arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

JSON index

This page describes configuring the JSON index for Apache Pinot.

The JSON index can be applied to JSON string columns to accelerate value lookups and filtering for the column.

hashtag
When to use JSON index

Use the JSON string can be used to represent array, map, and nested fields without forcing a fixed schema. While JSON strings are flexible, filtering on JSON string columns is expensive, so consider the use case.

Suppose we have some JSON records similar to the following sample record stored in the person column:

Without an index, to look up the key and filter records based on the value, Pinot must scan and reconstruct the JSON object from the JSON string for every record, look up the key and then compare the value.

For example, in order to find all persons whose name is "adam", the query will look like:

The JSON index is designed to accelerate the filtering on JSON string columns without scanning and reconstructing all the JSON objects.

hashtag
Enable and configure a JSON index

To enable the JSON index, you can configure the following options in the table configuration:

Config Key
Description
Type
Default

hashtag
Recommended way to configure

The recommended way to configure a JSON index is in the fieldConfigList.indexes object, within the json key.

All options are optional, so the following is a valid configuration that use the default parameter values:

hashtag
Deprecated ways to configure JSON indexes

There are two older ways to configure the indexes that can be configured in the tableIndexConfig section inside table config.

The first one uses the same JSON explained above, but it is defined inside tableIndexConfig.jsonIndexConfigs.<column name>:

Like in the previous case, all parameters are optional, so the following is also valid:

The last option does not support to configure any parameter. In order to use this option, add the name of the column in tableIndexConfig.jsonIndexColumns like in this example:

hashtag
Example:

With the following JSON document:

Using the default setting, we will flatten the document into the following records:

With maxLevels set to 1:

With maxLevels set to 2:

With excludeArray set to true:

With disableCrossArrayUnnest set to true:

With includePaths set to ["$.name", "$.addresses[*].country"]:

With excludePaths set to ["$.age", "$.addresses[*].number"]:

With excludeFields set to ["age", "street"]:

Note that the JSON index can only be applied to STRING/JSON columns whose values are JSON strings.

circle-info

To reduce unnecessary storage overhead when using a JSON index, we recommend that you add the indexed column to the noDictionaryColumns columns list.

For instructions on that configuration property, see the documentation.

hashtag
How to use the JSON index

The JSON index can be used via the JSON_MATCH predicate: JSON_MATCH(<column>, '<filterExpression>'). For example, to find every entry with the name "adam":

Note that the quotes within the filter expression need to be escaped.

hashtag
Supported filter expressions

hashtag
Simple key lookup

Find all persons whose name is "adam":

hashtag
Chained key lookup

Find all persons who have an address (one of the addresses) with number 112:

hashtag
Nested filter expression

Find all persons whose name is "adam" and also have an address (one of the addresses) with number 112:

hashtag
Array access

Find all persons whose first address has number 112:

hashtag
Existence check

Find all persons who have a phone field within the JSON:

Find all persons whose first address does not contain floor field within the JSON:

hashtag
JSON context is maintained

The JSON context is maintained for object elements within an array, meaning the filter won't cross-match different objects in the array.

To find all persons who live on "main st" in "ca":

This query won't match "adam" because none of his addresses matches both the street and the country.

If you don't want JSON context, use multiple separate JSON_MATCH predicates. For example, to find all persons who have addresses on "main st" and have addresses in "ca" (matches need not have the same address):

This query will match "adam" because one of his addresses matches the street and another one matches the country.

The array index is maintained as a separate entry within the element, so in order to query different elements within an array, multiple JSON_MATCH predicates are required. For example, to find all persons who have first address on "main st" and second address on "second st":

hashtag
Supported JSON values

hashtag
Object

See examples above.

hashtag
Array

To find the records with array element "item1" in "arrayCol":

To find the records with second array element "item2" in "arrayCol":

hashtag
Value

To find the records with value 123 in "valueCol":

hashtag
Null

To find the records with null in "nullableCol":

hashtag
Limitations

  1. The key (left-hand side) of the filter expression must be the leaf level of the JSON object, for example, "$.addresses[*]"='main st' won't work.

includePaths

Only include the given paths, e.g. "$.a.b", "$.a.c[*]" (mutual exclusive with excludePaths). Paths under the included paths will be included, e.g. "$.a.b.c" will be included when "$.a.b" is configured to be included.

Set<String>

null (include all paths)

excludePaths

Exclude the given paths, e.g. "$.a.b", "$.a.c[*]" (mutual exclusive with includePaths). Paths under the excluded paths will also be excluded, e.g. "$.a.b.c" will be excluded when "$.a.b" is configured to be excluded.

Set<String>

null (include all paths)

excludeFields

Exclude the given fields, e.g. "b", "c", even if it is under the included paths.

Set<String>

null (include all fields)

maxLevels

Max levels to flatten the json object (array is also counted as one level)

int

-1 (unlimited)

excludeArray

Whether to exclude array when flattening the object

boolean

false (include array)

disableCrossArrayUnnest

Whether to not unnest multiple arrays (unique combination of all elements)

boolean

Raw value forward index

false (calculate unique combination of all elements)

{
  "name": "adam",
  "age": 30,
  "country": "us",
  "addresses":
  [
    {
      "number" : 112,
      "street" : "main st",
      "country" : "us"
    },
    {
      "number" : 2,
      "street" : "second st",
      "country" : "us"
    },
    {
      "number" : 3,
      "street" : "third st",
      "country" : "ca"
    }
  ]
}
SELECT * 
FROM mytable 
WHERE JSON_EXTRACT_SCALAR(person, '$.name', 'STRING') = 'adam'
json index defined in tableConfig
{
  "fieldConfigList": [
    {
      "name": "person",
      "indexes": {
        "json": {
          "maxLevels": 2,
          "excludeArray": false,
          "disableCrossArrayUnnest": true,
          "includePaths": null,
          "excludePaths": null,
          "excludeFields": null
        }
      }
    }
  ],
  ...
}
json index defined in tableConfig
{
  "fieldConfigList": [
    {
      "name": "person",
      "indexes": {
        "json": {}
      }
    }
  ],
  ...
}
older way to configure json indexes in table config
{
  "tableIndexConfig": {
    "jsonIndexConfigs": {
      "person": {
        "maxLevels": 2,
        "excludeArray": false,
        "disableCrossArrayUnnest": true,
        "includePaths": null,
        "excludePaths": null,
        "excludeFields": null
      },
      ...
    },
    ...
  }
}
json index with default config
{
  "tableIndexConfig": {
    "jsonIndexConfigs": {
      "person": {},
      ...
    },
    ...
  }
}
json index with default config
{
  "tableIndexConfig": {        
    "jsonIndexColumns": [
      "person",
      ...
    ],
    ...
  }
}
{
  "name": "adam",
  "age": 20,
  "addresses": [
    {
      "country": "us",
      "street": "main st",
      "number": 1
    },
    {
      "country": "ca",
      "street": "second st",
      "number": 2
    }
  ],
  "skills": [
    "english",
    "programming"
  ]
}
{
  "name": "adam",
  "age": 20,
  "addresses[0].country": "us",
  "addresses[0].street": "main st",
  "addresses[0].number": 1,
  "skills[0]": "english"
},
{
  "name": "adam",
  "age": 20,
  "addresses[0].country": "us",
  "addresses[0].street": "main st",
  "addresses[0].number": 1,
  "skills[1]": "programming"
},
{
  "name": "adam",
  "age": 20,
  "addresses[1].country": "ca",
  "addresses[1].street": "second st",
  "addresses[1].number": 2,
  "skills[0]": "english"
},
{
  "name": "adam",
  "age": 20,
  "addresses[1].country": "ca",
  "addresses[1].street": "second st",
  "addresses[1].number": 2,
  "skills[1]": "programming"
}
{
  "name": "adam",
  "age": 20
}
{
  "name": "adam",
  "age": 20,
  "skills[0]": "english"
},
{
  "name": "adam",
  "age": 20,
  "skills[1]": "programming"
}
{
  "name": "adam",
  "age": 20
}
{
  "name": "adam",
  "age": 20,
  "addresses[0].country": "us",
  "addresses[0].street": "main st",
  "addresses[0].number": 1
},
{
  "name": "adam",
  "age": 20,
  "addresses[0].country": "us",
  "addresses[0].street": "main st",
  "addresses[0].number": 1
},
{
  "name": "adam",
  "age": 20,
  "skills[0]": "english"
},
{
  "name": "adam",
  "age": 20,
  "skills[1]": "programming"
}
{
  "name": "adam",
  "addresses[0].country": "us"
},
{
  "name": "adam",
  "addresses[1].country": "ca"
}
{
  "name": "adam",
  "addresses[0].country": "us",
  "addresses[0].street": "main st",
  "skills[0]": "english"
},
{
  "name": "adam",
  "addresses[0].country": "us",
  "addresses[0].street": "main st",
  "skills[1]": "programming"
},
{
  "name": "adam",
  "addresses[1].country": "ca",
  "addresses[1].street": "second st",
  "skills[0]": "english"
},
{
  "name": "adam",
  "addresses[1].country": "ca",
  "addresses[1].street": "second st",
  "skills[1]": "programming"
}
{
  "name": "adam",
  "addresses[0].country": "us",
  "addresses[0].number": 1,
  "skills[0]": "english"
},
{
  "name": "adam",
  "addresses[0].country": "us",
  "addresses[0].number": 1,
  "skills[1]": "programming"
},
{
  "name": "adam",
  "addresses[1].country": "ca",
  "addresses[1].number": 2,
  "skills[0]": "english"
},
{
  "name": "adam",
  "addresses[1].country": "ca",
  "addresses[1].number": 2,
  "skills[1]": "programming"
}
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.name"=''adam''')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.name"=''adam''')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.addresses[*].number"=112')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.name"=''adam'' AND "$.addresses[*].number"=112')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.addresses[0].number"=112')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.phone" IS NOT NULL')
SELECT ... 
FROM mytable
WHERE JSON_MATCH(person, '"$.addresses[0].floor" IS NULL')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.addresses[*].street"=''main st'' AND "$.addresses[*].country"=''ca''')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.addresses[*].street"=''main st''') AND JSON_MATCH(person, '"$.addresses[*].country"=''ca''')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(person, '"$.addresses[0].street"=''main st''') AND JSON_MATCH(person, '"$.addresses[1].street"=''second st''')
["item1", "item2", "item3"]
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(arrayCol, '"$[*]"=''item1''')
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(arrayCol, '"$[1]"=''item2''')
123
1.23
"Hello World"
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(valueCol, '"$"=123')
null
SELECT ... 
FROM mytable 
WHERE JSON_MATCH(nullableCol, '"$" IS NULL')