Querying JSON data
To see how JSON data can be queried, assume that we have the following table:
1
Table myTable:
2
id INTEGER
3
jsoncolumn JSON
4
5
Table data:
6
101,{"name":{"first":"daffy"\,"last":"duck"}\,"score":101\,"data":["a"\,"b"\,"c"\,"d"]}
7
102,{"name":{"first":"donald"\,"last":"duck"}\,"score":102\,"data":["a"\,"b"\,"e"\,"f"]}
8
103,{"name":{"first":"mickey"\,"last":"mouse"}\,"score":103\,"data":["a"\,"b"\,"g"\,"h"]}
9
104,{"name":{"first":"minnie"\,"last":"mouse"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
10
105,{"name":{"first":"goofy"\,"last":"dwag"}\,"score":104\,"data":["a"\,"b"\,"i"\,"j"]}
11
106,{"person":{"name":"daffy duck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}
12
107,{"person":{"name":"scrooge mcduck"\,"companies":[{"name":"n1"\,"title":"t1"}\,{"name":"n2"\,"title":"t2"}]}}
Copied!
We also assume that "jsoncolumn" has a Json Index on it. Note that the last two rows in the table have different structure than the rest of the rows. In keeping with JSON specification, a JSON column can contain any valid JSON data and doesn't need to adhere to a predefined schema. To pull out the entire JSON document for each row, we can run the query below:
1
SELECT id, jsoncolumn FROM myTable.
2
3
id,jsoncolumn
4
"101","{"name":{"first":"daffy","last":"duck"},"score":101,"data":["a","b","c","d"]}"
5
"102","{"name":{"first":"donald","last":"duck"},"score":102,"data":["a","b","e","f"]}"
6
"103","{"name":{"first":"mickey","last":"mouse"},"score":103,"data":["a","b","g","h"]}"
7
"104","{"name":{"first":"minnie","last":"mouse"},"score":104,"data":["a","b","i","j"]}"
8
"105","{"name":{"first":"goofy","last":"dwag"},"score":104,"data":["a","b","i","j"]}"
9
"106","{"person":{"name":"daffy duck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"
10
"107","{"person":{"name":"scrooge mcduck","companies":[{"name":"n1","title":"t1"},{"name":"n2","title":"t2"}]}}"
Copied!
To drill down and pull out specific keys within the JSON column, we simply append the JsonPath expression of those keys to the end of the column name.
1
SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] FROM myTable
2
3
id,jsoncolumn.name.last,jsoncolumn.name.first,jsoncolumn.data[1]
4
"101","duck","daffy","b"
5
"102","duck","donald","b"
6
"103","mouse","mickey","b"
7
"104","mouse","minnie","b"
8
"105","dwag","goofy","b"
9
"106","null","null","null"
10
"107","null","null","null"
Copied!
Note that the third column (jsoncolumn.data[1]) is null for rows with id 106 and 107. This is because these rows have JSON documents that don't have a key with JsonPath jsoncolumn.data[1]. We can filter out these rows.
1
SELECT id, jsoncolumn.name.last, jsoncolumn.name.first, jsoncolumn.data[1] FROM myTable WHERE jsoncolumn.data[1] IS NOT NULL
2
3
id,jsoncolumn.name.last,jsoncolumn.name.first,jsoncolumn.data[1]
4
"101","duck","daffy","b"
5
"102","duck","donald","b"
6
"103","mouse","mickey","b"
7
"104","mouse","minnie","b"
8
"105","dwag","goofy","b"
Copied!
Notice that certain last names (duck and mouse for example) repeat in the data above. We can get a count of each last name by running a GROUP BY query on a JsonPath expression.
1
SELECT jsoncolumn.name.last, count(*) FROM myTable WHERE jsoncolumn.data[1] IS NOT NULL GROUP BY jsoncolumn.name.last ORDER BY 2 DESC
2
3
jsoncolumn.name.last,count(*)
4
"mouse","2"
5
"duck","2"
6
"dwag","1"
Copied!
Also there is numerical information (jsconcolumn.score) embeded within the JSON document. We can extract those numerical values from JSON data into SQL and sum them up using the query below.
1
SELECT jsoncolumn.name.last, sum(jsoncolumn.score) FROM myTable WHERE jsoncolumn.name.last IS NOT NULL GROUP BY jsoncolumn.name.last
2
3
jsoncolumn.name.last,sum(jsoncolumn.score)
4
"mouse","207"
5
"dwag","104"
6
"duck","203"
Copied!
In short, JSON querying support in Pinot will allow you to use a JsonPath expression whereever you can use a column name with the only difference being that to query a column with data type JSON, you must append a JsonPath expression after the name of the column.
Last modified 2mo ago
Copy link