arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Frequent questions

This page has a collection of frequently asked questions with answers from the community.

circle-info

This is a list of frequent questions most often asked in our troubleshooting channel on Slack. Please feel free to contribute your questions and answers here and make a pull request.

hashtag
Ingestion

hashtag
How do I flatten my JSON Kafka stream?

We have toJsonStr(key) function which can store a top level json field as a STRING in Pinot.

Then you can use jsonExtractScalar(JSON_STRING_FIELD, JSON_PATH, OUTPUT_FORMAT) function during query time to fetch the desired field from the json string. For example

circle-exclamation

NOTE This works well if some of your fields are nested json, but most of your fields are top level json keys. If all of your fields are within a nested JSON key, you will have to store the entire payload as 1 column, which is not ideal.

Support for flattening during ingestion is on the roadmap:

hashtag
Indexing

hashtag
How to set inverted indexes?

Inverted indexes are set in the tableConfig's tableIndexConfig -> invertedIndexColumns list. Here's the documentation for tableIndexConfig: along with a sample table that has set inverted indexes on some columns.

Applying inverted indexes to a table config will generate inverted index to all new segments. In order to apply the inverted indexes to all existing segments, follow steps in

hashtag
How to apply inverted index to existing setup?

  1. Add the columns you wish to index to the tableIndexConfig-> invertedIndexColumns list. This sample table config show inverted indexes set: To update the table config use the Pinot Swagger API:

  2. Invoke the reload API:

Right now, there’s no easy way to confirm that reload succeeded. One way it to check out the index_map file inside the segment metadata, you should see inverted index entries for the new columns. An API for this is coming soon:

hashtag
How to apply star tree index?

hashtag
Querying

hashtag
What are all the fields in the Pinot query's JSON response?

Here's the page explaining the Pinot response format:

hashtag
SQL Query fails with "Encountered 'timestamp' was expecting one of..."

"timestamp" is a reserved keyword in SQL. Escape timestamp with double quotes.

Other commonly encountered reserved keywords are date, time, table.

hashtag
Filtering on STRING column WHERE column = "foo" does not work?

For filtering on STRING columns, use single quotes

hashtag
ORDER BY using an alias doesn't work?

The fields in the ORDER BY clause must be one of the group by clauses or aggregations, BEFORE applying the alias. Therefore, this will not work

Instead, this will work

hashtag
Operations

hashtag
Can I change a column name in my table, without losing data?

hashtag
How to change number of replicas of a table?

You can change the number of replicas by updating the table config's section. Make sure you have at least as many servers as the replication.

For OFFLINE table, update

For REALTIME table update

After changing the replication, run a .

hashtag
How to run a rebalance on a table?

A rebalance is run to reassign all the segments of a table to the available servers. This is typically done when capacity changes are done i.e. adding more servers or removing servers from a table.

Offline

Use the rebalance API from the Swagger APIs on the controller , with tableType OFFLINE

Realtime

Use the rebalance API from the Swagger APIs on the controller , with tableType REALTIME. A realtime table has 2 components, the consuming segments and the completed segments. By default, only the completed segments will get rebalanced. The consuming segments will pick the right assignment once they complete. But you can enforce the consuming segments to also be included in the rebalance, by setting the param includeConsuming to true. Note that rebalancing the consuming segments would mean the consuming segment will drop the consumed data so far, and restart consumption from the last offset, which may lead to a short duration of data staleness.

You can check the status of the rebalance by

  1. Checking the controller logs

  2. Running rebalance again after a while, you should receive status "status": "NO_OP"

  3. Checking the External View of the table, to see the changes in capacity/replicas have taken effect.

hashtag
Tuning and Optimizations

hashtag
Do replica groups work for real-time?

Yes, replica groups work for realtime. There's 2 parts to enabling replica groups:

  1. Replica groups segment assignment

  2. Replica group query routing

Replica group segment assignment

Replica group segment assignment is achieved in realtime, if number of servers is a multiple of number of replicas. The partitions get uniformly sprayed across the servers, creating replica groups.

For example, consider we have 6 partitions, 2 replicas, and 4 servers.

As you can see, the set (S0, S2) contains r1 of every partition, and (s1, S3) contains r2 of every partition. The query will only be routed to one of the sets, and not span every server. If you are are adding/removing servers from an existing table setup, you have to run for segment assignment changes to take effect.

Replica group query routing

Once replica group segment assignment is in effect, the query routing can take advantage of it. For replica group based query routing, set the following in the table config's section, and then restart brokers

S3

p5

S0

S1

p6

S2

S3

r1

r2

p1

S0

S1

p2

S2

S3

p3

S0

S1

p4

https://github.com/apache/incubator-pinot/issues/5264arrow-up-right
https://docs.pinot.apache.org/basics/components/table#tableindexconfig-1arrow-up-right
How to apply inverted index to existing setup?
https://docs.pinot.apache.org/basics/components/table#offline-table-config arrow-up-right
http://localhost:9000/help#!/Table/updateTableConfigarrow-up-right
http://localhost:9000/help#!/Segment/reloadAllSegmentsarrow-up-right
https://github.com/apache/incubator-pinot/issues/5390arrow-up-right
https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-formatarrow-up-right
segmentsConfigarrow-up-right
replicationarrow-up-right
replicasPerPartitionarrow-up-right
table rebalance
http://localhost:9000/help#!/Table/rebalancearrow-up-right
http://localhost:9000/help#!/Table/rebalancearrow-up-right
rebalance
routingarrow-up-right

S2

Select jsonExtractScalar(myJsonMapStr,'$.k1','STRING') 
    from myTable  
    where jsonExtractScalar(myJsonMapStr,'$.k1','STRING') = 'value-k1-0'"
Select sum(jsonExtractScalar(complexMapStr,'$.k4.met','INT')) 
    from myTable 
    group by jsonExtractScalar(complexMapStr,'$.k1','STRING')
select "timestamp" from myTable
SELECT COUNT(*) from myTable WHERE column = 'foo'
SELECT count(colA) as aliasA, colA from tableA GROUP BY colA ORDER BY aliasA
SELECT count(colA) as sumA, colA from tableA GROUP BY colA ORDER BY count(colA)
{ 
    "tableName": "pinotTable", 
    "tableType": "OFFLINE", 
    "segmentsConfig": {
      "replication": "3", 
      ... 
    }
    ..
{ 
    "tableName": "pinotTable", 
    "tableType": "REALTIME", 
    "segmentsConfig": {
      "replicasPerPartition": "3", 
      ... 
    }
    ..
{
    "tableName": "pinotTable", 
    "tableType": "REALTIME",
    "routing": {
        "instanceSelectorType": "replicaGroup"
    }
    ..
}