arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Schema Evolution

So far, you've seen how to create a new schemaarrow-up-right for a Pinot table. In this tutorial, we'll see how to evolve the schema (e.g. add a new column to the schema). This guide assumes you have a Pinot cluster up and running (eg: as mentioned in https://docs.pinot.apache.org/basics/getting-started/running-pinot-locallyarrow-up-right). We will also assume there's an existing table baseballStats created as part of the batch quick startarrow-up-right.

circle-info

Pinot only allows adding new columns to the schema. In order to drop a column, change the column name or data type, a new table has to be created.

hashtag
Get the existing schema

Let's begin by first fetching the existing schema. We can do this using the controller API:

hashtag
Add a new column

Let's add a new column at the end of the schema, something like this (by editing baseballStats.schema

In this example, we're adding a new column called yearsOfExperience with a default value of 1.

hashtag
Update the schema

You can now update the schema using the following command

Please note: this will not be reflected immediately. You can use the following command to reload the table segments for this column to show up. This can be done as follows:

After the reload, now you can query the new column as shown below:

circle-info

Real-Time Pinot table: In case of real-time tables, make sure the "pinot.server.instance.reload.consumingSegment" config is set to true inside . Without this, the current consuming segment(s) will not reflect the default null value for newly added columns.

Note that the real values for the newly added columns won't be reflected within the current consuming segment(s). The next consuming segment(s) will start consuming the real values.

hashtag
Derived Column

New columns can be added with . If all the source columns for the new column exist in the schema, the transformed values will be generated for the new column instead of filling default values.

hashtag
Backfilling the Data

As you can observe, the current query returns the defaultNullValue for the newly added column. In order to populate this column with real values, you will need to re-run the batch ingestion job for the past dates.

circle-info

Real-Time Pinot table: Backfilling data does not work for real-time tables. If you only have a real-time table, you can convert it to a hybrid table, by adding an offline counterpart that uses the same schema. Then you can backfill the offline table and fill in values for the newly added column. More on .

Server configarrow-up-right
ingestion transforms
hybrid tables herearrow-up-right
$ curl localhost:9000/schemas/baseballStats > baseballStats.schema
{
  "schemaName" : "baseballStats",
  "dimensionFieldSpecs" : [ {
  
    ...
    
    }, {
    "name" : "myNewColumn",
    "dataType" : "INT",
    "defaultNullValue": 1
  } ]
}
bin/pinot-admin.sh AddSchema -schemaFile baseballStats.schema -exec
$ curl -F [email protected] localhost:9000/schemas
$ curl -X POST localhost:9000/segments/baseballStats/reload
$ bin/pinot-admin.sh PostQuery \
  -queryType sql \
  -brokerPort 8000 \
  -query "select playerID, yearsOfExperience from baseballStats limit 10" 2>/dev/null
Executing command: PostQuery -brokerHost 192.168.86.234 -brokerPort 8000 -queryType sql -query select playerID, yearsOfExperience from baseballStats limit 10
Result: {"resultTable":{"dataSchema":{"columnNames":["playerID","yearsOfExperience"],"columnDataTypes":["STRING","INT"]},"rows":[["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aardsda01",1],["aaronha01",1],["aaronha01",1],["aaronha01",1]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numConsumingSegmentsQueried":0,"numDocsScanned":10,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":20,"numGroupsLimitReached":false,"totalDocs":97889,"timeUsedMs":3,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":0}