Schema Evolution
Last updated
Was this helpful?
Last updated
Was this helpful?
So far, you've seen how to for a Pinot table. In this tutorial, we'll see how to evolve the schema (e.g. add a new column to the schema). This guide assumes you have a Pinot cluster up and running (eg: as mentioned in ). We will also assume there's an existing table baseballStats
created as part of the .
Let's begin by first fetching the existing schema. We can do this using the controller API:
Let's add a new column at the end of the schema, something like this (by editing baseballStats.schema
In this example, we're adding a new column called yearsOfExperience
with a default value of 1.
You can now update the schema using the following command
Please note: this will not be reflected immediately. You can use the following command to reload the table segments for this column to show up. This can be done as follows:
After the reload, now you can query the new column as shown below:
As you can observe, the current query returns the defaultNullValue
for the newly added column. In order to populate this column with real values, you will need to re-run the batch ingestion job for the past dates.
Real-Time Pinot table: In case of real-time tables, make sure the "pinot.server.instance.reload.consumingSegment" config is set to true inside . Without this, the current consuming segment(s) will not reflect the default null value for newly added columns.
New columns can be added with . If all the source columns for the new column exist in the schema, the transformed values will be generated for the new column instead of filling default values. Note that derived column as well as corresponding data type needs to be first defined in the schema before making changes in table config for ingestion transform.
Real-Time Pinot table: Backfilling data does not work for real-time tables. If you only have a real-time table, you can convert it to a hybrid table, by adding an offline counterpart that uses the same schema. Then you can backfill the offline table and fill in values for the newly added column. More on .