Schema Evolution
Last updated
Was this helpful?
Last updated
Was this helpful?
So far, you've seen how to for a Pinot table. In this tutorial, we'll see how to evolve the schema (e.g. add a new column to the schema). This guide assumes you have a Pinot cluster up and running (eg: as mentioned in ). We will also assume there's an existing table baseballStats
created as part of the .
Let's begin by first fetching the existing schema. We can do this using the controller API:
Let's add a new column at the end of the schema, something like this (by editing baseballStats.schema
In this example, we're adding a new column called yearsOfExperience
with a default value of 1.
You can now update the schema using the following command
Please note: this will not be reflected immediately. You can use the following command to reload the table segments for this column to show up. This can be done as follows:
After the reload, now you can query the new column as shown below:
As you can observe, the current query returns the defaultNullValue
for the newly added column. In order to populate this column with real values, you will need to re-run the batch ingestion job for the past dates.
Real-Time Pinot table: In case of real-time tables, make sure the "pinot.server.instance.reload.consumingSegment" config is set to true inside . Without this, the current consuming segment(s) will not reflect the default null value for newly added columns.
New columns can be added with . If all the source columns for the new column exist in the schema, the transformed values will be generated for the new column instead of filling default values.
Real-Time Pinot table: Backfilling data does not work for real-time tables. If you only have a real-time table, you can convert it to a hybrid table, by adding an offline counterpart that uses the same schema. Then you can backfill the offline table and fill in values for the newly added column. More on .