Schema Evolution
Was this helpful?
Was this helpful?
Schema evolution occurs over time. As business requirements evolve, and data formats or structures need to change, use Pinot to keep your schemas up-to-date. If you're just starting out with schemas in Pinot, see how to for a Pinot table.
In this tutorial, you'll learn how to add a new column to your schema, load data to the updated schema, run a query to test the updated schema, and backfill data.
Before you get started, you must have a Pinot cluster up and running, and a baseballStats
table (created when you set up a Pinot cluster using the Quickstart option). For more information, see how to option.
Fetch the existing schema using the controller API:
Edit the baseballStats.schema
file to include a new column at the end of the schema. For example, here we're adding a new column called yearsOfExperience
with a dataType
of INT
and defaultNullValue
of 1
.
Update the schema using the following command:
After you add the new column to your schema, reload the consuming segments.
To ensure the baseballStats
column shows up, run the following command to reload the table segments--be sure to replace the accurate reloadJobId
for your schema:
Command
Response
This triggers a reload operation on each of the servers hosting the table's segments. The API response has a reloadJobId
that you can use to monitor the status of the reload operation using the segment reload status API.
Command
Response
After reloading the segments, run the the following to query the new column:
Command
Response
As you can see, the query returns the defaultNullValue
for the newly added column. To populate this column with real values, re-run the batch ingestion job for the past datesBackfill data.
(Real-time tables only): Open , and set pinot.server.instance.reload.consumingSegment
to true
.
Backfilling data does not work for real-time tables. You can convert a real-time table to a hybrid table by adding an offline table that uses the same counterpart, and then backfilling the offline table to fill in values for the newly added column. For more information, see .