Each table in Pinot is associated with a Schema. A schema defines what fields are present in the table along with the data types.
The schema is stored in the Zookeeper, along with the table configuration.
A schema also defines what category a column belongs to. Columns in a Pinot table can be categorized into three categories:
Data types determine the operations that can be performed on a column. Pinot supports the following data types:
JSONare added after release
0.7.1. In release
0.7.1and older releases,
BOOLEANis equivalent to
Pinot also supports columns that contain lists or arrays of items, but there isn't an explicit data type to represent these lists or arrays. Instead, you can indicate that a dimension column accepts multiple values. For more information, see DimensionFieldSpec in the Schema configuration reference.
Since Pinot doesn't have a dedicated
DATETIMEdatatype support, you need to input time in either STRING, LONG, or INT format. However, Pinot needs to convert the date into an understandable format such as epoch timestamp to do operations.
To achieve this conversion, you will need to provide the format of the date along with the data type in the schema. The format is described using the following syntax:
- time size - the size of the time unit. This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds. If your date is not in
EPOCHformat, this value is not used and can be set to 1 or any other integer.\
- timeFormat - can be either
SIMPLE_DATE_FORMAT. If it is
SIMPLE_DATE_FORMAT, the pattern string is also specified. \
- pattern - This is optional and is only specified when the date is in
SIMPLE_DATE_FORMAT. The pattern should be specified using the java SimpleDateFormat representation. e.g. 2020-08-21 can be represented as
Here are some sample date-time formats you can use in the schema:
1:MILLISECONDS:EPOCH- used when timestamp is in the epoch milliseconds and stored in
1:HOURS:EPOCH- used when timestamp is in the epoch hours and stored in
1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd- when the date is in
STRINGformat and has the pattern year-month-date. e.g. 2020-08-21
1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy- when date is in
STRINGformat. e.g. Mon Aug 24 12:36:50 America/Los_Angeles 2019
There are several built-in virtual columns inside the schema the can be used for debugging purposes:
These virtual columns can be used in queries in a similar way to regular columns.
Let's create a schema and put it in a JSON file. For this example, we have created a schema for flight data.
Then, we can upload the sample schema provided above using either a Bash command or REST API call.
bin/pinot-admin.sh AddSchema -schemaFile flights-schema.json -exec
bin/pinot-admin.sh AddTable -schemaFile flights-schema.json -tableFile flights-table.json -exec