Schema is used to define the names, data types, and other information for the columns of a Pinot table.
The Pinot schema is composed of:
The above json configuration is the example of Pinot schema derived from the flight data. As seen in the example, the schema is composed of 4 parts: schemaName
, dimensionFieldSpec
, metricFieldSpec
, and dateTimeFieldSpec
. Below is a detailed description of each type of field spec.
Data types determine the operations that can be performed on a column. Pinot supports the following data types:
The lowest granularity TIMESTAMP type supports is milliseconds epoch, nanoseconds is not supported.
Read the following sections for details on how data types are used in various parts of a schema.
A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec:
INT
LONG
FLOAT
DOUBLE
BOOLEAN
0 (false
)
TIMESTAMP
0 (1970-01-01 00:00:00 UTC
)
STRING
"null"
BYTES
byte array of length 0
JSON
"null"
A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec
name
Name of the metric column
dataType
Data type of the column. Can be INT, LONG, FLOAT, DOUBLE, BIG_DECIMAL, BYTES (for specialized representations such as HLL, TDigest, etc, where the column stores byte serialized version of the value)
defaultNullValue
Represents null values in the data. If not specified, an internal default null value is used, as listed here.
INT
0
LONG
0
FLOAT
0.0
DOUBLE
0.0
BIG_DECIMAL
0.0
BYTES
byte array of length 0
A dateTimeFieldSpec is used to define time columns of the table. The following fields can be configured in the date time field spec -
In the pinot master (0.12.0-SNAPSHOT), We have simplified date time formats for the users. The formats now follow the pattern - timeFormat|pattern/timeUnit|
[timeZone/timeSize]
. The fields present in []
are completely optional. timeFormat can be one of EPOCH
, SIMPLE_DATE_FORMAT
or TIMESTAMP
.
TIMESTAMP
- This represents timestamp in milliseconds. It is equivalent to specifying EPOCH|MILLISECONDS|1
Examples -
TIMESTAMP
EPOCH
- Defaults to MILLISECONDS (only in master
branch)
EPOCH|SECONDS
EPOCH|SECONDS|5
SIMPLE_DATE_FORMAT
(only in master
branch)
SIMPLE_DATE_FORMAT|yyyy-MM-dd HH:mm:ss
SIMPLE_DATE_FORMAT|yyyy-MM-dd|IST
Only datetime timeformats in lexicographical order are support in Pinot. so yyyy-MM-dd
,MM-dd
and yyyy-dd
are valid while MM-dd-yyyy
is not.
The order is decided as year > month > day > hour > minutes > second.
These date-time formats are still supported in Pinot for backward compatibility. However, new users should prefer to use the formats mentioned in the previous sections.
You will need to provide the format of the date along with the data type in the schema. The format is described using the following syntax: timeSize:timeUnit:timeFormat:pattern
.
time size - the size of the time unit. This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds.
If your date is not in EPOCH
format, this value is not used and can be set to 1 or any other integer.
timeFormat - can be either EPOCH
or SIMPLE_DATE_FORMAT
. If it is SIMPLE_DATE_FORMAT
, the pattern string is also specified.
Here are some sample date-time formats you can use in the schema:
1:MILLISECONDS:EPOCH
- used when timestamp is in the epoch milliseconds and stored in LONG
format
1:HOURS:EPOCH
- used when timestamp is in the epoch hours and stored in LONG
or INT
format
1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd
- when the date is in STRING
format and has the pattern year-month-date. e.g. 2020-08-21
1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy
- when date is in STRING
format. e.g. Mon Aug 24 12:36:50 America/Los_Angeles 2019
There are several built-in virtual columns inside the schema the can be used for debugging purposes:
These virtual columns can be used in queries in a similar way to regular columns.
Apart from these, there's some advanced fields. These are common to all field specs.