Schema
Schema is used to define the names, data types, and other information for the columns of a Pinot table.
The Pinot schema is composed of:
Field | Description |
schemaName | Defines the name of the schema. This is usually the same as the table name. The offline and the realtime table of a hybrid table should use the same schema. |
dimensionFieldSpecs | A dimensionFieldSpec is defined for each dimension column. For more details, scroll down to DimensionFieldSpec. |
metricFieldSpecs | A metricFieldSpec is defined for each metric column. For more details, scroll down to MetricFieldSpec. |
dateTimeFieldSpec | A dateTimeFieldSpec is defined for the time columns. There can be multiple time columns. For more details, scroll down to DateTimeFieldSpec. |
Below is a detailed description of each type of field spec.
DimensionFieldSpec
A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec:
Property | Description |
name | Name of the dimension column. |
dataType | Data type of the dimension column. Can be INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING, BYTES. |
defaultNullValue | Represents null values in the data, since Pinot doesn't support storing null column values natively (as part of its on-disk storage format). If not specified, an internal default null value is used as listed here. |
singleValueField | Boolean indicating if this is a single-valued or a multi-valued column. Multi-valued column is modeled as a list, where the order of the values are preserved and duplicate values are allowed. Individual rows don’t necessarily have the same number of values. Typical use case for this would be a column such as |
Internal default null values for dimension
Data Type | Internal Default Null Value |
INT | |
LONG | |
FLOAT | |
DOUBLE | |
BOOLEAN | 0 ( |
TIMESTAMP | 0 ( |
STRING | "null" |
BYTES | byte array of length 0 |
MetricFieldSpec
A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec
Property | Description |
name | Name of the metric column |
dataType | Data type of the column. Can be INT, LONG, FLOAT, DOUBLE, BYTES (for specialized representations such as HLL, TDigest, etc, where the column stores byte serialized version of the value) |
defaultNullValue | Represents null values in the data. If not specified, an internal default null value is used, as listed here. |
Internal default null values for metric
Data Type | Internal Default Null Value |
INT | 0 |
LONG | 0 |
FLOAT | 0.0 |
DOUBLE | 0.0 |
BYTES | byte array of length 0 |
DateTimeFieldSpec
A dateTimeFieldSpec is used to define time columns of the table. Here's a list of the fields in a dateTimeFieldSpec
Property | Description |
name | Name of the date time column |
dataType | Data type of the date time column. Can be STRING, INT, LONG |
format | The format of the time column. The syntax of the format is timeFormat can be either EPOCH or SIMPLE_DATE_FORMAT. If it is SIMPLE_DATE_FORMAT, the pattern string is also specified. For example: 1:MILLISECONDS:EPOCH - epoch millis 1:HOURS:EPOCH - epoch hours 1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd - date specified like 1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy - date specified like |
granularity | The granularity in which the column is bucketed. The syntax of granularity is
|
defaultNullValue | Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec. |
Advanced fields
Apart from these, there's some advanced fields. These are common to all field specs.
Property | Description |
maxLength | Max length of this column |
virtualColumnProvider | Column value provider |
Last updated