Schema
Schema is used to define the names, data types, and other information for the columns of a Pinot table.
The Pinot schema is composed of:
Field | Description |
---|---|
schemaName | Defines the name of the schema. This is usually the same as the table name. The offline and the realtime table of a hybrid table should use the same schema. |
dimensionFieldSpec | A dimensionFieldSpec is defined for each dimension column. For more details, scroll down to DimensionFieldSpec. |
metricFieldSpec | A metricFieldSpec is defined for each metric column. For more details, scroll down to MetricFieldSpec. |
dateTimeFieldSpec | A dateTimeFieldSpec is defined for the time columns. There can be multiple time columns. For more details, scroll down to DateTimeFieldSpec. |
Below is a detailed description of each type of field spec.
A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec:
Property | Description |
---|---|
name | Name of the dimension column. |
dataType | Data type of the dimension column. Can be INT, LONG, FLOAT, DOUBLE, BOOLEAN, TIMESTAMP, STRING, BYTES. |
defaultNullValue | Represents null values in the data, since Pinot doesn't support storing null column values natively (as part of its on-disk storage format). If not specified, an internal default null value is used as listed here. |
singleValueField | Boolean indicating if this is a single-valued or a multi-valued column. Multi-valued column is modeled as a list, where the order of the values are preserved and duplicate values are allowed. Individual rows don’t necessarily have the same number of values. Typical use case for this would be a column such as skillSet for a person (one row in the table) that can have multiple values such as Real Estate, Mortgages . The default null value for a multi-valued column is a single defaultNullValue , e.g. [Integer.MIN_VALUE] . |
Data Type | Internal Default Null Value |
---|---|
INT | |
LONG | |
FLOAT | |
DOUBLE | |
BOOLEAN | 0 ( false ) |
TIMESTAMP | 0 ( 1970-01-01 00:00:00 UTC ) |
STRING | "null" |
BYTES | byte array of length 0 |
A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec
Property | Description |
---|---|
name | Name of the metric column |
dataType | Data type of the column. Can be INT, LONG, FLOAT, DOUBLE, BYTES (for specialized representations such as HLL, TDigest, etc, where the column stores byte serialized version of the value) |
defaultNullValue | Represents null values in the data. If not specified, an internal default null value is used, as listed here. |
Data Type | Internal Default Null Value |
---|---|
INT | 0 |
LONG | 0 |
FLOAT | 0.0 |
DOUBLE | 0.0 |
BYTES | byte array of length 0 |
A dateTimeFieldSpec is used to define time columns of the table. Here's a list of the fields in a dateTimeFieldSpec
Property | Description |
---|---|
name | Name of the date time column |
dataType | Data type of the date time column. Can be STRING, INT, LONG or TIMESTAMP |
format | The format of the time column. The syntax of the format is timeSize:timeUnit:timeFormat timeFormat can be either EPOCH or SIMPLE_DATE_FORMAT. If it is SIMPLE_DATE_FORMAT, the pattern string is also specified. For example: 1:MILLISECONDS:EPOCH - epoch millis 1:HOURS:EPOCH - epoch hours 1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd - date specified like 20191018 1:HOURS:SIMPLE_DATE_FORMAT:EEE MMM dd HH:mm:ss ZZZ yyyy - date specified like Mon Aug 24 12:36:50 America/Los_Angeles 2019 Noted that if TIMESTAMP type is used in dataType, format is ignored because JDBC standard SQL requires TIMESTAMP in yyyy-[m]m-[d]d hh:mm:ss[.f...] format. |
granularity | The granularity in which the column is bucketed. The syntax of granularity is
bucket size:bucket unit
For example, the format can be milliseconds 1:MILLISECONDS:EPOCH , but bucketed to 15 minutes i.e. we only have one value for every 15 minute interval, in which case granularity can be specified as 15:MINUTES |
defaultNullValue | Represents null values in the data. If not specified, an internal default null value is used, as listed here. The values are the same as those used for dimensionFieldSpec. |
Apart from these, there's some advanced fields. These are common to all field specs.
Property | Description |
---|---|
maxLength | Max length of this column |
virtualColumnProvider | Column value provider |