Schema
Schema is used to define the names, data types and other information for the columns of a Pinot table.
Types of columns
Columns in a Pinot table can be broadly categorized into three categories
Schema format
A Pinot schema is written in JSON format. Here's an example which shows all the fields of a schema
The Pinot schema is composed of
Below is a detailed description of each type of field spec.
dimensionFieldSpecs
A dimensionFieldSpec is defined for each dimension column. Here's a list of the fields in the dimensionFieldSpec
Internal default null values for dimension
metricFieldSpecs
A metricFieldSpec is defined for each metric column. Here's a list of fields in the metricFieldSpec
Internal default null values for metric
dateTimeFieldSpec
A dateTimeFieldSpec is used to define time columns of the table. Here's a list of the fields in a dateTimeFieldSpec
timeFieldSpec
This has been deprecated. Older schemas containing timeFieldSpec will be supported. But for new schemas, use DateTimeFieldSpec instead.
A timeFieldSpec is defined for the time column. A timeFieldSpec is composed of an incomingGranularitySpec and an outgoingGranularitySpec. IncomingGranularitySpec in combination with outgoingGranularitySpec can be used to transform the time column from incoming format to the outgoing format. If both of them are specified, the segment creation process will convert the time column from the incoming format to the outgoing format. If no time column transformation is required, you can specify just the incomingGranularitySpec.
The incoming and outgoing granularitySpec are defined as:
Advanced fields
Apart from these, there's some advanced fields. These are common to all field specs.
Ingestion Transform Functions
Transform functions can be defined on columns in the schema. For example:
Currently, we have support for 2 kinds of functions
Groovy functions
Inbuilt functions
Note
Currently, the arguments must be from the source data. They cannot be columns from the Pinot schema which have been created through transformations.
Groovy functions
Groovy functions can be defined using the syntax:
Here's some examples of commonly needed functions. Any valid Groovy expression can be used.
String concatenation
Concat firstName
and lasName
to get fullName
Find element in an array
Find max value in array bids
Time transformation
Convert timestamp
from MILLISECONDS
to HOURS
Column name change
Simply change name of the column from user_id
to userId
Ternary operation
If eventType
is IMPRESSION
set impression
to 1
. Similar for CLICK
.
AVRO Map
Store an AVRO Map in Pinot as two multi-value columns. Sort the keys, to maintain the mapping.
1) The keys of the map as map_keys
2) The values of the map as map_values
Inbuilt Pinot functions
We have several inbuilt functions that can be used directly in as ingestion transform functions
DateTime functions
These are functions which enable commonly needed time transformations.
toEpochXXX
Converts from epoch milliseconds to a higher granularity.
toEpochXXXRounded
Converts from epoch milliseconds to another granularity, rounding to the nearest rounding bucket. For example, 1588469352000
(2020-05-01 42:29:12) is 26474489
minutesSinceEpoch. `toEpochMinutesRounded(1588469352000) = 26474480
(2020-05-01 42:20:00)
fromEpochXXX
Converts from an epoch granularity to milliseconds.
Simple date format
Converts simple date format strings to milliseconds and vice-a-versa, as per the provided pattern string.
Json functions
Creating a Schema
Create a schema for your data, or see examples
for examples. Make sure you've setup the cluster
Note: schema can also be created as part of table creation, refer to Creating a table.
Check out the schema in the Rest API to make sure it was successfully uploaded
Last updated