1 of 1

Schema

Each table in Pinot is associated with a Schema. A schema defines what fields are present in the table along with the data types.

The schema is stored in the Zookeeper, along with the table configuration.

Pinot also supports columns that contain lists or arrays of items, but there isn't an explicit data type to represent these lists or arrays. Instead, you can indicate that a dimension column accepts multiple values. For more information, see in the Schema configuration reference.

Date Time Fields

Since Pinot doesn't have a dedicated DATETIME datatype support, you need to input time in either STRING, LONG, or INT format. However, Pinot needs to convert the date into an understandable format such as epoch timestamp to do operations.

To achieve this conversion, you will need to provide the format of the date along with the data type in the schema. The format is described using the following syntax: timeSize:timeUnit:timeFormat:pattern .

time size - the size of the time unit. This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds. If your date is not in EPOCH format, this value is not used and can be set to 1 or any other integer.\
time unit - one of enum values. e.g. HOURS , MINUTES etc. If your date is not in EPOCH

Here are some sample date-time formats you can use in the schema:

1:MILLISECONDS:EPOCH - used when timestamp is in the epoch milliseconds and stored in LONG format
1:HOURS:EPOCH - used when timestamp is in the epoch hours and stored in LONG or INT format

New DateTime Formats

From Pinot release 0.11.0, We have simplified date time formats for the users. The formats now follow the pattern - timeFormat|pattern/timeUnit|[timeZone/timeSize] . The fields present in [] are completely optional. timeFormat can be one of EPOCH , SIMPLE_DATE_FORMAT or TIMESTAMP .

TIMESTAMP - This represents timestamp in milliseconds. It is equivalent to specifying EPOCH:MILLISECONDS:1
EPOCH - This represents time in timeUnit since 00:00:00 UTC on 1 January 1970. You can also specify the timeSize parameter.This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds. Examples -

Built-in Virtual Columns

There are several built-in virtual columns inside the schema the can be used for debugging purposes:

Column Name

Column Type

Data Type

Description

These virtual columns can be used in queries in a similar way to regular columns.

Creating a Schema

First, Make sure your and running.

Let's create a schema and put it in a JSON file. For this example, we have created a schema for flight data.

For more details on constructing a schema file, see the .

Then, we can upload the sample schema provided above using either a Bash command or REST API call.

Check out the schema in the to make sure it was successfully uploaded

Schema

Each table in Pinot is associated with a Schema. A schema defines what fields are present in the table along with the data types.

The schema is stored in the Zookeeper, along with the table configuration.

Date Time Fields

time size - the size of the time unit. This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds. If your date is not in EPOCH format, this value is not used and can be set to 1 or any other integer.\
time unit - one of enum values. e.g. HOURS , MINUTES etc. If your date is not in EPOCH

Here are some sample date-time formats you can use in the schema:

1:MILLISECONDS:EPOCH - used when timestamp is in the epoch milliseconds and stored in LONG format
1:HOURS:EPOCH - used when timestamp is in the epoch hours and stored in LONG or INT format

New DateTime Formats

TIMESTAMP - This represents timestamp in milliseconds. It is equivalent to specifying EPOCH:MILLISECONDS:1
EPOCH - This represents time in timeUnit since 00:00:00 UTC on 1 January 1970. You can also specify the timeSize parameter.This size is multiplied to the value present in the time column to get an actual timestamp. e.g. if timesize is 5 and value in time column is 4996308 minutes. The value that will be converted to epoch timestamp will be 4996308 * 5 * 60 * 1000 = 1498892400000 milliseconds. Examples -

Built-in Virtual Columns

There are several built-in virtual columns inside the schema the can be used for debugging purposes:

Column Name

Column Type

Data Type

Description

These virtual columns can be used in queries in a similar way to regular columns.

Creating a Schema

First, Make sure your and running.

Let's create a schema and put it in a JSON file. For this example, we have created a schema for flight data.

For more details on constructing a schema file, see the .

Then, we can upload the sample schema provided above using either a Bash command or REST API call.

Check out the schema in the to make sure it was successfully uploaded

SIMPLE_DATE_FORMAT - This represents time in the string format. The pattern should be specified using the java SimpleDateFormat representation. If no pattern is specified, we use ISO 8601 DateTimeFormat to parse the date times. Optionals are supported with ISO format so users can specify date time string in yyyy or yyyy-MM or yyyy-MM-dd and so on You can also specify optional timeZone parameter which is the ID for a TimeZone, either an abbreviation such as PST, a full name such as America/Los_Angeles, or a custom ID such as GMT-8:00. Examples -

SIMPLE_DATE_FORMAT
SIMPLE_DATE_FORMAT|yyyy-MM-dd HH:mm:ss
SIMPLE_DATE_FORMAT|yyyy-MM-dd|IST

Schema

Categories

Data Types

Date Time Fields

New DateTime Formats

Built-in Virtual Columns

Creating a Schema

Schema

Categories

Data Types

Date Time Fields

New DateTime Formats

Built-in Virtual Columns

Creating a Schema

Schema

hashtagCategories

hashtagData Types

hashtagDate Time Fields

hashtagNew DateTime Formats

hashtagBuilt-in Virtual Columns

hashtagCreating a Schema

Schema

hashtagCategories

hashtagData Types

hashtagDate Time Fields

hashtagNew DateTime Formats

hashtagBuilt-in Virtual Columns

hashtagCreating a Schema

Categories

Data Types

Date Time Fields

New DateTime Formats

Built-in Virtual Columns

Creating a Schema

Categories

Data Types

Date Time Fields

New DateTime Formats

Built-in Virtual Columns

Creating a Schema