githubEdit

First Table + Schema

Create your first Pinot schema and table, ready for data ingestion.

Outcome

By the end of this page you will have a Pinot schema and an offline table called transcript registered in your cluster, ready to receive data.

Prerequisites

  • A running Pinot cluster. See the install guides for Local or Docker.

  • For Docker users: the cluster must be on the pinot-demo network.

  • Confirm your Pinot version. See the Version reference page and set the PINOT_VERSION environment variable:

export PINOT_VERSION=<your-pinot-version>

Steps

1. Understand schemas

A Pinot schema defines every column in your table and assigns each one a column type. There are three column types:

Column type
Description

Dimension

Used in filters and GROUP BY clauses for slicing and dicing data.

Metric

Used in aggregations; represents quantitative measurements.

DateTime

Represents the timestamp associated with each row.

Every table must have a schema before it can accept data. The schema tells Pinot how to interpret, index, and store each field.

2. Create the data directory

3. Save the sample CSV data

Create the file /tmp/pinot-quick-start/rawdata/transcript.csv with the following contents:

In this dataset, studentID, firstName, lastName, gender, and subject are dimensions, score is a metric, and timestampInEpoch is the datetime column.

4. Save the schema

Create the file /tmp/pinot-quick-start/transcript-schema.json:

5. Understand table configs

A table config tells Pinot how to manage the table at runtime -- which columns to index, how many replicas to keep, which tenants to assign, and whether the table is OFFLINE (batch) or REALTIME (streaming). You pair one table config with one schema.

6. Save the offline table config

Create the file /tmp/pinot-quick-start/transcript-table-offline.json:

7. Upload the schema and table config

Verify

  1. Open the Pinot Data Explorer at http://localhost:9000arrow-up-right.

  2. Navigate to the Tables tab.

  3. Confirm you see transcript_OFFLINE listed.

If the table appears, the schema and table config were registered successfully.

Next step

You now have an empty table. Continue to First batch ingest to import the CSV data into your transcript table.

Last updated

Was this helpful?