Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page contains guides related to importing data from Apache Kafka using stream ingestion.
Loading...
This section contains a collection of short guides to show you how to import from a Pinot supported file system.
Loading...
Loading...
Loading...
This section contains a collection of guides that will show you how to import data from a Pinot supported input format.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This section contains articles that provide technical and implementation details of Pinot features
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Here you will find a collection of ready-made sample applications and examples for real-world data
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Pinot Controller is responsible for a number of things
Controllers maintain the global metadata (e.g. configs and schemas) of the system with the help of Zookeeper which is used as the persistent metadata store.
Controllers host Helix Controller and is responsible for managing other pinot components (brokers, servers, minions)
They maintain the mapping of which servers are responsible for which segments. This mapping is used by the servers, to download the portion of the segments that they are responsible for. This mapping is also used by the broker to decide which servers to route the queries to.
Controller has admin endpoints for viewing, creating, updating and deleting configs which help us manage and operate the cluster.
Controllers also have endpoints for segment uploads which are used in offline data pushes. They are responsible for initializing realtime consumption and coordination of persisting the realtime segments into the segment store periodically.
They undertake other management activities such as managing retention of segments, validations.
There can be multiple instances of Pinot controller for redundancy. If there are multiple controllers, Pinot expects that all of them are configured with the same back-end storage system so that they have a common view of the segments (e.g. NFS). Pinot can use other storage systems such as HDFS or ADLS.
Make sure you've setup Zookeeper. If you're using docker, make sure to pull the pinot docker image. To start a controller
Pinot Minion is a new component which leverages the Helix Task Framework . It can be attached to an existing Pinot cluster and then execute tasks as provided by the controller. It's a generic and single place for running background jobs. They help offload computationally intensive tasks—such as adding indexes to segments and merging segments—from other components.
A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (documents).
Data in Pinot tables is sharded into segments. A Pinot table is modeled as a Helix resource. Each segment of a table is modeled as a Helix Partition.
A table is typically associated with a schema, which is used to define the names, data types and other information of the columns of the table.
There are 3 types of a Pinot table
Table type
Description
Offline
Offline tables ingest pre-built pinot-segments from external data stores.
Realtime
Realtime tables ingest data from streams (such as Kafka) and build segments.
Hybrid
A hybrid Pinot table has both realtime as well as offline tables under the hood.
Note that the query does not know the existence of offline or realtime tables. It only specifies the table name in the query. For example, regardless of whether we have an offline table myTable_OFFLINE
, or a realtime table myTable_REALTIME
or a hybrid table containing both of these, the query will simply use mytable
as select count(*) from myTable
.
A table config file is used to define the table properties, such as name, type, indexing, routing, retention etc. It is written in JSON format, and stored in the property store in Zookeeper, along with the table schema.
Here's an example table config for an offline table
We will now discuss each section of the table config in detail.
Top level field
Description
tableName
Specifies the name of the table. Should only contain alpha-numeric characters, hyphens (‘-‘), or underscores (‘’). (Using a double-underscore (‘_’) is not allowed and reserved for other features within Pinot)
tableType
Defines the table type - OFFLINE
for offline table, REALTIME
for realtime table. A hybrid table is essentially 2 table configs one of each type, with the same table name.
quota
routing
segmentsConfig
tableIndexConfig
tenants
metadata
This section is for keeping custom configs, which are expressed as key value pairs.
quota fields
Description
storage
The maximum storage space the table is allowed to use, before replication. For example, in the above table, the storage is 140G and replication is 3. Therefore, the maximum storage the table is allowed to use is 140*3=420G. The space used by the table is calculated by adding up the sizes of all segments from every server hosting this table. Once this limit is reached, offline segment push throws a 403
exception with message, Quota check failed for segment: segment_0 of table: pinotTable
.
maxQueriesPerSecond
The maximum queries per second allowed to execute on this table. If query volume exceeds this, a 429
exception with message,Request 123 exceeds query quota for table:pinotTable, query:select count(*) from pinotTable