Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
For details on how to set up a table, refer to Creating a table.
When operating Pinot in a production environment, it's not always ideal to have servers immediately available for querying when they have started up. This is especially so for real-time servers that may have to re-consume some amount of data before they are "caught up". Pinot offers several strategies for determining when a server is up, healthy, and available for querying.
Pinot servers have several endpoints for determining the health of the servers.
GET /health/liveness
answers "is this server up." This only ensures that the server was able to start, and you can connect to it.
GET /health/readiness
answers "is this server up and ready to server data." The checkers below determine if the "readiness" aspect returns OK
.
GET /health
performs the same check as the readiness endpoint.
It's possible to operate Pinot with no checkers at all by disabling the following configurations, but this is not recommended. Instead, the defaults here are the following:
Pinot will wait up to 10 minutes for all server startup operations to complete. This will wait for the server's Ideal State to match its External State before marking the server as healthy. This could be mean downloading segments, building indices, and creating consumption streams. It is recommended to start with the default time and add more time as needed.
Waiting for Ideal State to match External State is not configurable. If enableServiceStatusCheck=true
, this will always be one of the checks.
The most basic startup check is the static one. It is configured by the following:
In the above example, a Pinot server will wait 60 seconds for all consuming segments before becoming healthy and available for serving queries. This gives the servers 1 minute to consume data un-throttled before being marked as healthy. Overall, the server will still only wait 10 minutes for all startup actions to complete. So make sure realtimeConsumptionCatchupWaitMs
< timeoutMs
.
The first option to determine fresher real-time data is the offset based status checker. This checker will determine the end offset of each consuming segment at the time of Pinot startup. It will then consume to that offset before marking the segment as healthy. Once all segments are healthy, this checker will return healthy.
There are some caveats to note here:
realtimeConsumptionCatchupWaitMs
must still be set. This checker will only wait as long as the value for realtimeConsumptionCatchupWaitMs
.
This checker will not ever recompute end offsets after it starts. With high real-time volume, you will still be behind. This means if your server takes 8 minutes to startup and have this checker become healthy, you will be 8 minutes behind and rapidly consuming data once the server starts serving queries.
The strictest checker Pinot offers is the freshness based one. This works similarly to the offset checker but with an extra condition. The actual events in that stream must meet a minimum freshness before the server is marked as healthy. This checker provides the best freshness guarantees for real-time data at the expense of longer startup time.
In the example above, the Pinot server will wait up to 1 minute for all consuming streams to have data within 10 seconds of the current system time. This is re-evaluated for each pass of the checker, so this checker gives the best guarantee of having fresh data before a server starts. This checker also checks the current offset a segment is at compared to the max offset of the stream, and it will mark the segment as healthy when those are equal. This is useful when you have a low volume stream where there may never be data fresher than realtimeConsumptionCatchupWaitMs
.
There are still some caveats that apply here:
realtimeConsumptionCatchupWaitMs
must still be set. This checker will only wait as long as the value for realtimeConsumptionCatchupWaitMs
.
your events must implement getMetadataAtIndex
to pass the event timestamp correctly. The current kafka, kinesis, and pulsar implementations already do this using the event ingestion time. But if your data takes multiple hops, it will only count the freshness from the last hop.
The recommended configurations in QA attempt to balance performing valid checks with fast and successful startup. We do not exit the server if startup status is failing to avoid crashloops, but we also do not wait indefinitely to catch up if events are not being consumed. A stuck partition will lead to ingestion lag here.
The recommended configurations in production optimize for the highest availability, correctness, and lowest ingestion lag. We wait indefinitely for segment freshness to match the minimum criteria, and we stop the server if status checks are not met by the timeout.
It is important to get your timeout configuration correct, otherwise servers will indefinitely stop if they cannot meet the freshness threshold in the allotted time.
Decouple the controller from the data path for real-time Pinot tables.
For real-time tables, when a Pinot server finishes consuming a segment, the segment goes through a completion protocol sequence. By default, the segment is uploaded to the lead Pinot controller which in turn persists the segment to deep store (for example, NFS, S3 or HDFS). As a result, because all real-time segments flow through the controller, it may become a bottleneck and slow down the overall ingestion rate. To overcome this limitation, we've added a new stream-level configuration to bypass the controller and upload the completed segment to deep store directly.
To upload the completed segment to the deep store directly, add the following stream-level configuration.
When this configuration is enabled, Pinot servers attempt to upload the completed segment to the segment store directly, bypassing the controller. When finished, Pinot updates the controller with the corresponding segment metadata.
pinot.server.instance.segment.store.uri
is optional by default. However, this config is required so that the server knows where the deep store is. Before enabling realtime.segment.serverUploadToDeepStore
on the table, verify the pinot.server.instance.segment.store.uri=<controller.data.dir>
is configured on the servers.
Peer download policy allows failure recovery in case uploading the completed segment to the deep store fails. If the segment store is unavailable, the corresponding segments can still be downloaded directly from the Pinot servers.
This scheme only works for real-time tables using the Low Level Consumer (LLC) mode. To enable peer download for segments, update the controller, server, and table configurations as follows:
Add the followings to the controller configuration:
Add the following things to the server configuration:
Here. the URI of segment store should point to the full path in the corresponding data directory, with both the filesystem scheme and path (eg: file://dir
or hdfs://path
or s3://path
).
Replace pinot.server.storage.factory.class.(scheme) with the corresponding scheme (for example, hdfs, s3 or gcs) of the segment store URI configured above. Then, add the PinotFS subclass for the scheme as the config value.
In this case, the peerSegmentDownloadScheme
can be either http
or https
.
Enabling peer download may incur LLC segments failed to be uploaded to segment store in some failure cases, e.g. segment store is unavailable during segment completion. Add the following controller config to enable the upload retry by a controller periodic job asynchronously.
To set up a Pinot cluster, follow these steps
instances
instances
instances
Add the following to the real-time :
This page introduces all the instance assignment strategies, when to use them, and how to configure them.
Instance assignment is the strategy of assigning the servers to host a table. Each instance assignment strategy is associated with one segment assignment strategy (read more about Segment Assignment).
Instance assignment is configured via the InstanceAssignmentConfig. Based on the config, Pinot can assign servers to a table, then assign segments to servers using the segment assignment strategy associated with the instance assignment strategy.
There are 3 types of instances for the InstanceAssignmentConfig: OFFLINE
, CONSUMING
and COMPLETED
. OFFLINE
represents the instances hosting the segments for the offline table; CONSUMING
represents the instances hosting the consuming segments for the real-time table; COMPLETED
represents the instances hosting the completed segments for the real-time table. For real-time table, if COMPLETED
instances are not configured, completed segments will use the same instance assignment strategy as the consuming segments. If it is configured, completed segments will be automatically moved to the COMPLETED
instances periodically.
The default instance assignment strategy simply assigns all the servers in the cluster to each table, and uses the Balanced Segment Assignment for the table. This strategy requires no extra configurations for the cluster, and it works well for small clusters with few tables where all the resources can be shared among all the tables.
For performance critical use cases, we might not want to share the server resources for multiple use cases to prevent the use case being impacted by other use cases hosted on the same set of servers. We can use the Tag-Based Instance Assignment to achieve isolation for tables.
(Note: Logically the Tag-Based Instance Assignment is identical to the Tenant concept in Pinot, but just a different way of configuring the table. We recommend using the instance assignment over the tenant config because it can achieve more complex assignment strategies, as described below.)
In order to use the Tag-Based Instance Assignment, the servers should be tagged via the Helix InstanceConfig, where the tag suffix (_OFFLINE
or _REALTIME
) denotes the type of table the server is going to serve. Each server can have multiple tags if necessary.
After configuring the server tags, the Tag-Based Instance Assignment can be enabled by setting the tag
within the InstanceAssignmentConfig for the table as shown below. Only the servers with this tag will be assigned to host this table, and the table will use the Balanced Segment Assignment.
On top of the Tag-Based Instance Assignment, we can also control the number of servers assigned to each table by configuring the numInstances
in the InstanceAssignmentConfig. This is useful when we want to serve multiple tables of different sizes on the same set of servers. For example, suppose we have 30 servers hosting hundreds of tables for different analytics, we don’t want to use all 30 servers for each table, especially the tiny tables with only megabytes of data.
In order to use the Replica-Group Segment Assignment, the servers need to be assigned to multiple replica-groups of the table, where the Replica-Group Instance Assignment comes into the picture. Enable it and configure the numReplicaGroups
and numInstancesPerReplicaGroup
in the InstanceAssignmentConfig, and Pinot will assign the instances accordingly.
Similar to the Replica-Group Segment Assignment, in order to use the Partitioned Replica-Group Segment Assignment, servers not only need to be assigned to each replica-group, but also the partition within the replica-group. Adding the numPartitions
and numInstancesPerPartition
in the InstanceAssignmentConfig can fulfill the requirement.
(Note: The numPartitions
configured here does not have to match the actual number of partitions for the table in case the partitions of the table changed for some reason. If they do not match, the table partition will be assigned to the server partition in a round-robin fashion. For example, if there are 2 server partitions, but 4 table partitions, table partition 1 and 3 will be assigned to server partition 1, and table partition 2 and 4 will be assigned to server partition 2.)