OFFLINE
for offline table, REALTIME
for realtime table. A hybrid table is essentially 2 table configs one of each type, with the same table name.403
exception with message, Quota check failed for segment: segment_0 of table: pinotTable
.429
exception with message Request 123 exceeds query quota for table:pinotTable, query:select count(*) from pinotTable
will be sent, and a BrokerMetric QUERY_QUOTA_EXCEEDED
will be recorded. The application should build an exponential backoff and retry mechanism to react to this exception.partition
- prunes segments based on the partition metadata stored in zookeeper. By default, there is no pruner. For more details on how to configure this check out Querying All Segments
time
- prunes segments for queries filtering on timeColumnName
that do not contain data in the query's time rangebalanced
- balances the number of segments served by each selected instance. Default.
replicaGroup
- instance selector for replica group routing strategy.
For more details on how to configure this check out Querying All Servers​APPEND
, optional for REFRESH.
timeColumnName along with timeColumnType is used to manage segment retention and time boundary for offline vs realtime.365 DAYS
in the example means that segments containing data older than 365 days will be deleted periodically. This is done by the RetentionManager
Controller periodic task. By default, no retention is set.APPEND
- new data segments pushed periodically, to append to the existing data eg. daily or hourly
REFRESH
- the entire data is replaced every time during a data push. Refresh tables have no retention.HOURLY
, DAILY
foo
, bar
, moo
select count(*) from T where latency > 3000
will be faster if you enable range index for latencyMurmur
- murmur2 hash functionModulo
- modulo on integer valuesHashCode
- java hashCode() functionByteArray
- java hashCode() on deserialized byte array{
"foo": {
"functionName": "Murmur",
"numPartitions": 32
}
}
heap
- load data directly into direct memory
mmap
- load data segments to off-heap memoryNONE
- do not generate for any columns
ALL
- generate for all columns
TIME
- generate for only time column
NON_METRIC
- generate for time and dimension columnsIS NULL
or IS NOT NULL
predicates in the query. Enabling this will lead to additional memory and storage usage per segment. By default, this is set to false.true
to pre-aggregate the metricstrue
if you want to disable dictionaries for single valued metric columns. Only applicable to single-valued metric columns. If a column is specified Default false
optimizeDictionaryForMetrics
enabled, dictionary is not created for the metric columns for which noDictionaryIndexSize/ indexWithDictionarySize
is less than the noDictionarySizeRatioThreshold
Default: 0.85
RAW
or DICTIONARY
TEXT
is supported.enableQueryCacheForTextIndex
- set to true
to enable caching for text index in LucenerawIndexWriterVersion
deriveNumDocsPerChunkForRawIndex
DOWNLOAD
or emptyhttp
or https
streamConfigs
section.kafka
is supported at the momentsmallest
, largest
or a timestamp in millis1d
, 4h30m
Default is 6 hours.150M
, or 1.1G
, etc. This value is used when realtime.segment.flush.threshold.rows
is set to 0. Default is 200M
i.e. 200 MegaBytesrealtime.segment.flush.threshold.rows
is set o 0 and the consumer type is LowLevel
.100000 (ie 100K).
realtime.segment.flush.threshold.rows
, the actual number of rows per segment is computed using the following formula:
``
realtime.segment.flush.threshold.rows / partitionsConsumedByServer
realtime.segment.flush.threshold.rows=1000
and each server consumes 10 partitions, the rows per segment will be:1000/10 = 100
streamConfigs
section may look like:realtimeConsuming
or realtimeCompleted
${ENV_NAME}
or ${ENV_NAME:DEFAULT_VALUE}
as field values in table config."$ENV_NAME"
is not supported.