1.2.0
Release Notes for 1.2.0
Was this helpful?
Release Notes for 1.2.0
Was this helpful?
This release comes with several Improvements and Bug Fixes for the Multistage Engine, Upserts and Compaction. There are a ton of other small features and general bug fixes.
LEAD allows you to access values after the current row in a frame.
LAG allows you to access values before the current row in a frame.
FIRST_VALUE and LAST_VALUE return the respective extremal values in the frame.
V2 Engine now supports a "database" construct, enabling table namespace isolation within the same Pinot cluster.
Improves user experience when multiple users are using the same Pinot Cluster.
Access control policies can be set at the database level.
Database can be selected in a query using a SET statement, such as SET database=my_db;
.
Added array sum aggregation functions for point-wise array operations .
Added support for valueIn
MV transform function .
Fixed bug in numeric casts for MV columns in filters .
Fixed NPE in ArrayAgg when a column contains no data .
Fixed array literal handling .
WITHIN GROUP
Clause can be used to process rows in a given order within a group.
One of the most common use-cases for this is the ListAgg
function, which when combined with WITHIN GROUP
can be used to concatenate strings in a given order.
Minions now support resource isolation based on an instance tag.
Instance tag is configured at table level, and can be set for each task on a table.
This enables you to implement arbitrary resource isolation strategies, i.e. you can use a set of Minion Nodes for running any set of tasks across any set of tables.
Upsert compaction now schedules segments for compaction based on the number of invalid docs.
This helps the compaction task to handle arbitrary temporal distribution of invalid docs.
Adds different modes of consistency guarantees for Upsert tables.
Adds a new UpsertConfig called consistencyMode
which can be set to NONE, SYNC, SNAPSHOT
.
SYNC
is optimized for data freshness but can lead to elevated query latencies and is best for low-qps use-cases. In this mode, the ingestion threads will take a WLock when updating validDocID bitmaps.
SNAPSHOT
mode can handle high-qps/high-ingestion use-cases by getting the list of valid docs from a snapshot of validDocID. The snapshot can be refreshed every few seconds and the tolerance can be set via a query option upsertViewFreshnessMs
.
Partial Upsert merges the old record and the new incoming record to generate the final ingested record.
Pinot now allows users to customize how this merge of an old row and the new row is computed.
This allows a column value in the new row to be an arbitrary function of the old and the new row.
Segments uploaded for Upsert Backfill can now explicitly specify the Kafka partition they belong to.
This enables backfilling an Upsert table where the externally generated segments are partitioned using an arbitrary hash function on an arbitrary primary key.
Added funnelMaxStep
function which can be used to calculate max funnel steps for a given sliding window .
Added funnelCompleteCount
to calculate the number of completed funnels, and funnelMatchStep
to get the funnel match array.
Prior to this feature, on a segment commit, Pinot would convert all the columnar data from the Mutable Segment to row-major, and then re-build column major Immutable Segments.
This feature skips the row-major conversion and is expected to be both space and time efficient.
It can help lower ingestion lag from segment commits, especially helpful when your segments are large.
You can now prettify SQL right in the Controller UI!
Added a new lossless hash-function for Upsert Primary Keys optimized for UUIDs.
The hash function can reduce Old Gen by up to 30%.
It maps a UUID to a 16 byte array, vs encoding it in a UTF string which would take 36 bytes.
Convenient for debugging impact of indexes on query performance or results.
You can add the skipIndexes
option to your query to skip any number of indexes. e.g. SET skipIndexes=inverted,range;
New GeoHash functions: encodeGeoHash
, decodeGeoHash
, decodeGeoHashLatitude
and decodeGeoHashLongitude
.
dateBin
can be used to align a timestamp to the nearest time bucket.
To enable this, you can set the compressionCodec
in the fieldConfigList
of the column you want to target.
do not fail on duplicate relaxed vars (#13214)z
make reflection calls compatible with 0.9.11 [#12958](https://github.com/apache/
Added Geospatial Scalar Function support for use in intermediate stage in the v2 query engine .
Fix 'WEEK' transform function .
Support EXTRACT
as a scalar function .
Added support for ALL modifier for INTERSECT and EXCEPT Set Operations .
Fixed bug in handling literal arguments in aggregation functions like Percentile .
Allow INT and FLOAT literals .
Fixed literal handling for all types .
Fixed null literal handling for null intolerant functions .
Added new metrics for tracking queries executed globally and at the table level .
New metrics to track join counts and window function counts .
Multiple meters and timers to track Multistage Engine Internals .
Improved Window operators resiliency, with new checks to make sure the window doesn't grow too large .
Optimized Group Key generation .
Fixed SortedMailboxReceiveOperator
to honor convention of pulling at most 1 EOS block .
Improvement in how execution stats are handled .
Use Protobuf instead of Reflection for Plan Serialization .
Minions can now download segments from servers when deepstore copy is missing. This feature is enabled via a cluster level config allowDownloadFromServer
.
Added support for TLS Port in Minions .
New metrics added for Minions to track segment/record processing information .
Minions can now handle invalid instance tags in Task Configs gracefully. Prior to this change, Minions would be stuck in IN_PROGRESS
state until task timeout .
Fix bug to return validDocIDsMetadata from all servers .
Upsert compaction doesn't retain maxLength information and trims string fields .
Fixed a Bug in Handling Equal Comparison Column Values in Upsert, which could lead to data inconsistency ()
Upsert snapshot will now snapshot only those segments which have updates. .
JSON Index can now be used for evaluating Regex and Range Predicates.
jsonExtractIndex
now supports contextual array filters. .
JSON column type now supports filter predicates like =
, !=
, IN
and NOT IN
. This is convenient for scenarios where the JSON values are very small. .
JSON_MATCH
now supports exclusive predicates correctly. For instance, you can use predicates such as JSON_MATCH(person, '"$.addresses[*].country" != ''us'''
to find all people who have at least one address that is not in the US. .
jsonExtractIndex
supports extracting Multi-Value JSON Fields, and also supports providing any default value when the key doesn't exist. .
Added isJson
UDF which increases your options to handle invalid JSONs. This can be used in queries and for filtering invalid json column values in ingestion. .
Fix ArrayIndexOutOfBoundsException
in jsonExtractIndex
. .
Improved Segment Build Time for Lucene Text Index by 40-60%. This improvement is realized when a consuming segment commits and changes to an ImmutableSegment
. This significantly helps in lowering ingestion lag at commit time due to a large text index .
Phrase Search can run 3x faster when the Lucene Index Config enablePrefixSuffixMatchingInPhraseQueries
is set to true
. This is achieved by rewriting phrase search query to a wildcard and prefix matching query .
Fixed bug in TextMatchFilterOptimizer
that was not applying precedence to the filter expressions properly, which could lead to incorrect results. .
Fixed bug in handling NOT text_match
which could have returned incorrect results. .
Added SchemaConformingTranformerV2
to enhance text search abilities. .
Added metrics to track Lucene NRT Refresh Delay .
Switched to NRTCachingDirectory
for Realtime segments and prevented duplicates in the Realtime Lucene Index to avoid IndexOutOfBounds
query time exceptions. .
Lucene Version is upgraded to 9.11.1. .
This can reduce the heap usage of a dictionary encoded byte column, for a certain distribution of duplicate values. See for details.
prefixes
, suffixes
and uniqueNgrams
UDFs for generating all respective string subsequences from a string input. .
Added isJson
UDF which increases your options to handle invalid JSONs. This can be used in queries and for filtering invalid json column values in ingestion. .
splitPart
UDF has minor improvements. .
is a compressed log processor which has really high compression ratio for certain log types.
Enable segment preloading at partition level .
Use Temurin instead of AdoptOpenJdk
Adding record reader config/context param to record transformer
Removing legacy commons-lang dependency
12508: Feature add segment rows flush config
ADSS Race Condition and update to client error codes
Add ExceptionMapper to convert Exception to Response Object for Broker REST API's
Add FunnelMaxStepAggregationFunction and FunnelCompleteCountAggregationFunction
Add GZIP Compression Codec (#11434)
Add PodDisruptionBudgets to the Pinot Helm chart
Add Postgres compliant name aliasing for String Functions.
Add SchemaConformingTransformerV2 to enhance text search abilities
Add a benchmark to measure multi-stage block serde cost
Add a plan version field to QueryRequest Protobuf Message
Add a post-validator visitor that verifies there are no cast to bytes
Add a safe version of CLStaticHttpHandler
that disallows path traversal.
Add ability to track filtered messages offset
Add back 'numRowsResultSet' to BrokerResponse, and retain it when result table id hidden
Add back profile for shade
Add back some exclude deps from hadoop-mapreduce-client-core
Add backward compatibility regression test suite for multi-stage query engine
Add base class for custom object accumulator
Add clickstream example table for funnel analysis
Add config option for timezone
Add config to skip record ingestion on string column length exceeding configured max schema length
Add controller API to get allLiveInstances
Add isJson UDF
Add list of collaborators to asf.yaml
Add locking logic to get consistent table view for upsert tables
Add metric to track number of segments missed in upsert-snapshot
Add metrics for SEGMENTS_WITH_LESS_REPLICAS monitoring
Add mode to allow adding dummy events for non-matching steps
Add offset based lag metrics
Add protobuf codegen decoder
Add retry policy to wait for job id to persist during rebalancing
Add round-robin logic during downloadSegmentFromPeer
Add schema as input to the decoder.
Add splitPartWithLimit and splitPartFromEnd UDFs
Add support for creating raw derived columns during segment reload
Add support for raw JSON filter predicates
Add the possibility of configuring ForwardIndexes with compressionCodec
Add upsert-snapshot timer metric
Add validation check for forward index disabled if it's a REALTIME table
Added PR compatability test against release 1.1.0
Added kafka partition number to metadata.
Added pinot-error-code header in query response
Added tests for additional data types in SegmentPreProcessorTest.java
Adding a cluster config to enable instance pool and replica group configuration in table config
Adding batch api support for WindowFunction
Adding bytes string data type integration tests
Adding registerExtraComponents to allow registering additional components in various services
Adding support of insecure TLS
Adding support to insecure TLS when creating SSLFactory
Adds AGGREGATE_CASE_TO_FILTER rule
Adds per-column, query-time index skip option
Allow Aggregations in Case Expressions
Allow PintoHelixResourceManager subclasses to be used in the controller starter by providing an overridable PinotHelixResouceManager object creator function
Allow RequestContext to consider http-headers case-insensitivity
Allow Server throttling just before executing queries on server to allow max CPU and disk utilization
Allow all raw index config in star-tree index
Allow apply both environment variables and system properties to user and table configs, Environment variables take precedence over system properties
Allow configurable queryWorkerThreads in Pinot server side GrpcQueryServer
Allow dynamically setting the log level even for loggers that aren't already explicitly configured
Allow passing custom record reader to be inited/closed in SegmentProcessorFramework
Allow passing database context through database
http header
Allow stop to interrupt the consumer thread and safely release the resource
Allow user configurable regex library for queries
Allow using 'serverReturnFinalResult' to optimize server partitioned table
Assign default value to newly added derived column upon reload
Avoid port conflict in integration tests
Better handling of null tableNames
CLP as a compressionCodec
Change helm app version to 1.0.0 for Apache Pinot latest release version
Clean Google Dependencies
Clean up BrokerRequestHandler and BrokerResponse
Clean up arbitrary sleep in /GrpcBrokerClusterIntegrationTest
Cleaning up vector index comments and exceptions
Cleanup HTTP components dependencies and upgrade Thrift
Cleanup Javax and Jakarta dependencies
Cleanup deprecated query options
Cleanup the consumer interfaces and legacy code
Cleanup unnecessary dependencies under pinot-s3
Cleanup unused aggregate internal hint
Consistency in API response for live broker
Consolidate bouncycastle libraries
Consolidate nimbus-jose-jwt version to 9.37.3
ControllerRequestClient accepts headers. Useful for authN tests
Custom configuration property reader for segment metadata files
Delete database API
Deprecate PinotHelixResourceManager#getAllTables() in favour of getAllTables(String databaseName)
Detect expired messages in Kafka. Log and set a gauge.
Do not hard code resource class in BaseClusterIntegrationTest
Do not pause ingestion when upsert snapshot flow errors out
Don't drop original field during flatten
Don't enforce -realTimeInstanceCount and -offlineInstanceCount options when creating broker tenants
Egalpin/skip indexes minor changes
Emit Metrics for Broker Adaptive Server Selector type
Emit table size related metrics only in lead controller
Enable complexType handling in SegmentProcessFramework
Enable more integration tests to run on the v2 multi-stage query engine
Enabling avroParquet to read Int96 as bytes
Enhance Kinesis consumer
Enhance Parquet Test
Enhance ProtoSerializationUtils to handle class move
Enhance Pulsar consumer
Enhance PulsarConsumerTest
Enhance commit threshold to accept size threshold without setting rows to 0
Enhance json index to support regexp and range predicate evaluation
Enhancement: Sketch value aggregator performance
Ensure FieldConfig.getEncodingType() is never null
Ensure all the lists used in PinotQuery are ArrayList
Ensure brokerId and requestId are always set in BrokerResponse
Enter segment preloading at partition level
Exclude dimensions from star-tree index stored type check
Expose more helper API in TableDataManager
Extend compatibility verifier operation timeout from 1m to 2m to reduce flakiness
Extract json individual array elements from json index for the transform function jsonExtractIndex
Fetch query quota capacity utilization rate metric in a callback function
First with time
GitHub Actions checkout v4
Gzip compression, ensure uncompressed size can be calculated from compressed buffer
Handle errors gracefully during multi-stage stats collection in the broker
Handle shaded classes in all methods of kafka factory
Hash Function for UUID Primary Keys
Ignore case when checking for Direct Memory OOM
Improve Retention Manager Segment Lineage Clean Up
Improve error message for max rows in join limit breach
Improve exception logging when we fail to index / transform message
Improve logging in range index handler for index updates
Improve upsert compaction threshold validations
Improve warn logs for requesting validDocID snapshots
Improved metrics for server grpc query
Improved null check for varargs
Improved segment build time for Lucene text index realtime to offline conversion
In ClusterTest, make start port higher to avoid potential conflict with Kafka
Introduce PinotLogicalAggregate and remove internal hint
Introduce retries while creating stream message decoder for more robustness
Isolate bad server configs during broker startup phase
Issue #12367
Json extract index filter support
Json extract index mv
Keep get tables API with and without database
Lint failure
Logging a warn message instead of throwing exception
Made the error message around dimension table size clearer
Make Helix state transition handling idempotent
Make KafkaConsumerFactory method less restrictive to avoid incompatibility
Make task manager APIs database aware
Metric for count of tables configured with various tier backends
Metric for upsert tables count
Metrics for Realtime Rows Fetched and Stream Consumer Create Exceptions
Minmaxrange null
Modify consumingSegmentsInfo endpoint to indicate how many servers failed
Move offset validation logic to consumer classes
Move package org.apache.calcite to org.apache.pinot.calcite
Move resolveComparisonTies from addOrReplaceSegment to base class
Move some mispositioned tests under pinot-core
Move wildfly-openssl dependency management to root pom
Moving deleteSegment call from POST to DELETE call
Optimize unnecessary extra array allocation and conversion for raw derived column during segment reload
Pass explicit TypeRef when evaluating MV jsonPath
Percentile operations supporting null
Prepare for next development iteration
Propagate Disable User Agent Config to Http Client
Properly handle complex type transformer in segment processor framework
Properly return response if SegmentCompletion is aborted
Publish helm 0.2.8
Publish helm 0.2.9
Pull janino dependency to root pom
Pull pulsar version definitaion into root POM
Query response opt
Re-enable the Spotless plugin for Java 21
Readme - How to setup Pinot UI for development
Record enricher
Refactor PinotTaskManager class
Refactored CommonsConfigurationUtils for loading properties configuration.
Refactored compatibility-verifier module
Refactoring removeSegment flow in upsert
Refine PeerServerSegmentFinder
Refine SegmentFetcherFactory
Replace custom fmpp plugin with fmpp-maven-plugin
Reposition query submission spot for adaptive server selection
Reset controller port when stopping the controller in ControllerTest
Rest Endpoint to Create ZNode
Return clear error message when no common broker found for multi-stage query with tables from different tenants
Returning tables names failing authorization in Exception of Multi State Engine Queries
Revert " Adding record reader config/context param to record transformer (#12520)"
Revert "Using local copy of segment instead of downloading from remote (#12863)"
Short circuit SubPlanFragmenter because we don't support multiple sub-plans yet
Simplify Google dependencies by importing BOM
Specify version for commons-validator
Support NOT in StarTree Index
Support empty strings as json nodes^
Supporting human-readable format when configuring broker response size
Use ArrayList instead of LinkedList in SortOperator
Use a two server setup for multi-stage query engine backward compatibility regression test suite
Use more efficient variants of URLEncoder::encode and URLDecoder::decode
Use parameterized log messages instead of string concatenation
Use separate action for /tasks/scheduler/jobDetails API
Use try-with-resources to close file walk stream in LocalPinotFS
Using local copy of segment instead of downloading from remote
[Adaptive Server Selector] Add metrics for Stats Manager Queue Size
[Cleanup] Move classes in pinot-common to the correct package
[Feature] Add Support for SQL Formatting in Query Editor
[HELM]: Added additional probes options and startup probe.
[HELM]: Added checksum config annotation in stateful set for broker, controller and server
[HELM]: Added namespace support in K8s deployment.
[HELM]: zookeeper chart upgrade to version 13.2.0
[Minor] Add Nullable annotation to HttpHeaders in BrokerRequestHandler
[Minor] Small refactor of raw index creator constructor to be more clear
[Multi-stage] Clean up RelNode to Operator handling
[null-aggr] Add null handling support in mode
aggregation
[partial-upsert] configure early release of _partitionGroupConsumerSemaphore in RealtimeSegmentDataManager
[spark-connector] Add option to fail read when there are invalid segments
add Netty arm64 dependencies
add Netty unit test
add SegmentContext to collect validDocIds bitmaps for many segments together
add skipUnavailableServers
query option
add insecure mode when Pinot uses TLS connections
add instrumentation to json index getMatchingFlattenedDocsMap()
add jmx to promethues metric exporting rule for realtimeRowsFiltered
add metrics for IdeaState update
add some metrics for upsert table preloading
add some tests on jsonPathString
add test cases in RequestUtilsTest
add unit test for JsonAsyncHttpPinotClientTransport
add unit test for QueryServer
add unit test for ServerChannels
add unit test for StringFunctions encodeUrl
add unit tests for pinot-jdbc-client
add url assertion to SegmentCompletionProtocolTest
adjust the llc partition consuming metric reporting logic
allow passing null http headers object to translateTableName
allow to set segment when use SegmentProcessorFramework
auto renew jvm default sslconext when it's loaded from files
avoid useless intermediate byte array allocation for VarChunkV4Reader's getStringMV
aws sdk 2.25.3
build-helper-maven-plugin 3.5.0
cache ssl contexts and reuse them
clean up jetbrain nullable annotation
cleanup: maven no transfer progress
close JDBC connections
dropwizard metrics 4.2.25
dynamic chunk sizing for v4 raw forward index
enable Netty leak detection
enable parallel Maven in pinot linter script
ensure inverse And/OrFilterOperator implementations match the query
exclude .mvn directory from source assembly
extend CompactedPinotSegmentRecordReader so that it can skip deleteRecord
get startTime outside the executor task to avoid flaky time checks
handle absent segments so that catchup checker doesn't get stuck on them
handle overflow for MutableOffHeapByteArrayStore
buffer starting size
handle segments not tracked by partition mgr and add skipUpsertView query option
handle table name translation on missed api resources
hash4j version upgrade to 0.17.0
including the underlying exception in the logging output
int96 parity with native parquet reader
jsonExtractIndex support array of default values
log the log rate limiter rate for dropped broker logs
make http listener ssl config swappable
maven: no transfer progress
missed to delete the temp dir
move shouldReplaceOnComparisonTie to base class to be more reusable
reduce Java enum .values() usage in TimerContext
reduce logging for SpecialValueTransformer
reduce regex pattern compilation in Pinot jdbc
refactor TlsUtils class
refine when to registerSegment while doing addSegment and replaceSegment for upsert tables for better data consistency
reformat AdminConsoleIntegrationTest.java
reformat ClusterTest.java
release segment mgrs more reliably
replaced getServer with getServers
report rebalance job status for the early returns like noops
require noDictionaryColumns with aggregationConfigs
share the same table config object
track segments for snapshotting even if they lost all comparisons
untrack the segment out of TTL
update ControllerJobType from enum to string
update RewriterConstants so that expr min max would not collide with columns start with "parent"
update access control check error handling to catch throwable and log errors
Use gte(lte) to replace between() which has a bug
Fix the ConcurrentModificationException for And/Or DocIdSet
Upgrade RoaringBitmap to 1.0.5 to pick up the fix for RangeBitmap.between()
bugfix: do not move src ByteBuffer position for LZ4 length prefixed decompress
Bug Fix createDictionaryForColumn does not take into account inverted index
fix Cluster Manager error
fix for quick start Cluster Manager issue
Adding config for having suffix for client ID for realtime consumer
Addressed comments and fixed tests from pull request 12389. /uptime and /start-time endpoints working all components
Bigfix. Added missing paramName
Bug fix: Do not ignore scheme property
Bug fix: Handle missing shade config overwrites for Kafka
BugFix: Fix merge result from more than one server
Bugfix. Allow tenant rebalance with downtime as true
Bugfix. Avoid passing null table name input to translation util
Bugfix. Correct wrong method call from scheduleTask() to scheduleTaskForDatabase()
Bugfix. Maintain literal data type during function evaluation
Cleanup: Fix grammar in error message, also improve readability.
Fix Bug in Handling Equal Comparison Column Values in Upsert
Fix ColumnMinMaxValueGenerator
Fix JavaEE related dependencies
Fix Logging Location for CPU-Based Query Killing
Fix PulsarUtils to not share buffer
Fix URI construction so that AddSchema command line tool works when override flag is set to true
Fix [Type]ArrayList elements() method usage
Fix a typo when calculating query freshness
Fix an overflow in PinotDataBuffer.readFrom
Fix bug in logging in UpsertCompaction task
Fix bug to return validDocIDsMetadata from all servers
Fix connection issues if using JDBC and Hikari (#12267)
Fix controller host / port / protocol CLI option description for admin commands
Fix environment variables not applied when creating table
Fix error message for insufficient number of untagged brokers during tenant creation
Fix few metric rules which were affected by the database prefix handling
Fix file handle leaks in Pinot Driver (apache#12263)
Fix flakiness of ControllerPeriodicTasksIntegrationTest
Fix issue with startree index metadata loading for columns with '__' in name
Fix metric rule pattern regex
Fix pinot-parquet NoClassFound issue
Fix segment size check in OfflineClusterIntegrationTest
Fix some resource leak in tests
Fix the NPE from IS update metrics
Fix the NPE when metadataTTL is enabled without delete column
Fix the ServletConfig loading issue with swagger.
Fix the issue that map flatten shouldn't remove the map field from the record
Fix the race condition for H3InclusionIndexFilterOperator
Fix the time segment pruner on TIMESTAMP data type
Fix time stats in SegmentIndexCreationDriverImpl
Fixed infer logical type name from avro union schema
Fixing instance type to resolve and
Helm: bug fix for chart rendering issue.
Try to amend kafka common package with pinot shaded package prefix
Update getValidDocIdsMetadataFromServer to make call in batches to servers and other bug fixes
Upgrade com.microsoft.azure:msal4j from 1.3.5 to 1.3.10 for CVE fixing
[bugfix] Handling null value for kafka client id suffix
bugfix: fixing jdbc client sql feature not supported exception
bugfix: re-add support for not text_match
bugfix: reduce enum array allocation in QueryLogger
bugfix: use consumerDir during lucene realtime segment conversion
cleanup: fix apache rat violation
fix GuavaRateLimiter acquire method
fix fieldsToRead class not in decoder
fix flakey test, avoid early finalization
fix merging null multi value in partial upsert
fix race condition in ScalingThreadPoolExecutor
fix shared buffer, tests
fix(build): update node version to 16
fixing CVE critical issues by resolving kerby/jline and wildfly libraries
fixing pinot-adls high severity CVEs
fixing swagger setup using localhost as host name
swagger-ui upgrade to 5.15.0 Fixes
upgrade jettison version to fix CVE