LogoLogo
release-0.12.1
release-0.12.1
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Schema
      • Table
      • Segment
      • Deep Store
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension Table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • Stream Ingestion with Dedup
      • Stream Ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
      • Complex Type (Array, Map) Handling
    • Indexing
      • Forward Index
      • Inverted Index
      • Star-Tree Index
      • Bloom Filter
      • Range Index
      • Native Text Index
      • Text search support
      • JSON Index
      • Geospatial
      • Timestamp Index
    • Releases
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Aggregation Functions
      • Transformation Functions
      • User-Defined Functions (UDFs)
      • Grouping Algorithm
      • Query Options
      • Cardinality Estimation
      • Lookup UDF Join
      • Querying JSON data
      • Filtering with IdSet
      • Explain Plan
      • GapFill Function For Time-Series Dataset
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null Value Support
      • Multi-Stage Query Engine
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Setup cluster
      • Server Startup Status Checkers
      • Setup table
      • Setup ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication, Authorization, and ACLs
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FLOOR
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • yearOfWeek
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
Powered by GitBook
On this page
  • Summary
  • Support Segment Merge and Roll-up
  • UI Improvement
  • SQL Improvements
  • Performance Improvements
  • Other Notable New Features and Changes

Was this helpful?

Export as PDF
  1. Basics
  2. Releases

0.9.0

Summary

This release introduces a new features: Segment Merge and Rollup to simplify users day to day operational work. A new metrics plugin is added to support dropwizard. As usual, new functionalities and many UI/ Performance improvements.

The release was cut from the following commit: 13c9ee9 and the following cherry-picks: 668b5e0, ee887b9

Support Segment Merge and Roll-up

LinkedIn operates a large multi-tenant cluster that serves a business metrics dashboard, and noticed that their tables consisted of millions of small segments. This was leading to slow operations in Helix/Zookeeper, long running queries due to having too many tasks to process, as well as using more space because of a lack of compression.

To solve this problem they added the Segment Merge task, which compresses segments based on timestamps and rolls up/aggregates older data. The task can be run on a schedule or triggered manually via the Pinot REST API.

At the moment this feature is only available for offline tables, but will be added for real-time tables in a future release.

Major Changes:

  • Integrate enhanced SegmentProcessorFramework into MergeRollupTaskExecutor (#7180)

  • Merge/Rollup task scheduler for offline tables. (#7178)

  • Fix MergeRollupTask uploading segments not updating their metadata (#7289)

  • MergeRollupTask integration tests (#7283)

  • Add mergeRollupTask delay metrics (#7368)

  • MergeRollupTaskGenerator enhancement: enable parallel buckets scheduling (#7481)

  • Use maxEndTimeMs for merge/roll-up delay metrics. (#7617)

UI Improvement

This release also sees improvements to Pinot’s query console UI.

  • Cmd+Enter shortcut to run query in query console (#7359)

  • Showing tooltip in SQL Editor (#7387)

  • Make the SQL Editor box expandable (#7381)

  • Fix tables ordering by number of segments (#7564)

SQL Improvements

There have also been improvements and additions to Pinot’s SQL implementation.

New functions:

  • IN (#7542)

  • LASTWITHTIME (#7584)

  • ID_SET on MV columns (#7355)

  • Raw results for Percentile TDigest and Est (#7226),

  • Add timezone as argument in function toDateTime (#7552)

New predicates are supported:

  • LIKE(#7214)

  • REGEXP_EXTRACT(#7114)

  • FILTER(#7566)

Query compatibility improvements:

  • Infer data type for Literal (#7332)

  • Support logical identifier in predicate (#7347)

  • Support JSON queries with top-level array path expression. (#7511)

  • Support configurable group by trim size to improve results accuracy (#7241)

Performance Improvements

This release contains many performance improvement, you may sense it for you day to day queries. Thanks to all the great contributions listed below:

  • Reduce the disk usage for segment conversion task (#7193)

  • Simplify association between Java Class and PinotDataType for faster mapping (#7402)

  • Avoid creating stateless ParseContextImpl once per jsonpath evaluation, avoid varargs allocation (#7412)

  • Replace MINUS with STRCMP (#7394)

  • Bit-sliced range index for int, long, float, double, dictionarized SV columns (#7454)

  • Use MethodHandle to access vectorized unsigned comparison on JDK9+ (#7487)

  • Add option to limit thread usage per query (#7492)

  • Improved range queries (#7513)

  • Faster bitmap scans (#7530)

  • Optimize EmptySegmentPruner to skip pruning when there is no empty segments (#7531)

  • Map bitmaps through a bounded window to avoid excessive disk pressure (#7535)

  • Allow RLE compression of bitmaps for smaller file sizes (#7582)

  • Support raw index properties for columns with JSON and RANGE indexes (#7615)

  • Enhance BloomFilter rule to include IN predicate(#7444) (#7624)

  • Introduce LZ4_WITH_LENGTH chunk compression type (#7655)

  • Enhance ColumnValueSegmentPruner and support bloom filter prefetch (#7654)

  • Apply the optimization on dictIds within the segment to DistinctCountHLL aggregation func (#7630)

  • During segment pruning, release the bloom filter after each segment is processed (#7668)

  • Fix JSONPath cache inefficient issue (#7409)

  • Optimize getUnpaddedString with SWAR padding search (#7708)

  • Lighter weight LiteralTransformFunction, avoid excessive array fills (#7707)

  • Inline binary comparison ops to prevent function call overhead (#7709)

  • Memoize literals in query context in order to deduplicate them (#7720)

Other Notable New Features and Changes

  • Human Readable Controller Configs (#7173)

  • Add the support of geoToH3 function (#7182)

  • Add Apache Pulsar as Pinot Plugin (#7223) (#7247)

  • Add dropwizard metrics plugin (#7263)

  • Introduce OR Predicate Execution On Star Tree Index (#7184)

  • Allow to extract values from array of objects with jsonPathArray (#7208)

  • Add Realtime table metadata and indexes API. (#7169)

  • Support array with mixing data types (#7234)

  • Support force download segment in reload API (#7249)

  • Show uncompressed znRecord from zk api (#7304)

  • Add debug endpoint to get minion task status. (#7300)

  • Validate CSV Header For Configured Delimiter (#7237)

  • Add auth tokens and user/password support to ingestion job command (#7233)

  • Add option to store the hash of the upsert primary key (#7246)

  • Add null support for time column (#7269)

  • Add mode aggregation function (#7318)

  • Support disable swagger in Pinot servers (#7341)

  • Delete metadata properly on table deletion (#7329)

  • Add basic Obfuscator Support (#7407)

  • Add AWS sts dependency to enable auth using web identity token. (#7017)(#7445)

  • Mask credentials in debug endpoint /appconfigs (#7452)

  • Fix /sql query endpoint now compatible with auth (#7230)

  • Fix case sensitive issue in BasicAuthPrincipal permission check (#7354)

  • Fix auth token injection in SegmentGenerationAndPushTaskExecutor (#7464)

  • Add segmentNameGeneratorType config to IndexingConfig (#7346)

  • Support trigger PeriodicTask manually (#7174)

  • Add endpoint to check minion task status for a single task. (#7353)

  • Showing partial status of segment and counting CONSUMING state as good segment status (#7327)

  • Add "num rows in segments" and "num segments queried per host" to the output of Realtime Provisioning Rule (#7282)

  • Check schema backward-compatibility when updating schema through addSchema with override (#7374)

  • Optimize IndexedTable (#7373)

  • Support indices remove in V3 segment format (#7301)

  • Optimize TableResizer (#7392)

  • Introduce resultSize in IndexedTable (#7420)

  • Offset based realtime consumption status checker (#7267)

  • Add causes to stack trace return (#7460)

  • Create controller resource packages config key (#7488)

  • Enhance TableCache to support schema name different from table name (#7525)

  • Add validation for realtimeToOffline task (#7523)

  • Unify CombineOperator multi-threading logic (#7450)

  • Support no downtime rebalance for table with 1 replica in TableRebalancer (#7532)

  • Introduce MinionConf, move END_REPLACE_SEGMENTS_TIMEOUT_MS to minion config instead of task config. (#7516)

  • Adjust tuner api (#7553)

  • Adding config for metrics library (#7551)

  • Add geo type conversion scalar functions (#7573)

  • Add BOOLEAN_ARRAY and TIMESTAMP_ARRAY types (#7581)

  • Add MV raw forward index and MV BYTES data type (#7595)

  • Enhance TableRebalancer to offload the segments from most loaded instances first (#7574)

  • Improve get tenant API to differentiate offline and realtime tenants (#7548)

  • Refactor query rewriter to interfaces and implementations to allow customization (#7576)

  • In ServiceStartable, apply global cluster config in ZK to instance config (#7593)

  • Make dimension tables creation bypass tenant validation (#7559)

  • Allow Metadata and Dictionary Based Plans for No Op Filters (#7563)

  • Reject query with identifiers not in schema (#7590)

  • Round Robin IP addresses when retry uploading/downloading segments (#7585)

  • Support multi-value derived column in offline table reload (#7632)

  • Support segmentNamePostfix in segment name (#7646)

  • Add select segments API (#7651)

  • Controller getTableInstance() call now returns the list of live brokers of a table. (#7556)

  • Allow MV Field Support For Raw Columns in Text Indices (#7638)

  • Allow override distinctCount to segmentPartitionedDistinctCount (#7664)

  • Add a quick start with both UPSERT and JSON index (#7669)

  • Add revertSegmentReplacement API (#7662)

  • Smooth segment reloading with non blocking semantic (#7675)

  • Clear the reused record in PartitionUpsertMetadataManager (#7676)

  • Replace args4j with picocli (#7665)

  • Handle datetime column consistently (#7645)(#7705)

  • Allow to carry headers with query requests (#7696) (#7712)

  • Allow adding JSON data type for dimension column types (#7718)

  • Separate SegmentDirectoryLoader and tierBackend concepts (#7737)

  • Implement size balanced V4 raw chunk format (#7661)

  • Add presto-pinot-driver lib (#7384)

Major Bug fixes

  • Fix null pointer exception for non-existed metric columns in schema for JDBC driver (#7175)

  • Fix the config key for TASK_MANAGER_FREQUENCY_PERIOD (#7198)

  • Fixed pinot java client to add zkClient close (#7196)

  • Ignore query json parse errors (#7165)

  • Fix shutdown hook for PinotServiceManager (#7251) (#7253)

  • Make STRING to BOOLEAN data type change as backward compatible schema change (#7259)

  • Replace gcp hardcoded values with generic annotations (#6985)

  • Fix segment conversion executor for in-place conversion (#7265)

  • Fix reporting consuming rate when the Kafka partition level consumer isn't stopped (#7322)

  • Fix the issue with concurrent modification for segment lineage (#7343)

  • Fix TableNotFound error message in PinotHelixResourceManager (#7340)

  • Fix upload LLC segment endpoint truncated download URL (#7361)

  • Fix task scheduling on table update (#7362)

  • Fix metric method for ONLINE_MINION_INSTANCES metric (#7363)

  • Fix JsonToPinotSchema behavior to be consistent with AvroSchemaToPinotSchema (#7366)

  • Fix currentOffset volatility in consuming segment(#7365)

  • Fix misleading error msg for missing URI (#7367)

  • Fix the correctness of getColumnIndices method (#7370)

  • Fix SegmentZKMetadta time handling (#7375)

  • Fix retention for cleaning up segment lineage (#7424)

  • Fix segment generator to not return illegal filenames (#7085)

  • Fix missing LLC segments in segment store by adding controller periodic task to upload them (#6778)

  • Fix parsing error messages returned to FileUploadDownloadClient (#7428)

  • Fix manifest scan which drives /version endpoint (#7456)

  • Fix missing rate limiter if brokerResourceEV becomes null due to ZK connection (#7470)

  • Fix race conditions between segment merge/roll-up and purge (or convertToRawIndex) tasks: (#7427)

  • Fix pql double quote checker exception (#7485)

  • Fix minion metrics exporter config (#7496)

  • Fix segment unable to retry issue by catching timeout exception during segment replace (#7509)

  • Add Exception to Broker Response When Not All Segments Are Available (Partial Response) (#7397)

  • Fix segment generation commands (#7527)

  • Return non zero from main with exception (#7482)

  • Fix parquet plugin shading error (#7570)

  • Fix the lowest partition id is not 0 for LLC (#7066)

  • Fix star-tree index map when column name contains '.' (#7623)

  • Fix cluster manager URLs encoding issue(#7639)

  • Fix fieldConfig nullable validation (#7648)

  • Fix verifyHostname issue in FileUploadDownloadClient (#7703)

  • Fix TableCache schema to include the built-in virtual columns (#7706)

  • Fix DISTINCT with AS function (#7678)

  • Fix SDF pattern in DataPreprocessingHelper (#7721)

  • Fix fields missing issue in the source in ParquetNativeRecordReader (#7742)

Previous0.9.1Next0.8.0

Last updated 2 years ago

Was this helpful?