LogoLogo
release-0.11.0
release-0.11.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Schema
      • Table
      • Segment
      • Deep Store
      • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Hadoop
        • Backfill Data
        • Dimension Table
      • Stream ingestion
        • Apache Kafka
        • Amazon Kinesis
        • Apache Pulsar
      • Stream Ingestion with Upsert
      • Stream Ingestion with Dedup
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
      • Complex Type (Array, Map) Handling
    • Indexing
      • Forward Index
      • Inverted Index
      • Star-Tree Index
      • Bloom Filter
      • Range Index
      • Text search support
      • JSON Index
      • Geospatial
      • Timestamp Index
    • Releases
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Aggregation Functions
      • Transformation Functions
      • User-Defined Functions (UDFs)
      • Grouping Algorithm
      • Query Options
      • Cardinality Estimation
      • Lookup UDF Join
      • Querying JSON data
      • Filtering with IdSet
      • Explain Plan
      • GapFill Function For Time-Series Dataset
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null Value Support
      • V2 Multi-Stage Query Engine
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Setup cluster
      • Server Startup Status Checkers
      • Setup table
      • Setup ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
      • Tiered Storage
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
      • Upgrading Pinot with confidence
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication, Authorization, and ACLs
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FLOOR
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • Histogram
      • hour
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • yearOfWeek
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
Powered by GitBook
On this page
  • What's the big change?
  • Notable New Features
  • Major Bug Fixes
  • Work in Progress
  • Backward Incompatible Changes

Was this helpful?

Edit on GitHub
Export as PDF
  1. Basics
  2. Releases

0.3.0

0.3.0 release of Apache Pinot introduces the concept of plugins that makes it easy to extend and integrate with other systems.

Previous0.4.0Next0.2.0

Last updated 2 years ago

Was this helpful?

What's the big change?

The reason behind the architectural change from the previous release (0.2.0) and this release (0.3.0), is the possibility of extending Apache Pinot. The 0.2.0 release was not flexible enough to support new storage types nor new stream types. Basically, inserting a new functionality required to change too much code. Thus, the Pinot team went through an extensive refactoring and improvement of the source code.

For instance, the picture below shows the module dependencies of the 0.2.X or previous releases. If we wanted to support a new storage type, we would have had to change several modules. Pretty bad, huh?

In order to conquer this challenge, below major changes are made:

  • Refactored common interfaces to pinot-spi module

  • Concluded four types of modules:

    • Pinot input format: How to read records from various data/file formats: e.g. Avro/CSV/JSON/ORC/Parquet/Thrift

    • Pinot filesystem: How to operate files on various filesystems: e.g. Azure Data Lake/Google Cloud Storage/S3/HDFS

    • Pinot stream ingestion: How to ingest data stream from various upstream systems, e.g. Kafka/Kinesis/Eventhub

    • Pinot batch ingestion: How to run Pinot batch ingestion jobs in various frameworks, like Standalone, Hadoop, Spark.

  • Built shaded jars for each individual plugin

  • Added support to dynamically load pinot plugins at server startup time

Now the architecture supports a plug-and-play fashion, where new tools can be supported with little and simple extensions, without affecting big chunks of code. Integrations with new streaming services and data formats can be developed in a much more simple and convenient way.

Notable New Features

  • SQL Support

    • Added Calcite SQL compiler

  • JDK 11 Support

  • Deprecated pinot-hadoop and pinot-spark modules, replace with pinot-batch-ingestion-hadoop and pinot-batch-ingestion-spark

  • Enhanced TableRebalancer logics

  • APIs Additions/Changes

    • Pinot Admin Command

    • Pinot Controller Rest APIs

        • GET /cluster/configs

        • POST /cluster/configs

        • DELETE /cluster/configs/{configName}

  • Configurations Additions/Changes

    • Config: controller.host is now optional in Pinot Controller

      • pinot.server.starter.enableSegmentsLoadingCheck

      • pinot.server.starter.timeoutInSeconds

      • pinot.server.instance.enable.shutdown.delay

      • pinot.server.instance.starter.maxShutdownWaitTime

      • pinot.server.instance.starter.checkIntervalTime

Major Bug Fixes

Work in Progress

  • We are in the process of supporting text search query functionalities.

Backward Incompatible Changes

  • It’s a disruptive upgrade from version 0.1.0 to this because of the protocol changes between Pinot Broker and Pinot Server. Please ensure that you upgrade to release 0.2.0 first, then upgrade to this version.

  • If you build your own startable or war without using scripts generated in Pinot-distribution module. For Java 8, an environment variable “plugins.dir” is required for Pinot to find out where to load all the Pinot plugin jars. For Java 11, plugins directory is required to be explicitly set into classpath. Please see pinot-admin.sh as an example.

  • As always, we recommend that you upgrade controllers first, and then brokers and lastly the servers in order to have zero downtime in production clusters.

  • Kafka 0.9 is no longer included in the release distribution.

    • Removed segment toggle APIs

    • Removed list all segments in cluster APIs

    • Deprecated below APIs:

      • GET /tables/{tableName}/segments

      • GET /tables/{tableName}/segments/metadata

      • GET /tables/{tableName}/segments/crc

      • GET /tables/{tableName}/segments/{segmentName}

      • GET /tables/{tableName}/segments/{segmentName}/metadata

      • GET /tables/{tableName}/segments/{segmentName}/reload

      • POST /tables/{tableName}/segments/{segmentName}/reload

      • GET /tables/{tableName}/segments/reload

      • POST /tables/{tableName}/segments/reload

    • GET:

      • /tasks/taskqueues: List all task queues

      • /tasks/taskqueuestate/{taskType} -> /tasks/{taskType}/state

      • /tasks/tasks/{taskType} -> /tasks/{taskType}/tasks

      • /tasks/taskstates/{taskType} -> /tasks/{taskType}/taskstates

      • /tasks/taskstate/{taskName} -> /tasks/task/{taskName}/taskstate

      • /tasks/taskconfig/{taskName} -> /tasks/task/{taskName}/taskconfig

    • PUT:

      • /tasks/scheduletasks -> POST /tasks/schedule

      • /tasks/cleanuptasks/{taskType} -> /tasks/{taskType}/cleanup

      • /tasks/taskqueue/{taskType}: Toggle a task queue

    • DELETE:

      • /tasks/taskqueue/{taskType} -> /tasks/{taskType}

  • Deprecated modules pinot-hadoop and pinot-spark and replaced with pinot-batch-ingestion-hadoop and pinot-batch-ingestion-spark.

  • Introduced new Pinot batch ingestion jobs and yaml based job specs to define segment generation jobs and segment push jobs.

  • You may see exceptions like below in pinot-brokers during cluster upgrade, but it's safe to ignore them.

    2020/03/09 23:37:19.879 ERROR [HelixTaskExecutor] [CallbackProcessor@b808af5-pinot] [pinot-broker] [] Message cannot be processed: 78816abe-5288-4f08-88c0-f8aa596114fe, {CREATE_TIMESTAMP=1583797034542, MSG_ID=78816abe-5288-4f08-88c0-f8aa596114fe, MSG_STATE=unprocessable, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=fooBar_OFFLINE, RESOURCE_NAME=brokerResource, RETRY_COUNT=0, SRC_CLUSTER=pinot, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_hostname.domain,com_9000, TGT_NAME=Broker_hostname,domain.com_6998, TGT_SESSION_ID=f6e19a457b80db5, TIMEOUT=-1, segmentName=fooBar_559, tableName=fooBar_OFFLINE}{}{}
    java.lang.UnsupportedOperationException: Unsupported user defined message sub type: REFRESH_SEGMENT
          at org.apache.pinot.broker.broker.helix.TimeboundaryRefreshMessageHandlerFactory.createHandler(TimeboundaryRefreshMessageHandlerFactory.java:68) ~[pinot-broker-0.2.1172.jar:0.3.0-SNAPSHOT-c9d88e47e02d799dc334d7dd1446a38d9ce161a3]
          at org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:1096) ~[helix-core-0.9.1.509.jar:0.9.1.509]
          at org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:866) [helix-core-0.9.1.509.jar:0.9.1.509]

Added SQL response format (, )

Added support for GROUP BY with ORDER BY ()

Query console defaults to use SQL syntax ()

Support column alias (, )

Added SQL query endpoint: /query/sql ()

Support arithmetic operators ()

Support non-literal expressions for right-side operand in predicate comparison()

Added support for DISTINCT ()

Added support default value for BYTES column ()

Added support to tune size vs accuracy for approximation aggregation functions: DistinctCountHLL, PercentileEst, PercentileTDigest ()

Added Data Anonymizer Tool ()

Support STRING and BYTES for no dictionary columns in realtime consuming segments ()

Make pinot-distribution to build a pinot-all jar and assemble it ()

Added support for PQL case insensitive ()

Moved to new rebalance strategy ()

Supported rebalancing tables under any condition()

Supported reassigning completed segments along with Consuming segments for LLC realtime table ()

Added experimental support for Text Search‌ ()

Upgraded Helix to version 0.9.4, task management now works as expected ()

Added date_trunc transformation function. ()

Support schema evolution for consuming segment. ()

Added -queryType option in PinotAdmin PostQuery subcommand ()

Added -schemaFile as option in AddTable command ()

Added OperateClusterConfig sub command in PinotAdmin ()

Get Table leader controller resource ()

Support HTTP POST/PUT to upload JSON encoded schema ()

Table rebalance API now requires both table name and type as parameters. ()

Refactored Segments APIs ()

Added segment batch deletion REST API ()

Update schema API to reload table on schema change when applicable ()

Enhance the task related REST APIs ()

Added PinotClusterConfig REST APIs ()

Added instance config: queriesDisabled to disable query sending to a running server ()

Added broker config: pinot.broker.enable.query.limit.override configurable max query response size ()

Removed deprecated server configs ()

Decouple server instance id with hostname/port config. ()

Add FieldConfig to encapsulate encoding, indexing info for a field.()

Fixed the bug of releasing the segment when there are still threads working on it. ()

Fixed the bug of uneven task distribution for threads ()

Fixed encryption for .tar.gz segment file upload ()

Fixed controller rest API to download segment from non local FS. ()

Fixed the bug of not releasing segment lock if segment recovery throws exception ()

Fixed the issue of server not registering state model factory before connecting the Helix manager ()

Fixed the exception in server instance when Helix starts a new ZK session ()

Fixed ThreadLocal DocIdSet issue in ExpressionFilterOperator ()

Fixed the bug in default value provider classes ()

Fixed the bug when no segment exists in RealtimeSegmentSelector ()

We are in the process of supporting null value (), currently limited query feature is supported

Added Presence Vector to represent null value ()

Added null predicate support for leaf predicates ()

Pull request introduces a backward incompatible API change for segments management.

Pull request deprecated below task related APIs:

#4694
#4877
#4602
#4994
#5016
#5033
#4964
#5018
#5070
#4535
#4583
#4666
#4747
#4791
#4977
#4983
#4695
#4990
#5015
#4993
#5020
#4740
#4954
#4726
#4959
#5073
#4545
#4639
#4824
#4806
#4828
#4838
#5054
#5073
#4767
#5040
#4903
#4995
#5006
#4764
#4793
#4855
#4808
#4882
#4929
#4976
#5114
#5137
#5138
#4230
#4585
#4943
#4806
#5054
0.2.0 and before Pinot Module Dependency Diagram
Dependency graph after introducing pinot-plugin in 0.3.0