Comment on page
0.3.0
0.3.0 release of Apache Pinot introduces the concept of plugins that makes it easy to extend and integrate with other systems.
The reason behind the architectural change from the previous release (0.2.0) and this release (0.3.0), is the possibility of extending Apache Pinot. The 0.2.0 release was not flexible enough to support new storage types nor new stream types. Basically, inserting a new functionality required to change too much code. Thus, the Pinot team went through an extensive refactoring and improvement of the source code.
For instance, the picture below shows the module dependencies of the 0.2.X or previous releases. If we wanted to support a new storage type, we would have had to change several modules. Pretty bad, huh?
0.2.0 and before Pinot Module Dependency Diagram
In order to conquer this challenge, below major changes are made:
- Refactored common interfaces to
pinot-spi
module - Concluded four types of modules:
- Pinot input format: How to read records from various data/file formats: e.g.
Avro
/CSV
/JSON
/ORC
/Parquet
/Thrift
- Pinot filesystem: How to operate files on various filesystems: e.g.
Azure Data Lake
/Google Cloud Storage
/S3
/HDFS
- Pinot stream ingestion: How to ingest data stream from various upstream systems, e.g.
Kafka
/Kinesis
/Eventhub
- Pinot batch ingestion: How to run Pinot batch ingestion jobs in various frameworks, like
Standalone
,Hadoop
,Spark
.
- Built shaded jars for each individual plugin
- Added support to dynamically load pinot plugins at server startup time
Now the architecture supports a plug-and-play fashion, where new tools can be supported with little and simple extensions, without affecting big chunks of code. Integrations with new streaming services and data formats can be developed in a much more simple and convenient way.
Dependency graph after introducing pinot-plugin in 0.3.0
- SQL Support
- Added Calcite SQL compiler
JDK 11
Support- Added support to tune size vs accuracy for approximation aggregation functions:
DistinctCountHLL
,PercentileEst
,PercentileTDigest
(#4666) - Deprecated
pinot-hadoop
andpinot-spark
modules, replace withpinot-batch-ingestion-hadoop
andpinot-batch-ingestion-spark
- APIs Additions/Changes
- Pinot Controller Rest APIs
GET /cluster/configs
POST /cluster/configs
DELETE /cluster/configs/{configName}
- Configurations Additions/Changes
- Config:
controller.host
is now optional in Pinot Controller - Added broker config:
pinot.broker.enable.query.limit.override
configurable max query response size (#5040) pinot.server.starter.enableSegmentsLoadingCheck
pinot.server.starter.timeoutInSeconds
pinot.server.instance.enable.shutdown.delay
pinot.server.instance.starter.maxShutdownWaitTime
pinot.server.instance.starter.checkIntervalTime
- Fixed the issue of server not registering state model factory before connecting the Helix manager (#4929)
- It’s a disruptive upgrade from version 0.1.0 to this because of the protocol changes between Pinot Broker and Pinot Server. Please ensure that you upgrade to release 0.2.0 first, then upgrade to this version.
- If you build your own startable or war without using scripts generated in Pinot-distribution module. For Java 8, an environment variable “plugins.dir” is required for Pinot to find out where to load all the Pinot plugin jars. For Java 11, plugins directory is required to be explicitly set into classpath. Please see
pinot-admin.sh
as an example. - As always, we recommend that you upgrade controllers first, and then brokers and lastly the servers in order to have zero downtime in production clusters.
- Kafka 0.9 is no longer included in the release distribution.
- Removed segment toggle APIs
- Removed list all segments in cluster APIs
- Deprecated below APIs:
GET /tables/{tableName}/segments
GET /tables/{tableName}/segments/metadata
GET /tables/{tableName}/segments/crc
GET /tables/{tableName}/segments/{segmentName}
GET /tables/{tableName}/segments/{segmentName}/metadata
GET /tables/{tableName}/segments/{segmentName}/reload
POST /tables/{tableName}/segments/{segmentName}/reload
GET /tables/{tableName}/segments/reload
POST /tables/{tableName}/segments/reload
- GET:
/tasks/taskqueues
: List all task queues/tasks/taskqueuestate/{taskType}
->/tasks/{taskType}/state
/tasks/tasks/{taskType}
->/tasks/{taskType}/tasks
/tasks/taskstates/{taskType}
->/tasks/{taskType}/taskstates
/tasks/taskstate/{taskName}
->/tasks/task/{taskName}/taskstate
/tasks/taskconfig/{taskName}
->/tasks/task/{taskName}/taskconfig
- PUT:
/tasks/scheduletasks
->POST
/tasks/schedule
/tasks/cleanuptasks/{taskType}
->/tasks/{taskType}/cleanup
/tasks/taskqueue/{taskType}
: Toggle a task queue
- DELETE:
/tasks/taskqueue/{taskType}
->/tasks/{taskType}
- Deprecated modules
pinot-hadoop
andpinot-spark
and replaced withpinot-batch-ingestion-hadoop
andpinot-batch-ingestion-spark
. - Introduced new Pinot batch ingestion jobs and yaml based job specs to define segment generation jobs and segment push jobs.
- You may see exceptions like below in pinot-brokers during cluster upgrade, but it's safe to ignore them.2020/03/09 23:37:19.879 ERROR [HelixTaskExecutor] [CallbackProcessor@b808af5-pinot] [pinot-broker] [] Message cannot be processed: 78816abe-5288-4f08-88c0-f8aa596114fe, {CREATE_TIMESTAMP=1583797034542, MSG_ID=78816abe-5288-4f08-88c0-f8aa596114fe, MSG_STATE=unprocessable, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=fooBar_OFFLINE, RESOURCE_NAME=brokerResource, RETRY_COUNT=0, SRC_CLUSTER=pinot, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_hostname.domain,com_9000, TGT_NAME=Broker_hostname,domain.com_6998, TGT_SESSION_ID=f6e19a457b80db5, TIMEOUT=-1, segmentName=fooBar_559, tableName=fooBar_OFFLINE}{}{}java.lang.UnsupportedOperationException: Unsupported user defined message sub type: REFRESH_SEGMENTat org.apache.pinot.broker.broker.helix.TimeboundaryRefreshMessageHandlerFactory.createHandler(TimeboundaryRefreshMessageHandlerFactory.java:68) ~[pinot-broker-0.2.1172.jar:0.3.0-SNAPSHOT-c9d88e47e02d799dc334d7dd1446a38d9ce161a3]at org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:1096) ~[helix-core-0.9.1.509.jar:0.9.1.509]at org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:866) [helix-core-0.9.1.509.jar:0.9.1.509]
Last modified 1yr ago