1 of 8

Basics

Extending Pinot

Writing Custom Aggregation Function

Pinot has many inbuilt Aggregation Functions such as MIN, MAX, SUM, AVG etc. See PQL page for the list of aggregation functions.

Adding a new AggregationFunction requires two things

Implement AggregationFunction interface and make it available as part of the classpath
Register the function in . As of today, this requires code change in Pinot but we plan to add the ability to plugin Functions without having to change Pinot code.

To get an overall idea, see Aggregation Function implementation. All other implementations can be found .

Lets look at the key methods to implements in AggregationFunction

Before getting into the implementation, it's important to understand how Aggregation works in Pinot.

This is advanced topic and assumes you know Pinot . All the data in Pinot is stored in segments across multiple nodes. The query plan at a high level comprises of 3 phases

1. Map phase

This phase works on the individual segments in Pinot.

Initialization: Depending on the query type the following methods are invoked to setup the result holder. While having different methods and return types adds complexity, it helps in performance.
- AGGREGATION : createAggregationResultHolderThis must return an instance of type . You can either use the or

2. Combine phase

In this phase, the results from all segments within a single pinot server are combined into IntermediateResult. The type of IntermediateResult is based on the Generic Type defined in the AggregationFunction implementation.

3. Reduce phase

There are two steps in the Reduce Phase

Merge all the IntermediateResult's from various servers using the merge function
Extract the final results by invoking the extractFinalResult method. In most cases, FinalResult is same type as IntermediateResult. is an example where IntermediateResult (AvgPair) is different from FinalResult(Double)

Segment Fetchers

When Pinot segment files are created in external systems (hadoop/spark/etc), there are several ways to push those data to Pinot Controller and Server:

push segment to shared NFS and let Pinot pull segment files from the location of that NFS.
push segment to a Web server and let Pinot pull segment files from the Web server with http/https link.

Contribution Guidelines

Before you begin to contribute, make sure you have reviewed Dev Environment Setup and Code Modules and Organization sections and that you have created your own fork of the pinot source code.

Pinot Enhancement Proposal Workflow

The Apache Pinot community encourages members to contribute to the overall growth and success of the project. All contributors are expected to follow the following guidelines when proposing an enhancement (aka PEP - Pinot Enhancement Proposal):

All enhancements, regardless of scope/size, must start with a . The issue should clearly state the following information:

What needs to be done?
Why the feature is needed (e.g. describing the use case).
It may also include an initial idea/proposal on how as well.

Once the Github issue is filed:

The PMC would decide if a detailed proposal/design-doc is required or can simply be followed by a PR.
There should be enough time (e.g. 5 business days) given for the PMC to review the issue/proposal before moving to implementation.
One +1 and zero -1 votes from the PMC may be used to proceed with the implementation.

The PMC would use the following guideline when deciding whether a PEP requires an explicit proposal/design doc, or can simply be followed by a PR that includes a link to the Github issue.

Any new major feature, subsystem, or piece of functionality.
Any change that may potentially create backward incompatibility:
- Any change that impacts the public interfaces of the project.

If the requests get at least one +1 and no -1 from the PMC to directly go to the PR stage, the requestor can then submit the PR along with a link to the Github issue.

If the request requires a proposal, then the requestor is expected to provide a proposal design doc before submitting a PR for review. The design doc must include the following:

Motivation: Describe the problem to be solved including the details on why such as use-case, etc.
Proposed Change: Describe the new thing that needs to be done. This may be fairly extensive and have large subsections of its own. Or it may be a few sentences, depending on the scope of the change. Also, describe “How” with details and possible POC.
New or Changed Public Interfaces: impact to any of the "compatibility commitments" described above. We want to call these out in particular so everyone thinks about them.

The proposal/design doc should be in a google doc that has comment access enabled by default to any community member (should not require asking for permissions). Only exceptions are small features where the initial proposal in the issue is generally accepted. Once the proposal/design doc is approved (all questions/comments resolved), it must be transferred into a common Google Drive where all Pinot proposal/design docs must be submitted.

If there are meetings/discussions offline with a subset of members, the meeting notes should be captured and added to the doc.
General Guidelines
- Smaller PRs that are easier to review

Create a design document

If your change is relatively minor, you can skip this step. If you are adding new major feature, we suggest that you add a design document and solicit comments from the community before submitting any code.

is a list of current design documents.

Create an issue for the change

Create a Pinot issue for the change you would like to make. Provide information on why the change is needed and how you plan to address it. Use the conversations on the issue as a way to validate assumptions and the right way to proceed. Be sure to review sections on and .

If you have a design document, please refer to the design documents in your Issue. You may even want to create multiple issues depending on the extent of your change.

Once you are clear about what you want to do, proceed with the next steps listed below.

Create a branch for your change

Make the necessary changes. If the changes you plan to make are too big, make sure you break it down into smaller tasks.

Making the changes

Follow the recommendations/best-practices noted here when you are making changes.

Code documentation

Please ensure your code is adequately documented. Some things to consider for documentation:

Always include class level java docs. At the top class level, we are looking for information about what functionality is provided by the class, what state is maintained by the class, whether there are concurrency/thread-safety concerns and any exceptional behavior that the class might exhibit.
Document public methods and their parameters.

Logging

Ensure there is adequate logging for positive paths as well as exceptional paths. As a corollary to this, ensure logs are not noisy.
Do not use System.out.println to log messages. Use the slf4j loggers.
Use logging levels correctly: set level to debug

Exceptions and Exception-Handling

Where possible, throw specific exceptions, preferably checked exceptions, so the callers can easily determine what the erroneous conditions that need to be handled are.
Avoid catching broad exceptions (i.e., catch (Exception e) blocks), except for when this is in the run() method of a thread/runnable.

Current Pinot code does not strictly adhere to this, but we would like to change this over time and adopt best practices around exception handling.

Backward and Forward compatibility changes

If you are making any changes to state stored, either in Zookeeper or in segments, make sure you consider both backward and forward compatibility issues.

For backward compatibility, consider cases where one component is using the new version and another is still on the old version. E.g., when the request format between broker and server is updated, consider resulting behaviors when a new broker is talking to an older server. Will it break?
For forward compatibility, consider rollback cases. E.g., consider what happens when state persisted by new code is handled by old code. Does the old code skip over new fields?

External libraries

Be cautious about pulling in external dependencies. You will need to consider multiple things when faced with a need to pull in a new library.

What capability is the addition of the library providing you with? Can existing libraries provide this functionality (may be with a little bit of effort)?
Is the external library maintained by an active community of contributors?
What are the licensing terms for the library. For more information about handling licenses, see .

Testing your changes

Automated tests are always recommended for contributions. Make sure you write tests so that:

You verify the correctness of your contribution. This serves as proof to you as well as the reviewers.
You future proof your contributions against code refactors or other changes. While this may not always be possible (see ), it's a good goal to aim for.

Identify a list of tests for the changes you have made. Depending on the scope of changes, you may need one or more of the following tests:

Unit Tests
Make sure your code has the necessary class or method level unit tests. It is important to write both positive case as well as negative case tests. Document your tests well and add meaningful assertions in the tests; when the assertions fail, ensure that the right messages are logged with information that allows other to debug.
Integration Tests
Add integration tests to cover End-to-End paths without relying on mocking (see note below). You MUST

Testing Guidelines

Mocking
Use to mock classes to control specific behaviors - e.g., simulate various error conditions.

Note

DO NOT use advanced mock libraries such as . They make bytecode level changes to allow tests for static/private members but this typically results in other tools like jacoco to fail. They also promote incorrect implementation choices that make it harder to test additional changes. When faced with a choice to use PowerMock or advanced mocking options, you might either need to refactor the code to work better with mocking or you actually need to write an integration test instead of a unit test.

Validate assumptions in tests
Make sure that adequate asserts are added in the tests to verify that the tests are passing for the right reasons.
Write reliable tests
Make sure you are writing tests that are reliable. If the tests depend on asynchronous events to be fired, do not add sleep to your tests. Where possible, use appropriate mocking or condition based triggers.

License Headers for newly added files

All source code files should have license headers. To automatically add the header for any new file you plan to checkin, run in pinot top-level folder:

Note

If you checkin third-party code or files, please make sure you review Apache guidelines:

Once you determine the code you are pulling in adhere to the guidelines above, go ahead pull the changes in. Do not add license headers for them. Follow these instructions to ensure we are compliant with Apache Licensing process:

Under pinot/licenses add a LICENSE-<newlib> file that has the license terms of the included library.
Update the pinot/LICENSE file to indicate the newly added library file paths under the corresponding supported Licenses.
Update the exclusion rules for

If attention to the licensing terms in not paid early on, they will be caught much later in the process, when we prepare to make a new release. Updating code at that time to work with the right libraries at that time might require bigger refactoring changes and delay the release process.

Creating a Pull Request (PR)

Verifying code-style
Run the following command to verify the code-style before posting a PR

Run tests
Before you create a review request for the changes, make sure you have run the corresponding unit tests for your changes. You can run individual tests via the IDE or via maven command-line. Finally run all tests locally by running mvn clean install -Pbin-dist.
For changes that are related to performance issues or race conditions, it is hard to write reliable tests, so we recommend running manual stress tests to validate the changes. You MUST note the manual tests done in the PR description.

Once you receive comments on github on your changes, be sure to respond to them on github and address the concerns. If any discussions happen offline for the changes in question, make sure to capture the outcome of the discussion, so others can follow along as well.
It is possible that while your change is being reviewed, other changes were made to the master branch. Be sure to pull rebase your change on the new changes thus:

When you have addressed all comments and have an approved PR, one of the committers can merge your PR.
After your change is merged, check to see if any documentation needs to be updated. If so, create a PR for documentation.

Update Documentation

Usually for new features, functionalities, API changes, documentation update is required to keep users up to date and keep track of our development.

Please follow this link to accordingly

Code Setup

Dev Environment Setup

To contribute to Pinot, please follow the instructions below.

Git

Pinot uses git for source code management. If you are new to Git, it will be good to review of Git and a common tasks like and .

Getting the Source Code

Create a fork

To limit the number of branches created on the Apache Pinot repository, we recommend that you create a fork by clicking on the fork button . Read more about

Clone the repository locally

Maven

Pinot is a Maven project and familiarity with Maven will help you work with Pinot code. If you are new to Maven, you can read about Maven and .

Run the following maven command to setup the project.

Setup IDE

Import the project into your favorite IDE. Setup stylesheet according to your IDE. We have provided instructions for intellij and eclipse. If you are using other IDEs, please ensure you use stylesheet based on .

Intellij

To import the Pinot stylesheet this launch intellij and navigate to Preferences (on Mac) or Settings on Linux.

Navigate to Editor -> Code Style -> Java
Select Import Scheme -> Intellij IDES code style XML

Eclipse

To import the Pinot stylesheet this launch eclipse and navigate to Preferences (on Mac) or Settings on Linux.

Navigate to Java->Code Style->Formatter
Choose codestyle-eclipse.xml from pinot/config folder of your workspace. Click Apply.

Starting Pinot via IDE

Once the IDE is set up, you can run for batch mode or for realtime mode.

Batch Quickstart

start all Pinot components (ZK, Controller, Server, Broker) in the same JVM
create Baseball Stats table

Go to localhost:9000 in your browser and play with the query console.

Realtime Quickstart

start all Pinot components (ZK, Controller, Server, Broker) in the same JVM
Start Kafka in the same JVM
create MeetUpRSVP table.

Go to localhost:9000 in your browser and play with the meetup RSVP table.

Code Modules and Organization

TODO: Deprecated

Before proceeding to contributing changes to Pinot, review the contents of this section.

External Dependencies

Pinot depends on a number of external projects, the most notable ones are:

Apache Zookeeper
Apache Helix
Apache Kafka
Apache Thrift
Netty
Google Guava
Yammer

Helix is used for ClusterManagement, and Pinot code is tightly integrated with Helix and Zookeeper interfaces.

Kafka is the default realtime stream provider, but can be replaced with others. See customizations section for more info.

Thrift is used for message exchange between broker and server components, with Netty providing the server functionality for processing messages in a non-blocking fashion.

Guava is used for number of auxiliary components such as Caches and RateLimiters. Yammer metrics is used to register and expose metrics from Pinot components.

In addition, Pinot relies on several key external libraries for some of its core functionality: Roaring Bitmaps: Pinot’s inverted indices are built using library. t-Digest: Pinot’s digest based percentile calculations are based on library.

Pinot Modules

Pinot is a multi-module project, with each module providing specific functionality that helps us to build services from a combination of modules. This helps keep clean interface contracts between different modules as well as reduce the overall executable size for individually deployable component.

Each module has a src/main/java folder where the code resides and src/test/java where the unit tests corresponding to the module’s code reside.

Foundational modules

The following figure provides a high-level overview of the foundational Pinot modules.

pinot-common

pinot-common provides classes common to Pinot components. Some key classes you will find here are:

config: Definitions for various elements of Pinot’s table config.
metrics: Definitions for base metrics provided by Controller, Broker and Server.
metadata

pinot-transport

pinot-transport module provides classes required to handle scatter-gather on Pinot Broker and netty wrapper classes used by Server to handle connections from Broker.

pinot-core

pinot-core modules provides the core functionality of Pinot, specifically for handling segments, various index structures, query execution - filters, transformations, aggregations etc and support for realtime segments.

pinot-server

pinot-server provides server specific functionality including server startup and REST APIs exposed by the server.

pinot-controller

pinot-controller houses all the controller specific functionality, including many cluster administration APIs, segment upload (for both offline and realtime), segment assignment, retention strategies etc.

pinot-broker

pinot-broker provides broker functionality that includes wiring the broker startup sequence, building broker routing tables, PQL request handling.

pinot-minion

pinot-minion provides functionality for running auxiliary/periodic tasks on a Pinot Cluster such as purging records for compliance with regulations like GDPR.

pinot-hadoop

pinot-hadoop provides classes for segment generation jobs using Hadoop infrastructure.

Auxiliary modules

In addition to the core modules described above, Pinot code provides the following modules:

pinot-tools: This module is a collection of many tools useful for setting up Pinot cluster, creating/updating segments.It also houses the Pinot quick start guide code.
pinot-perf: This module has a collection of benchmark test code used to evaluate design options.

These tests typically do not rely on mocking and provide more end to end coverage for code.

Extension modules

pinot-hadoop-filesystem and pinot-azure-filesystem are module added to support extensions to Pinot filesystem. The functionality is broken down into modules of their own to avoid polluting the common modules with additional large libraries. These libraries bring in transitive dependencies of their own that can cause classpath conflicts at runtime. We would like to avoid this for the common usage of Pinot as much as possible.