Here you will find a collection of how-to guides for operators or developers
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Set up ZkBasicAuthAccessControl for access to controller and broker
Note: Be sure to keep your password safe, as encrypted passwords cannot be decrypted.
Apache Pinot 0.10.0+ includes built-in support for Enhanced HTTP Basic Auth using ZooKeeper. Although it is disabled by default for simplified setup, authentication and authorization can be easily added to any environment through configuration. ACLs (Access Control Lists) can be set for both API and table levels. This upgrade can be seamlessly performed in any environment without requiring replication, ensuring zero downtime.
The latest ZK Basic Auth offers the following features:
User Console offers a more convenient method for changing user authentication settings
Hot Deployment is supported when updating authentication information
Bcrypt Encryption Algorithm is used to encrypt passwords and store them in the Helix ProperStore
ZkBasicAuthAccessControl also uses HTTP basic authentication. Enabling ZkBasicAuthAccessControl only requires adjusting the methods and procedures for user management. Both components can be protected via auth and can be configured independently. This makes it possible to separate accounts for administrative functions such as table creation from accounts that are read the contents of tables in production.
Zk Basic auth still supports legacy tokens, which are commonly provided to service accounts, similar to BasicAuthAccessControl.
This is best demonstrated by example of introducing ACLs with a simple admin + user setup. To enable zk authentication on a cluster without interrupting operations, we'll go these steps in sequence:
1. Default "admin" account when you start controller/broker
2. Create user in the UI
The user roles in Pinot have been classified into "user" and "admin." Only the admin role has access to the user console page in the Pinot controller. Admin accounts are authorized to create Controller/Broker/Server users through the user console page.
3. Distribute service tokens to pinot's components
the same as BasicAuthControlAccess
4. Enable ACL enforcement on the controller
After a controller restart, any access to controller APIs requires authentication information. Whether from internal components, external users, or the Web UI.
5. Enable ACL enforcement on the Broker
After restarting the broker, any access to broker APIs requires authentication information as well.
Congratulations! You've successfully enabled authentication on Apache Pinot. Read on to learn more about the details and advanced configuration options.
See Authentication with Web UI and API.
General steps: update Kafka's advertised.listeners
and make sure Kafka is accessible (e.g. allow inputs on Security Groups).
You will probably face the following problems.
If you want to connect to Kafka outside of EKS, you will need to change advertised.listeners
. When a client connects to a single Kafka bootstrap server (like other brokers), a bootstrap server sends a list of addresses for all brokers to the client. If you want to connect to a EKS Kafka, these default values will not be correct. This post provides an excellent explanation of the field.
If you use Helm to deploy Kafka to AWS EKS, review the chart's README. It describes multiple setups for communicating into EKS.
Running helm upgrade
on the Kafka chart does not always update the pods. The exact reason is unknown. It's probably an issue with the chart's implementation. You should run kubectl describe pod
and other commands to see the current status of the pods. During initial development, you can run helm uninstall
and then helm install
to force the values to update.
The scripts to build Pinot related docker images is located at here.
You can access those scripts by running below command to checkout Pinot repo:
You can find current supported 3 images in this directory:
Pinot: Pinot all-in-one distribution image
Pinot-Presto: Presto image with Presto-Pinot Connector built-in.
Pinot-Superset: Superset image with Pinot connector built-in.
This is a docker image of Apache Pinot.
There is a docker build script which will build a given Git repo/branch and tag the image.
Usage:
This script will check out Pinot Repo [Pinot Git URL]
on branch [Git Branch]
and build the docker image for that.
The docker image is tagged as [Docker Tag]
.
Docker Tag
: Name and tag your docker image. Default is pinot:latest
.
Git Branch
: The Pinot branch to build. Default is master
.
Pinot Git URL
: The Pinot Git Repo to build, users can set it to their own fork. Note that the URL is https://
based, not git://
. Default is the Apache Repo: https://github.com/apache/pinot.git
.
Kafka Version
: The Kafka Version to build pinot with. Default is 2.0
Java Version
: The Java Build and Runtime image version. Default is 11
JDK Version
: The JDK parameter to build pinot, set as part of maven build option: -Djdk.version=${JDK_VERSION}
. Default is 11
OpenJDK Image
: Base image to use for Pinot build and runtime. Default is openjdk
.
Example of building and tagging a snapshot on your own fork:
Example of building a release version:
For users on Mac M1 chips, they need to build the images with arm64 base image, e.g. arm64v8/openjdk
Example of building an arm64 image:
or just run the docker build script directly
Note that if you are not on arm64 machine, you can still build the image by turning on the experimental feature of docker, and add --platform linux/arm64
into the docker build ...
script, e.g.
Script docker-push.sh
publishes a given docker image to your docker registry.
In order to push to your own repo, the image needs to be explicitly tagged with the repo name.
Example of publishing a image to apachepinot/pinot dockerHub repo.
Tag a built image, then push.
Script docker-build-and-push.sh
builds and publishes this docker image to your docker registry after build.
Example of building and publishing a image to apachepinot/pinot dockerHub repo.
Refer to Kubernetes Quickstart for deployment examples.
Docker image for Presto with Pinot integration.
This docker build project is specialized for Pinot.
Usage:
This script will check out Presto Repo [Presto Git URL]
on branch [Git Branch]
and build the docker image for that.
The docker image is tagged as [Docker Tag]
.
Docker Tag
: Name and tag your docker image. Default is pinot-presto:latest
.
Git Branch
: The Presto branch to build. Default is master
.
Presto Git URL
: The Presto Git Repo to build, users can set it to their own fork. Note that the URL is https://
based, not git://
. Default is the Apache Repo: https://github.com/prestodb/presto.git
.
Follow the instructions provided by Presto for writing your own configuration files under etc
directory.
The image defines two data volumes: one for mounting configuration into the container, and one for data.
The configuration volume is located alternatively at /home/presto/etc
, which contains all the configuration and plugins.
The data volume is located at /home/presto/data
.
Refer to presto-coordinator.yaml
as k8s deployment example.
Docker image for Superset with Pinot integration.
This docker build project is based on Project docker-superset and specialized for Pinot.
Modify file Makefile
to change image
and superset_version
accordingly.
Below command will build docker image and tag it as superset_version
and latest
.
You can also build directly with docker build
command by setting arguments:
Follow the instructions provided by Apache Superset for writing your own superset_config.py
.
Place this file in a local directory and mount this directory to /etc/superset
inside the container. This location is included in the image's PYTHONPATH
. Mounting this file to a different location is possible, but it will need to be in the PYTHONPATH
.
The image defines two data volumes: one for mounting configuration into the container, and one for data (logs, SQLite DBs, &c).
The configuration volume is located alternatively at /etc/superset
or /home/superset
; either is acceptable. Both of these directories are included in the PYTHONPATH
of the image. Mount any configuration (specifically the superset_config.py
file) here to have it read by the app on startup.
The data volume is located at /var/lib/superset
and it is where you would mount your SQLite file (if you are using that as your backend), or a volume to collect any logs that are routed there. This location is used as the value of the SUPERSET_HOME
environmental variable.
Refer to superset.yaml
as k8s deployment example.
Set up HTTP basic auth and ACLs for access to controller and broker
Apache Pinot 0.8.0+ comes out of the box with support for HTTP Basic Auth. While disabled by default for easier setup, authentication and authorization can be added to any environment simply via configuration. ACLs can be set on both API and table levels. This upgrade can be performed with zero downtime in any environment that provides replication.
For external access, Pinot exposes two primary APIs via the following components:
pinot-controller handles cluster management and configuration
pinot-broker handles incoming SQL queries
Both components can be protected via auth and even be configured independently. This makes it is possible to separate accounts for administrative functions such as table creation from accounts that are read the contents of tables in production.
Additionally, all other Pinot components such as pinot-server and pinot-minion can be configured to authenticate themselves to pinot-controller via the same mechanism. This can be done independently of (and in addition to) using 2-way TLS/SSL to ensure intra-cluster authentication on the lower networking layer.
If you'd rather dive directly into the action with an all-in-one running example, we provide an AuthQuickstart runnable with Apache Pinot. This sample app is preconfigured with the settings below but intended only as a dev-friendly, local, single-node deployment.
How to Connect Pinot with Amazon Managed Streaming for Apache Kafka (Amazon MSK)
This wiki documents how to connect Pinot deployed in Amazon EKS to Amazon Managed Kafka.
Follow this AWS Quickstart Wiki to run Pinot on Amazon EKS.
Go to MSK Landing Page to create a Kafka Cluster.
Note:
For demo simplicity, this MSK cluster reuses same VPC created by EKS cluster in the previous step. Otherwise a VPC Peering is required to ensure two VPCs could talk to each other.
Under Encryption section, choose Both TLS encrypted and plaintext traffic allowed
Click Create. b
Once the cluster is created, click View client information
to see the Zookeeper and Kafka Broker list.
Sample Client Information
Until now, the MSK cluster is still not accessible, you can follow this Wiki to create an EC2 instance to connect to it for topic creation, run console producer and consumer.
In order to connect MSK to EKS, we need to allow the traffic could go through each other.
This is configured through Amazon VPC Page.
Record the Amazon MSK SecurityGroup
from the Cluster page, in the above demo, it's sg-01e7ab1320a77f1a9
.
Open Amazon VPC Page, click on SecurityGroups
on left bar. Find the EKS Security group: eksctl-${PINOT_EKS_CLUSTER}-cluster/ClusterSharedNodeSecurityGroup.
Ensure you are picking ClusterShardNodeSecurityGroup
In SecurityGroup, click on MSK SecurityGroup (sg-01e7ab1320a77f1a9
), then Click on Edit Rules
, then add above ClusterSharedNodeSecurityGroup
(sg-0402b59d7e440f8d1
) to it.
Click EKS Security Group ClusterSharedNodeSecurityGroup
(sg-0402b59d7e440f8d1
), add In bound Rule for MSK Security Group (sg-01e7ab1320a77f1a9
).
Now, EKS cluster should be able to talk to Amazon MSK.
To run below commands, ensure you set two environment variable with ZOOKEEPER_CONNECT_STRING
and BROKER_LIST_STRING
(Use plaintext) from Amazon MSK client information, and replace the Variables accordingly.
E.g.
You can log into one EKS node or container and run below command to create a topic.
E.g. Enter into Pinot controller container:
Then install wget
then download Kafka binary.
Create a Kafka topic:
Topic creation succeeds with below message:
Once topic is created, we can start a simple application to produce to it.
You can download below yaml file, then replace:
${ZOOKEEPER_CONNECT_STRING}
-> MSK Zookeeper String
${BROKER_LIST_STRING}
-> MSK Plaintext Broker String in the deployment
${GITHUB_PERSONAL_ACCESS_TOKEN}
-> A personal Github Personal Access Token generated from here, grant all read permissions to it. Here is the source code to generate Github Events.
And apply the YAML file by.
Once the pod is up, you can verify by running a console consumer to read from it.
Try to run from the Pinot Controller container entered in above step.
This step is relatively easy.
Since we already put table creation request into the ConfigMap, we can just enter into pinot-github-events-data-into-msk-kafka
pod to execute the command.
Check if the pod is running:
Sample output:
Enter into the pod
Create Table
Sample output:
Then you can open Pinot Query Console to browse the data
Pinot community has provided Helm based .
You can deploy it as simple as run a helm install
command.
However there are a few things to be noted before starting the benchmark/production.
We recommend to run Pinot with pre-defined resources for the container, and make requests and limits to be the same.
This will ensure the container won't be killed if there is a sudden bump of workload.
It will also be simpler to benchmark the system, e.g. get broker qps limit.
Below is an example for values to set in values.yaml
file. Default resources is not set.
JVM setting should be complaint with the container resources for Pinot Controller and Pinot Broker.
You can make JVM setting like below to make -Xmx
the same size as your container.
For Pinot Server, heap is majorly used for query processing, metadata management. It uses off-heap memory for data loading/persistence, memory mapped files page caching. So we recommend just keep minimal requirement for JVM, and leave the rest of the container for off-heap data operations.
E.g. Assuming data is 100 GB on disk, the container size is 4 CPU, 10GB Memory.
For JVM, limit -Xmx
to not exceed 50% container memory limit, so that the rest of the container could be leveraged by the off-heap operations.
Pinot uses remote storage as deep storage to backup segments.
Default deployment creates a mount disk(e.g Amazon EBS) as deep storage in controller.
Set up TLS-secured connections inside and outside your cluster
Pinot versions from 0.7.0+ support client-cluster and intra-cluster TLS. TLS-support comes in both 1-way and 2-way flavors. This guide walks through the relevant configuration options.
Looking to ingest from Kafka via secured connections? Check out .
In order to support incremental upgrades of unsecured pinot clusters towards TLS, we introduce multi-ingress support via listeners. Each listener accepts connections for a specific protocol on a specific port. For example, pinot-broker may be configured to accept both, http on port 8099 and https on port 8443 at the same time.
Existing configuration properties such as controller.port
are still parsed and automatically translated to a http listener configuration to enable full backwards-compatibility. TLS-secured ingress must be configured through the new listener specifications.
If you're bootstrapping a cluster from scratch, you can directly configure TLS-secured connections and you can forgo legacy http ingress. If you're upgrading an existing (production) cluster, you'll be able to perform the upgrade without downtime if your deployment is configured for high-availability.
On a high level, a zero-downtime upgrade includes the following 3 phases:
adding a secondary TLS-secured ingress to pinot controllers, brokers, and servers
switching client and internode egress to prefer TLS-secured connections
disabling unsecured ingress
This requires a rolling restart of (replicated) service containers after each re-configuration phase. The sample listener specifications below will guide you through this process.
Apache Pinot leverages the JVM's native TLS infrastructure with all its benefits and limitations. Certificates should be generated to include the host IP, hostname, and fully-qualified domain names (if accessed or identified this way).
We support both, the JVM's default key/truststore, as well as configuration options to load certificates from secondary locations. Note, that some connector plugins require the default truststore to contain any trusted certs since they do not parse pinot's configuration properties for external truststores.
Most JVM's default certificate store can be configured with command-line arguments:
-Djavax.net.ssl.keyStore
-Djavax.net.ssl.keyStorePassword
-Djavax.net.ssl.trustStore
-Djavax.net.ssl.trustStorePassword
This section contains a number of examples for common situations. The complete configuration reference can be found is each component's configuration reference.
If you're bootstrapping a new cluster, scroll down towards the end. We order this section for purposes of migrating an existing unsecured cluster to TLS-only.
This is a minimal example of network configuration options prior to 0.7.0. This specification is still supported for backwards-compatibility and translated internally to a listener specification.
This HTTP listener specification is the equivalent of manually translating the legacy configuration above to a listener specification.
This is a common scenario for development clusters and an intermediate phase during a zero-downtime migration of an unsecured cluster towards TLS. This configuration optionally accepts secure ingress on alternate ports, but still defaults to unsecured egress for all operations.
After all pinot components have been configured and restarted to offer secure ingress, we can modify egress to default to secure connections internode. Clients, such as pinot-admin.sh, support an optional flag -controllerProtocol https
to enable secure access. Ingestion jobs similarly support an optional tlsSpec
key to configure key/trststores. Note, that any console clients must have access to appropriate certificates via the JVM's default key/truststore.
This is the default for a newly bootstrapped secure pinot cluster. It is also the final stage for any migration of an existing cluster. With this configuration applied, pinot's components will reject any unsecured connection attempt.
Apache Pinot also supports 2-way TLS for environments with high security requirements. This can be enabled per component with the optional client.auth.enabled
flag. Bear in mind that any client (or server) interacting with a component expecting client auth must have access to both, a keystore and a truststore. This setting does NOT have apply to unsecured http or netty connections.
You can configure your own S3/Azure DataLate/Google Cloud Storage following this .