Running in Kubernetes

Pinot quick start in Kubernetes

Get started running Pinot in Kubernetes.

Note: The examples in this guide are sample configurations to be used as reference. For production setup, you may want to customize it to your needs.

Prerequisites

Kubernetes

This guide assumes that you already have a running Kubernetes cluster.

If you haven't yet set up a Kubernetes cluster, see the links below for instructions:

Pinot

Make sure that you've downloaded Apache Pinot. The scripts for the setup in this guide can be found in our open source project on GitHub.

# checkout pinot
git clone https://github.com/apache/pinot.git
cd pinot/helm/pinot

Set up a Pinot cluster in Kubernetes

Start Pinot with Helm

The Pinot repository has pre-packaged Helm charts for Pinot and Presto. The Helm repository index file is here.

Note: Specify StorageClass based on your cloud vendor. Don't mount a blob store (such as AzureFile, GoogleCloudStorage, or S3) as the data serving file system. Use only Amazon EBS/GCP Persistent Disk/Azure Disk-style disks.

  • For AWS: "gp2"

  • For GCP: "pd-ssd" or "standard"

  • For Azure: "AzureDisk"

  • For Docker-Desktop: "hostpath"

Check Pinot deployment status

Load data into Pinot using Kafka

Bring up a Kafka cluster for real-time data ingestion

Check Kafka deployment status

Ensure the Kafka deployment is ready before executing the scripts in the following steps. Run the following command:

Below is an example output showing the deployment is ready:

Create Kafka topics

Run the scripts below to create two Kafka topics for data ingestion:

Load data into Kafka and create Pinot schema/tables

The script below does the following:

  • Ingests 19492 JSON messages to Kafka topic flights-realtime at a speed of 1 msg/sec

  • Ingests 19492 Avro messages to Kafka topic flights-realtime-avro at a speed of 1 msg/sec

  • Uploads Pinot schema airlineStats

  • Creates Pinot table airlineStats to ingest data from JSON encoded Kafka topic flights-realtime

  • Creates Pinot table airlineStatsAvro to ingest data from Avro encoded Kafka topic flights-realtime-avro

Query with the Pinot Data Explorer

Pinot Data Explorer

The script below, located at ./pinot/helm/pinot, performs local port forwarding, and opens the Pinot query console in your default web browser.

Query Pinot with Superset

Bring up Superset using Helm

  1. Install the SuperSet Helm repository:

  1. Get the Helm values configuration file:

  1. For Superset to install Pinot dependencies, edit /tmp/superset-values.yaml file to add apinotdb pip dependency into bootstrapScript field.

  2. You can also build your own image with this dependency or use the image apachepinot/pinot-superset:latest instead.

  1. Replace the default admin credentials inside the init section with a meaningful user profile and stronger password.

  2. Install Superset using Helm:

  1. Ensure your cluster is up by running:

Access the Superset UI

  1. Run the below command to port forward Superset to your localhost:18088.

  1. Navigate to Superset in your browser with the admin credentials you set in the previous section.

  2. Create a new database connection with the following URI: pinot+http://pinot-broker.pinot-quickstart:8099/query?controller=http://pinot-controller.pinot-quickstart:9000/

  3. Once the database is added, you can add more data sets and explore the dashboard options.

Access Pinot with Trino

Deploy Trino

  1. Deploy Trino with the Pinot plugin installed:

  1. See the charts in the Trino Helm chart repository:

  1. In order to connect Trino to Pinot, you'll need to add the Pinot catalog, which requires extra configurations. Run the below command to get all the configurable values.

  1. To add the Pinot catalog, edit the additionalCatalogs section by adding:

Pinot is deployed at namespace pinot-quickstart, so the controller serviceURL is pinot-controller.pinot-quickstart:9000

  1. After modifying the /tmp/trino-values.yaml file, deploy Trino with:

  1. Once you've deployed Trino, check the deployment status:

Query Pinot with the Trino CLI

Once Trino is deployed, run the below command to get a runnable Trino CLI.

  1. Download the Trino CLI:

  1. Port forward Trino service to your local if it's not already exposed:

  1. Use the Trino console client to connect to the Trino service:

  1. Query Pinot data using the Trino CLI, like in the sample queries below.

Sample queries to execute

List all catalogs

List all tables

Show schema

Count total documents

Access Pinot with Presto

Deploy Presto with the Pinot plugin

  1. First, deploy Presto with default configurations:

  1. To customize your deployment, run the below command to get all the configurable values.

  1. After modifying the /tmp/presto-values.yaml file, deploy Presto:

  1. Once you've deployed the Presto instance, check the deployment status:

Sample Output of K8s Deployment Status

Query Presto using the Presto CLI

Once Presto is deployed, you can run the below command from here, or follow the steps below.

  1. Download the Presto CLI:

  1. Port forward presto-coordinator port 8080 to localhost port 18080:

  1. Start the Presto CLI with the Pinot catalog:

  1. Query Pinot data with the Presto CLI, like in the sample queries below.

Sample queries to execute

List all catalogs

List all tables

Show schema

Count total documents

Delete a Pinot cluster in Kubernetes

To delete your Pinot cluster in Kubernetes, run the following command:

Last updated

Was this helpful?