githubEdit

First Stream Ingest

Set up real-time streaming ingestion from Kafka and watch data arrive in Pinot.

circle-info

For Kubernetes-specific streaming ingestion, see Stream ingestion (Kubernetes)arrow-up-right.

Outcome

By the end of this page you will have a realtime Pinot table consuming data from a Kafka topic, with 12 rows visible in the query console.

Prerequisites

  • Completed First table and schema -- the transcript schema must already exist in the cluster.

  • A running Pinot cluster. See the install guides for Local or Docker.

  • For Docker users: set the PINOT_VERSION environment variable. See the Version reference page.

Steps

1. Understand streaming ingestion

Streaming ingestion lets Pinot consume data from a message queue in real time. As messages arrive in a Kafka topic, Pinot reads them and makes the rows queryable within seconds. The realtime table config specifies the Kafka broker, topic, and decoder so that Pinot knows how to connect and interpret incoming records.

2. Start Kafka

Start Kafka on port 9876 using the same ZooKeeper from the Pinot quick-start:

bin/pinot-admin.sh StartKafka -zkAddress=localhost:2123/kafka -port 9876

3. Create a Kafka topic

Download Apache Kafkaarrow-up-right if you have not already, then create the topic:

4. Save the realtime table config

Create the file /tmp/pinot-quick-start/transcript-table-realtime.json:

5. Upload the realtime table config

As soon as the realtime table is created, Pinot begins consuming from the Kafka topic.

circle-info

If the transcript schema was already uploaded during First table and schema, you can omit the -schemaFile flag. Including it is safe -- Pinot will skip re-creating an identical schema.

6. Save the sample streaming data

Create the file /tmp/pinot-quick-start/rawdata/transcript.json:

7. Push data into the Kafka topic

Verify

  1. Open the Query Consolearrow-up-right in your browser.

  2. Run the following query:

  1. You should see 12 rows of streaming data. Pinot ingests from Kafka in real time, so the rows appear within seconds of being pushed to the topic.

Next step

Continue to First query to learn how to write analytical queries against your Pinot tables.

Last updated

Was this helpful?