githubEdit

Stream Ingestion (Local)

Step-by-step guide for streaming ingestion into a local Pinot installation

This guide walks you through setting up real-time stream ingestion into a Pinot cluster running locally. Make sure you have completed Running Pinot locally first.

Set up Kafka

Pinot has out-of-the-box real-time ingestion support for Kafka. Other streams can be plugged in, see Pluggable Streams.

Start Kafka on port 9876 using the same Zookeeper from the quick-start examples:

bin/pinot-admin.sh StartKafka -zkAddress=localhost:2123/kafka -port 9876

Create a Kafka topic. Download the latest Kafkaarrow-up-right, then create a topic:

bin/kafka-topics.sh --create --bootstrap-server localhost:9876 \
    --replication-factor 1 --partitions 1 --topic transcript-topic

Create a schema

If you already pushed a schema during the Batch ingestion example, you can reuse it. Otherwise, see Creating a schema to learn how to create one.

Create a table configuration

/tmp/pinot-quick-start/transcript-table-realtime.json
{
  "tableName": "transcript",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "timestampInEpoch",
    "timeType": "MILLISECONDS",
    "schemaName": "transcript",
    "replicasPerPartition": "1"
  },
  "tenants": {},
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.topic.name": "transcript-topic",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka30.KafkaConsumerFactory",
      "stream.kafka.broker.list": "localhost:9876",
      "realtime.segment.flush.threshold.rows": "0",
      "realtime.segment.flush.threshold.time": "24h",
      "realtime.segment.flush.threshold.segment.size": "50M",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Upload the schema and table configuration

As soon as the real-time table is created, it will begin ingesting from the Kafka topic.

Load sample data into the stream

Push the sample JSON into the Kafka topic:

Query your data

As soon as data flows into the stream, Pinot will consume it and make it available for querying. Open the Query Consolearrow-up-right to examine the real-time data.

Last updated

Was this helpful?