LogoLogo
release-0.4.0
release-0.4.0
  • Introduction
  • Basics
    • Concepts
    • Architecture
    • Components
      • Cluster
      • Controller
      • Broker
      • Server
      • Minion
      • Tenant
      • Table
      • Schema
      • Segment
    • Getting started
      • Frequent questions
      • Running Pinot locally
      • Running Pinot in Docker
      • Running Pinot in Kubernetes
      • Public cloud examples
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Manual cluster setup
      • Batch import example
      • Stream ingestion example
    • Data import
      • Stream ingestion
        • Import from Kafka
      • File systems
        • Import from ADLS (Azure)
        • Import from HDFS
        • Import from GCP
      • Input formats
        • Import from CSV
        • Import from JSON
        • Import from Avro
        • Import from Parquet
        • Import from Thrift
        • Import from ORC
    • Feature guides
      • Pinot data explorer
      • Text search support
      • Indexing
    • Releases
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • GitHub Events Stream
  • For Users
    • Query
      • Pinot Query Language (PQL)
        • Unique Counting
    • API
      • Querying Pinot
        • Response Format
      • Pinot Rest Admin Interface
    • Clients
      • Java
      • Golang
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Pluggable Streams
        • Pluggable Storage
        • Record Reader
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Update Documentation
    • Advanced
      • Data Ingestion Overview
      • Advanced Pinot Setup
    • Tutorials
      • Pinot Architecture
      • Store Data
        • Batch Tables
        • Streaming Tables
      • Ingest Data
        • Batch
          • Creating Pinot Segments
          • Write your batch
          • HDFS
          • AWS S3
          • Azure Storage
          • Google Cloud Storage
        • Streaming
          • Creating Pinot Segments
          • Write your stream
          • Kafka
          • Azure EventHub
          • Amazon Kinesis
          • Google Pub/Sub
    • Design Documents
  • For Operators
    • Basics
      • Setup cluster
      • Setup table
      • Setup ingestion
      • Access Control
      • Monitoring
      • Tuning
        • Realtime
        • Routing
    • Tutorials
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Batch Data Ingestion In Practice
  • RESOURCES
    • Community
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • ThirdEye
    • Superset
    • Presto
  • PLUGINS
    • Plugin Architecture
    • Pinot Input Format
    • Pinot File System
    • Pinot Batch Ingestion
    • Pinot Stream Ingestion
Powered by GitBook
On this page

Was this helpful?

Edit on Git
Export as PDF
  1. For Developers
  2. Tutorials
  3. Ingest Data
  4. Streaming

Creating Pinot Segments

Realtime segment generation

To consume in realtime, we simply need to create a table with the same name as the schema and point to the Kafka topic to consume from, using a table definition such as this one:

{
  "tableName": "flights",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "7",
    "segmentPushFrequency": "daily",
    "segmentPushType": "APPEND",
    "replication": "1",
    "timeColumnName": "daysSinceEpoch",
    "timeType": "DAYS",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy"
  },
  "tableIndexConfig": {
    "invertedIndexColumns": [
      "flightNumber",
      "tags",
      "daysSinceEpoch"
    ],
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "highLevel",
      "stream.kafka.topic.name": "flights-realtime",
      "stream.kafka.decoder.class.name": "org.apache.pinot.core.realtime.impl.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.zk.broker.url": "localhost:2181",
      "stream.kafka.hlc.zk.connect.string": "localhost:2181"
    }
  },
  "tenants": {
    "broker": "brokerTenant",
    "server": "serverTenant"
  },
  "metadata": {
  }
}

First, we’ll start a local instance of Kafka and start streaming data into it:Untitled

bin/pinot-admin.sh StartKafka &
bin/pinot-admin.sh StreamAvroIntoKafka -avroFile flights-2014.avro -kafkaTopic flights-realtime &

This will stream one event per second from the Avro file to the Kafka topic. Then, we’ll create a realtime table, which will start consuming from the Kafka topic.

bin/pinot-admin.sh AddTable -filePath flights-definition-realtime.json

We can then query the table with the following query to see the events stream in:

SELECT COUNT(*) FROM flights

Repeating the query multiple times should show the events slowly being streamed into the table.

PreviousStreamingNextWrite your stream

Last updated 4 years ago

Was this helpful?