LogoLogo
latest
latest
  • Introduction
  • Basics
    • Concepts
      • Pinot storage model
      • Architecture
      • Components
        • Cluster
          • Tenant
          • Server
          • Controller
          • Broker
          • Minion
        • Table
          • Segment
            • Deep Store
            • Segment threshold
            • Segment retention
          • Schema
          • Time boundary
        • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • FST index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
      • Vector index
    • Release notes
      • 1.3.0
      • 1.2.0
      • 1.1.0
      • 1.0.0
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Explain Plan (Single-Stage)
        • Filtering with IdSet
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • JOINs
        • Lookup UDF Join
      • Query Options
      • Query Quotas
      • Query using Cursors
      • Multi-stage query
        • Understanding Stages
        • Stats
        • Optimizing joins
        • Join strategies
          • Random + broadcast join strategy
          • Query time partition join strategy
          • Colocated join strategy
          • Lookup join strategy
        • Hints
        • Operator Types
          • Aggregate
          • Filter
          • Join
          • Intersect
          • Leaf
          • Literal
          • Mailbox receive
          • Mailbox send
          • Minus
          • Sort or limit
          • Transform
          • Union
          • Window
        • Stage-Level Spooling
      • Explain plan
    • APIs
      • Broker Query API
        • Query Response Format
      • Broker GRPC API
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Dependency Management
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
          • Examples and Scenarios
        • Rebalance Brokers
        • Rebalance Tenant
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Tuning Default MMAP Advice
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
      • Pause ingestion based on resource utilization
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
      • Segment Operations Throttling
      • Reload a table segment
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Database
    • Ingestion Job Spec
    • Monitoring Metrics
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
    • Dynamic Environment
  • Manage Data
    • Import Data
      • SQL Insert Into From Files
      • Upload Pinot segment Using CommandLine
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream Ingestion
        • Ingest streaming data from Apache Kafka
        • Ingest streaming data from Amazon Kinesis
        • Ingest streaming data from Apache Pulsar
        • Configure indexes
        • Stream ingestion with CLP
      • Upsert and Dedup
        • Stream ingestion with Upsert
        • Segment compaction on upserts
        • Stream ingestion with Dedup
      • Supported Data Formats
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Complex Type (Array, Map) Handling
        • Complex Type Examples (Unnest)
      • Ingest records with dynamic schemas
  • Functions
    • Aggregation Functions
    • Transformation Functions
    • Array Functions
    • Funnel Analysis Functions
    • Hash Functions
    • JSON Functions
    • User-Defined Functions (UDFs)
    • URL Functions
    • Unique Count and cardinality Estimation Functions
  • Window Functions
  • (Deprecating) Function List
    • ABS
    • ADD
    • ago
    • EXPR_MIN / EXPR_MAX
    • ARRAY_AGG
    • arrayConcatDouble
    • arrayConcatFloat
    • arrayConcatInt
    • arrayConcatLong
    • arrayConcatString
    • arrayContainsInt
    • arrayContainsString
    • arrayDistinctInt
    • arrayDistinctString
    • arrayIndexOfInt
    • arrayIndexOfString
    • ARRAYLENGTH
    • arrayRemoveInt
    • arrayRemoveString
    • arrayReverseInt
    • arrayReverseString
    • arraySliceInt
    • arraySliceString
    • arraySortInt
    • arraySortString
    • arrayUnionInt
    • arrayUnionString
    • AVGMV
    • Base64
    • caseWhen
    • ceil
    • CHR
    • codepoint
    • concat
    • count
    • COUNTMV
    • COVAR_POP
    • COVAR_SAMP
    • day
    • dayOfWeek
    • dayOfYear
    • DISTINCT
    • DISTINCTCOUNT
    • DISTINCTCOUNTMV
    • DISTINCT_COUNT_OFF_HEAP
    • SEGMENTPARTITIONEDDISTINCTCOUNT
    • DISTINCTCOUNTBITMAP
    • DISTINCTCOUNTBITMAPMV
    • DISTINCTCOUNTHLL
    • DISTINCTCOUNTHLLMV
    • DISTINCTCOUNTRAWHLL
    • DISTINCTCOUNTRAWHLLMV
    • DISTINCTCOUNTSMARTHLL
    • DISTINCTCOUNTHLLPLUS
    • DISTINCTCOUNTULL
    • DISTINCTCOUNTTHETASKETCH
    • DISTINCTCOUNTRAWTHETASKETCH
    • DISTINCTSUM
    • DISTINCTSUMMV
    • DISTINCTAVG
    • DISTINCTAVGMV
    • DIV
    • DATETIMECONVERT
    • DATETRUNC
    • exp
    • FIRSTWITHTIME
    • FLOOR
    • FrequentLongsSketch
    • FrequentStringsSketch
    • FromDateTime
    • FromEpoch
    • FromEpochBucket
    • FUNNELCOUNT
    • FunnelCompleteCount
    • FunnelMaxStep
    • FunnelMatchStep
    • GridDistance
    • Histogram
    • hour
    • isSubnetOf
    • JSONFORMAT
    • JSONPATH
    • JSONPATHARRAY
    • JSONPATHARRAYDEFAULTEMPTY
    • JSONPATHDOUBLE
    • JSONPATHLONG
    • JSONPATHSTRING
    • jsonextractkey
    • jsonextractscalar
    • LAG
    • LASTWITHTIME
    • LEAD
    • length
    • ln
    • lower
    • lpad
    • ltrim
    • max
    • MAXMV
    • MD5
    • millisecond
    • min
    • minmaxrange
    • MINMAXRANGEMV
    • MINMV
    • minute
    • MOD
    • mode
    • month
    • mult
    • now
    • percentile
    • percentileest
    • percentileestmv
    • percentilemv
    • percentiletdigest
    • percentiletdigestmv
    • percentilekll
    • percentilerawkll
    • percentilekllmv
    • percentilerawkllmv
    • quarter
    • regexpExtract
    • regexpReplace
    • remove
    • replace
    • reverse
    • round
    • roundDecimal
    • ROW_NUMBER
    • rpad
    • rtrim
    • second
    • sha
    • sha256
    • sha512
    • sqrt
    • startswith
    • ST_AsBinary
    • ST_AsText
    • ST_Contains
    • ST_Distance
    • ST_GeogFromText
    • ST_GeogFromWKB
    • ST_GeometryType
    • ST_GeomFromText
    • ST_GeomFromWKB
    • STPOINT
    • ST_Polygon
    • strpos
    • ST_Union
    • SUB
    • substr
    • sum
    • summv
    • TIMECONVERT
    • timezoneHour
    • timezoneMinute
    • ToDateTime
    • ToEpoch
    • ToEpochBucket
    • ToEpochRounded
    • TOJSONMAPSTR
    • toGeometry
    • toSphericalGeography
    • trim
    • upper
    • Url
    • UTF8
    • VALUEIN
    • week
    • year
    • Extract
    • yearOfWeek
    • FIRST_VALUE
    • LAST_VALUE
    • ST_GeomFromGeoJSON
    • ST_GeogFromGeoJSON
    • ST_AsGeoJSON
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
    • Troubleshooting
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Troubleshoot issues with ZooKeeper znodes
      • Realtime Ingestion Stopped
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Enable GRPC query entrypoint in broker
  • Broker GRPC Clients
  • Java Grpc Client
  • JDBC Grpc Client
  • Client Configurations
  • Benchmark
  • Compression
  • Block Size
  • Encoding

Was this helpful?

Edit on GitHub
Export as PDF
  1. For Users
  2. APIs

Broker GRPC API

PreviousQuery Response FormatNextController Admin API

Last updated 1 month ago

Was this helpful?

Apart from HTTP based query endpoint, Pinot also supports gRPC endpoint in broker.

Enable GRPC query entrypoint in broker

Add gRPC port config in pinot broker to enable the :

pinot.broker.grpc.port=8010

if you want to enable TLS, then use below configs:

pinot.broker.grpc.tls.enabled=true
pinot.broker.grpc.tls.port=8020

// Server side TLS is used. The client does not present a certificate.
// The server only verifies the client’s connection via its own certificate and doesn’t validate the client’s identity via TLS.
// Common in public APIs, where only the server needs to be trusted.
pinot.broker.grpctls.client.auth.enabled=false

// Config TLS keystore and truststore
pinot.broker.grpctls.keystore.path=/home/pinot/tls-store/keystore-internal.jks
pinot.broker.grpctls.keystore.password=changeit
pinot.broker.grpctls.keystore.type=JKS
pinot.broker.grpctls.truststore.path=/home/pinot/tls-store/truststore.jks
pinot.broker.grpctls.truststore.password=changeit
pinot.broker.grpctls.truststore.type=JKS

Broker GRPC Clients

Below are the examples of usage for pinot-java-client and pinot-jdbc-client .

Java Grpc Client

The main difference of the usage here is that ConnectionFactory will return a GrpcConnection instead of Connection.

If you want to use ARROW as the encoding type, you must start Java with

--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED

(See https://arrow.apache.org/docs/java/install.html)

Below is a sample code to use the java client: GrpcConnection:

package org.apache.pinot.client.examples;

import java.io.IOException;
import java.util.Properties;
import org.apache.pinot.client.ConnectionFactory;
import org.apache.pinot.client.ResultSetGroup;
import org.apache.pinot.client.grpc.GrpcConnection;


public class PinotBrokerGrpcClientExample {

  private PinotBrokerGrpcClientExample() {
  }
  
  public static void main(String[] args)
      throws IOException {
    Properties properties = new Properties();
    properties.put("encoding", "JSON");
    properties.put("compression", "ZSTD");
    properties.put("blockRowSize", "10000");
    GrpcConnection grpcConnection = ConnectionFactory.fromControllerGrpc(properties, "localhost:9000");
    // GrpcConnection grpcConnection = ConnectionFactory.fromZookeeperGrpc(properties, "localhost:2123/QuickStartCluster");
    // GrpcConnection grpcConnection = ConnectionFactory.fromHostListGrpc(properties, List.of("localhost:8010"));
    ResultSetGroup resultSetGroup = grpcConnection.execute("SELECT * FROM airlineStats limit 1000");
    for (int i = 0; i < resultSetGroup.getResultSetCount(); i++) {
      org.apache.pinot.client.ResultSet resultSet = resultSetGroup.getResultSet(i);
      for (int rowId = 0; rowId < resultSet.getRowCount(); rowId++) {
        System.out.print("Row Id: " + rowId + "\t");
        for (int colId = 0; colId < resultSet.getColumnCount(); colId++) {
          System.out.print(resultSet.getString(rowId, colId) + "\t");
        }
        System.out.println();
      }
    }
    grpcConnection.close();
  }
}

JDBC Grpc Client

The main usage difference here is scheme is changed to pinotgrpc.

If you want to use ARROW as the encoding type, you must start Java with

--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED

(See https://arrow.apache.org/docs/java/install.html)

Below is a sample code to use JDBC driver.

package org.apache.pinot.client.examples;

import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Statement;


public class PinotBrokerGrpcJdbcClientExample {

  private PinotBrokerGrpcJdbcClientExample() {
  }

  public static void main(String[] args)
      throws IOException {
    try (Connection connection = DriverManager.getConnection("jdbc:pinotgrpc://localhost:9000?blockRowSize=10000&encoding=JSON&compression=ZSTD");
        Statement statement = connection.createStatement();
        ResultSet resultSet = statement.executeQuery("SELECT * FROM airlineStats limit 1000")) {
      // Print results
      ResultSetMetaData metaData = resultSet.getMetaData();
      int columnCount = metaData.getColumnCount();
      while (resultSet.next()) {
        System.out.print("Row Id: " + resultSet.getRow() + "\t");
        for (int i = 1; i <= columnCount; i++) {
          System.out.print(metaData.getColumnName(i) + ": " + resultSet.getString(i) + "\t");
        }
        System.out.println();
      }
    } catch (SQLException e) {
      e.printStackTrace();
    }
  }
}

Client Configurations

Below are the parameters to set per connection basis:

  1. blockRowSize: the number of rows per block that grpc response will return, default is 10000.

  2. compression: the compression algorithm over the wire, default is ZSTD, other options:

    1. LZ4_FAST: fast than LZ4_HIGH but not as high compression rate

    2. LZ4_HIGH

    3. ZSTD (Default)

    4. DEFLATE

    5. GZIP

    6. SNAPPY

    7. PASS_THROUGH/NONE: no compression, fast but could be large data transfer over the wire.

  3. encoding: how to do serialization and deserialization for ResultTable transport between BrokerGrpcServer and Grpc Client, default is JSON, other options:

    1. JSON(Default)

    2. ARROW

Benchmark

We did a simple benchmark of query:

SELECT * FROM airlineStats limit 1000

Will compress 97k (9.2MB) rows with block size 10K.

Compression

CompressionType
Compression Ratio
Response Latency(ms)
Latency Ratio

LZ4_FAST

42.19%

477.6

103.02%

LZ4_HIGH

30.04%

977.0

210.74%

ZSTD

26.10%

485.4

104.70%

DEFLATE

25.64%

810.0

174.72%

GZIP

25.64%

811.8

175.11%

SNAPPY

42.56%

461.2

99.48%

NONE

100%

463.6

100%

Block Size

We also tried for ZSTD the same data set with different block size:

BlockRowSize
Compression Ratio
Avg Latency(ms)

100

30.48%

539.6

1000

27.56%

478.3

10000

26.10%

485.4

100000

25.73%

561.2

Encoding

Tested with ZSTD compression for the same query but different encoding and block size:

BlockRowSize
Encoding
Compression
Avg Latency(ms)
Bytes
Compression Ratio

1000

JSON

ZSTD

473

2559246

27.56%

10000

JSON

ZSTD

473

2424082

26.10%

100000

JSON

ZSTD

594

2389717

25.73%

1000

ARROW

ZSTD

372

3054993

32.90%

10000

ARROW

ZSTD

367

2705245

29.13%

100000

ARROW

ZSTD

447

2809744

30.26%

BrokerGrpcServer