LogoLogo
release-1.3.0
release-1.3.0
  • Introduction
  • Basics
    • Concepts
      • Pinot storage model
      • Architecture
      • Components
        • Cluster
          • Tenant
          • Server
          • Controller
          • Broker
          • Minion
        • Table
          • Segment
            • Deep Store
            • Segment threshold
            • Segment retention
          • Schema
          • Time boundary
        • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream ingestion
        • Ingest streaming data from Apache Kafka
        • Ingest streaming data from Amazon Kinesis
        • Ingest streaming data from Apache Pulsar
        • Configure indexes
      • Stream ingestion with Upsert
      • Segment compaction on upserts
      • Stream ingestion with Dedup
      • Stream ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
        • Complex Type (Array, Map) Handling
        • Complex Type Examples
        • Ingest records with dynamic schemas
      • Reload a table segment
      • Upload a table segment
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • FST index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
      • Vector index
    • Release notes
      • 1.3.0
      • 1.2.0
      • 1.1.0
      • 1.0.0
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Aggregation Functions
        • Array Functions
        • Cardinality Estimation
        • Explain Plan (Single-Stage)
        • Filtering with IdSet
        • Funnel Analysis
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • Hash Functions
        • JOINs
        • Lookup UDF Join
        • Querying JSON data
        • Transformation Functions
        • URL Functions
        • Window Functions
      • Query Options
      • Query Quotas
      • Query using Cursors
      • Multi-stage query
        • Understanding Stages
        • Stats
        • Optimizing joins
        • Join strategies
          • Random + broadcast join strategy
          • Query time partition join strategy
          • Colocated join strategy
          • Lookup join strategy
        • Hints
        • Operator Types
          • Aggregate
          • Filter
          • Join
          • Intersect
          • Leaf
          • Literal
          • Mailbox receive
          • Mailbox send
          • Minus
          • Sort or limit
          • Transform
          • Union
          • Window
        • Stage-Level Spooling
      • User-Defined Functions (UDFs)
      • Explain plan
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Dependency Management
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
        • Rebalance Tenant
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Tuning Default MMAP Advice
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
      • Pause ingestion based on resource utilization
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
      • Segment Operations Throttling
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • EXPR_MIN / EXPR_MAX
      • ARRAY_AGG
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • caseWhen
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTSMARTHLL
      • DISTINCTCOUNTHLLPLUS
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTCOUNTULL
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FIRSTWITHTIME
      • FLOOR
      • FrequentLongsSketch
      • FrequentStringsSketch
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • FUNNELCOUNT
      • FunnelCompleteCount
      • FunnelMaxStep
      • FunnelMatchStep
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • LAG
      • LASTWITHTIME
      • LEAD
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • percentilekll
      • percentilerawkll
      • percentilekllmv
      • percentilerawkllmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • roundDecimal
      • ROW_NUMBER
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • Extract
      • yearOfWeek
      • FIRST_VALUE
      • LAST_VALUE
      • ST_GeomFromGeoJSON
      • ST_GeogFromGeoJSON
      • ST_AsGeoJSON
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
    • Dynamic Environment
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
    • Troubleshooting
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Troubleshoot issues with ZooKeeper znodes
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Concepts
  • Response Store
  • ResponseStoreCleaner
  • User APIs
  • POST /query/sql
  • GET /responseStore/{requestId}/results
  • GET /responseStore/{requestId}/
  • Admin APIs
  • GET /responseStore
  • DELETE /responseStore/{requestId}/
  • Configuration

Was this helpful?

Export as PDF
  1. For Users
  2. Query

Query using Cursors

PreviousQuery QuotasNextMulti-stage query

Was this helpful?

Cursors allow Pinot clients to consume query results in smaller chunks. With this approach,

  • clients use less memory because client-side caching of results is not required.

  • application logic is simpler. For example an app that paginates through results in a table or a graph can get the required slice of results when a page refreshes.

Features of the cursor implementation in Apache Pinot are:

  • A query is run once and its results are stored in a temporary location. The results are cleaned up after a configurable period of time.

  • The first page of results is returned in the response.

  • A client can iterate through the rest of the result set by using the responseStore API.

  • The client can seek forward and backward as well as change the number of rows in the repsonse.

  • Cursors can be used with Single-Stage and Multi-Stage Query Engines.

Concepts

Response Store

A ResponseStore stores the results of a query. The ResponseStore is created and managed by the broker which executes the query.

A client should access a response store from the same broker where it submitted a query.

Clients can determine the broker host & port from the client response. An error is thrown if clients try to access ResponseStores from another broker.

A ResponseStore is identified by the requestId of the query.

Any user that has READ permissions on all tables in the query can read from the response store.

New implementations of ResponseStore can be added by implementing the ResponseStore SPI. A specific implementation of the ResponseStore can be chosen at startup by specifying the config parameter pinot.broker.cursor.response.store.type.

Note that only ONE implementation of the ResponseStore can be used in a cluster.

FsResponseStore

FsResponseStore is the default implementation of the ResponseStore. Internally it uses PinotFileSystem. FsResponseStore can be configured to use any filesystem supported by PinotFileSystem such as HDFS, Amazon S3, Google Cloud Storage or Azure DataLake.

By default, the broker's local storage is used to store responses.

# Example configuration for file using local storage

pinot.broker.cursor.result.store.type=file
pinot.broker.cursor.result.store.file.temp.dir=/home/pinot/broker/data/cursors/temp
pinot.broker.cursor.result.store.file.data.dir=file:///home/pinot/data/cursors/data

#Example configuration for file using S3

pinot.broker.cursor.result.store.type=file
pinot.broker.storage.factory.s3.region=us-west-2
pinot.broker.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.broker.cursor.result.store.file.temp.dir=/home/pinot/broker/data/cursors/temp
pinot.broker.cursor.result.store.file.data.dir.data.dir=s3://bucket/dir/query-results/

ResponseStoreCleaner

This is a periodic job that runs on the controller. A ResponseStore has an expiration time. The ResponseStoreCleaner sends a DELETE request to brokers to delete expired ResponseStores.

User APIs

POST /query/sql

A new API parameter has been added to trigger pagination.

The API accepts the following new optional query parameters:

  • getCursor(boolean):

  • numRows (int): The number of rows to return in the first page.


curl --request POST http://localhost:8000/query/sql?getCursor=true&numRows=1 \
  --data '{"sql":"SELECT * FROM nation limit 100"}' | jq

Response:
{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "n_comment",
        "n_name",
        "n_nationkey",
        "n_regionkey"
      ],
      "columnDataTypes": [
        "STRING",
        "STRING",
        "INT",
        "INT"
      ]
    },
    "rows": [
      [
        " haggle. carefully final deposits detect slyly agai",
        "ALGERIA",
        0,
        0
      ]
    ]
  },
  "numRowsResultSet": 25,
  "requestId": "236490978000000006",
  "offset": 0,
  "numRows": 1,
  "cursorResultWriteTimeMs": 4,
  "submissionTimeMs": 1734928302801,
  "expirationTimeMs": 1734931902801,
  "brokerHost": "127.0.0.1",
  "brokerPort": 8000,
  "bytesWritten": 2489,
  "cursorFetchTimeMs": 0,
}

The output above shows response fields that are specific to cursor responses. Other than numRowsResultSet and requestId, fields common with BrokerResponse are not shown for brevity.

Field
Description

numRowsResultSet

Total numbers of rows in the result set. Same as in default BrokerResponse

requestId

The unique ID for the query. It has to be used in subsequent calls to cursor API. Same as in default BrokerResponse

offset

The offset of the first row in the resultTable.

numRows

The number of rows in the resultTable.

cursorResultWriteTimeMs

Time in milliseconds to write the response to ResponseStore. It is applicable only for the query submission API.

submissionTimeMs

Unix timestamp in milliseconds when the query was submitted.

expirationTimeMs

Expiration time of the ResponseStore in unix timestamp in milliseconds.

brokerHost

Hostname or IP address of the broker that manages the ResponseStore. All subsequent cursor API calls should be directed to this broker.

brokerPort

The port of the broker that manages the ResponseStore

bytesWritten

The number of bytes written to ResponseStore when storing the result set.

cursorFetchTimeMs

Time in milliseconds to fetch the cursor from ResponseStore. It is applicable for cursor fetch API.

GET /responseStore/{requestId}/results

This is broker API that can be used to iterate over the result set of a query in a ResponseStore.

The API accepts the following query parameters:

  • offset (int) (required): Offset of the first row to be fetched. Offset starts from 0 for the first row in the resultset.

  • numRows (int) (optional): The number of rows in the page. If not specified, the value specified by the config parameter "pinot.broker.cursor.fetch.rows" is used.

curl -X GET http://localhost:8000/responseStore/236490978000000006/results\?offset\=1\&numRows\=1 | jq

{
  "resultTable": {
    "dataSchema": {
      "columnNames": [
        "n_comment",
        "n_name",
        "n_nationkey",
        "n_regionkey"
      ],
      "columnDataTypes": [
        "STRING",
        "STRING",
        "INT",
        "INT"
      ]
    },
    "rows": [
      [
        "al foxes promise slyly according to the regular accounts. bold requests alon",
        "ARGENTINA",
        1,
        1
      ]
    ]
  },
  "numRowsResultSet": 25,
  "requestId": "236490978000000006",
  "offset": 1,
  "numRows": 1,
  "cursorResultWriteTimeMs": 0,
  "submissionTimeMs": 1734928302801,
  "expirationTimeMs": 1734931902801,
  "brokerHost": "127.0.0.1",
  "brokerPort": 8000,
  "bytesWritten": 2489,
  "cursorFetchTimeMs": 1,
}

GET /responseStore/{requestId}/

Returns the BrokerResponse metadata of the query.

The API accepts the following URL parameters:

  • requestId (required)

Admin APIs

GET /responseStore

Returns a list of ResponseStores. Only the response metadata is returned.

curl -X GET http://localhost:8000/responseStore | jq

[
  {
    "requestId": "236490978000000005",
    ...
  },
  {
    "requestId": "236490978000000006",
    ...
  }
]

DELETE /responseStore/{requestId}/

Delete the results of a query.

The API accepts the following URL parameters:

  • requestId (required)

Configuration

Configuration parameters with pinot.broker prefix are Broker configuration parameters.

Configuration parameters with controller prefix are Controller configuration parameters.

Configuration
Default
Description

pinot.broker.cursor.response.store.type

file

Specifies the ResponseStore type to instantiate.

pinot.broker.cursor.response.store.file.data.dir

{java.io.tmpdir}/broker/responseStore/data

Directory where the responses will be stored.

pinot.broker.cursor.response.store.file.temp.dir

{java.io.tmpdir}/broker/responseStore/temp

Directory where temporary files will be stored.

pinot.broker.cursor.response.store.expiration

1h

Time To Live for a response store.

pinot.broker.cursor.fetch.rows

10000

The default number of rows in a cursor response.

controller.cluster.response.store.cleaner.frequencyPeriod

1h

The frequency of ResponseStoreCleaner

controller.cluster.response.store.cleaner.initialDelay

random delay between 0-300 seconds

The initial delay before the first run of the periodic task.

System Diagram of Cursor Components
ResponseStore using Blob Store like AWS S3