Pinot can be queried via a broker endpoint as follows. This example assumes broker is running on localhost:8099
The Pinot REST API can be accessed by invoking POST
operation with a JSON body containing the parameter sql
to the /query/sql
endpoint on a broker.
When TLS/SSL is not enabled:
When TLS/SSL is enabled:
If the SQL statement contains "
, in the JSON body, it needs to be replaced by '"'"'
, for example:
Note
This endpoint is deprecated, and will soon be removed. The standard-SQL endpoint is the recommended endpoint.
The PQL endpoint can be accessed by invoking POST
operation with a JSON body containing the parameter pql
to the /query
endpoint on a broker.
When TLS/SSL is not enabled:
When TLS/SSL is enabled:
If the PQL statement contains "
, in the JSON body, it needs to be replaced by '"'"'
, for example:
Query Console can be used for running ad hoc queries (checkbox available to query the PQL endpoint). The Query Console can be accessed by entering the <controller host>:<controller port>
in your browser
You can also query using the pinot-admin
scripts. Make sure you follow instructions in Getting Pinot to get Pinot locally, and then
All user APIs available in Pinot
The full up-to-date list of APIs can be viewed on Swagger.
List all the cluster configs. These are fetched from Zookeeper from the CONFIGS/CLUSTER/<clusterName> znode.
Request
Response
Post new configs to cluster. These will get stored in the same znode as above i.e. CONFIGS/CLUSTER/<clusterName>. These properties are appended to the existing properties if keys are new, else they will be updated if key already exists.
Request
Response
Delete a cluster config.
Request
Response
Gets cluster related info, such as cluster name
Request
Response
Check controller health. Status are OK or WebApplicationException with ServiceUnavailable and message
Request
Response
Gets the leader resource map, which shows the tables that are mapped to each leader.
Request
Response
Gets the leaders for the specific table
Request
Response
Debug information for the table, which includes metadata and error status about segments, ingestion, servers and brokers of the table
Request
Response
Response is returned in a SQL-like tabular structure. Note, this is the response returned from the standard-SQL endpoint. For PQL endpoint response, skip to PQL endpoint response
Note
PQL endpoint is deprecated, and will soon be removed. The standard sql endpoint is the recommended endpoint.
The response received from PQL endpoint is different depending on the type of the query.
The Pinot Admin UI contains all the APIs that you will need to operate and manage your cluster. It provides a set of APIs for Pinot cluster management including health check, instances management, schema and table management, data segments management.
Note: The controller API's are primarily for admin tasks. Even though the UI console queries Pinot when running queries from the query console, use the Broker Query API for querying Pinot.
Let's check out the tables in this cluster by going to Table -> List all tables in cluster and click on Try it out!
. We can see the baseballStats
table listed here. We can also see the exact curl
call made to the controller API.
You can look at the configuration of this table by going to Tables -> Get/Enable/Disable/Drop a table, type in baseballStats
in the table name, and click Try it out!
Let's check out the schemas in the cluster by going to Schema -> List all schemas in the cluster and click Try it out!
. We can see a schema called baseballStats
in this list.
Take a look at the schema by going to Schema -> Get a schema, type baseballStats
in the schema name, and click Try it out!
.
Finally, let's checkout the data segments in the cluster by going to List all segments, type in baseballStats
in the table name, and click Try it out!
. There's 1 segment for this table, called baseballStats_OFFLINE_0
.
You might have figured out by now, in order to get data into the Pinot cluster, we need a table, a schema and segments. Let's head over to Batch upload sample data, to find out more about these components and learn how to create them for your own data.
Response Field | Description |
---|---|
resultTable
This contains everything needed to process the response
resultTable.dataSchema
This describes schema of the response (columnNames and their dataTypes)
resultTable.dataSchema.columnNames
columnNames in the response.
resultTable.dataSchema.columnDataTypes
DataTypes for each column
resultTable.rows
Actual content with values. This is an array of arrays. number of rows depends on the limit value in the query. The number of columns in each row is equal to the length of (resultTable.dataSchema.columnNames)
timeUsedms
Total time taken as seen by the broker before sending the response back to the client
totalDocs
This is number of documents/records in the table
numServersQueried
represents the number of servers queried by the broker (note that this may be less than the total number of servers since broker can apply some optimizations to minimize the number of servers)
numServersResponded
This should be equal to the numServersQueried. If this is not the same, then one of more servers might have timed out. If numServersQueried != numServersResponded the results can be considered partial and clients can retry the query with exponential back off.
numSegmentsQueried
Total number of segmentsQueried for this query. it may be less than the total number of segments since broker can apply optimizations.
numSegmentsMatched
This is the number of segments processed with at least one document matched query response. In general numSegmentsQueried <= numSegmentsProcessed <= numSegmentsMatched.
numSegmentsProcessed
Number of segment operators used to process segments. This is indicates the effectiveness of the pruning logic.
numDocScanned
The number of docs/records that were selected after filter phase.
numEntriesScannedInFilter
The number of entries scanned in the filtering phase of query execution.
It could be larger than the total scanned doc count because of multiple filtering predicate and/or multi-value entries.
It can also be smaller than the total scanned doc count if indexing is used for filtering.
This along with numEntriesScannedInPostFilter should give an idea on where most of the time is spent during query processing. If this is high, enabling indexing for columns in tableConfig can be one way to bring it down.
numEntriesScannedPostFilter
The number of entries scanned after the filtering phase of query execution, ie. aggregation and/or group-by phases. This is equivalent to numDocScanned * number of projected columns.
This along with numEntriesScannedInFilter should give an idea on where most of the time is spent during query processing.
A high number for this means the selectivity is low (i.e. pinot needs to scan a lot of records to answer the query). If this is high, adding regular inverted/bitmap index will not help. However, consider using star-tree index.
numGroupsLimitReached
If the query has group by clause and top K, pinot drops new entries after the numGroupsLimit is reached. If this boolean is set to true then the query result may not be accurate. Note that the default value for numGroupsLimit is 100k and should be sufficient for most use cases.
exceptions
Will contain the stack trace if there is any exception processing the query.
segmentStatistics
N/A
traceInfo
If trace is enabled (can be enabled for each query), this will contain the timing for each stage and each segment. Advanced feature and intended for dev/debugging purposes