arrow-left

All pages
gitbookPowered by GitBook
1 of 5

Loading...

Loading...

Loading...

Loading...

Loading...

APIs

Broker Query API

hashtag
REST API on the Broker

Pinot can be queried via a broker endpoint as follows. This example assumes broker is running on localhost:8099

The Pinot REST API can be accessed by invoking POST operation with a JSON body containing the parameter sql to the /query/sql endpoint on a broker.

When TLS/SSL is not enabled:

When TLS/SSL is enabled:

If the SQL statement contains ", in the JSON body, it needs to be replaced by '"'"', for example:

circle-exclamation

Note

This endpoint is deprecated, and will soon be removed. The standard-SQL endpoint is the recommended endpoint.

The PQL endpoint can be accessed by invoking POST operation with a JSON body containing the parameter pql to the /query

hashtag
Query Console

Query Console can be used for running ad hoc queries (checkbox available to query the PQL endpoint). The Query Console can be accessed by entering the <controller host>:<controller port> in your browser

hashtag
pinot-admin

You can also query using the pinot-admin scripts. Make sure you follow instructions in to get Pinot locally, and then

hashtag

endpoint on a broker.

When TLS/SSL is not enabled:

When TLS/SSL is enabled:

If the PQL statement contains ", in the JSON body, it needs to be replaced by '"'"', for example:

Getting Pinot
$ curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"select foo, count(*) from myTable group by foo limit 100"}' \
   http://localhost:8099/query/sql
$ curl -k -H "Content-Type: application/json" -X POST \
   -d '{"sql":"select foo, count(*) from myTable group by foo limit 100"}' \
   https://localhost:8099/query/sql
$ curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"select foo, count(*) from myTable where foo='"'"'abc'"'"' limit 100"}' \
   http://localhost:8099/query/sql
$ curl -H "Content-Type: application/json" -X POST \
   -
$ curl -k -H "Content-Type: application/json" -X POST \
$ curl -H "Content-Type: application/json" -X POST \
   -
cd incubator-pinot/pinot-tools/target/pinot-tools-pkg 
bin/pinot-admin.sh PostQuery \
  -queryType sql \
  -brokerPort 8000 \
  -query "select count(*) from baseballStats"
2020/03/04 12:46:33.459 INFO [PostQueryCommand] [main] Executing command: PostQuery -brokerHost localhost -brokerPort 8000 -queryType sql -query select count(*) from baseballStats
2020/03/04 12:46:33.854 INFO [PostQueryCommand] [main] Result: {"resultTable":{"dataSchema":{"columnDataTypes":["LONG"],"columnNames":["count(*)"]},"rows":[[97889]]},"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numConsumingSegmentsQueried":0,"numDocsScanned":97889,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numGroupsLimitReached":false,"totalDocs":97889,"timeUsedMs":185,"segmentStatistics":[],"traceInfo":{},"minConsumingFreshnessTimeMs":0}
d
'
{"pql":"select count(*) from myTable group by foo top 100"}
'
\
http://localhost:8099/query
-d '{"pql":"select count(*) from myTable group by foo top 100"}' \
https://localhost:8099/query
d
'
{"pql":"select count(*) from myTable where foo=
'"
'
"'
abc
'"
'
"'
top 100"}
'
\
http://localhost:8099/query

Controller Admin API

The Pinot Admin UIarrow-up-right contains all the APIs that you will need to operate and manage your cluster. It provides a set of APIs for Pinot cluster management including health check, instances management, schema and table management, data segments management.

Note: The controller API's are primarily for admin tasks. Even though the UI console queries Pinot when running queries from the query console, use the Broker Query APIarrow-up-right for querying Pinot.

Let's check out the tables in this cluster by going to Table -> List all tables in clusterarrow-up-right and click on Try it out!. We can see the baseballStats table listed here. We can also see the exact curl call made to the controller API.

You can look at the configuration of this table by going to , type in baseballStats in the table name, and click Try it out!

Let's check out the schemas in the cluster by going to and click Try it out!. We can see a schema called baseballStats in this list.

Take a look at the schema by going to , type baseballStats in the schema name, and click Try it out!.

Finally, let's checkout the data segments in the cluster by going to , type in baseballStats in the table name, and click Try it out!. There's 1 segment for this table, called baseballStats_OFFLINE_0.

You might have figured out by now, in order to get data into the Pinot cluster, we need a table, a schema and segments. Let's head over to , to find out more about these components and learn how to create them for your own data.

Tables -> Get/Enable/Disable/Drop a tablearrow-up-right
Schema -> List all schemas in the clusterarrow-up-right
Schema -> Get a schemaarrow-up-right
List all segmentsarrow-up-right
Batch upload sample data
List all tables in cluster
List all schemas in the cluster
{
  "schemaName": "baseballStats",
  "dimensionFieldSpecs": [
    {
      "name": "playerID",
      "dataType": "STRING"
    },
    {
      "name": "yearID",
      "dataType": "INT"
    },
    {
      "name": "teamID",
      "dataType": "STRING"
    },
    {
      "name": "league",
      "dataType": "STRING"
    },
    {
      "name": "playerName",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "playerStint",
      "dataType": "INT"
    },
    {
      "name": "numberOfGames",
      "dataType": "INT"
    },
    {
      "name": "numberOfGamesAsBatter",
      "dataType": "INT"
    },
    {
      "name": "AtBatting",
      "dataType": "INT"
    },
    {
      "name": "runs",
      "dataType": "INT"
    },
    {
      "name": "hits",
      "dataType": "INT"
    },
    {
      "name": "doules",
      "dataType": "INT"
    },
    {
      "name": "tripples",
      "dataType": "INT"
    },
    {
      "name": "homeRuns",
      "dataType": "INT"
    },
    {
      "name": "runsBattedIn",
      "dataType": "INT"
    },
    {
      "name": "stolenBases",
      "dataType": "INT"
    },
    {
      "name": "caughtStealing",
      "dataType": "INT"
    },
    {
      "name": "baseOnBalls",
      "dataType": "INT"
    },
    {
      "name": "strikeouts",
      "dataType": "INT"
    },
    {
      "name": "intentionalWalks",
      "dataType": "INT"
    },
    {
      "name": "hitsByPitch",
      "dataType": "INT"
    },
    {
      "name": "sacrificeHits",
      "dataType": "INT"
    },
    {
      "name": "sacrificeFlies",
      "dataType": "INT"
    },
    {
      "name": "groundedIntoDoublePlays",
      "dataType": "INT"
    },
    {
      "name": "G_old",
      "dataType": "INT"
    }
  ]
}

Controller API Reference

All user APIs available in Pinot

The full up-to-date list of APIs can be viewed on Swagger.

hashtag
Cluster

hashtag
GET /cluster/configs

List all the cluster configs. These are fetched from Zookeeper from the CONFIGS/CLUSTER/<clusterName> znode.

Request

Response

hashtag
POST /cluster/configs

Post new configs to cluster. These will get stored in the same znode as above i.e. CONFIGS/CLUSTER/<clusterName>. These properties are appended to the existing properties if keys are new, else they will be updated if key already exists.

Request

Response

hashtag
DELETE /cluster/configs

Delete a cluster config.

Request

Response

hashtag
GET /cluster/info

Gets cluster related info, such as cluster name

Request

Response

hashtag
Health

hashtag
GET /health

Check controller health. Status are OK or WebApplicationException with ServiceUnavailable and message

Request

Response

hashtag
Leader

hashtag
GET /leader/tables

Gets the leader resource map, which shows the tables that are mapped to each leader.

Request

Response

hashtag
GET /leader/tables/<tableName>

Gets the leaders for the specific table

Request

Response

hashtag
Table

hashtag
GET /debug/tables/<tableName>

Debug information for the table, which includes metadata and error status about segments, ingestion, servers and brokers of the table

Request

Response

curl -X GET "http://localhost:9000/cluster/configs" -H "accept: application/json"
{
  "allowParticipantAutoJoin": "true",
  "enable.case.insensitive": "false",
  "pinot.broker.enable.query.limit.override": "false",
  "default.hyperloglog.log2m": "8"
}
curl -X POST "http://localhost:9000/cluster/configs" 
-H "accept: application/json" 
-H "Content-Type: application/json" 
-d "{ \"pinot.helix.instance.state.maxStateTransitions\" : \"20\", \"custom.cluster.prop\": \"foo\"}"
{
  "status": "Updated cluster config."
}
curl -X DELETE "http://localhost:9000/cluster/configs/custom.cluster.prop" 
{
  "status": "Deleted cluster config: custom.cluster.prop"
}
curl -X GET "http://localhost:9000/cluster/info" -H "accept: application/json"
{
  "clusterName": "QuickStartCluster"
}
curl -X GET "http://localhost:9000/health" -H "accept: text/plain"
OK
curl -X GET "http://localhost:9000/leader/tables" -H "accept: application/json"
{
  "leadControllerEntryMap": {
    "leadControllerResource_0": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_1": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_2": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_3": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_4": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_5": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_6": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_7": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "baseballStats_OFFLINE"
      ]
    },
    "leadControllerResource_8": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "dimBaseballTeams_OFFLINE",
        "starbucksStores_OFFLINE"
      ]
    },
    "leadControllerResource_9": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "billing_OFFLINE"
      ]
    },
    "leadControllerResource_10": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_11": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_12": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_13": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "githubComplexTypeEvents_OFFLINE"
      ]
    },
    "leadControllerResource_14": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_15": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "githubEvents_OFFLINE"
      ]
    },
    "leadControllerResource_16": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_17": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_18": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_19": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "airlineStats_OFFLINE"
      ]
    },
    "leadControllerResource_20": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_21": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_22": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    },
    "leadControllerResource_23": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": []
    }
  },
  "leadControllerResourceEnabled": true
}
curl -X GET "http://localhost:9000/leader/tables/baseballStats" -H "accept: application/json"
{
  "leadControllerEntryMap": {
    "leadControllerResource_7": {
      "leadControllerId": "Controller_192.168.1.24_9000",
      "tableNames": [
        "baseballStats"
      ]
    }
  },
  "leadControllerResourceEnabled": true
}
curl -X GET "http://localhost:9000/debug/tables/baseballStats?type=OFFLINE&verbosity=0" -H "accept: application/json"
[
  {
    "tableName": "baseballStats_OFFLINE",
    "numSegments": 1,
    "numServers": 1,
    "numBrokers": 1,
    "segmentDebugInfos": [],
    "serverDebugInfos": [],
    "brokerDebugInfos": [],
    "tableSize": {
      "reportedSize": "3 MB",
      "estimatedSize": "3 MB"
    },
    "ingestionStatus": {
      "ingestionState": "HEALTHY",
      "errorMessage": ""
    }
  }
]

Query Response Format

Find Pinot query response format examples for selection, aggregation, and group by queries formatted in a SQL-like structure. Also find details about each field included in the Pinot broker query response.

To learn more about how a Pinot broker routes and processes queries, computes the query explain plan, and ways to optimize queries, see the following topics:

  • Processing queries

  • Query explain plans:

hashtag
Standard-SQL response

The query response is returned in a SQL-like tabular structure from the standard-SQL endpoint.

hashtag
Broker query response fields

Response Field
Description

totalDocs

Number of documents/records in the table.

numServersQueried

Represents the number of servers queried by the broker (may be less than the total number of servers since the broker can apply some optimizations to minimize the number of servers).

numServersResponded

This should be equal to the numServersQueried. If this is not the same, then one of more servers might have timed out. If numServersQueried != numServersResponded, the results can be considered partial and clients can retry the query with exponential back off.

numSegmentsQueried

The total number of segmentsQueried for a query. May be less than the total number of segments if the broker applies optimizations.

The broker decides how many segments to query on each server, based on broker pruning logic. The server decides how many of these segments to actually look at, based on server pruning logic. After processing segments for a query, fewer may have the matching records. In general, numSegmentsQueried >= numSegmentsProcessed >= numSegmentsMatched.

numSegmentsMatched

The number of segments processed with at least one document matched in the query response.

numSegmentsProcessed

The number of segment operators used to process segments. Indicates the effectiveness of the pruning logic. For more information, see query plans for:

numDocScanned

The number of docs/records selected after the filter phase.

numEntriesScannedInFilter

The number of entries scanned in the filtering phase of query execution.

Can be larger than the total scanned doc count because of multiple filtering predicates or multi-value entries.

Can also be smaller than the total scanned doc count if indexing is used for filtering.

This along with numEntriesScannedInPostFilter indicates where most of the time is spent during query processing. If this value is high, enabling indexing for columns in tableConfig is a way to bring it down.

numEntriesScannedPostFilter

The number of entries scanned after the filtering phase of query execution, ie. aggregation and/or group-by phases. This is equivalent to numDocScanned * number of projected columns.

This along with numEntriesScannedInFilter indicates where most of the time is spent during query processing.

A high number for this means the selectivity is low (that is, Pinot needs to scan a lot of records to answer the query). If this is high, consider using star-tree index. (A regular inverted/bitmap index won't improve performance.)

numGroupsLimitReached

If the query has a group by clause and top K, Pinot drops new entries after the numGroupsLimit is reached. If this boolean is set to true, the query result may not be accurate. The default value for numGroupsLimit is 100k, and should be sufficient for most use cases.

exceptions

Will contain the stack trace if there is any exception processing the query.

segmentStatistics

N/A

stageStats

In multi-stage queries, this field contains the stats for each stage. See to know more about how to interpret them.

resultTable

Contains everything needed to process the response

resultTable.dataSchema

Describes the schema of the response, including columnNames and their dataTypes

resultTable.dataSchema.columnNames

columnNames in the response

resultTable.dataSchema.columnDataTypes

dataTypes for each column

resultTable.rows

Actual content with values. This is an array of arrays. The number of rows depends on the limit value in the query. The number of columns in each row is equal to the length of resultTable.dataSchema.columnNames

timeUsedms

Total time taken as seen by the broker before sending the response back to the client.

Single-stage query engine
Multi-stage query engine
Optimizing query routing
Use adaptive server selection
$ curl -H "Content-Type: application/json" -X POST \
   -d '{"sql":"SELECT moo, bar, foo FROM myTable ORDER BY foo DESC"}' \
   http://localhost:8099/query/sql
{
  "exceptions": [], 
  "minConsumingFreshnessTimeMs": 0, 
  "numConsumingSegmentsQueried": 0, 
  "numDocsScanned": 6, 
  "numEntriesScannedInFilter": 0, 
  "numEntriesScannedPostFilter": 18, 
  "numGroupsLimitReached": false, 
  "numSegmentsMatched": 2, 
  "numSegmentsProcessed": 2, 
  "numSegmentsQueried": 2, 
  "numServersQueried": 1, 
  "numServersResponded": 1, 
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "LONG",
        "INT",
        "STRING"
      ], 
      "columnNames": [
        "moo", 
        "bar",
        "foo"
      ]
    }, 
    "rows": [
      [ 
        40015, 
        2019,
        "xyz"
      ], 
      [
        1002,
        2001,
        "pqr"
      ], 
      [
        20555,
        1988,
        "pqr"
      ],
      [ 
        203,
        2010,
        "pqr"
      ], 
      [
        500,
        2008,
        "abc"
      ], 
      [
        60, 
        2003,
        "abc"
      ]
    ]
  }, 
  "segmentStatistics": [], 
  "timeUsedMs": 4, 
  "totalDocs": 6, 
  "traceInfo": {}
}
$ curl -X POST \
  -d '{"sql":"SELECT SUM(moo), MAX(bar), COUNT(*) FROM myTable"}' \
  localhost:8099/query/sql -H "Content-Type: application/json" 
{
  "exceptions": [], 
  "minConsumingFreshnessTimeMs": 0, 
  "numConsumingSegmentsQueried": 0, 
  "numDocsScanned": 6, 
  "numEntriesScannedInFilter": 0, 
  "numEntriesScannedPostFilter": 12, 
  "numGroupsLimitReached": false, 
  "numSegmentsMatched": 2, 
  "numSegmentsProcessed": 2, 
  "numSegmentsQueried": 2, 
  "numServersQueried": 1, 
  "numServersResponded": 1, 
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "DOUBLE", 
        "DOUBLE", 
        "LONG"
      ], 
      "columnNames": [
        "sum(moo)", 
        "max(bar)", 
        "count(*)"
      ]
    }, 
    "rows": [
      [
        62335, 
        2019.0, 
        6
      ]
    ]
  }, 
  "segmentStatistics": [], 
  "timeUsedMs": 87, 
  "totalDocs": 6, 
  "traceInfo": {}
}
$ curl -X POST \
  -d '{"sql":"SELECT SUM(moo), MAX(bar) FROM myTable GROUP BY foo ORDER BY foo"}' \
  localhost:8099/query/sql -H "Content-Type: application/json" 
{
  "exceptions": [], 
  "minConsumingFreshnessTimeMs": 0, 
  "numConsumingSegmentsQueried": 0, 
  "numDocsScanned": 6, 
  "numEntriesScannedInFilter": 0, 
  "numEntriesScannedPostFilter": 18, 
  "numGroupsLimitReached": false, 
  "numSegmentsMatched": 2, 
  "numSegmentsProcessed": 2, 
  "numSegmentsQueried": 2, 
  "numServersQueried": 1, 
  "numServersResponded": 1, 
  "resultTable": {
    "dataSchema": {
      "columnDataTypes": [
        "STRING", 
        "DOUBLE", 
        "DOUBLE"
      ], 
      "columnNames": [
        "foo", 
        "sum(moo)", 
        "max(bar)"
      ]
    }, 
    "rows": [
      [
        "abc", 
        560.0, 
        2008.0
      ], 
      [
        "pqr", 
        21760.0, 
        2010.0
      ], 
      [
        "xyz", 
        40015.0, 
        2019.0
      ]
    ]
  }, 
  "segmentStatistics": [], 
  "timeUsedMs": 15, 
  "totalDocs": 6, 
  "stageStats": {}
}
Single-stage query engine
Multi-stage query engine
Understanding multi-stage stats