Presto
Integrate with Presto for ad-hoc queries with Full SQL
Start running Presto Image with pre-built Presto Pinot connector.
Docker
Run below command to start a standalone Presto coordinator.
1
docker run \
2
--network pinot-demo \
3
--name=presto-coordinator \
4
-p 8080:8080 \
5
-d apachepinot/pinot-presto:latest
Copied!
Then you can connect to presto with Presto-Cli.
1
if [[ ! -f "/tmp/presto-cli" ]]; then
2
curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.228/presto-cli-0.228-executable.jar -o /tmp/presto-cli
3
chmod +x /tmp/presto-cli
4
fi
5
/tmp/presto-cli --server localhost:8080 --catalog pinot_quickstart --schema default
Copied!
Then write your own queries;
1
presto:default> show tables;
2
Table
3
--------------
4
airlinestats
5
(1 row)
6
7
Query 20200211_185652_00006_w6yfz, FINISHED, 1 node
8
Splits: 19 total, 19 done (100.00%)
9
0:00 [1 rows, 29B] [3 rows/s, 99B/s]
Copied!
1
presto:default> select count(*) as flights_from_ca_to_ny from airlinestats where originstate='CA' and deststate='NY';
2
flights_from_ca_to_ny
3
-----------------------
4
67
5
(1 row)
6
7
Query 20200211_190136_00018_w6yfz, FINISHED, 1 node
8
Splits: 17 total, 17 done (100.00%)
9
0:00 [1 rows, 8B] [5 rows/s, 42B/s]
Copied!
1
presto:default> select * from airlinestats limit 1;
2
flightnum | origin | quarter | lateaircraftdelay | divactualelapsedtime | divwheelsons | divwheelsoffs | airtime | arrdel15 | divtotalgtimes | deptimeblk | destcitymarketid | divairportseqids | dayssinceepoch | deptime | month | crselapsedtime | deststatename | carrier |
3
-----------+--------+---------+-------------------+----------------------+--------------+---------------+---------+----------+----------------+------------+------------------+------------------+----------------+---------+-------+----------------+---------------+---------+
4
122 | DFW | 1 | -2147483648 | -2147483648 | | | 202 | 0 | | 0700-0759 | 32457 | | 16088 | 715 | 1 | 235 | California | AA |
5
(1 row)
6
7
Query 20200211_185719_00007_w6yfz, FINISHED, 1 node
8
Splits: 17 total, 17 done (100.00%)
9
0:02 [1 rows, 325B] [0 rows/s, 133B/s]
Copied!
Meanwhile you can access Presto Cluster UI to see query stats.
Presto Cluster UI

Advanced features

Using Pinot Streaming/gRPC connector

Presto supports aggregation and predicate push down to Pinot. However, for certain queries that Pinot doesn't handle, Presto tries to fetch all the rows from the Pinot table segment by segment. This is definitely not an ideal access pattern for Pinot.
In order to support large data scanning, Pinot (>=0.6.0) introduces a gRPC server for on-demand data scanning with a reasonable smaller memory footprint.
You can enable it by adding the below configs to the Pinot server config file:
1
pinot.server.grpc.enable=true
2
pinot.server.grpc.port=8090
Copied!
Then you can enable the streaming connector in Presto(>=0.244) by adding the below config to the Pinot catalog configs.
1
pinot.use-streaming-for-segment-queries=true
Copied!
(Disclaimer: Presto is a third-party software that is not part of the Apache Software Foundation).
Last modified 4mo ago