Presto Integrate with Presto for ad hoc queries with Full SQL
Start running Presto Image with pre-built Presto Pinot connector .
Docker
Run below command to start a standalone Presto coordinator.
Copy docker run \
--network pinot-demo \
--name=presto-coordinator \
-p 8080:8080 \
-d apachepinot/pinot-presto:latest
Then you can connect to presto with Presto-Cli .
Copy if [[ ! -f "/tmp/presto-cli" ]]; then
curl -L https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.228/presto-cli-0.228-executable.jar -o /tmp/presto-cli
chmod +x /tmp/presto-cli
fi
/tmp/presto-cli --server localhost:8080 --catalog pinot_quickstart --schema default
Then write your own queries;
Copy presto:default> show tables;
Table
--------------
airlinestats
(1 row)
Query 20200211_185652_00006_w6yfz, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [1 rows, 29B] [3 rows/s, 99B/s]
Copy presto:default> select count(*) as flights_from_ca_to_ny from airlinestats where originstate='CA' and deststate='NY';
flights_from_ca_to_ny
-----------------------
67
(1 row)
Query 20200211_190136_00018_w6yfz, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [1 rows, 8B] [5 rows/s, 42B/s]
Copy presto:default> select * from airlinestats limit 1;
flightnum | origin | quarter | lateaircraftdelay | divactualelapsedtime | divwheelsons | divwheelsoffs | airtime | arrdel15 | divtotalgtimes | deptimeblk | destcitymarketid | divairportseqids | dayssinceepoch | deptime | month | crselapsedtime | deststatename | carrier |
-----------+--------+---------+-------------------+----------------------+--------------+---------------+---------+----------+----------------+------------+------------------+------------------+----------------+---------+-------+----------------+---------------+---------+
122 | DFW | 1 | -2147483648 | -2147483648 | | | 202 | 0 | | 0700-0759 | 32457 | | 16088 | 715 | 1 | 235 | California | AA |
(1 row)
Query 20200211_185719_00007_w6yfz, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:02 [1 rows, 325B] [0 rows/s, 133B/s]
Meanwhile you can access Presto Cluster UI to see query stats.
Advanced features
Using Pinot Streaming/gRPC connector
For Pinot version [0.6.0, 0.10.0] and Presto Version [0.244, 0.277].
Pinot >=0.11.0 has enabled server gRPC by default;
Presto >=0.278 only supports gRPC for server queries, so this is supported by default.
Presto supports aggregation and predicate push down to Pinot. However, for certain queries that Pinot doesn't handle, Presto tries to fetch all the rows from the Pinot table segment by segment. This is definitely not an ideal access pattern for Pinot.
In order to support large data scanning, Pinot [0.6.0, 0.10.0] introduces a gRPC server for on-demand data scanning with a reasonably smaller memory footprint.
You can enable it by adding the below configs to the Pinot server config file:
Copy pinot.server.grpc.enable=true
pinot.server.grpc.port=8090
Then you can enable the streaming connector in Presto [0.244, 0.277] by adding the below config to the Pinot catalog configs.
Copy pinot.use-streaming-for-segment-queries=true
This config: pinot.use-streaming-for-segment-queries
is removed from prestodb 0.278, and will raise an error if you add it.
(Disclaimer: Presto is a third-party software that is not part of the Apache Software Foundation).
Last updated 7 months ago