Pull docker image
Get the latest Docker image.
Copy export PINOT_VERSION=latest
export PINOT_IMAGE=apachepinot/pinot:${PINOT_VERSION}
docker pull ${PINOT_IMAGE}
Long Version
Set up the Pinot cluster
Follow the instructions in Advanced Pinot Setup to setup the Pinot cluster with the components:
Create a Kafka topic
Create a Kafka topic called pullRequestMergedEvents
for the demo.
Copy docker exec \
-t kafka \
/opt/kafka/bin/kafka-topics.sh \
--zookeeper pinot-zookeeper:2181/kafka \
--partitions=1 --replication-factor=1 \
--create --topic pullRequestMergedEvents
Add Pinot table and schema
The schema is present at examples/stream/githubEvents/pullRequestMergedEvents_schema.json
and is also pasted below
pullRequestMergedEvents_schema.json
Copy {
"schemaName" : "pullRequestMergedEvents" ,
"dimensionFieldSpecs" : [
{
"name" : "title" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
} ,
{
"name" : "labels" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "userId" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
} ,
{
"name" : "userType" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
} ,
{
"name" : "authorAssociation" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
} ,
{
"name" : "mergedBy" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
} ,
{
"name" : "assignees" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "authors" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "committers" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "requestedReviewers" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "requestedTeams" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "reviewers" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "commenters" ,
"dataType" : "STRING" ,
"singleValueField" : false ,
"defaultNullValue" : ""
} ,
{
"name" : "repo" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
} ,
{
"name" : "organization" ,
"dataType" : "STRING" ,
"defaultNullValue" : ""
}
] ,
"metricFieldSpecs" : [
{
"name" : "count" ,
"dataType" : "LONG" ,
"defaultNullValue" : 1
} ,
{
"name" : "numComments" ,
"dataType" : "LONG"
} ,
{
"name" : "numReviewComments" ,
"dataType" : "LONG"
} ,
{
"name" : "numCommits" ,
"dataType" : "LONG"
} ,
{
"name" : "numLinesAdded" ,
"dataType" : "LONG"
} ,
{
"name" : "numLinesDeleted" ,
"dataType" : "LONG"
} ,
{
"name" : "numFilesChanged" ,
"dataType" : "LONG"
} ,
{
"name" : "numAuthors" ,
"dataType" : "LONG"
} ,
{
"name" : "numCommitters" ,
"dataType" : "LONG"
} ,
{
"name" : "numReviewers" ,
"dataType" : "LONG"
} ,
{
"name" : "numCommenters" ,
"dataType" : "LONG"
} ,
{
"name" : "createdTimeMillis" ,
"dataType" : "LONG"
} ,
{
"name" : "elapsedTimeMillis" ,
"dataType" : "LONG"
}
] ,
"timeFieldSpec" : {
"incomingGranularitySpec" : {
"timeType" : "MILLISECONDS" ,
"timeFormat" : "EPOCH" ,
"dataType" : "LONG" ,
"name" : "mergedTimeMillis"
}
}
}
The table config is present at examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json
and is also pasted below.
Note
If you're setting this up on a pre-configured cluster, set the properties stream.kafka.zk.broker.url
and stream.kafka.broker.list
correctly, depending on the configuration of your Kafka cluster.
pullRequestMergedEvents_realtime_table_config.json
Copy {
"tableName" : "pullRequestMergedEvents" ,
"tableType" : "REALTIME" ,
"segmentsConfig" : {
"timeColumnName" : "mergedTimeMillis" ,
"timeType" : "MILLISECONDS" ,
"retentionTimeUnit" : "DAYS" ,
"retentionTimeValue" : "60" ,
"schemaName" : "pullRequestMergedEvents" ,
"replication" : "1" ,
"replicasPerPartition" : "1"
} ,
"tenants" : {} ,
"tableIndexConfig" : {
"loadMode" : "MMAP" ,
"invertedIndexColumns" : [
"organization" ,
"repo"
] ,
"streamConfigs" : {
"streamType" : "kafka" ,
"stream.kafka.consumer.type" : "simple" ,
"stream.kafka.topic.name" : "pullRequestMergedEvents" ,
"stream.kafka.decoder.class.name" : "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder" ,
"stream.kafka.consumer.factory.class.name" : "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory" ,
"stream.kafka.zk.broker.url" : "pinot-zookeeper:2181/kafka" ,
"stream.kafka.broker.list" : "kafka:9092" ,
"realtime.segment.flush.threshold.time" : "12h" ,
"realtime.segment.flush.threshold.size" : "100000" ,
"stream.kafka.consumer.prop.auto.offset.reset" : "smallest"
}
} ,
"metadata" : {
"customConfigs" : {}
}
}
Add the table and schema using the following command
Copy $ docker run \
--network=pinot-demo \
--name pinot-streaming-table-creation \
${PINOT_IMAGE} AddTable \
-schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json \
-tableConfigFile examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json \
-controllerHost pinot-controller \
-controllerPort 9000 \
-exec
Executing command: AddTable -tableConfigFile examples/stream/githubEvents/docker/pullRequestMergedEvents_realtime_table_config.json -schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json -controllerHost pinot-controller -controllerPort 9000 -exec
Sending request: http://pinot-controller:9000/schemas to controller: 20c241022a96, version: Unknown
{ "status" : "Table pullRequestMergedEvents_REALTIME succesfully added" }
Publish events
Start streaming GitHub events into the Kafka topic
Copy $ docker run --rm -ti \
--network=pinot-demo \
--name pinot-github-events-into-kafka \
-d ${PINOT_IMAGE} StreamGitHubEvents \
-schemaFile examples/stream/githubEvents/pullRequestMergedEvents_schema.json \
-topic pullRequestMergedEvents \
-personalAccessToken < your_github_personal_access_toke n > \
-kafkaBrokerList kafka:9092
Short Version
For a single command to setup all the above steps, use the following command. Make sure to stop any previous running Pinot services.
Copy $ docker run --rm -ti \
--network=pinot-demo \
--name pinot-github-events-quick-start \
${PINOT_IMAGE} GitHubEventsQuickStart \
-personalAccessToken < your_github_personal_access_toke n >