Connect to Dash
In this Apache Pinot guide, we'll learn how visualize data using the Dash web framework.
In this guide you'll learn how to visualize data from Apache Pinot using Plotly's Dash web framework. Dash is the most downloaded, trusted Python framework for building ML & data science web apps.
We're going to use Dash to build a real-time dashboard to visualize the changes being made to Wikimedia properties.
Real-Time Dashboard Architecture
Startup components
We're going to use the following Docker compose file, which spins up instances of Zookeeper, Kafka, along with a Pinot controller, broker, and server:
version: '3.7'
services:
zookeeper:
image: zookeeper:3.5.6
container_name: "zookeeper-wiki"
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
kafka:
image: wurstmeister/kafka:latest
restart: unless-stopped
container_name: "kafka-wiki"
ports:
- "9092:9092"
expose:
- "9093"
depends_on:
- zookeeper
environment:
KAFKA_ZOOKEEPER_CONNECT: zookeeper-wiki:2181/kafka
KAFKA_BROKER_ID: 0
KAFKA_ADVERTISED_HOST_NAME: kafka-wiki
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-wiki:9093,OUTSIDE://localhost:9092
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,OUTSIDE:PLAINTEXT
pinot-controller:
image: apachepinot/pinot:0.10.0
command: "StartController -zkAddress zookeeper-wiki:2181 -dataDir /data"
container_name: "pinot-controller-wiki"
volumes:
- ./config:/config
- ./data:/data
restart: unless-stopped
ports:
- "9000:9000"
depends_on:
- zookeeper
pinot-broker:
image: apachepinot/pinot:0.10.0
command: "StartBroker -zkAddress zookeeper-wiki:2181"
restart: unless-stopped
container_name: "pinot-broker-wiki"
volumes:
- ./config:/config
ports:
- "8099:8099"
depends_on:
- pinot-controller
pinot-server:
image: apachepinot/pinot:0.10.0
command: "StartServer -zkAddress zookeeper-wiki:2181"
restart: unless-stopped
container_name: "pinot-server-wiki"
volumes:
- ./config:/config
depends_on:
- pinot-brokerdocker-compose.yml
Run the following command to launch all the components:
Wikimedia recent changes stream
Wikimedia provides provides a continuous stream of structured event data describing changes made to various Wikimedia properties. The events are published over HTTP using the Server-Side Events (SSE) Protocol.
You can find the endpoint at: stream.wikimedia.org/v2/stream/recentchange
We'll need to install the SSE client library to consume this data:
Next, create a file called wiki.py that contains the following:
wiki.py
The highlighted section shows how we connect to the recent changes feed using the SSE client library.
Let's run this script as shown below:
We'll see the following (truncated) output:
Output
Ingest recent changes into Kafka
Now we're going to import each of the events into Apache Kafka. First let's create a Kafka topic called wiki_events with 5 partitions:
Create a new file called wiki_to_kafka.py and import the following libraries:
wiki_to_kafka.py
Add these functions:
wiki_to_kafka.py
And now let's add the code that calls the recent changes API and imports events into the wiki_events topic:
wiki_to_kafka.py
The highlighted parts of this script indicate where events are ingested into Kafka and then flushed to disk.
If we run this script:
We'll see a message every time 100 messages are pushed to Kafka, as shown below:
Output
Explore Kafka
Let's check that the data has made its way into Kafka.
The following command returns the message offset for each partition in the wiki_events topic:
Output
Looks good. We can also stream all the messages in this topic by running the following command:
Output
Configure Pinot
Now let's configure Pinot to consume the data from Kafka.
We'll have the following schema:
schema.json
And the following table config:
table.json
The highlighted lines are how we connect Pinot to the Kafka topic that contains the events. Create the schema and table by running the following commnad:
Once you've done that, navigate to the Pinot UI and run the following query to check that the data has made its way into Pinot:
As long as you see some records, everything is working as expected.
Building a Dash Dashboard
Now let's write some more queries against Pinot and display the results in Dash.
First, install the following libraries:
Create a file called dashboard.py and import libraries and write a header for the page:
app.py
Connect to Pinot and write a query that returns recent changes, along with the users who made the changes, and domains where they were made:
app.py
The highlighted part of the query shows how to count the number of events from the last minute and the minute before that. We then do a similar thing to count the number of unique users and domains.
Metrics
Now let's create some metrics based on that data.
First, let's create a couple of helper functions for creating these metrics:
dash_utils.py
And now let's add the following import to app.py:
app.py
And the following code at the end of the file:
app.py
Go back to the terminal and run the following command:
Navigate to localhost:8501 to see the Dash app. You should see something like the following:
Dash Metrics
Changes per minute
Next, let's add a line chart that shows the number of changes being done to Wikimedia per minute. Update app.py as follows:
app.py
Go back to the web browser and you should see something like this:
Dash Time Series
Auto Refresh
At the moment we need to refresh our web browser to update the metrics and line chart, but it would be much better if that happened automatically. Let's now add auto refresh functionality.
This will require some restructuring of our application so that each component is rendered from a function annotated with a callback that causes the function to be called on an interval.
The app layout now looks like this:
app.py
interval-componentis configured to fire a callback every 1,000 milliseconds.latest-timestampis a container that will contain the latest timestamp.indicatorswill contain indicators with the latest counts of users, domains, and changes.time-serieswill contain the time series line chart.
The timestamp is refreshed by the following callback function:
app.py
The indicators are refreshed by this function:
app.py
And finally, the following function refreshes the line chart:
app.py
If we navigate back to our web browser, we'll see the following:
Dash Auto Refresh
The full script used in this example is shown below:
dashboard.py
Summary
In this guide we've learnt how to publish data into Kafka from Wikimedia's event stream, ingest it from there into Pinot, and finally make sense of the data using SQL queries run from Dash.
Last updated
Was this helpful?

