1 of 5

Recipes

Here you will find a collection of ready-made sample applications and examples for real-world data

Connect to Streamlit

In this Apache Pinot guide, we'll learn how visualize data using the Streamlit web framework.

In this guide you'll learn how to visualize data from Apache Pinot using . Streamlit is a Python library that makes it easy to build interactive data based web applications.

We're going to use Streamlit to build a real-time dashboard to visualize the changes being made to Wikimedia properties.

Real-Time Dashboard Architecture

Startup components

Connect to Dash

In this Apache Pinot guide, we'll learn how visualize data using the Dash web framework.

In this guide you'll learn how to visualize data from Apache Pinot using Plotly's Dash web framework. Dash is the most downloaded, trusted Python framework for building ML & data science web apps.

We're going to use Dash to build a real-time dashboard to visualize the changes being made to Wikimedia properties.

Real-Time Dashboard Architecture

Startup components

We're going to use the following Docker compose file, which spins up instances of Zookeeper, Kafka, along with a Pinot controller, broker, and server:

docker-compose.yml

Run the following command to launch all the components:

Wikimedia recent changes stream

Wikimedia provides provides a continuous stream of structured event data describing changes made to various Wikimedia properties. The events are published over HTTP using the Server-Side Events (SSE) Protocol.

You can find the endpoint at:

We'll need to install the SSE client library to consume this data:

Next, create a file called wiki.py that contains the following:

wiki.py

The highlighted section shows how we connect to the recent changes feed using the SSE client library.

Let's run this script as shown below:

We'll see the following (truncated) output:

Output

Ingest recent changes into Kafka

Now we're going to import each of the events into Apache Kafka. First let's create a Kafka topic called wiki_events with 5 partitions:

Create a new file called wiki_to_kafka.py and import the following libraries:

wiki_to_kafka.py

Add these functions:

wiki_to_kafka.py

And now let's add the code that calls the recent changes API and imports events into the wiki_events topic:

wiki_to_kafka.py

The highlighted parts of this script indicate where events are ingested into Kafka and then flushed to disk.

If we run this script:

We'll see a message every time 100 messages are pushed to Kafka, as shown below:

Output

Explore Kafka

Let's check that the data has made its way into Kafka.

The following command returns the message offset for each partition in the wiki_events topic:

Output

Looks good. We can also stream all the messages in this topic by running the following command:

Output

Configure Pinot

Now let's configure Pinot to consume the data from Kafka.

We'll have the following schema:

schema.json

And the following table config:

table.json

The highlighted lines are how we connect Pinot to the Kafka topic that contains the events. Create the schema and table by running the following commnad:

Once you've done that, navigate to the and run the following query to check that the data has made its way into Pinot:

As long as you see some records, everything is working as expected.

Building a Dash Dashboard

Now let's write some more queries against Pinot and display the results in Dash.

First, install the following libraries:

Create a file called dashboard.py and import libraries and write a header for the page:

app.py

Connect to Pinot and write a query that returns recent changes, along with the users who made the changes, and domains where they were made:

app.py

The highlighted part of the query shows how to count the number of events from the last minute and the minute before that. We then do a similar thing to count the number of unique users and domains.

Metrics

Now let's create some metrics based on that data.

First, let's create a couple of helper functions for creating these metrics:

dash_utils.py

And now let's add the following import to app.py:

app.py

And the following code at the end of the file:

app.py

Go back to the terminal and run the following command:

Navigate to to see the Dash app. You should see something like the following:

Dash Metrics

Changes per minute

Next, let's add a line chart that shows the number of changes being done to Wikimedia per minute. Update app.py as follows:

app.py

Go back to the web browser and you should see something like this:

Dash Time Series

Auto Refresh

At the moment we need to refresh our web browser to update the metrics and line chart, but it would be much better if that happened automatically. Let's now add auto refresh functionality.

This will require some restructuring of our application so that each component is rendered from a function annotated with a callback that causes the function to be called on an interval.

The app layout now looks like this:

app.py

interval-component is configured to fire a callback every 1,000 milliseconds.
latest-timestamp is a container that will contain the latest timestamp.
indicators will contain indicators with the latest counts of users, domains, and changes.

The timestamp is refreshed by the following callback function:

app.py

The indicators are refreshed by this function:

app.py

And finally, the following function refreshes the line chart:

app.py

If we navigate back to our web browser, we'll see the following:

Dash Auto Refresh

The full script used in this example is shown below:

dashboard.py

Summary

In this guide we've learnt how to publish data into Kafka from Wikimedia's event stream, ingest it from there into Pinot, and finally make sense of the data using SQL queries run from Dash.

Visualize data with Redash

Install Redash and start a running instance, following the Docker Based Developer Installation Guide.
Configure Redash to query Pinot, by doing the following:
1. Add pinotdb dependency
Create visualizations, by doing the following:

Add pinot db dependency

Apache Pinot provides a Python client library pinotdb to query Pinot from Python applications. Install pinotdb inside the Redash worker instance to make network calls to Pinot.

Navigate to the root directory where you’ve cloned Redash. Run the following command to get the name of the Redash worker container (by default, redash_worker_1):

docker-compose ps

Run the following command (change redash_worker_1 to your own Redash worker container name, if applicable):

Restart Docker.

Add Python data source for Pinot

In Redash, select Settings > Data Sources.
Select New Data Source, and then select Python from the list.
On the Redash Settings - Data Source page, add Pinot as the name of the data source, enter pinotdb

Start Pinot

Run the following command in a new terminal to spin up an Apache Pinot Docker container in the quick start mode with a baseball stats dataset built in.

Run a query in Redash

In Redash, select Queries > New Query, and then select the Python data source you created in .
Add Python code to query data. For more information, see the .
Click Execute to run the query and view results.

You can also include libraries like Pandas to perform more advanced data manipulation on Pinot’s data and visualize the output with Redash.

For more information, see in Redash documentation.

Example Python queries

Query top 10 teams by total runs

The following query connects to Pinot and queries the baseballStats table to retrieve the top ten players with the highest scores. The results are transformed into a dictionary format supported by Redash.

Query top 10 teams by total runs

Query total strikeouts by year

Add a visualization and dashboard in Redash

Add a visualization

In Redash, after you've ran your query, click the New Visualization tab, and select the type of visualization your want to create, for example, Bar Chart. The Visualization Editor appears with your chart.

For example, you may want to create a bar chart to view the top 10 players with highest scores.

You may want to create a line chart to view the total variation in strikeouts over time.

For more information, see .

Add a dashboard

Create a dashboard with one or more visualizations (widgets).

In Redash, go to Dashboards > New Dashboards.
Add the widgets to your dashboard. For example, by adding the three visualizations from the above, you create a Baseball stats dashboard.

For more information, see in the Redash documentation.

GitHub Events Stream

Steps for setting up a Pinot cluster and a real-time table which consumes from the GitHub events stream.

In this recipe you will set up an Apache Pinot cluster and a real-time table which consumes data flowing from a GitHub events stream. The stream is based on GitHub pull requests and uses Kafka.

In this recipe you will perform the following steps:

Set up a Pinot cluster, to do which you will:
a. Start zookeeper.
b. Start the controller.
c. Start the broker.
d. Start the server.
Set up a Kafka cluster.
Create a Kafka topic, which will be called pullRequestMergedEvents.
Create a real-time table called pullRequestMergedEvents and a schema.
Start a task which reads from the and publishes events about merged pull requests to the topic.
Query the real-time data.

Steps

Use either Docker images or launcher scripts

Pull the Docker image

Get the latest Docker image.

Long version

Set up the Pinot cluster

Follow the instructions in to set up a Pinot cluster with the components:

Kubernetes cluster

If you already have a Kubernetes cluster with Pinot and Kafka (see ), first create the topic, then set up the table and streaming using

Query

Browse to the to view the data.

Visualize with SuperSet

You can use SuperSet to visualize this data. Some of the interesting insights we captures were

List the most active organizations during the lockdown

Repositories by number of commits in the Apache organization

To integrate with SuperSet you can check out the page.

{ "schemaName": "pullRequestMergedEvents", "dimensionFieldSpecs": [ { "name": "title", "dataType": "STRING", "defaultNullValue": "" }, { "name": "labels", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "userId", "dataType": "STRING", "defaultNullValue": "" }, { "name": "userType", "dataType": "STRING", "defaultNullValue": "" }, { "name": "authorAssociation", "dataType": "STRING", "defaultNullValue": "" }, { "name": "mergedBy", "dataType": "STRING", "defaultNullValue": "" }, { "name": "assignees", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "authors", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "committers", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "requestedReviewers", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "requestedTeams", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "reviewers", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "commenters", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "repo", "dataType": "STRING", "defaultNullValue": "" }, { "name": "organization", "dataType": "STRING", "defaultNullValue": "" } ], "metricFieldSpecs": [ { "name": "count", "dataType": "LONG", "defaultNullValue": 1 }, { "name": "numComments", "dataType": "LONG" }, { "name": "numReviewComments", "dataType": "LONG" }, { "name": "numCommits", "dataType": "LONG" }, { "name": "numLinesAdded", "dataType": "LONG" }, { "name": "numLinesDeleted", "dataType": "LONG" }, { "name": "numFilesChanged", "dataType": "LONG" }, { "name": "numAuthors", "dataType": "LONG" }, { "name": "numCommitters", "dataType": "LONG" }, { "name": "numReviewers", "dataType": "LONG" }, { "name": "numCommenters", "dataType": "LONG" }, { "name": "createdTimeMillis", "dataType": "LONG" }, { "name": "elapsedTimeMillis", "dataType": "LONG" } ], "dateTimeFieldSpecs": [ { "name": "mergedTimeMillis", "dataType": "TIMESTAMP", "format": "1:MILLISECONDS:TIMESTAMP", "granularity": "1:MILLISECONDS" } ] }

{'$schema': '/mediawiki/recentchange/1.0.0',
 'bot': False,
 'comment': '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
            'Oslofjorden Norway (Protected coastal forest Recreational area '
            'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
            'archeological stone string) Vår (spring) 2021-04-24.jpg]] removed '
            'from category',
 'id': 1923506287,
 'meta': {'domain': 'commons.wikimedia.org',
          'dt': '2022-05-12T09:57:00Z',
          'id': '3800228e-43d8-440d-8034-c68977742653',
          'offset': 3855767440,
          'partition': 0,
          'request_id': '930b17cc-f14a-4656-afa1-d15b79a8f666',
          'stream': 'mediawiki.recentchange',
          'topic': 'eqiad.mediawiki.recentchange',
          'uri': 'https://commons.wikimedia.org/wiki/Category:Iron_Age_in_Norway'},
 'namespace': 14,
 'parsedcomment': '<a '
                  'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
                  'title="File:Storemyr-Fagerbakken landskapsvernområde '
                  'HVASSER Oslofjorden Norway (Protected coastal forest '
                  'Recreational area hiking trails) Rituell-kultisk '
                  'steinstreng sørøst i skogen (small archeological stone '
                  'string) Vår (spring) '
                  '2021-04-24.jpg">File:Storemyr-Fagerbakken '
                  'landskapsvernområde HVASSER Oslofjorden Norway (Protected '
                  'coastal forest Recreational area hiking trails) '
                  'Rituell-kultisk steinstreng sørøst i skogen (small '
                  'archeological stone string) Vår (spring) 2021-04-24.jpg</a> '
                  'removed from category',
 'server_name': 'commons.wikimedia.org',
 'server_script_path': '/w',
 'server_url': 'https://commons.wikimedia.org',
 'timestamp': 1652349420,
 'title': 'Category:Iron Age in Norway',
 'type': 'categorize',
 'user': 'Krg',
 'wiki': 'commonswiki'}
{'$schema': '/mediawiki/recentchange/1.0.0',
 'bot': False,
 'comment': '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
            'Oslofjorden Norway (Protected coastal forest Recreational area '
            'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
            'archeological stone string) Vår (spring) 2021-04-24.jpg]] removed '
            'from category',
 'id': 1923506289,
 'meta': {'domain': 'commons.wikimedia.org',
          'dt': '2022-05-12T09:57:00Z',
          'id': '2b819d20-beca-46a5-8ce3-b2f3b73d2cbe',
          'offset': 3855767441,
          'partition': 0,
          'request_id': '930b17cc-f14a-4656-afa1-d15b79a8f666',
          'stream': 'mediawiki.recentchange',
          'topic': 'eqiad.mediawiki.recentchange',
          'uri': 'https://commons.wikimedia.org/wiki/Category:Cultural_heritage_monuments_in_F%C3%A6rder'},
 'namespace': 14,
 'parsedcomment': '<a '
                  'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
                  'title="File:Storemyr-Fagerbakken landskapsvernområde '
                  'HVASSER Oslofjorden Norway (Protected coastal forest '
                  'Recreational area hiking trails) Rituell-kultisk '
                  'steinstreng sørøst i skogen (small archeological stone '
                  'string) Vår (spring) '
                  '2021-04-24.jpg">File:Storemyr-Fagerbakken '
                  'landskapsvernområde HVASSER Oslofjorden Norway (Protected '
                  'coastal forest Recreational area hiking trails) '
                  'Rituell-kultisk steinstreng sørøst i skogen (small '
                  'archeological stone string) Vår (spring) 2021-04-24.jpg</a> '
                  'removed from category',
 'server_name': 'commons.wikimedia.org',
 'server_script_path': '/w',
 'server_url': 'https://commons.wikimedia.org',
 'timestamp': 1652349420,
 'title': 'Category:Cultural heritage monuments in Færder',
 'type': 'categorize',
 'user': 'Krg',
 'wiki': 'commonswiki'}

... {"$schema": "/mediawiki/recentchange/1.0.0", "meta": {"uri": "https://en.wikipedia.org/wiki/Super_Wings", "request_id": "6f82e64d-220f-41f4-88c3-2e15f03ae504", "id": "c30cd735-1ead-405e-94d1-49fbe7c40411", "dt": "2022-05-12T10:05:36Z", "domain": "en.wikipedia.org", "stream": "mediawiki.recentchange", "topic": "eqiad.mediawiki.recentchange", "partition": 0, "offset": 3855779703}, "type": "log", "namespace": 0, "title": "Super Wings", "comment": "", "timestamp": 1652349936, "user": "2001:448A:50E0:885B:FD1D:2D04:233E:7647", "bot": false, "log_id": 0, "log_type": "abusefilter", "log_action": "hit", "log_params": {"action": "edit", "filter": "550", "actions": "tag", "log": 32575794}, "log_action_comment": "2001:448A:50E0:885B:FD1D:2D04:233E:7647 triggered [[Special:AbuseFilter/550|filter 550]], performing the action \"edit\" on [[Super Wings]]. Actions taken: Tag ([[Special:AbuseLog/32575794|details]])", "server_url": "https://en.wikipedia.org", "server_name": "en.wikipedia.org", "server_script_path": "/w", "wiki": "enwiki", "parsedcomment": ""} {"$schema": "/mediawiki/recentchange/1.0.0", "meta": {"uri": "https://no.wikipedia.org/wiki/Brukerdiskusjon:Haros", "request_id": "a20c9692-f301-4faf-9373-669bebbffff4", "id": "566ee63e-8e86-4a7e-a1f3-562704306509", "dt": "2022-05-12T10:05:36Z", "domain": "no.wikipedia.org", "stream": "mediawiki.recentchange", "topic": "eqiad.mediawiki.recentchange", "partition": 0, "offset": 3855779714}, "id": 84572581, "type": "edit", "namespace": 3, "title": "Brukerdiskusjon:Haros", "comment": "/* Stor forbokstav / ucfirst */", "timestamp": 1652349936, "user": "Asav", "bot": false, "minor": false, "patrolled": true, "length": {"old": 110378, "new": 110380}, "revision": {"old": 22579494, "new": 22579495}, "server_url": "https://no.wikipedia.org", "server_name": "no.wikipedia.org", "server_script_path": "/w", "wiki": "nowiki", "parsedcomment": "<a href=\"/wiki/Brukerdiskusjon:Haros#Stor_forbokstav_/_ucfirst\" title=\"Brukerdiskusjon:Haros\">→‎Stor forbokstav / ucfirst</a>"} {"$schema": "/mediawiki/recentchange/1.0.0", "meta": {"uri": "https://es.wikipedia.org/wiki/Campo_de_la_calle_Industria", "request_id": "d45bd9af-3e2c-4aac-ae8f-e16d3340da76", "id": "7fb3956e-9bd2-4fa5-8659-72b266cdb45b", "dt": "2022-05-12T10:05:35Z", "domain": "es.wikipedia.org", "stream": "mediawiki.recentchange", "topic": "eqiad.mediawiki.recentchange", "partition": 0, "offset": 3855779718}, "id": 266270269, "type": "edit", "namespace": 0, "title": "Campo de la calle Industria", "comment": "/* Historia */", "timestamp": 1652349935, "user": "Raimon will", "bot": false, "minor": false, "length": {"old": 7566, "new": 7566}, "revision": {"old": 143485393, "new": 143485422}, "server_url": "https://es.wikipedia.org", "server_name": "es.wikipedia.org", "server_script_path": "/w", "wiki": "eswiki", "parsedcomment": "<a href=\"/wiki/Campo_de_la_calle_Industria#Historia\" title=\"Campo de la calle Industria\">→‎Historia</a>"} ^CProcessed a total of 269 messages

{ "tableName": "wikievents", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "ts", "schemaName": "wikipedia", "replication": "1", "replicasPerPartition": "1" }, "tableIndexConfig": { "invertedIndexColumns": [], "rangeIndexColumns": [], "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [], "bloomFilterColumns": [], "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "wiki_events", "stream.kafka.broker.list": "kafka-wiki:9093", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "realtime.segment.flush.threshold.rows": "1000", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "100M" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "noDictionaryColumns": [], "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false }, "metadata": {}, "quota": {}, "routing": {}, "query": {}, "ingestionConfig": { "transformConfigs": [ { "columnName": "metaJson", "transformFunction": "JSONFORMAT(meta)" }, { "columnName": "id", "transformFunction": "JSONPATH(metaJson, '$.id')" }, { "columnName": "stream", "transformFunction": "JSONPATH(metaJson, '$.stream')" }, { "columnName": "domain", "transformFunction": "JSONPATH(metaJson, '$.domain')" }, { "columnName": "topic", "transformFunction": "JSONPATH(metaJson, '$.topic')" }, { "columnName": "uri", "transformFunction": "JSONPATH(metaJson, '$.uri')" }, { "columnName": "ts", "transformFunction": "\"timestamp\" * 1000" } ] }, "isDimTable": false }

import pandas as pd import streamlit as st from pinotdb import connect from datetime import datetime import plotly.express as px import time st.set_page_config(layout="wide") conn = connect(host='localhost', port=8099, path='/query/sql', scheme='http') st.header("Wikipedia Recent Changes") now = datetime.now() dt_string = now.strftime("%d %B %Y %H:%M:%S") st.write(f"Last update: {dt_string}") # Use session state to keep track of whether we need to auto refresh the page and the refresh frequency if not "sleep_time" in st.session_state: st.session_state.sleep_time = 2 if not "auto_refresh" in st.session_state: st.session_state.auto_refresh = True auto_refresh = st.checkbox('Auto Refresh?', st.session_state.auto_refresh) if auto_refresh: number = st.number_input('Refresh rate in seconds', value=st.session_state.sleep_time) st.session_state.sleep_time = number # Find changes that happened in the last 1 minute # Find changes that happened between 1 and 2 minutes ago query = """ select count(*) FILTER(WHERE ts > ago('PT1M')) AS events1Min, count(*) FILTER(WHERE ts <= ago('PT1M') AND ts > ago('PT2M')) AS events1Min2Min, distinctcount(user) FILTER(WHERE ts > ago('PT1M')) AS users1Min, distinctcount(user) FILTER(WHERE ts <= ago('PT1M') AND ts > ago('PT2M')) AS users1Min2Min, distinctcount(domain) FILTER(WHERE ts > ago('PT1M')) AS domains1Min, distinctcount(domain) FILTER(WHERE ts <= ago('PT1M') AND ts > ago('PT2M')) AS domains1Min2Min from wikievents where ts > ago('PT2M') limit 1 """ curs = conn.cursor() curs.execute(query) df_summary = pd.DataFrame(curs, columns=[item[0] for item in curs.description]) metric1, metric2, metric3 = st.columns(3) metric1.metric( label="Changes", value=df_summary['events1Min'].values[0], delta=float(df_summary['events1Min'].values[0] - df_summary['events1Min2Min'].values[0]) ) metric2.metric( label="Users", value=df_summary['users1Min'].values[0], delta=float(df_summary['users1Min'].values[0] - df_summary['users1Min2Min'].values[0]) ) metric3.metric( label="Domains", value=df_summary['domains1Min'].values[0], delta=float(df_summary['domains1Min'].values[0] - df_summary['domains1Min2Min'].values[0]) ) # Find all the changes by minute in the last hour query = """ select ToDateTime(DATETRUNC('minute', ts), 'yyyy-MM-dd hh:mm:ss') AS dateMin, count(*) AS changes, distinctcount(user) AS users, distinctcount(domain) AS domains from wikievents where ts > ago('PT10M') group by dateMin order by dateMin desc LIMIT 30 """ curs.execute(query) df_ts = pd.DataFrame(curs, columns=[item[0] for item in curs.description]) df_ts_melt = pd.melt(df_ts, id_vars=['dateMin'], value_vars=['changes', 'users', 'domains']) fig = px.line(df_ts_melt, x='dateMin', y="value", color='variable', color_discrete_sequence =['blue', 'red', 'green']) fig['layout'].update(margin=dict(l=0,r=0,b=0,t=40), title="Changes/Users/Domains per minute") fig.update_yaxes(range=[0, df_ts["changes"].max() * 1.1]) st.plotly_chart(fig, use_container_width=True) # Refresh the page if auto_refresh: time.sleep(number) st.experimental_rerun()

{'$schema': '/mediawiki/recentchange/1.0.0',
 'bot': False,
 'comment': '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
            'Oslofjorden Norway (Protected coastal forest Recreational area '
            'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
            'archeological stone string) Vår (spring) 2021-04-24.jpg]] removed '
            'from category',
 'id': 1923506287,
 'meta': {'domain': 'commons.wikimedia.org',
          'dt': '2022-05-12T09:57:00Z',
          'id': '3800228e-43d8-440d-8034-c68977742653',
          'offset': 3855767440,
          'partition': 0,
          'request_id': '930b17cc-f14a-4656-afa1-d15b79a8f666',
          'stream': 'mediawiki.recentchange',
          'topic': 'eqiad.mediawiki.recentchange',
          'uri': 'https://commons.wikimedia.org/wiki/Category:Iron_Age_in_Norway'},
 'namespace': 14,
 'parsedcomment': '<a '
                  'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
                  'title="File:Storemyr-Fagerbakken landskapsvernområde '
                  'HVASSER Oslofjorden Norway (Protected coastal forest '
                  'Recreational area hiking trails) Rituell-kultisk '
                  'steinstreng sørøst i skogen (small archeological stone '
                  'string) Vår (spring) '
                  '2021-04-24.jpg">File:Storemyr-Fagerbakken '
                  'landskapsvernområde HVASSER Oslofjorden Norway (Protected '
                  'coastal forest Recreational area hiking trails) '
                  'Rituell-kultisk steinstreng sørøst i skogen (small '
                  'archeological stone string) Vår (spring) 2021-04-24.jpg</a> '
                  'removed from category',
 'server_name': 'commons.wikimedia.org',
 'server_script_path': '/w',
 'server_url': 'https://commons.wikimedia.org',
 'timestamp': 1652349420,
 'title': 'Category:Iron Age in Norway',
 'type': 'categorize',
 'user': 'Krg',
 'wiki': 'commonswiki'}
{'$schema': '/mediawiki/recentchange/1.0.0',
 'bot': False,
 'comment': '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
            'Oslofjorden Norway (Protected coastal forest Recreational area '
            'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
            'archeological stone string) Vår (spring) 2021-04-24.jpg]] removed '
            'from category',
 'id': 1923506289,
 'meta': {'domain': 'commons.wikimedia.org',
          'dt': '2022-05-12T09:57:00Z',
          'id': '2b819d20-beca-46a5-8ce3-b2f3b73d2cbe',
          'offset': 3855767441,
          'partition': 0,
          'request_id': '930b17cc-f14a-4656-afa1-d15b79a8f666',
          'stream': 'mediawiki.recentchange',
          'topic': 'eqiad.mediawiki.recentchange',
          'uri': 'https://commons.wikimedia.org/wiki/Category:Cultural_heritage_monuments_in_F%C3%A6rder'},
 'namespace': 14,
 'parsedcomment': '<a '
                  'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
                  'title="File:Storemyr-Fagerbakken landskapsvernområde '
                  'HVASSER Oslofjorden Norway (Protected coastal forest '
                  'Recreational area hiking trails) Rituell-kultisk '
                  'steinstreng sørøst i skogen (small archeological stone '
                  'string) Vår (spring) '
                  '2021-04-24.jpg">File:Storemyr-Fagerbakken '
                  'landskapsvernområde HVASSER Oslofjorden Norway (Protected '
                  'coastal forest Recreational area hiking trails) '
                  'Rituell-kultisk steinstreng sørøst i skogen (small '
                  'archeological stone string) Vår (spring) 2021-04-24.jpg</a> '
                  'removed from category',
 'server_name': 'commons.wikimedia.org',
 'server_script_path': '/w',
 'server_url': 'https://commons.wikimedia.org',
 'timestamp': 1652349420,
 'title': 'Category:Cultural heritage monuments in Færder',
 'type': 'categorize',
 'user': 'Krg',
 'wiki': 'commonswiki'}

{ "schemaName": "pullRequestMergedEvents", "dimensionFieldSpecs": [ { "name": "title", "dataType": "STRING", "defaultNullValue": "" }, { "name": "labels", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "userId", "dataType": "STRING", "defaultNullValue": "" }, { "name": "userType", "dataType": "STRING", "defaultNullValue": "" }, { "name": "authorAssociation", "dataType": "STRING", "defaultNullValue": "" }, { "name": "mergedBy", "dataType": "STRING", "defaultNullValue": "" }, { "name": "assignees", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "authors", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "committers", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "requestedReviewers", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "requestedTeams", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "reviewers", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "commenters", "dataType": "STRING", "singleValueField": false, "defaultNullValue": "" }, { "name": "repo", "dataType": "STRING", "defaultNullValue": "" }, { "name": "organization", "dataType": "STRING", "defaultNullValue": "" } ], "metricFieldSpecs": [ { "name": "count", "dataType": "LONG", "defaultNullValue": 1 }, { "name": "numComments", "dataType": "LONG" }, { "name": "numReviewComments", "dataType": "LONG" }, { "name": "numCommits", "dataType": "LONG" }, { "name": "numLinesAdded", "dataType": "LONG" }, { "name": "numLinesDeleted", "dataType": "LONG" }, { "name": "numFilesChanged", "dataType": "LONG" }, { "name": "numAuthors", "dataType": "LONG" }, { "name": "numCommitters", "dataType": "LONG" }, { "name": "numReviewers", "dataType": "LONG" }, { "name": "numCommenters", "dataType": "LONG" }, { "name": "createdTimeMillis", "dataType": "LONG" }, { "name": "elapsedTimeMillis", "dataType": "LONG" } ], "timeFieldSpec": { "incomingGranularitySpec": { "timeType": "MILLISECONDS", "timeFormat": "EPOCH", "dataType": "LONG", "name": "mergedTimeMillis" } } }

Visualize data with Redash

Install Redash and start a running instance, following the Docker Based Developer Installation Guide.
Configure Redash to query Pinot, by doing the following:
1. Add pinotdb dependency
Create visualizations, by doing the following:

Add pinot db dependency

Apache Pinot provides a Python client library pinotdb to query Pinot from Python applications. Install pinotdb inside the Redash worker instance to make network calls to Pinot.

Navigate to the root directory where you’ve cloned Redash. Run the following command to get the name of the Redash worker container (by default, redash_worker_1):

docker-compose ps

Run the following command (change redash_worker_1 to your own Redash worker container name, if applicable):

Restart Docker.

Add Python data source for Pinot

In Redash, select Settings > Data Sources.
Select New Data Source, and then select Python from the list.
On the Redash Settings - Data Source page, add Pinot as the name of the data source, enter pinotdb

Start Pinot

Run the following command in a new terminal to spin up an Apache Pinot Docker container in the quick start mode with a baseball stats dataset built in.

Run a query in Redash

In Redash, select Queries > New Query, and then select the Python data source you created in .
Add Python code to query data. For more information, see the .
Click Execute to run the query and view results.

You can also include libraries like Pandas to perform more advanced data manipulation on Pinot’s data and visualize the output with Redash.

For more information, see in Redash documentation.

Example Python queries

Query top 10 teams by total runs

Query total strikeouts by year

Add a visualization and dashboard in Redash

Add a visualization

For example, you may want to create a bar chart to view the top 10 players with highest scores.

You may want to create a line chart to view the total variation in strikeouts over time.

For more information, see .

Add a dashboard

Create a dashboard with one or more visualizations (widgets).

In Redash, go to Dashboards > New Dashboards.
Add the widgets to your dashboard. For example, by adding the three visualizations from the above, you create a Baseball stats dashboard.

For more information, see in the Redash documentation.

Connect to Dash

In this Apache Pinot guide, we'll learn how visualize data using the Dash web framework.

In this guide you'll learn how to visualize data from Apache Pinot using Plotly's Dash web framework. Dash is the most downloaded, trusted Python framework for building ML & data science web apps.

We're going to use Dash to build a real-time dashboard to visualize the changes being made to Wikimedia properties.

Real-Time Dashboard Architecture

Startup components

We're going to use the following Docker compose file, which spins up instances of Zookeeper, Kafka, along with a Pinot controller, broker, and server:

docker-compose.yml

Run the following command to launch all the components:

Wikimedia recent changes stream

You can find the endpoint at:

We'll need to install the SSE client library to consume this data:

Next, create a file called wiki.py that contains the following:

wiki.py

The highlighted section shows how we connect to the recent changes feed using the SSE client library.

Let's run this script as shown below:

We'll see the following (truncated) output:

Output

Ingest recent changes into Kafka

Now we're going to import each of the events into Apache Kafka. First let's create a Kafka topic called wiki_events with 5 partitions:

Create a new file called wiki_to_kafka.py and import the following libraries:

wiki_to_kafka.py

Add these functions:

wiki_to_kafka.py

And now let's add the code that calls the recent changes API and imports events into the wiki_events topic:

wiki_to_kafka.py

The highlighted parts of this script indicate where events are ingested into Kafka and then flushed to disk.

If we run this script:

We'll see a message every time 100 messages are pushed to Kafka, as shown below:

Output

Explore Kafka

Let's check that the data has made its way into Kafka.

The following command returns the message offset for each partition in the wiki_events topic:

Output

Looks good. We can also stream all the messages in this topic by running the following command:

Output

Configure Pinot

Now let's configure Pinot to consume the data from Kafka.

We'll have the following schema:

schema.json

And the following table config:

table.json

The highlighted lines are how we connect Pinot to the Kafka topic that contains the events. Create the schema and table by running the following commnad:

Once you've done that, navigate to the and run the following query to check that the data has made its way into Pinot:

As long as you see some records, everything is working as expected.

Building a Dash Dashboard

Now let's write some more queries against Pinot and display the results in Dash.

First, install the following libraries:

Create a file called dashboard.py and import libraries and write a header for the page:

app.py

Connect to Pinot and write a query that returns recent changes, along with the users who made the changes, and domains where they were made:

app.py

The highlighted part of the query shows how to count the number of events from the last minute and the minute before that. We then do a similar thing to count the number of unique users and domains.

Metrics

Now let's create some metrics based on that data.

First, let's create a couple of helper functions for creating these metrics:

dash_utils.py

And now let's add the following import to app.py:

app.py

And the following code at the end of the file:

app.py

Go back to the terminal and run the following command:

Navigate to to see the Dash app. You should see something like the following:

Dash Metrics

Changes per minute

Next, let's add a line chart that shows the number of changes being done to Wikimedia per minute. Update app.py as follows:

app.py

Go back to the web browser and you should see something like this:

Dash Time Series

Auto Refresh

At the moment we need to refresh our web browser to update the metrics and line chart, but it would be much better if that happened automatically. Let's now add auto refresh functionality.

This will require some restructuring of our application so that each component is rendered from a function annotated with a callback that causes the function to be called on an interval.

The app layout now looks like this:

app.py

interval-component is configured to fire a callback every 1,000 milliseconds.
latest-timestamp is a container that will contain the latest timestamp.
indicators will contain indicators with the latest counts of users, domains, and changes.

The timestamp is refreshed by the following callback function:

app.py

The indicators are refreshed by this function:

app.py

And finally, the following function refreshes the line chart:

app.py

If we navigate back to our web browser, we'll see the following:

Dash Auto Refresh

The full script used in this example is shown below:

dashboard.py

Summary

In this guide we've learnt how to publish data into Kafka from Wikimedia's event stream, ingest it from there into Pinot, and finally make sense of the data using SQL queries run from Dash.

{'$schema': '/mediawiki/recentchange/1.0.0',
 'bot': False,
 'comment': '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
            'Oslofjorden Norway (Protected coastal forest Recreational area '
            'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
            'archeological stone string) Vår (spring) 2021-04-24.jpg]] removed '
            'from category',
 'id': 1923506287,
 'meta': {'domain': 'commons.wikimedia.org',
          'dt': '2022-05-12T09:57:00Z',
          'id': '3800228e-43d8-440d-8034-c68977742653',
          'offset': 3855767440,
          'partition': 0,
          'request_id': '930b17cc-f14a-4656-afa1-d15b79a8f666',
          'stream': 'mediawiki.recentchange',
          'topic': 'eqiad.mediawiki.recentchange',
          'uri': 'https://commons.wikimedia.org/wiki/Category:Iron_Age_in_Norway'},
 'namespace': 14,
 'parsedcomment': '<a '
                  'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
                  'title="File:Storemyr-Fagerbakken landskapsvernområde '
                  'HVASSER Oslofjorden Norway (Protected coastal forest '
                  'Recreational area hiking trails) Rituell-kultisk '
                  'steinstreng sørøst i skogen (small archeological stone '
                  'string) Vår (spring) '
                  '2021-04-24.jpg">File:Storemyr-Fagerbakken '
                  'landskapsvernområde HVASSER Oslofjorden Norway (Protected '
                  'coastal forest Recreational area hiking trails) '
                  'Rituell-kultisk steinstreng sørøst i skogen (small '
                  'archeological stone string) Vår (spring) 2021-04-24.jpg</a> '
                  'removed from category',
 'server_name': 'commons.wikimedia.org',
 'server_script_path': '/w',
 'server_url': 'https://commons.wikimedia.org',
 'timestamp': 1652349420,
 'title': 'Category:Iron Age in Norway',
 'type': 'categorize',
 'user': 'Krg',
 'wiki': 'commonswiki'}
{'$schema': '/mediawiki/recentchange/1.0.0',
 'bot': False,
 'comment': '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
            'Oslofjorden Norway (Protected coastal forest Recreational area '
            'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
            'archeological stone string) Vår (spring) 2021-04-24.jpg]] removed '
            'from category',
 'id': 1923506289,
 'meta': {'domain': 'commons.wikimedia.org',
          'dt': '2022-05-12T09:57:00Z',
          'id': '2b819d20-beca-46a5-8ce3-b2f3b73d2cbe',
          'offset': 3855767441,
          'partition': 0,
          'request_id': '930b17cc-f14a-4656-afa1-d15b79a8f666',
          'stream': 'mediawiki.recentchange',
          'topic': 'eqiad.mediawiki.recentchange',
          'uri': 'https://commons.wikimedia.org/wiki/Category:Cultural_heritage_monuments_in_F%C3%A6rder'},
 'namespace': 14,
 'parsedcomment': '<a '
                  'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
                  'title="File:Storemyr-Fagerbakken landskapsvernområde '
                  'HVASSER Oslofjorden Norway (Protected coastal forest '
                  'Recreational area hiking trails) Rituell-kultisk '
                  'steinstreng sørøst i skogen (small archeological stone '
                  'string) Vår (spring) '
                  '2021-04-24.jpg">File:Storemyr-Fagerbakken '
                  'landskapsvernområde HVASSER Oslofjorden Norway (Protected '
                  'coastal forest Recreational area hiking trails) '
                  'Rituell-kultisk steinstreng sørøst i skogen (small '
                  'archeological stone string) Vår (spring) 2021-04-24.jpg</a> '
                  'removed from category',
 'server_name': 'commons.wikimedia.org',
 'server_script_path': '/w',
 'server_url': 'https://commons.wikimedia.org',
 'timestamp': 1652349420,
 'title': 'Category:Cultural heritage monuments in Færder',
 'type': 'categorize',
 'user': 'Krg',
 'wiki': 'commonswiki'}

import pandas as pd from dash import Dash, html, dash_table, dcc, Input, Output import plotly.graph_objects as go from pinotdb import connect from dash_utils import add_delta_trace, add_trace import plotly.express as px import datetime external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css'] app = Dash(__name__, external_stylesheets=external_stylesheets) app.title = "Wiki Recent Changes Dashboard" connection = connect(host="localhost", port="8099", path="/query/sql", scheme=( "http")) @app.callback(Output(component_id='indicators', component_property='figure'), Input('interval-component', 'n_intervals')) def indicators(n): query = """ select count(*) FILTER(WHERE ts > ago('PT1M')) AS events1Min, count(*) FILTER(WHERE ts <= ago('PT1M') AND ts > ago('PT2M')) AS events1Min2Min, distinctcount(user) FILTER(WHERE ts > ago('PT1M')) AS users1Min, distinctcount(user) FILTER(WHERE ts <= ago('PT1M') AND ts > ago('PT2M')) AS users1Min2Min, distinctcount(domain) FILTER(WHERE ts > ago('PT1M')) AS domains1Min, distinctcount(domain) FILTER(WHERE ts <= ago('PT1M') AND ts > ago('PT2M')) AS domains1Min2Min from wikievents where ts > ago('PT2M') limit 1 """ curs = connection.cursor() curs.execute(query) df_summary = pd.DataFrame(curs, columns=[item[0] for item in curs.description]) curs.close() fig = go.Figure(layout=go.Layout(height=300)) if df_summary["events1Min"][0] > 0: if df_summary["events1Min"][0] > 0: add_delta_trace(fig, "Changes", df_summary["events1Min"][0], df_summary["events1Min2Min"][0], 0, 0) add_delta_trace(fig, "Users", df_summary["users1Min"][0], df_summary["users1Min2Min"][0], 0, 1) add_delta_trace(fig, "Domain", df_summary["domains1Min"][0], df_summary["domains1Min2Min"][0], 0, 2) else: add_trace(fig, "Changes", df_summary["events1Min"][0], 0, 0) add_trace(fig, "Users", df_summary["users1Min2Min"][0], 0, 1) add_trace(fig, "Domains", df_summary["domains1Min2Min"][0], 0, 2) fig.update_layout(grid = {"rows": 1, "columns": 3, 'pattern': "independent"},) else: fig.update_layout(annotations = [{"text": "No events found", "xref": "paper", "yref": "paper", "showarrow": False, "font": {"size": 28}}]) return fig @app.callback(Output(component_id='time-series', component_property='figure'), Input('interval-component', 'n_intervals')) def time_series(n): query = """ select ToDateTime(DATETRUNC('minute', ts), 'yyyy-MM-dd hh:mm:ss') AS dateMin, count(*) AS changes, distinctcount(user) AS users, distinctcount(domain) AS domains from wikievents where ts > ago('PT1H') group by dateMin order by dateMin desc LIMIT 30 """ curs = connection.cursor() curs.execute(query) df_ts = pd.DataFrame(curs, columns=[item[0] for item in curs.description]) curs.close() df_ts_melt = pd.melt(df_ts, id_vars=['dateMin'], value_vars=['changes', 'users', 'domains']) line_chart = px.line(df_ts_melt, x='dateMin', y="value", color='variable', color_discrete_sequence =['blue', 'red', 'green']) line_chart['layout'].update(margin=dict(l=0,r=0,b=0,t=40), title="Changes/Users/Domains per minute") line_chart.update_yaxes(range=[0, df_ts["changes"].max() * 1.1]) return line_chart @app.callback( Output(component_id='latest-timestamp', component_property='children'), Input('interval-component', 'n_intervals')) def timestamp(n): return html.Span(f"Last updated: {datetime.datetime.now()}") app.layout = html.Div([ html.H1("Wiki Recent Changes Dashboard", style={'text-align': 'center'}), html.Div(id='latest-timestamp', style={"padding": "5px 0", "text-align": "center"}), dcc.Interval( id='interval-component', interval=1 * 1000, n_intervals=0 ), html.Div(id='content', children=[ dcc.Graph(id="indicators"), dcc.Graph(id="time-series"), ]) ]) if __name__ == '__main__': app.run_server(debug=True)

Recipes

Connect to Streamlit

hashtagStartup components

Connect to Dash

hashtagStartup components

hashtagWikimedia recent changes stream

hashtagIngest recent changes into Kafka

hashtagExplore Kafka

hashtagConfigure Pinot

hashtagBuilding a Dash Dashboard

hashtagMetrics

hashtagChanges per minute

hashtagAuto Refresh

hashtagSummary

Visualize data with Redash

hashtagAdd pinot db dependency

hashtagAdd Python data source for Pinot

hashtagStart Pinot

hashtagRun a query in Redash

hashtagExample Python queries

hashtagQuery top 10 teams by total runs

hashtagQuery top 10 teams by total runs

hashtagQuery total strikeouts by year

hashtagAdd a visualization and dashboard in Redash

hashtagAdd a visualization

hashtagAdd a dashboard

GitHub Events Stream

hashtagSteps

hashtagUse either Docker images or launcher scripts

hashtagKubernetes cluster

hashtagQuery

hashtagVisualize with SuperSet

hashtagList the most active organizations during the lockdown

Recipes

GitHub Events Stream

hashtagSteps

hashtagUse either Docker images or launcher scripts

hashtagKubernetes cluster

hashtagQuery

hashtagVisualize with SuperSet

hashtagList the most active organizations during the lockdown

Connect to Streamlit

hashtagStartup components

hashtagWikimedia recent changes stream

hashtagIngest recent changes into Kafka

hashtagExplore Kafka

hashtagConfigure Pinot

hashtagBuilding a Streamlit Dashboard

hashtagMetrics

hashtagChanges per minute

hashtagAuto Refresh

hashtagSummary

Visualize data with Redash

hashtagAdd pinot db dependency

hashtagAdd Python data source for Pinot

hashtagStart Pinot

hashtagRun a query in Redash

hashtagExample Python queries

hashtagQuery top 10 teams by total runs

hashtagQuery top 10 teams by total runs

hashtagQuery total strikeouts by year

hashtagAdd a visualization and dashboard in Redash

hashtagAdd a visualization

hashtagAdd a dashboard

Connect to Dash

hashtagStartup components

hashtagWikimedia recent changes stream

hashtagIngest recent changes into Kafka

hashtagExplore Kafka

hashtagConfigure Pinot

hashtagBuilding a Dash Dashboard

hashtagMetrics

hashtagChanges per minute

hashtagAuto Refresh

hashtagSummary

Startup components

Startup components

Wikimedia recent changes stream

Ingest recent changes into Kafka

Explore Kafka

Configure Pinot

Building a Dash Dashboard

Metrics

Changes per minute

Auto Refresh

Summary

Add pinot db dependency

Add Python data source for Pinot

Start Pinot

Run a query in Redash

Example Python queries

Query top 10 teams by total runs

Query top 10 teams by total runs

Query total strikeouts by year

Add a visualization and dashboard in Redash

Add a visualization

Add a dashboard

Steps

Use either Docker images or launcher scripts

Kubernetes cluster

Query

Visualize with SuperSet

List the most active organizations during the lockdown

Steps

Use either Docker images or launcher scripts

Kubernetes cluster

Query

Visualize with SuperSet

List the most active organizations during the lockdown

Startup components

Wikimedia recent changes stream

Ingest recent changes into Kafka

Explore Kafka

Configure Pinot

Building a Streamlit Dashboard

Metrics

Changes per minute

Auto Refresh

Summary

Add pinot db dependency

Add Python data source for Pinot

Start Pinot

Run a query in Redash

Example Python queries

Query top 10 teams by total runs

Query top 10 teams by total runs

Query total strikeouts by year

Add a visualization and dashboard in Redash

Add a visualization

Add a dashboard

Startup components

Wikimedia recent changes stream

Ingest recent changes into Kafka

Explore Kafka

Configure Pinot

Building a Dash Dashboard

Metrics

Changes per minute

Auto Refresh

Summary