Connect to Streamlit In this Apache Pinot guide, we'll learn how visualize data using the Streamlit web framework.
We're going to use Streamlit to build a real-time dashboard to visualize the changes being made to Wikimedia properties.
Real-Time Dashboard Architecture
Startup components
We're going to use the following Docker compose file, which spins up instances of Zookeeper, Kafka, along with a Pinot controller, broker, and server:
Copy version : '3.7'
services :
zookeeper :
image : zookeeper:3.5.6
container_name : "zookeeper-wiki"
ports :
- "2181:2181"
environment :
kafka :
image : wurstmeister/kafka:latest
restart : unless-stopped
container_name : "kafka-wiki"
ports :
- "9092:9092"
expose :
- "9093"
depends_on :
- zookeeper
environment :
KAFKA_ZOOKEEPER_CONNECT : zookeeper-wiki:2181/kafka
KAFKA_ADVERTISED_LISTENERS : PLAINTEXT://kafka-wiki:9093,OUTSIDE://localhost:9092
pinot-controller :
image : apachepinot/pinot:0.10.0
command : "StartController -zkAddress zookeeper-wiki:2181 -dataDir /data"
container_name : "pinot-controller-wiki"
volumes :
- ./config:/config
- ./data:/data
restart : unless-stopped
ports :
- "9000:9000"
depends_on :
- zookeeper
pinot-broker :
image : apachepinot/pinot:0.10.0
command : "StartBroker -zkAddress zookeeper-wiki:2181"
restart : unless-stopped
container_name : "pinot-broker-wiki"
volumes :
- ./config:/config
ports :
- "8099:8099"
depends_on :
- pinot-controller
pinot-server :
image : apachepinot/pinot:0.10.0
command : "StartServer -zkAddress zookeeper-wiki:2181"
restart : unless-stopped
container_name : "pinot-server-wiki"
volumes :
- ./config:/config
depends_on :
- pinot-broker
Run the following command to launch all the components:
Wikimedia recent changes stream
Wikimedia provides provides a continuous stream of structured event data describing changes made to various Wikimedia properties. The events are published over HTTP using the Server-Side Events (SSE) Protocol.
You can find the endpoint at:
We'll need to install the SSE client library to consume this data:
Copy pip install sseclient-py
Next, create a file called
that contains the following:
Copy import json
import pprint
import sseclient
import requests
def with_requests ( url , headers ):
"""Get a streaming response for the given event feed using requests."""
return requests . get (url, stream = True , headers = headers)
url = ''
headers = { 'Accept' : 'text/event-stream' }
response = with_requests (url, headers)
client = sseclient . SSEClient (response)
for event in client . events ():
stream = json . loads (
pprint . pprint (stream)
The highlighted section shows how we connect to the recent changes feed using the SSE client library.
Let's run this script as shown below:
We'll see the following (truncated) output:
Copy {'$schema' : '/mediawiki/recentchange/ 1.0 . 0 ' ,
'bot' : False ,
'comment' : '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
'Oslofjorden Norway (Protected coastal forest Recreational area '
'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
'archeological stone string) Vår (spring) 2021-04-24 .jpg]] removed '
'from category' ,
'id' : 1923506287 ,
'meta' : {'domain' : '' ,
'dt' : ' 2022-05-12 T 09 : 57 : 00 Z' ,
'id' : ' 3800228e-43 d 8-440 d -8034 -c 68977742653 ' ,
'offset' : 3855767440 ,
'partition' : 0 ,
'request_id' : ' 930 b 17 cc-f 14 a -4656 -afa 1 -d 15 b 79 a 8 f 666 ' ,
'stream' : 'mediawiki.recentchange' ,
'topic' : 'eqiad.mediawiki.recentchange' ,
'uri' : 'https: //'},
'namespace': 14 ,
'parsedcomment' : '<a '
'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
'title= "File:Storemyr-Fagerbakken landskapsvernområde '
'HVASSER Oslofjorden Norway (Protected coastal forest '