Connect to Streamlit In this Apache Pinot guide, we'll learn how visualize data using the Streamlit web framework.
In this guide you'll learn how to visualize data from Apache Pinot using Streamlit . Streamlit is a Python library that makes it easy to build interactive data based web applications.
We're going to use Streamlit to build a real-time dashboard to visualize the changes being made to Wikimedia properties.
Real-Time Dashboard Architecture
Startup components
We're going to use the following Docker compose file, which spins up instances of Zookeeper, Kafka, along with a Pinot controller, broker, and server:
Copy version : '3.7'
services :
zookeeper :
image : zookeeper:3.5.6
container_name : "zookeeper-wiki"
ports :
- "2181:2181"
environment :
ZOOKEEPER_CLIENT_PORT : 2181
ZOOKEEPER_TICK_TIME : 2000
kafka :
image : wurstmeister/kafka:latest
restart : unless-stopped
container_name : "kafka-wiki"
ports :
- "9092:9092"
expose :
- "9093"
depends_on :
- zookeeper
environment :
KAFKA_ZOOKEEPER_CONNECT : zookeeper-wiki:2181/kafka
KAFKA_BROKER_ID : 0
KAFKA_ADVERTISED_HOST_NAME : kafka-wiki
KAFKA_ADVERTISED_LISTENERS : PLAINTEXT://kafka-wiki:9093,OUTSIDE://localhost:9092
KAFKA_LISTENERS : PLAINTEXT://0.0.0.0:9093,OUTSIDE://0.0.0.0:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP : PLAINTEXT:PLAINTEXT,OUTSIDE:PLAINTEXT
pinot-controller :
image : apachepinot/pinot:0.10.0
command : "StartController -zkAddress zookeeper-wiki:2181 -dataDir /data"
container_name : "pinot-controller-wiki"
volumes :
- ./config:/config
- ./data:/data
restart : unless-stopped
ports :
- "9000:9000"
depends_on :
- zookeeper
pinot-broker :
image : apachepinot/pinot:0.10.0
command : "StartBroker -zkAddress zookeeper-wiki:2181"
restart : unless-stopped
container_name : "pinot-broker-wiki"
volumes :
- ./config:/config
ports :
- "8099:8099"
depends_on :
- pinot-controller
pinot-server :
image : apachepinot/pinot:0.10.0
command : "StartServer -zkAddress zookeeper-wiki:2181"
restart : unless-stopped
container_name : "pinot-server-wiki"
volumes :
- ./config:/config
depends_on :
- pinot-broker
docker-compose.yml
Run the following command to launch all the components:
Wikimedia recent changes stream
Wikimedia provides provides a continuous stream of structured event data describing changes made to various Wikimedia properties. The events are published over HTTP using the Server-Side Events (SSE) Protocol.
You can find the endpoint at: stream.wikimedia.org/v2/stream/recentchange
We'll need to install the SSE client library to consume this data:
Copy pip install sseclient-py
Next, create a file called wiki.py
that contains the following:
Copy import json
import pprint
import sseclient
import requests
def with_requests ( url , headers ):
"""Get a streaming response for the given event feed using requests."""
return requests . get (url, stream = True , headers = headers)
url = 'https://stream.wikimedia.org/v2/stream/recentchange'
headers = { 'Accept' : 'text/event-stream' }
response = with_requests (url, headers)
client = sseclient . SSEClient (response)
for event in client . events ():
stream = json . loads (event.data)
pprint . pprint (stream)
wiki.py
The highlighted section shows how we connect to the recent changes feed using the SSE client library.
Let's run this script as shown below:
We'll see the following (truncated) output:
Output
Copy {'$schema' : '/mediawiki/recentchange/ 1.0 . 0 ' ,
'bot' : False ,
'comment' : '[[:File:Storemyr-Fagerbakken landskapsvernområde HVASSER '
'Oslofjorden Norway (Protected coastal forest Recreational area '
'hiking trails) Rituell-kultisk steinstreng sørøst i skogen (small '
'archeological stone string) Vår (spring) 2021-04-24 .jpg]] removed '
'from category' ,
'id' : 1923506287 ,
'meta' : {'domain' : 'commons.wikimedia.org' ,
'dt' : ' 2022-05-12 T 09 : 57 : 00 Z' ,
'id' : ' 3800228e-43 d 8-440 d -8034 -c 68977742653 ' ,
'offset' : 3855767440 ,
'partition' : 0 ,
'request_id' : ' 930 b 17 cc-f 14 a -4656 -afa 1 -d 15 b 79 a 8 f 666 ' ,
'stream' : 'mediawiki.recentchange' ,
'topic' : 'eqiad.mediawiki.recentchange' ,
'uri' : 'https: //commons.wikimedia.org/wiki/Category:Iron_Age_in_Norway'},
'namespace': 14 ,
'parsedcomment' : '<a '
'href="/wiki/File:Storemyr-Fagerbakken_landskapsvernomr%C3%A5de_HVASSER_Oslofjorden_Norway_(Protected_coastal_forest_Recreational_area_hiking_trails)_Rituell-kultisk_steinstreng_s%C3%B8r%C3%B8st_i_skogen_(small_archeological_stone_string)_V%C3%A5r_(spring)_2021-04-24.jpg" '
'title= "File:Storemyr-Fagerbakken landskapsvernområde '
'HVASSER Oslofjorden Norway (Protected coastal forest '