LogoLogo
latest
latest
  • Introduction
  • Basics
    • Concepts
      • Pinot storage model
      • Architecture
      • Components
        • Cluster
          • Tenant
          • Server
          • Controller
          • Broker
          • Minion
        • Table
          • Segment
            • Deep Store
            • Segment threshold
            • Segment retention
          • Schema
          • Time boundary
        • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • FST index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
      • Vector index
    • Release notes
      • 1.3.0
      • 1.2.0
      • 1.1.0
      • 1.0.0
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Explain Plan (Single-Stage)
        • Filtering with IdSet
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • JOINs
        • Lookup UDF Join
      • Query Options
      • Query Quotas
      • Query using Cursors
      • Multi-stage query
        • Understanding Stages
        • Stats
        • Optimizing joins
        • Join strategies
          • Random + broadcast join strategy
          • Query time partition join strategy
          • Colocated join strategy
          • Lookup join strategy
        • Hints
        • Operator Types
          • Aggregate
          • Filter
          • Join
          • Intersect
          • Leaf
          • Literal
          • Mailbox receive
          • Mailbox send
          • Minus
          • Sort or limit
          • Transform
          • Union
          • Window
        • Stage-Level Spooling
      • Explain plan
    • APIs
      • Broker Query API
        • Query Response Format
      • Broker GRPC API
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Dependency Management
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
          • Examples and Scenarios
        • Rebalance Brokers
        • Rebalance Tenant
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Tuning Default MMAP Advice
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
      • Pause ingestion based on resource utilization
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
      • Segment Operations Throttling
      • Reload a table segment
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Database
    • Ingestion Job Spec
    • Monitoring Metrics
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
    • Dynamic Environment
  • Manage Data
    • Import Data
      • SQL Insert Into From Files
      • Upload Pinot segment Using CommandLine
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream Ingestion
        • Ingest streaming data from Apache Kafka
        • Ingest streaming data from Amazon Kinesis
        • Ingest streaming data from Apache Pulsar
        • Configure indexes
        • Stream ingestion with CLP
      • Upsert and Dedup
        • Stream ingestion with Upsert
        • Segment compaction on upserts
        • Stream ingestion with Dedup
      • Supported Data Formats
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Complex Type (Array, Map) Handling
        • Complex Type Examples (Unnest)
      • Ingest records with dynamic schemas
  • Functions
    • Aggregation Functions
    • Transformation Functions
    • Array Functions
    • Funnel Analysis Functions
    • Hash Functions
    • JSON Functions
    • User-Defined Functions (UDFs)
    • URL Functions
    • Unique Count and cardinality Estimation Functions
  • Window Functions
  • (Deprecating) Function List
    • ABS
    • ADD
    • ago
    • EXPR_MIN / EXPR_MAX
    • ARRAY_AGG
    • arrayConcatDouble
    • arrayConcatFloat
    • arrayConcatInt
    • arrayConcatLong
    • arrayConcatString
    • arrayContainsInt
    • arrayContainsString
    • arrayDistinctInt
    • arrayDistinctString
    • arrayIndexOfInt
    • arrayIndexOfString
    • ARRAYLENGTH
    • arrayRemoveInt
    • arrayRemoveString
    • arrayReverseInt
    • arrayReverseString
    • arraySliceInt
    • arraySliceString
    • arraySortInt
    • arraySortString
    • arrayUnionInt
    • arrayUnionString
    • AVGMV
    • Base64
    • caseWhen
    • ceil
    • CHR
    • codepoint
    • concat
    • count
    • COUNTMV
    • COVAR_POP
    • COVAR_SAMP
    • day
    • dayOfWeek
    • dayOfYear
    • DISTINCT
    • DISTINCTCOUNT
    • DISTINCTCOUNTMV
    • DISTINCT_COUNT_OFF_HEAP
    • SEGMENTPARTITIONEDDISTINCTCOUNT
    • DISTINCTCOUNTBITMAP
    • DISTINCTCOUNTBITMAPMV
    • DISTINCTCOUNTHLL
    • DISTINCTCOUNTHLLMV
    • DISTINCTCOUNTRAWHLL
    • DISTINCTCOUNTRAWHLLMV
    • DISTINCTCOUNTSMARTHLL
    • DISTINCTCOUNTHLLPLUS
    • DISTINCTCOUNTULL
    • DISTINCTCOUNTTHETASKETCH
    • DISTINCTCOUNTRAWTHETASKETCH
    • DISTINCTSUM
    • DISTINCTSUMMV
    • DISTINCTAVG
    • DISTINCTAVGMV
    • DIV
    • DATETIMECONVERT
    • DATETRUNC
    • exp
    • FIRSTWITHTIME
    • FLOOR
    • FrequentLongsSketch
    • FrequentStringsSketch
    • FromDateTime
    • FromEpoch
    • FromEpochBucket
    • FUNNELCOUNT
    • FunnelCompleteCount
    • FunnelMaxStep
    • FunnelMatchStep
    • GridDistance
    • Histogram
    • hour
    • isSubnetOf
    • JSONFORMAT
    • JSONPATH
    • JSONPATHARRAY
    • JSONPATHARRAYDEFAULTEMPTY
    • JSONPATHDOUBLE
    • JSONPATHLONG
    • JSONPATHSTRING
    • jsonextractkey
    • jsonextractscalar
    • LAG
    • LASTWITHTIME
    • LEAD
    • length
    • ln
    • lower
    • lpad
    • ltrim
    • max
    • MAXMV
    • MD5
    • millisecond
    • min
    • minmaxrange
    • MINMAXRANGEMV
    • MINMV
    • minute
    • MOD
    • mode
    • month
    • mult
    • now
    • percentile
    • percentileest
    • percentileestmv
    • percentilemv
    • percentiletdigest
    • percentiletdigestmv
    • percentilekll
    • percentilerawkll
    • percentilekllmv
    • percentilerawkllmv
    • quarter
    • regexpExtract
    • regexpReplace
    • remove
    • replace
    • reverse
    • round
    • roundDecimal
    • ROW_NUMBER
    • rpad
    • rtrim
    • second
    • sha
    • sha256
    • sha512
    • sqrt
    • startswith
    • ST_AsBinary
    • ST_AsText
    • ST_Contains
    • ST_Distance
    • ST_GeogFromText
    • ST_GeogFromWKB
    • ST_GeometryType
    • ST_GeomFromText
    • ST_GeomFromWKB
    • STPOINT
    • ST_Polygon
    • strpos
    • ST_Union
    • SUB
    • substr
    • sum
    • summv
    • TIMECONVERT
    • timezoneHour
    • timezoneMinute
    • ToDateTime
    • ToEpoch
    • ToEpochBucket
    • ToEpochRounded
    • TOJSONMAPSTR
    • toGeometry
    • toSphericalGeography
    • trim
    • upper
    • Url
    • UTF8
    • VALUEIN
    • week
    • year
    • Extract
    • yearOfWeek
    • FIRST_VALUE
    • LAST_VALUE
    • ST_GeomFromGeoJSON
    • ST_GeogFromGeoJSON
    • ST_AsGeoJSON
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
    • Troubleshooting
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Troubleshoot issues with ZooKeeper znodes
      • Realtime Ingestion Stopped
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Add pinot db dependency
  • Add Python data source for Pinot
  • Start Pinot
  • Run a query in Redash
  • Example Python queries
  • Query top 10 teams by total runs
  • Query top 10 teams by total runs
  • Query total strikeouts by year
  • Add a visualization and dashboard in Redash
  • Add a visualization
  • Add a dashboard

Was this helpful?

Edit on GitHub
Export as PDF
  1. Basics
  2. Recipes

Visualize data with Redash

PreviousConnect to DashNextGitHub Events Stream

Last updated 1 year ago

Was this helpful?

  1. Install Redash and start a running instance, following the .

  2. Configure Redash to query Pinot, by doing the following:

  3. Create visualizations, by doing the following:

Add pinot db dependency

Apache Pinot provides a Python client library pinotdb to query Pinot from Python applications. Install pinotdb inside the Redash worker instance to make network calls to Pinot.

  1. Navigate to the root directory where you’ve cloned Redash. Run the following command to get the name of the Redash worker container (by default, redash_worker_1):

docker-compose ps

  1. Run the following command (change redash_worker_1 to your own Redash worker container name, if applicable):

docker exec -it redash_worker_1 /bin/sh                                
pip install pinotdb
  1. Restart Docker.

Add Python data source for Pinot

  1. In Redash, select Settings > Data Sources.

  2. On the Redash Settings - Data Source page, add Pinot as the name of the data source, enter pinotdb in the Modules to import prior to running the script field.

  3. Enter the following optional fields as needed:

    • AdditionalModulesPaths: Enter a comma-separated list of absolute paths on the Redash server to Python modules to make available when querying from Redash. Useful for private modules unavailable in pip.

    • AdditionalBuiltins: Specify additional built-in functions as needed. By default, Redash automatically includes 25 Python built-in functions.

  4. Click Save.

Start Pinot

Run the following command in a new terminal to spin up an Apache Pinot Docker container in the quick start mode with a baseball stats dataset built in.

docker run \
  --name pinot-quickstart \
  -p 2123:2123 \
  -p 9000:9000 \
  -p 8000:8000 \
  apachepinot/pinot:0.9.3 QuickStart -type batch

Run a query in Redash

  1. Click Execute to run the query and view results.

You can also include libraries like Pandas to perform more advanced data manipulation on Pinot’s data and visualize the output with Redash.

Example Python queries

Query top 10 teams by total runs

The following query connects to Pinot and queries the baseballStats table to retrieve the top ten players with the highest scores. The results are transformed into a dictionary format supported by Redash.

from pinotdb import connect

conn = connect(host='host.docker.internal', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    select 
playerName, sum(runs) as total_runs
from baseballStats
group by playerName
order by total_runs desc
limit 10
""")

result = {}
result['columns'] = [
    {
      "name": "player_name",
      "type": "string",
      "friendly_name": "playerName"
    },
    {
      "name": "total_runs",
      "type": "integer",
      "friendly_name": "total_runs"
    }
  ]

rows = []

for row in curs:
    record = {}
    record['player_name'] = row[0]
    record['total_runs'] = row[1]


    rows.append(record)

result["rows"] = rows

Query top 10 teams by total runs

from pinotdb import connect

conn = connect(host='host.docker.internal', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    select 
teamID, sum(runs) as total_runs
from baseballStats
group by teamID
order by total_runs desc
limit 10
""")

result = {}
result['columns'] = [
    {
      "name": "teamID",
      "type": "string",
      "friendly_name": "Team"
    },
    {
      "name": "total_runs",
      "type": "integer",
      "friendly_name": "Total Runs"
    }
  ]

rows = []

for row in curs:
    record = {}
    record['teamID'] = row[0]
    record['total_runs'] = row[1]


    rows.append(record)

result["rows"] = rows

Query total strikeouts by year

from pinotdb import connect

conn = connect(host='host.docker.internal', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute("""
    select 
yearID, sum(strikeouts) as total_so
from baseballStats
group by yearID
order by yearID asc
limit 1000
""")

result = {}
result['columns'] = [
    {
      "name": "yearID",
      "type": "integer",
      "friendly_name": "Year"
    },
    {
      "name": "total_so",
      "type": "integer",
      "friendly_name": "Total Strikeouts"
    }
  ]

rows = []

for row in curs:
    record = {}
    record['yearID'] = row[0]
    record['total_so'] = row[1]


    rows.append(record)

result["rows"] = rows

Add a visualization and dashboard in Redash

Add a visualization

In Redash, after you've ran your query, click the New Visualization tab, and select the type of visualization your want to create, for example, Bar Chart. The Visualization Editor appears with your chart.

For example, you may want to create a bar chart to view the top 10 players with highest scores.

You may want to create a line chart to view the total variation in strikeouts over time.

Add a dashboard

Create a dashboard with one or more visualizations (widgets).

  1. In Redash, go to Dashboards > New Dashboards.

Select New Data Source, and then select Python from the list.

In Redash, select Queries > New Query, and then select the Python data source you created in .

Add Python code to query data. For more information, see the .

For more information, see in Redash documentation.

Bar chart configuration

For more information, see .

Add the widgets to your dashboard. For example, by adding the three visualizations from the above, you create a Baseball stats dashboard.

Baseball stats dashboard

For more information, see in the Redash documentation.

Python query runner
Querying
Visualizations
Dashboards
Docker Based Developer Installation Guide
Add pinotdb dependency
Add a Python data source for Pinot
Start Pinot
Query in Redash
Add a visualization and dashboard in Redash
Add a Python data source for Pinot
three example queries