LogoLogo
release-1.2.0
release-1.2.0
  • Introduction
  • Basics
    • Concepts
      • Pinot storage model
      • Architecture
      • Components
        • Cluster
          • Tenant
          • Server
          • Controller
          • Broker
          • Minion
        • Table
          • Segment
            • Deep Store
            • Segment threshold
            • Segment retention
          • Schema
          • Time boundary
        • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream ingestion
        • Ingest streaming data from Apache Kafka
        • Ingest streaming data from Amazon Kinesis
        • Ingest streaming data from Apache Pulsar
        • Configure indexes
      • Stream ingestion with Upsert
      • Segment compaction on upserts
      • Stream ingestion with Dedup
      • Stream ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
        • Complex Type (Array, Map) Handling
        • Ingest records with dynamic schemas
      • Reload a table segment
      • Upload a table segment
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • FST index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
    • Release notes
      • 1.1.0
      • 1.0.0
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Aggregation Functions
        • Cardinality Estimation
        • Explain Plan (Single-Stage)
        • Explain Plan (Multi-Stage)
        • Filtering with IdSet
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • JOINs
        • Lookup UDF Join
        • Querying JSON data
        • Transformation Functions
        • Window aggregate
        • Funnel Analysis
      • Query Options
      • Multi stage query
        • Operator Types
          • Aggregate
          • Filter
          • Join
          • Intersect
          • Leaf
          • Literal
          • Mailbox receive
          • Mailbox send
          • Minus
          • Sort or limit
          • Transform
          • Union
          • Window
        • Understanding Stages
        • Explain
        • Stats
      • User-Defined Functions (UDFs)
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Dependency Management
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
        • Rebalance Tenant
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • EXPR_MIN / EXPR_MAX
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • caseWhen
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FIRSTWITHTIME
      • FLOOR
      • FrequentLongsSketch
      • FrequentStringsSketch
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • FUNNELCOUNT
      • FunnelCompleteCount
      • FunnelMaxStep
      • FunnelMatchStep
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • LAG
      • LASTWITHTIME
      • LEAD
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • percentilekll
      • percentilerawkll
      • percentilekllmv
      • percentilerawkllmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • ROW_NUMBER
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • yearOfWeek
      • Extract
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
    • Dynamic Environment
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
    • Troubleshooting
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Troubleshoot issues with ZooKeeper znodes
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Geospatial data types
  • Geometry vs geography
  • Geospatial functions
  • Aggregations
  • Constructors
  • Measurements
  • Outputs
  • Conversion
  • Relationship
  • Geospatial index
  • How to use geoindex
  • How geoindex works

Was this helpful?

Edit on GitHub
Export as PDF
  1. Basics
  2. Indexing

Geospatial

This page talks about geospatial support in Pinot.

PreviousFST indexNextInverted index

Was this helpful?

Pinot supports SQL/MM geospatial data and is compliant with the . This includes:

  • Geospatial data types, such as point, line and polygon;

  • Geospatial functions, for querying of spatial properties and relationships.

  • Geospatial indexing, used for efficient processing of spatial operations

Geospatial data types

Geospatial data types abstract and encapsulate spatial structures such as boundary and dimension. In many respects, spatial data types can be understood simply as shapes. Pinot supports the Well-Known Text (WKT) and Well-Known Binary (WKB) forms of geospatial objects, for example:

  • POINT (0, 0)

  • LINESTRING (0 0, 1 1, 2 1, 2 2)

  • POLYGON (0 0, 10 0, 10 10, 0 10, 0 0),(1 1, 1 2, 2 2, 2 1, 1 1)

  • MULTIPOINT (0 0, 1 2)

  • MULTILINESTRING ((0 0, 1 1, 1 2), (2 3, 3 2, 5 4))

  • MULTIPOLYGON (((0 0, 4 0, 4 4, 0 4, 0 0), (1 1, 2 1, 2 2, 1 2, 1 1)), ((-1 -1, -1 -2, -2 -2, -2 -1, -1 -1)))

  • GEOMETRYCOLLECTION(POINT(2 0),POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))

Geometry vs geography

It is common to have data in which the coordinates are geographics or latitude/longitude. Unlike coordinates in Mercator or UTM, geographic coordinates are not Cartesian coordinates.

  • Geographic coordinates do not represent a linear distance from an origin as plotted on a plane. Rather, these spherical coordinates describe angular coordinates on a globe.

  • Spherical coordinates specify a point by the angle of rotation from a reference meridian (longitude), and the angle from the equator (latitude).

You can treat geographic coordinates as approximate Cartesian coordinates and continue to do spatial calculations. However, measurements of distance, length and area will be nonsensical. Since spherical coordinates measure angular distance, the units are in degrees.

Geospatial functions

For manipulating geospatial data, Pinot provides a set of functions for analyzing geometric components, determining spatial relationships, and manipulating geometries. In particular, geospatial functions that begin with the ST_ prefix support the SQL/MM specification.

Following geospatial functions are available out of the box in Pinot:

Aggregations

Constructors

Measurements

  • ST_Area(Geometry/Geography g) → double For geometry type, it returns the 2D Euclidean area of a geometry. For geography, returns the area of a polygon or multi-polygon in square meters using a spherical model for Earth.

Outputs

Conversion

Relationship

  • ST_Equals(Geometry, Geometry) → boolean Returns true if the given geometries represent the same geometry/geography.

  • ST_Within(Geometry, Geometry) → boolean Returns true if first geometry is completely inside second geometry.

Geospatial index

A given geospatial location (longitude, latitude) can map to one hexagon (represented as H3Index). And its neighbors in H3 can be approximated by a ring of hexagons. To quickly identify the distance between any given two geospatial locations, we can convert the two locations in the H3Index, and then check the H3 distance between them. H3 distance is measured as the number of hexagons.

How to use geoindex

geoindex schema
{
  "dataType": "BYTES",
  "name": "location_st_point",
  "transformFunction": "toSphericalGeography(stPoint(lon,lat))"
}

Note the use of transformFunction that converts the created point into SphericalGeography format, which is needed by the ST_Distance function.

  • Enable the H3 index.

It is recommended to do the latter by using the indexes section:

geoindex tableConfig
{
  "fieldConfigList": [
    {
      "name": "location_st_point",
      "encodingType":"RAW", // this actually disables the dictionary
      "indexes": {
        "h3": {
          "resolutions": [13, 5, 6]
        }
      }
    }
  ],
  ...
}

Alternative the older way to configure H3 indexes is still supported:

geoindex tableConfig
{
  "fieldConfigList": [{
    "name": "location_st_point",
    "encodingType":"RAW", // this actually disables the dictionary
    "indexTypes":["H3"],
    "properties": {
      "resolutions": "13, 5, 6" // Here resolutions must be a string with ints separated by commas 
    }
  }],
  ...
}

The query below will use the geoindex to filter the Starbucks stores within 5km of the given point in the bay area.

SELECT address, ST_DISTANCE(location_st_point, ST_Point(-122, 37, 1))
FROM starbucksStores
WHERE ST_DISTANCE(location_st_point, ST_Point(-122, 37, 1)) < 5000
limit 1000

How geoindex works

The Pinot geoindex accelerates query evaluation while maintaining accuracy. Currently, geoindex supports the ST_Distance function in the WHERE clause.

At the high level, geoindex is used for retrieving the records within the nearby hexagons of the given location, and then use ST_Distance to accurately filter the matched results.

As in the example diagram above, if we want to find all relevant points within a given distance around San Francisco (area within the red circle), then the algorithm with geoindex will:

  • First find the H3 distance x that contains the range (for example, within a red circle).

  • Finally, for the points contained in the hexagons of kRing(x) at the outer edge of the red circle H3 distance, the algorithm will filter them by evaluating the condition ST_Distance(loc1, loc2) < x to find only those that are within the circle.

Pinot supports both geometry and geography types, which can be constructed by the corresponding functions as shown in . And for the geography types, the measurement functions such as ST_Distance and ST_Area calculate the spherical distance and area on earth respectively.

This aggregate function returns a MULTI geometry or NON-MULTI geometry from a set of geometries. it ignores NULL geometries.

Returns a geometry type object from WKT representation, with the optional spatial system reference.

Returns a geometry type object from WKB representation.

Returns a geometry type point object with the given coordinate values.

Returns a geometry type polygon object from .

Creates a geography instance from a

Returns a specified geography value from .

For geometry type, returns the 2-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units. For geography, returns the great-circle distance in meters between two SphericalGeography points. Note that g1, g2 shall have the same type.

Returns the type of the geometry as a string. e.g.: ST_Linestring, ST_Polygon,ST_MultiPolygon etc.

Returns the WKB representation of the geometry.

Returns the WKT representation of the geometry/geography.

Converts a Geometry object to a spherical geography object.

Converts a spherical geographical object to a Geometry object.

Returns true if and only if no points of the second geometry/geography lie in the exterior of the first geometry/geography, and at least one point of the interior of the first geometry lies in the interior of the second geometry. Warning: ST_Contains on Geography only give close approximation

Geospatial functions are typically expensive to evaluate, and using geoindex can greatly accelerate the query evaluation. Geoindexing in Pinot is based on Uber’s , a hexagon-based hierarchical gridding.

For example, in the diagram below, the red hexagons are within the 1 distance of the central hexagon. The size of the hexagon is determined by the resolution of the indexing. Check this table for the level of and the corresponding precision (measured in km).

To use the geoindex, first declare the geolocation field as bytes in the schema, as in the example of the .

Next, declare the geospatial index in the you need to

Verify the dictionary is disabled (see how to ).

Then, for the points within the H3 distance (those covered by the hexagons completely within ), directly accept those points without filtering.

Open Geospatial Consortium’s (OGC) OpenGIS Specifications
ST_Union(geometry[] g1_array) → Geometry
ST_GeomFromText(String wkt) → Geometry
ST_GeomFromWKB(bytes wkb) → Geometry
ST_Point(double x, double y) → Point
ST_Polygon(String wkt) → Polygon
WKT representation
ST_GeogFromWKB(bytes wkb) → Geography
Well-Known Binary geometry representation (WKB)
ST_GeogFromText(String wkt) → Geography
Well-Known Text representation or extended (WKT)
ST_Distance(Geometry/Geography g1, Geometry/Geography g2) → double
ST_GeometryType(Geometry g) → String
ST_AsBinary(Geometry/Geography g) → bytes
ST_AsText(Geometry/Geography g) → string
toSphericalGeography(Geometry g) → Geography
toGeometry(Geography g) → Geometry
ST_Contains(Geometry/Geography, Geometry/Geography) → boolean
H3
resolutions
QuickStart example
table configuration
disable the dictionary index
kRing(x)
section
Hexagonal grid in H3
Geoindex example