This page talks about geospatial support in Pinot.
- Geospatial data types, such as point, line and polygon;
- Geospatial functions, for querying of spatial properties and relationships.
- Geospatial indexing, used for efficient processing of spatial operations
Geospatial data types abstract and encapsulate spatial structures such as boundary and dimension. In many respects, spatial data types can be understood simply as shapes. Pinot supports the Well-Known Text (WKT) and Well-Known Binary (WKB) form of geospatial objects, for example:
POINT (0, 0)
LINESTRING (0 0, 1 1, 2 1, 2 2)
POLYGON (0 0, 10 0, 10 10, 0 10, 0 0),(1 1, 1 2, 2 2, 2 1, 1 1)
MULTIPOINT (0 0, 1 2)
MULTILINESTRING ((0 0, 1 1, 1 2), (2 3, 3 2, 5 4))
MULTIPOLYGON (((0 0, 4 0, 4 4, 0 4, 0 0), (1 1, 2 1, 2 2, 1 2, 1 1)), ((-1 -1, -1 -2, -2 -2, -2 -1, -1 -1)))
GEOMETRYCOLLECTION(POINT(2 0),POLYGON((0 0, 1 0, 1 1, 0 1, 0 0)))
It is common to have data in which the coordinates are
latitude/longitude.Unlike coordinates in Mercator or UTM, geographic coordinates are not Cartesian coordinates.
- Geographic coordinates do not represent a linear distance from an origin as plotted on a plane. Rather, these spherical coordinates describe angular coordinates on a globe.
- Spherical coordinates specify a point by the angle of rotation from a reference meridian (longitude), and the angle from the equator (latitude).
You can treat geographic coordinates as approximate Cartesian coordinates and continue to do spatial calculations. However, measurements of distance, length and area will be nonsensical. Since spherical coordinates measure angular distance, the units are in degrees.
Pinot supports both geometry and geography types, which can be constructed by the corresponding functions as shown in section. And for the geography types, the measurement functions such as
ST_Areacalculate the spherical distance and area on earth respectively.
For manipulating geospatial data, Pinot provides a set of functions for analyzing geometric components, determining spatial relationships, and manipulating geometries. In particular, geospatial functions that begin with the
ST_prefix support the SQL/MM specification.
Following geospatial functions are available out of the box in Pinot-
- ST_Area(Geometry/Geography g) → double For geometry type, it returns the 2D Euclidean area of a geometry. For geography, returns the area of a polygon or multi-polygon in square meters using a spherical model for Earth.
- ST_Distance(Geometry/Geography g1, Geometry/Geography g2) → double For geometry type, returns the 2-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units. For geography, returns the great-circle distance in meters between two SphericalGeography points. Note that g1, g2 shall have the same type.
- ST_Contains(Geometry, Geometry) → boolean Returns true if and only if no points of the second geometry/geography lie in the exterior of the first geometry/geography, and at least one point of the interior of the first geometry lies in the interior of the second geometry.
- ST_Equals(Geometry, Geometry) → boolean Returns true if the given geometries represent the same geometry/geography.
- ST_Within(Geometry, Geometry) → boolean Returns true if first geometry is completely inside second geometry.
Geospatial functions are typically expensive to evaluate, and using geoindex can greatly accelerate the query evaluation. Geoindexing in Pinot is based on Uber’s H3, a hexagon-based hierarchical gridding.
A given geospatial location (longitude, latitude) can map to one hexagon (represented as H3Index). And its neighbors in H3 can be approximated by a ring of hexagons. To quickly identify the distance between any given two geospatial locations, we can convert the two locations in the H3Index, and then check the H3 distance between them. H3 distance is measured as the number of hexagons.
For example, in the diagram below, the red hexagons are within the 1 distance of the central hexagon. The size of the hexagon is determined by the resolution of the indexing. Please check this table for the level of resolutions and the corresponding precision (measured in km).
Hexagonal grid in H3
Note the use of
transformFunctionthat converts the created point into
SphericalGeographyformat, which is needed by the
The query below will use the geoindex to filter the Starbucks stores within 5km of the given point in the bay area.
SELECT address, ST_DISTANCE(location_st_point, ST_Point(-122, 37, 1))
WHERE ST_DISTANCE(location_st_point, ST_Point(-122, 37, 1)) < 5000
Geoindex in Pinot accelerates the query evaluation without compromising the correctness of the query result. Currently, geoindex supports the
ST_Distancefunction used in the range predicates in the
WHEREclause, as shown in the query example in the previous section.
At the high level, geoindex is used for retrieving the records within the nearby hexagons of the given location, and then use
ST_Distanceto accurately filter the matched results.
As in the example diagram above, if we want to find all relevant points within a given distance at San Francisco (represented in the area within the red circle), then the algorithm with geoindex works as the following:
- Find the H3 distance
xthat contains the range (i.e. red circle)
- For the points falling into the H3 distance (i.e. in the hexagons of
kRing(x)), we do filtering on them by evaluating the condition
ST_Distance(loc1, loc2) < x