LogoLogo
release-1.3.0
release-1.3.0
  • Introduction
  • Basics
    • Concepts
      • Pinot storage model
      • Architecture
      • Components
        • Cluster
          • Tenant
          • Server
          • Controller
          • Broker
          • Minion
        • Table
          • Segment
            • Deep Store
            • Segment threshold
            • Segment retention
          • Schema
          • Time boundary
        • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Import Data
      • From Query Console
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream ingestion
        • Ingest streaming data from Apache Kafka
        • Ingest streaming data from Amazon Kinesis
        • Ingest streaming data from Apache Pulsar
        • Configure indexes
      • Stream ingestion with Upsert
      • Segment compaction on upserts
      • Stream ingestion with Dedup
      • Stream ingestion with CLP
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Input formats
        • Complex Type (Array, Map) Handling
        • Complex Type Examples
        • Ingest records with dynamic schemas
      • Reload a table segment
      • Upload a table segment
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • FST index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
      • Vector index
    • Release notes
      • 1.3.0
      • 1.2.0
      • 1.1.0
      • 1.0.0
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Aggregation Functions
        • Array Functions
        • Cardinality Estimation
        • Explain Plan (Single-Stage)
        • Filtering with IdSet
        • Funnel Analysis
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • Hash Functions
        • JOINs
        • Lookup UDF Join
        • Querying JSON data
        • Transformation Functions
        • URL Functions
        • Window Functions
      • Query Options
      • Query Quotas
      • Query using Cursors
      • Multi-stage query
        • Understanding Stages
        • Stats
        • Optimizing joins
        • Join strategies
          • Random + broadcast join strategy
          • Query time partition join strategy
          • Colocated join strategy
          • Lookup join strategy
        • Hints
        • Operator Types
          • Aggregate
          • Filter
          • Join
          • Intersect
          • Leaf
          • Literal
          • Mailbox receive
          • Mailbox send
          • Minus
          • Sort or limit
          • Transform
          • Union
          • Window
        • Stage-Level Spooling
      • User-Defined Functions (UDFs)
      • Explain plan
    • APIs
      • Broker Query API
        • Query Response Format
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Dependency Management
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
        • Rebalance Brokers
        • Rebalance Tenant
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Tuning Default MMAP Advice
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
      • Pause ingestion based on resource utilization
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
      • Segment Operations Throttling
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Ingestion Job Spec
    • Monitoring Metrics
    • Functions
      • ABS
      • ADD
      • ago
      • EXPR_MIN / EXPR_MAX
      • ARRAY_AGG
      • arrayConcatDouble
      • arrayConcatFloat
      • arrayConcatInt
      • arrayConcatLong
      • arrayConcatString
      • arrayContainsInt
      • arrayContainsString
      • arrayDistinctInt
      • arrayDistinctString
      • arrayIndexOfInt
      • arrayIndexOfString
      • ARRAYLENGTH
      • arrayRemoveInt
      • arrayRemoveString
      • arrayReverseInt
      • arrayReverseString
      • arraySliceInt
      • arraySliceString
      • arraySortInt
      • arraySortString
      • arrayUnionInt
      • arrayUnionString
      • AVGMV
      • Base64
      • caseWhen
      • ceil
      • CHR
      • codepoint
      • concat
      • count
      • COUNTMV
      • COVAR_POP
      • COVAR_SAMP
      • day
      • dayOfWeek
      • dayOfYear
      • DISTINCT
      • DISTINCTAVG
      • DISTINCTAVGMV
      • DISTINCTCOUNT
      • DISTINCTCOUNTBITMAP
      • DISTINCTCOUNTBITMAPMV
      • DISTINCTCOUNTHLL
      • DISTINCTCOUNTSMARTHLL
      • DISTINCTCOUNTHLLPLUS
      • DISTINCTCOUNTHLLMV
      • DISTINCTCOUNTMV
      • DISTINCTCOUNTRAWHLL
      • DISTINCTCOUNTRAWHLLMV
      • DISTINCTCOUNTRAWTHETASKETCH
      • DISTINCTCOUNTTHETASKETCH
      • DISTINCTCOUNTULL
      • DISTINCTSUM
      • DISTINCTSUMMV
      • DIV
      • DATETIMECONVERT
      • DATETRUNC
      • exp
      • FIRSTWITHTIME
      • FLOOR
      • FrequentLongsSketch
      • FrequentStringsSketch
      • FromDateTime
      • FromEpoch
      • FromEpochBucket
      • FUNNELCOUNT
      • FunnelCompleteCount
      • FunnelMaxStep
      • FunnelMatchStep
      • Histogram
      • hour
      • isSubnetOf
      • JSONFORMAT
      • JSONPATH
      • JSONPATHARRAY
      • JSONPATHARRAYDEFAULTEMPTY
      • JSONPATHDOUBLE
      • JSONPATHLONG
      • JSONPATHSTRING
      • jsonextractkey
      • jsonextractscalar
      • LAG
      • LASTWITHTIME
      • LEAD
      • length
      • ln
      • lower
      • lpad
      • ltrim
      • max
      • MAXMV
      • MD5
      • millisecond
      • min
      • minmaxrange
      • MINMAXRANGEMV
      • MINMV
      • minute
      • MOD
      • mode
      • month
      • mult
      • now
      • percentile
      • percentileest
      • percentileestmv
      • percentilemv
      • percentiletdigest
      • percentiletdigestmv
      • percentilekll
      • percentilerawkll
      • percentilekllmv
      • percentilerawkllmv
      • quarter
      • regexpExtract
      • regexpReplace
      • remove
      • replace
      • reverse
      • round
      • roundDecimal
      • ROW_NUMBER
      • rpad
      • rtrim
      • second
      • SEGMENTPARTITIONEDDISTINCTCOUNT
      • sha
      • sha256
      • sha512
      • sqrt
      • startswith
      • ST_AsBinary
      • ST_AsText
      • ST_Contains
      • ST_Distance
      • ST_GeogFromText
      • ST_GeogFromWKB
      • ST_GeometryType
      • ST_GeomFromText
      • ST_GeomFromWKB
      • STPOINT
      • ST_Polygon
      • strpos
      • ST_Union
      • SUB
      • substr
      • sum
      • summv
      • TIMECONVERT
      • timezoneHour
      • timezoneMinute
      • ToDateTime
      • ToEpoch
      • ToEpochBucket
      • ToEpochRounded
      • TOJSONMAPSTR
      • toGeometry
      • toSphericalGeography
      • trim
      • upper
      • Url
      • UTF8
      • VALUEIN
      • week
      • year
      • Extract
      • yearOfWeek
      • FIRST_VALUE
      • LAST_VALUE
      • ST_GeomFromGeoJSON
      • ST_GeogFromGeoJSON
      • ST_AsGeoJSON
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
    • Dynamic Environment
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
    • Troubleshooting
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Troubleshoot issues with ZooKeeper znodes
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Cryptographic Hash Functions
  • Non-Cryptographic Hash Functions
  • Notes

Was this helpful?

Export as PDF
  1. For Users
  2. Query
  3. Query Syntax

Hash Functions

Apache Pinot provides a suite of hash functions to compute various hash values for data transformation within queries. These functions support cryptographic hashes (e.g., SHA, MD5) and non-cryptographic hashes (e.g., Murmur, Adler, CRC). Below is a detailed reference for each function.

Cryptographic Hash Functions


SHA

Computes the SHA-1 hash of the input.

Syntax

SHA(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-1 hash as a lowercase hex string.

Example

SELECT SHA(TO_UTF8('testString')) FROM myTable
-- Returns '956265657d0b637ef65b9b59f9f858eecf55ed6a'

SHA224

Computes the SHA-224 hash of the input.

Syntax

SHA224(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-224 hash as a lowercase hex string.

Example

SELECT SHA224(TO_UTF8('testString')) FROM myTable
-- Returns 'bb54d1095764bff72b570dcdc3172ed6d1b26695494528a0059c95ae'

SHA256

Computes the SHA-256 hash of the input.

Syntax

SHA256(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-256 hash as a lowercase hex string.

Example

SELECT SHA256(TO_UTF8('testString')) FROM myTable
-- Returns '4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3'

SHA512

Computes the SHA-512 hash of the input.

Syntax

SHA512(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-512 hash as a lowercase hex string.

Example

SELECT SHA512(TO_UTF8('testString')) FROM myTable
-- Returns 'c48af5a7f6d4a851fc8a434eed638ab1a6ef68e19dbcae894ac67c9fbc5bcb0182b8e7123b3df3c9e4dcb7690c23103f03dc17f54352071ceb2a4eb204b26b91'

MD2

Computes the MD2 hash of the input.

Syntax

MD2(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: MD2 hash as a lowercase hex string.

Example

SELECT MD2(TO_UTF8('testString')) FROM myTable
-- Returns '466c453913ba0d8325f96b2d47984fb5'

MD5

Computes the MD5 hash of the input.

Syntax

MD5(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: MD5 hash as a lowercase hex string.

Example

SELECT MD5(TO_UTF8('testString')) FROM myTable
-- Returns '536788f4dbdffeecfbb8f350a941eea3'

Non-Cryptographic Hash Functions


MurmurHash2

Computes a 32-bit MurmurHash2 value.

Syntax

MURMURHASH2(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH2(TO_UTF8('testString')) FROM myTable
-- Returns -534425817

MurmurHash2UTF8

Computes a 32-bit MurmurHash2 value for a UTF-8 string.

Syntax

MURMURHASH2UTF8(input)

Parameters

  • input (STRING): Input string (converted to UTF-8 bytes).

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH2UTF8('testString') FROM myTable
-- Returns -534425817

MurmurHash2Bit64

Computes a 64-bit MurmurHash2 value. Two overloads are supported:

Syntax 1 (No Seed)

MURMURHASH2BIT64(input)

Syntax 2 (With Seed)

MURMURHASH2BIT64(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT, optional): Seed value for the hash function.

Returns

  • LONG: 64-bit hash value (signed long).

Examples

SELECT MURMURHASH2BIT64(TO_UTF8('testString')) FROM myTable
-- Returns 3907736674355139845

SELECT MURMURHASH2BIT64(TO_UTF8('testString'), 12345) FROM myTable
-- Returns -2138976126980760436

MurmurHash3Bit32

Computes a 32-bit MurmurHash3 value with a seed.

Syntax

MURMURHASH3BIT32(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH3BIT32(TO_UTF8('testString'), 0) FROM myTable
-- Returns -1435605585

MurmurHash3Bit64

Computes a 64-bit MurmurHash3 value with a seed.

Syntax

MURMURHASH3BIT64(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • LONG: 64-bit hash value (signed long).

Example

SELECT MURMURHASH3BIT64(TO_UTF8('testString'), 0) FROM myTable
-- Returns -3652179990542706350

MurmurHash3Bit128

Computes a 128-bit MurmurHash3 value with a seed.

Syntax

MURMURHASH3BIT128(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • BYTES: 128-bit hash as a 16-byte array.

Example

SELECT MURMURHASH3BIT128(TO_UTF8('testString'), 0) FROM myTable
-- Returns byte array: [82, -103, -23, 15, -90, -39, 80, -51, 15, 73, -81, -28, 111, -21, -78, 108]

MurmurHash3X64Bit32 (x64 Optimized)

Computes a 32-bit MurmurHash3 optimized for x64 platforms.

Syntax

MURMURHASH3X64BIT32(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH3X64BIT32(TO_UTF8('testString'), 0) FROM myTable
-- Returns -1096986291

MurmurHash3X64Bit128 (x64 Optimized)

Computes a 128-bit MurmurHash3 optimized for x64 platforms.

Syntax

MURMURHASH3X64BIT128(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • BYTES: 128-bit hash as a 16-byte array.

Example

SELECT MURMURHASH3X64BIT128(TO_UTF8('testString'), 0) FROM myTable
-- Returns byte array: [-66, -99, 81, 77, -7, 29, 124, 76, 42, 38, -34, -42, -92, -83, 83, 13]

Adler32

Computes a 32-bit Adler checksum.

Syntax

ADLER32(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit Adler checksum (signed integer).

Example

SELECT ADLER32(TO_UTF8('testString')) FROM myTable
-- Returns 392102968

CRC32

Computes a 32-bit CRC (Cyclic Redundancy Check).

Syntax

CRC32(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit CRC32 value (signed integer).

Example

SELECT CRC32(TO_UTF8('testString')) FROM myTable
-- Returns 418708744

CRC32C

Computes a 32-bit CRC32C (Castagnoli).

Syntax

CRC32C(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit CRC32C value (signed integer).

Example

SELECT CRC32C(TO_UTF8('testString')) FROM myTable
-- Returns -1608760557

Notes

  1. Input Conversion: Use TO_UTF8(string) to convert strings to BYTES where required.

  2. Negative Values: Hash functions return signed integers/longs. Use CAST to interpret them as unsigned if needed.

  3. Byte Arrays: Functions like MURMURHASH3BIT128 return BYTES as a 16-byte array.

  4. Platform-Specific Variants: Functions like MURMURHASH3X64BIT32/64/128 are optimized for x64 architectures. The results could be different cross platform.

PreviousGrouping AlgorithmNextJOINs

Was this helpful?