LogoLogo
latest
latest
  • Introduction
  • Basics
    • Concepts
      • Pinot storage model
      • Architecture
      • Components
        • Cluster
          • Tenant
          • Server
          • Controller
          • Broker
          • Minion
        • Table
          • Segment
            • Deep Store
            • Segment threshold
            • Segment retention
          • Schema
          • Time boundary
        • Pinot Data Explorer
    • Getting Started
      • Running Pinot locally
      • Running Pinot in Docker
      • Quick Start Examples
      • Running in Kubernetes
      • Running on public clouds
        • Running on Azure
        • Running on GCP
        • Running on AWS
      • Create and update a table configuration
      • Batch import example
      • Stream ingestion example
      • HDFS as Deep Storage
      • Troubleshooting Pinot
      • Frequently Asked Questions (FAQs)
        • General
        • Pinot On Kubernetes FAQ
        • Ingestion FAQ
        • Query FAQ
        • Operations FAQ
    • Indexing
      • Bloom filter
      • Dictionary index
      • Forward index
      • FST index
      • Geospatial
      • Inverted index
      • JSON index
      • Native text index
      • Range index
      • Star-tree index
      • Text search support
      • Timestamp index
      • Vector index
    • Release notes
      • 1.3.0
      • 1.2.0
      • 1.1.0
      • 1.0.0
      • 0.12.1
      • 0.12.0
      • 0.11.0
      • 0.10.0
      • 0.9.3
      • 0.9.2
      • 0.9.1
      • 0.9.0
      • 0.8.0
      • 0.7.1
      • 0.6.0
      • 0.5.0
      • 0.4.0
      • 0.3.0
      • 0.2.0
      • 0.1.0
    • Recipes
      • Connect to Streamlit
      • Connect to Dash
      • Visualize data with Redash
      • GitHub Events Stream
  • For Users
    • Query
      • Querying Pinot
      • Query Syntax
        • Explain Plan (Single-Stage)
        • Filtering with IdSet
        • GapFill Function For Time-Series Dataset
        • Grouping Algorithm
        • JOINs
        • Lookup UDF Join
      • Query Options
      • Query Quotas
      • Query using Cursors
      • Multi-stage query
        • Understanding Stages
        • Stats
        • Optimizing joins
        • Join strategies
          • Random + broadcast join strategy
          • Query time partition join strategy
          • Colocated join strategy
          • Lookup join strategy
        • Hints
        • Operator Types
          • Aggregate
          • Filter
          • Join
          • Intersect
          • Leaf
          • Literal
          • Mailbox receive
          • Mailbox send
          • Minus
          • Sort or limit
          • Transform
          • Union
          • Window
        • Stage-Level Spooling
      • Explain plan
    • APIs
      • Broker Query API
        • Query Response Format
      • Broker GRPC API
      • Controller Admin API
      • Controller API Reference
    • External Clients
      • JDBC
      • Java
      • Python
      • Golang
    • Tutorials
      • Use OSS as Deep Storage for Pinot
      • Ingest Parquet Files from S3 Using Spark
      • Creating Pinot Segments
      • Use S3 as Deep Storage for Pinot
      • Use S3 and Pinot in Docker
      • Batch Data Ingestion In Practice
      • Schema Evolution
  • For Developers
    • Basics
      • Extending Pinot
        • Writing Custom Aggregation Function
        • Segment Fetchers
      • Contribution Guidelines
      • Code Setup
      • Code Modules and Organization
      • Dependency Management
      • Update documentation
    • Advanced
      • Data Ingestion Overview
      • Ingestion Aggregations
      • Ingestion Transformations
      • Null value support
      • Use the multi-stage query engine (v2)
      • Advanced Pinot Setup
    • Plugins
      • Write Custom Plugins
        • Input Format Plugin
        • Filesystem Plugin
        • Batch Segment Fetcher Plugin
        • Stream Ingestion Plugin
    • Design Documents
      • Segment Writer API
  • For Operators
    • Deployment and Monitoring
      • Set up cluster
      • Server Startup Status Checkers
      • Set up table
      • Set up ingestion
      • Decoupling Controller from the Data Path
      • Segment Assignment
      • Instance Assignment
      • Rebalance
        • Rebalance Servers
          • Examples and Scenarios
        • Rebalance Brokers
        • Rebalance Tenant
      • Separating data storage by age
        • Using multiple tenants
        • Using multiple directories
      • Pinot managed Offline flows
      • Minion merge rollup task
      • Consistent Push and Rollback
      • Access Control
      • Monitoring
      • Tuning
        • Tuning Default MMAP Advice
        • Real-time
        • Routing
        • Query Routing using Adaptive Server Selection
        • Query Scheduling
      • Upgrading Pinot with confidence
      • Managing Logs
      • OOM Protection Using Automatic Query Killing
      • Pause ingestion based on resource utilization
    • Command-Line Interface (CLI)
    • Configuration Recommendation Engine
    • Tutorials
      • Authentication
        • Basic auth access control
        • ZkBasicAuthAccessControl
      • Configuring TLS/SSL
      • Build Docker Images
      • Running Pinot in Production
      • Kubernetes Deployment
      • Amazon EKS (Kafka)
      • Amazon MSK (Kafka)
      • Monitor Pinot using Prometheus and Grafana
      • Performance Optimization Configurations
      • Segment Operations Throttling
      • Reload a table segment
  • Configuration Reference
    • Cluster
    • Controller
    • Broker
    • Server
    • Table
    • Ingestion
    • Schema
    • Database
    • Ingestion Job Spec
    • Monitoring Metrics
    • Plugin Reference
      • Stream Ingestion Connectors
      • VAR_POP
      • VAR_SAMP
      • STDDEV_POP
      • STDDEV_SAMP
    • Dynamic Environment
  • Manage Data
    • Import Data
      • SQL Insert Into From Files
      • Upload Pinot segment Using CommandLine
      • Batch Ingestion
        • Spark
        • Flink
        • Hadoop
        • Backfill Data
        • Dimension table
      • Stream Ingestion
        • Ingest streaming data from Apache Kafka
        • Ingest streaming data from Amazon Kinesis
        • Ingest streaming data from Apache Pulsar
        • Configure indexes
        • Stream ingestion with CLP
      • Upsert and Dedup
        • Stream ingestion with Upsert
        • Segment compaction on upserts
        • Stream ingestion with Dedup
      • Supported Data Formats
      • File Systems
        • Amazon S3
        • Azure Data Lake Storage
        • HDFS
        • Google Cloud Storage
      • Complex Type (Array, Map) Handling
        • Complex Type Examples (Unnest)
      • Ingest records with dynamic schemas
  • Functions
    • Aggregation Functions
    • Transformation Functions
    • Array Functions
    • Funnel Analysis Functions
    • Hash Functions
    • JSON Functions
    • User-Defined Functions (UDFs)
    • URL Functions
    • Unique Count and cardinality Estimation Functions
  • Window Functions
  • (Deprecating) Function List
    • ABS
    • ADD
    • ago
    • EXPR_MIN / EXPR_MAX
    • ARRAY_AGG
    • arrayConcatDouble
    • arrayConcatFloat
    • arrayConcatInt
    • arrayConcatLong
    • arrayConcatString
    • arrayContainsInt
    • arrayContainsString
    • arrayDistinctInt
    • arrayDistinctString
    • arrayIndexOfInt
    • arrayIndexOfString
    • ARRAYLENGTH
    • arrayRemoveInt
    • arrayRemoveString
    • arrayReverseInt
    • arrayReverseString
    • arraySliceInt
    • arraySliceString
    • arraySortInt
    • arraySortString
    • arrayUnionInt
    • arrayUnionString
    • AVGMV
    • Base64
    • caseWhen
    • ceil
    • CHR
    • codepoint
    • concat
    • count
    • COUNTMV
    • COVAR_POP
    • COVAR_SAMP
    • day
    • dayOfWeek
    • dayOfYear
    • DISTINCT
    • DISTINCTCOUNT
    • DISTINCTCOUNTMV
    • DISTINCT_COUNT_OFF_HEAP
    • SEGMENTPARTITIONEDDISTINCTCOUNT
    • DISTINCTCOUNTBITMAP
    • DISTINCTCOUNTBITMAPMV
    • DISTINCTCOUNTHLL
    • DISTINCTCOUNTHLLMV
    • DISTINCTCOUNTRAWHLL
    • DISTINCTCOUNTRAWHLLMV
    • DISTINCTCOUNTSMARTHLL
    • DISTINCTCOUNTHLLPLUS
    • DISTINCTCOUNTULL
    • DISTINCTCOUNTTHETASKETCH
    • DISTINCTCOUNTRAWTHETASKETCH
    • DISTINCTSUM
    • DISTINCTSUMMV
    • DISTINCTAVG
    • DISTINCTAVGMV
    • DIV
    • DATETIMECONVERT
    • DATETRUNC
    • exp
    • FIRSTWITHTIME
    • FLOOR
    • FrequentLongsSketch
    • FrequentStringsSketch
    • FromDateTime
    • FromEpoch
    • FromEpochBucket
    • FUNNELCOUNT
    • FunnelCompleteCount
    • FunnelMaxStep
    • FunnelMatchStep
    • GridDistance
    • Histogram
    • hour
    • isSubnetOf
    • JSONFORMAT
    • JSONPATH
    • JSONPATHARRAY
    • JSONPATHARRAYDEFAULTEMPTY
    • JSONPATHDOUBLE
    • JSONPATHLONG
    • JSONPATHSTRING
    • jsonextractkey
    • jsonextractscalar
    • LAG
    • LASTWITHTIME
    • LEAD
    • length
    • ln
    • lower
    • lpad
    • ltrim
    • max
    • MAXMV
    • MD5
    • millisecond
    • min
    • minmaxrange
    • MINMAXRANGEMV
    • MINMV
    • minute
    • MOD
    • mode
    • month
    • mult
    • now
    • percentile
    • percentileest
    • percentileestmv
    • percentilemv
    • percentiletdigest
    • percentiletdigestmv
    • percentilekll
    • percentilerawkll
    • percentilekllmv
    • percentilerawkllmv
    • quarter
    • regexpExtract
    • regexpReplace
    • remove
    • replace
    • reverse
    • round
    • roundDecimal
    • ROW_NUMBER
    • rpad
    • rtrim
    • second
    • sha
    • sha256
    • sha512
    • sqrt
    • startswith
    • ST_AsBinary
    • ST_AsText
    • ST_Contains
    • ST_Distance
    • ST_GeogFromText
    • ST_GeogFromWKB
    • ST_GeometryType
    • ST_GeomFromText
    • ST_GeomFromWKB
    • STPOINT
    • ST_Polygon
    • strpos
    • ST_Union
    • SUB
    • substr
    • sum
    • summv
    • TIMECONVERT
    • timezoneHour
    • timezoneMinute
    • ToDateTime
    • ToEpoch
    • ToEpochBucket
    • ToEpochRounded
    • TOJSONMAPSTR
    • toGeometry
    • toSphericalGeography
    • trim
    • upper
    • Url
    • UTF8
    • VALUEIN
    • week
    • year
    • Extract
    • yearOfWeek
    • FIRST_VALUE
    • LAST_VALUE
    • ST_GeomFromGeoJSON
    • ST_GeogFromGeoJSON
    • ST_AsGeoJSON
  • Reference
    • Single-stage query engine (v1)
    • Multi-stage query engine (v2)
    • Troubleshooting
      • Troubleshoot issues with the multi-stage query engine (v2)
      • Troubleshoot issues with ZooKeeper znodes
      • Realtime Ingestion Stopped
  • RESOURCES
    • Community
    • Team
    • Blogs
    • Presentations
    • Videos
  • Integrations
    • Tableau
    • Trino
    • ThirdEye
    • Superset
    • Presto
    • Spark-Pinot Connector
  • Contributing
    • Contribute Pinot documentation
    • Style guide
Powered by GitBook
On this page
  • Cryptographic Hash Functions
  • Non-Cryptographic Hash Functions
  • Notes

Was this helpful?

Edit on GitHub
Export as PDF
  1. Functions

Hash Functions

Apache Pinot provides a suite of hash functions to compute various hash values for data transformation within queries. These functions support cryptographic hashes (e.g., SHA, MD5) and non-cryptographic hashes (e.g., Murmur, Adler, CRC). Below is a detailed reference for each function.

Cryptographic Hash Functions


SHA

Computes the SHA-1 hash of the input.

Syntax

SHA(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-1 hash as a lowercase hex string.

Example

SELECT SHA(TO_UTF8('testString')) FROM myTable
-- Returns '956265657d0b637ef65b9b59f9f858eecf55ed6a'

SHA224

Computes the SHA-224 hash of the input.

Syntax

SHA224(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-224 hash as a lowercase hex string.

Example

SELECT SHA224(TO_UTF8('testString')) FROM myTable
-- Returns 'bb54d1095764bff72b570dcdc3172ed6d1b26695494528a0059c95ae'

SHA256

Computes the SHA-256 hash of the input.

Syntax

SHA256(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-256 hash as a lowercase hex string.

Example

SELECT SHA256(TO_UTF8('testString')) FROM myTable
-- Returns '4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3'

SHA512

Computes the SHA-512 hash of the input.

Syntax

SHA512(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-512 hash as a lowercase hex string.

Example

SELECT SHA512(TO_UTF8('testString')) FROM myTable
-- Returns 'c48af5a7f6d4a851fc8a434eed638ab1a6ef68e19dbcae894ac67c9fbc5bcb0182b8e7123b3df3c9e4dcb7690c23103f03dc17f54352071ceb2a4eb204b26b91'

MD2

Computes the MD2 hash of the input.

Syntax

MD2(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: MD2 hash as a lowercase hex string.

Example

SELECT MD2(TO_UTF8('testString')) FROM myTable
-- Returns '466c453913ba0d8325f96b2d47984fb5'

MD5

Computes the MD5 hash of the input.

Syntax

MD5(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: MD5 hash as a lowercase hex string.

Example

SELECT MD5(TO_UTF8('testString')) FROM myTable
-- Returns '536788f4dbdffeecfbb8f350a941eea3'

Non-Cryptographic Hash Functions


MurmurHash2

Computes a 32-bit MurmurHash2 value.

Syntax

MURMURHASH2(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH2(TO_UTF8('testString')) FROM myTable
-- Returns -534425817

MurmurHash2UTF8

Computes a 32-bit MurmurHash2 value for a UTF-8 string.

Syntax

MURMURHASH2UTF8(input)

Parameters

  • input (STRING): Input string (converted to UTF-8 bytes).

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH2UTF8('testString') FROM myTable
-- Returns -534425817

MurmurHash2Bit64

Computes a 64-bit MurmurHash2 value. Two overloads are supported:

Syntax 1 (No Seed)

MURMURHASH2BIT64(input)

Syntax 2 (With Seed)

MURMURHASH2BIT64(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT, optional): Seed value for the hash function.

Returns

  • LONG: 64-bit hash value (signed long).

Examples

SELECT MURMURHASH2BIT64(TO_UTF8('testString')) FROM myTable
-- Returns 3907736674355139845

SELECT MURMURHASH2BIT64(TO_UTF8('testString'), 12345) FROM myTable
-- Returns -2138976126980760436

MurmurHash3Bit32

Computes a 32-bit MurmurHash3 value with a seed.

Syntax

MURMURHASH3BIT32(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH3BIT32(TO_UTF8('testString'), 0) FROM myTable
-- Returns -1435605585

MurmurHash3Bit64

Computes a 64-bit MurmurHash3 value with a seed.

Syntax

MURMURHASH3BIT64(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • LONG: 64-bit hash value (signed long).

Example

SELECT MURMURHASH3BIT64(TO_UTF8('testString'), 0) FROM myTable
-- Returns -3652179990542706350

MurmurHash3Bit128

Computes a 128-bit MurmurHash3 value with a seed.

Syntax

MURMURHASH3BIT128(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • BYTES: 128-bit hash as a 16-byte array.

Example

SELECT MURMURHASH3BIT128(TO_UTF8('testString'), 0) FROM myTable
-- Returns byte array: [82, -103, -23, 15, -90, -39, 80, -51, 15, 73, -81, -28, 111, -21, -78, 108]

MurmurHash3X64Bit32 (x64 Optimized)

Computes a 32-bit MurmurHash3 optimized for x64 platforms.

Syntax

MURMURHASH3X64BIT32(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • INT: 32-bit hash value (signed integer).

Example

SELECT MURMURHASH3X64BIT32(TO_UTF8('testString'), 0) FROM myTable
-- Returns -1096986291

MurmurHash3X64Bit128 (x64 Optimized)

Computes a 128-bit MurmurHash3 optimized for x64 platforms.

Syntax

MURMURHASH3X64BIT128(input, seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • BYTES: 128-bit hash as a 16-byte array.

Example

SELECT MURMURHASH3X64BIT128(TO_UTF8('testString'), 0) FROM myTable
-- Returns byte array: [-66, -99, 81, 77, -7, 29, 124, 76, 42, 38, -34, -42, -92, -83, 83, 13]

Adler32

Computes a 32-bit Adler checksum.

Syntax

ADLER32(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit Adler checksum (signed integer).

Example

SELECT ADLER32(TO_UTF8('testString')) FROM myTable
-- Returns 392102968

CRC32

Computes a 32-bit CRC (Cyclic Redundancy Check).

Syntax

CRC32(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit CRC32 value (signed integer).

Example

SELECT CRC32(TO_UTF8('testString')) FROM myTable
-- Returns 418708744

CRC32C

Computes a 32-bit CRC32C (Castagnoli).

Syntax

CRC32C(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit CRC32C value (signed integer).

Example

SELECT CRC32C(TO_UTF8('testString')) FROM myTable
-- Returns -1608760557

Notes

  1. Input Conversion: Use TO_UTF8(string) to convert strings to BYTES where required.

  2. Negative Values: Hash functions return signed integers/longs. Use CAST to interpret them as unsigned if needed.

  3. Byte Arrays: Functions like MURMURHASH3BIT128 return BYTES as a 16-byte array.

  4. Platform-Specific Variants: Functions like MURMURHASH3X64BIT32/64/128 are optimized for x64 architectures. The results could be different cross platform.

PreviousFunnel Analysis FunctionsNextJSON Functions

Last updated 2 months ago

Was this helpful?