Hash Functions

Apache Pinot provides a suite of hash functions to compute various hash values for data transformation within queries. These functions support cryptographic hashes (e.g., SHA, MD5) and non-cryptographic hashes (e.g., Murmur, Adler, CRC). Below is a detailed reference for each function.

Cryptographic Hash Functions


SHA

Computes the SHA-1 hash of the input.

Syntax

SHA(input)

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-1 hash as a lowercase hex string.

Example

SELECT SHA(TO_UTF8('testString')) FROM myTable
-- Returns '956265657d0b637ef65b9b59f9f858eecf55ed6a'

SHA224

Computes the SHA-224 hash of the input.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-224 hash as a lowercase hex string.

Example


SHA256

Computes the SHA-256 hash of the input.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-256 hash as a lowercase hex string.

Example


SHA512

Computes the SHA-512 hash of the input.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: SHA-512 hash as a lowercase hex string.

Example


MD2

Computes the MD2 hash of the input.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: MD2 hash as a lowercase hex string.

Example


MD5

Computes the MD5 hash of the input.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • STRING: MD5 hash as a lowercase hex string.

Example


Non-Cryptographic Hash Functions


MurmurHash2

Computes a 32-bit MurmurHash2 value.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit hash value (signed integer).

Example


MurmurHash2UTF8

Computes a 32-bit MurmurHash2 value for a UTF-8 string.

Syntax

Parameters

  • input (STRING): Input string (converted to UTF-8 bytes).

Returns

  • INT: 32-bit hash value (signed integer).

Example


MurmurHash2Bit64

Computes a 64-bit MurmurHash2 value. Two overloads are supported:

Syntax 1 (No Seed)

Syntax 2 (With Seed)

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT, optional): Seed value for the hash function.

Returns

  • LONG: 64-bit hash value (signed long).

Examples


MurmurHash3Bit32

Computes a 32-bit MurmurHash3 value with a seed.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • INT: 32-bit hash value (signed integer).

Example


MurmurHash3Bit64

Computes a 64-bit MurmurHash3 value with a seed.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • LONG: 64-bit hash value (signed long).

Example


MurmurHash3Bit128

Computes a 128-bit MurmurHash3 value with a seed.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • BYTES: 128-bit hash as a 16-byte array.

Example


MurmurHash3X64Bit32 (x64 Optimized)

Computes a 32-bit MurmurHash3 optimized for x64 platforms.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • INT: 32-bit hash value (signed integer).

Example


MurmurHash3X64Bit128 (x64 Optimized)

Computes a 128-bit MurmurHash3 optimized for x64 platforms.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

  • seed (INT): Seed value for the hash function.

Returns

  • BYTES: 128-bit hash as a 16-byte array.

Example


Adler32

Computes a 32-bit Adler checksum.

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit Adler checksum (signed integer).

Example


CRC32

Computes a 32-bit CRC (Cyclic Redundancy Check).

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit CRC32 value (signed integer).

Example


CRC32C

Computes a 32-bit CRC32C (Castagnoli).

Syntax

Parameters

  • input (BYTES): Input byte array to hash.

Returns

  • INT: 32-bit CRC32C value (signed integer).

Example


Notes

  1. Input Conversion: Use TO_UTF8(string) to convert strings to BYTES where required.

  2. Negative Values: Hash functions return signed integers/longs. Use CAST to interpret them as unsigned if needed.

  3. Byte Arrays: Functions like MURMURHASH3BIT128 return BYTES as a 16-byte array.

  4. Platform-Specific Variants: Functions like MURMURHASH3X64BIT32/64/128 are optimized for x64 architectures. The results could be different cross platform.

Was this helpful?