githubEdit

Vector / Similarity Functions

Pinot provides built-in vector similarity and utility functions for working with float array columns. These functions support nearest-neighbor search, recommendation systems, retrieval-augmented generation (RAG), and any use case involving embedding vectors.

Both input vectors must have the same number of dimensions. Passing null or mismatched-length vectors results in an error.

circle-info

To accelerate vector search queries with approximate nearest-neighbor (ANN) lookup, configure a vector index on your float array column and use the VECTOR_SIMILARITY predicate described below.

VECTOR_SIMILARITY

VECTOR_SIMILARITY is a WHERE-clause predicate that performs approximate nearest-neighbor (ANN) search using a vector index. It is not a standalone function—it acts as a filter that returns the top-K documents whose vectors are closest to the given query vector.

Syntax

WHERE VECTOR_SIMILARITY(vectorColumn, queryVector, topK)
Parameter
Type
Description

vectorColumn

identifier

A multi-valued FLOAT column with a vector index configured.

queryVector

ARRAY[...]

A float array literal representing the query embedding.

topK

integer literal

Number of nearest neighbors to retrieve. Defaults to 10 if omitted.

Prerequisites

VECTOR_SIMILARITY requires a vector index on the target column. Without a vector index the predicate will fail. See the vector index documentation for setup instructions.

Minimal field config:

{
  "fieldConfigList": [
    {
      "name": "embedding",
      "encodingType": "RAW",
      "indexes": {
        "vector": {
          "vectorIndexType": "HNSW",
          "vectorDimension": 512,
          "vectorDistanceFunction": "COSINE",
          "version": 1
        }
      }
    }
  ]
}

Examples

Find the 5 nearest products to a query embedding and rank by cosine distance:

Combine with metadata filters — retrieve 20 ANN candidates and then filter by category:

circle-exclamation

Distance Functions

cosineDistance

Returns the cosine distance between two vectors, defined as 1 - cosine_similarity. The result ranges from 0 (identical direction) to 2 (opposite direction).

If either vector has a norm of zero, the two-argument form returns NaN while the three-argument form returns the specified defaultValue.

innerProduct

Returns the inner product (sum of element-wise products) of two vectors. Useful when embeddings are pre-normalized and higher scores indicate greater similarity.

l1Distance

Returns the L1 distance (Manhattan distance) between two vectors, computed as the sum of absolute differences of their components.

l2Distance

Returns the L2 distance (Euclidean distance) between two vectors, computed as the square root of the sum of squared differences.

euclideanDistance

Returns the squared Euclidean distance between two vectors (the sum of squared differences without the square root). This is computationally cheaper than l2Distance when you only need to compare relative distances, since omitting the square root preserves the ordering.

dotProduct

Returns the dot product of two vectors. Functionally equivalent to innerProduct.

Utility Functions

vectorDims

Returns the number of dimensions (length) of a vector.

vectorNorm

Returns the L2 norm (Euclidean length) of a vector, computed as the square root of the sum of squared components.

This example walks through setting up a table for semantic search over product reviews, from schema definition to querying.

1. Define the schema

The embedding column must be a multi-valued FLOAT field:

2. Configure the table with a vector index

Enable the HNSW vector index on the embedding column. Choose a distance function that matches how your embeddings were produced—COSINE is the most common choice for normalized text embeddings.

3. Query with vector similarity

Use VECTOR_SIMILARITY to retrieve nearest neighbors and a distance function to rank results:

This query first uses the HNSW index to retrieve the 10 approximate nearest neighbors, then orders those candidates by exact cosine distance and returns the top 5.

Function Summary

Function
Return type
Description

VECTOR_SIMILARITY(col, query, topK)

predicate

ANN filter — requires a vector index.

cosineDistance(v1, v2 [, default])

DOUBLE

Cosine distance (1 - cosine_similarity).

innerProduct(v1, v2)

DOUBLE

Inner product (sum of element-wise products).

l1Distance(v1, v2)

DOUBLE

Manhattan distance.

l2Distance(v1, v2)

DOUBLE

Euclidean distance (with square root).

euclideanDistance(v1, v2)

DOUBLE

Squared Euclidean distance (no square root).

dotProduct(v1, v2)

DOUBLE

Dot product (equivalent to innerProduct).

vectorDims(v)

INT

Number of vector dimensions.

vectorNorm(v)

DOUBLE

L2 norm (magnitude) of a vector.

Last updated

Was this helpful?