Null value support

Multi-stage engine warning

This document describes null handling for the single-stage query engine. At this time, the multi-stage query engine (v2) does not support null handling. Queries involving null values in a multi-stage environment may return unexpected results.

Null handling is defined in two different parts: at ingestion and at query time.

Basic null handling support means that you have enabled null handling at ingestion.
Advanced null support means that you have also enabled null handling at query time.

Basic null handling support

By default, null handling is disabled (nullHandlingEnabled=false) in the Table index configuration (tableIndexConfig). When null support is disabled, IS NOT NULL evaluates to true, and IS NULL evaluates to false. For example, the predicate in the query below matches all records.

select count(*) from my_table where column IS NOT NULL

Enable basic null support

To enable basic null support (IS NULL and IS NOT NULL) and generate the null index, in the Table index configuration (tableIndexConfig), set nullHandlingEnabled=true.

When null support is enabled, IS NOT NULL and IS NULL evaluate to true or false according to whether a null is detected.

Important

You MUST SET enableNullHandling=true; before you query. Just having "nullHandlingEnabled: true," set in your table config does not automatically provide enableNullHandling=true when you execute a query. Basic null handling supports IS NOT NULL and IS NULL predicates. Advanced null handling adds SQL compatibility.

Example workarounds to handle null values

If you're not able to generate the null index for your use case, you may filter for null values using a default value specified in your schema or a specific value included in your query.

The following example queries work when the null value is not used in a dataset. Errors may occur if the specified null value is a valid value in the dataset.

Filter for default null value(s) specified in your schema

Specify a default null value (defaultNullValue) in your schema for dimension fields, (dimensionFieldSpecs), metric fields (metricFieldSpecs), and date time fields (dateTimeFieldSpecs).
To filter out the specified default null value, for example, you could write a query like the following:

    select count(*) from my_table where column <> 'default_null_value'

Filter for a specific value in your query

Filter for a specific value in your query that will not be included in the dataset. For example, to calculate the average age, use -1 to indicate the value of Age is null.

Rewrite the following query:

    select avg(Age) from my_table

To cover null values as follows:

    select avg(Age) from my_table WHERE Age <> -1

Advanced null handling support

Under development to improve performance for advanced null handling.

Pinot provides advanced null handling support similar to standard SQL null handling. Because this feature carries a notable performance impact (even queries without null values), this feature is not enabled by default. For optimal query latency, we recommend enabling basic null support.

Enable advanced null handling

To enable NULL handling, do the following:

To enable null handling during ingestion, in tableIndexConfig, set**nullHandlingEnabled=true**.
To enable null handling for queries, set the**enableNullHandling** query option.

Important

Ingestion time

To store the null values in a segment, you must enable the nullHandlingEnabled in tableIndexConfig section before ingesting the data.

During real-time or offline ingestion, Pinot checks to see if null handling is enabled, and stores null values in the segment itself. Data ingested when null handling is disabled does not store null values, and should be ingested again.

The nullHandlingEnabled configuration affects all columns in a Pinot table.

Column-level null support is under development.

Query time

By default, null usage in the predicate is disabled.

For handling nulls in aggregation functions, explicitly enable the null support by setting the query option enableNullHandling to true. Configure this option in one of the following ways:

Set enableNullHandling=true at the beginning of the query.
If using JDBC, set the connection option enableNullHandling=true (either in the URL or as a property).

When this option is enabled, the Pinot query engine uses a different execution path that checks null predicates. Therefore, some indexes may not be usable, and the query is significantly more expensive. This is the main reason why null handling is not enabled by default.

If the query includes a IS NULL or IS NOT NULL predicate, Pinot fetches the NULL value vector for the corresponding column within FilterPlanNode and retrieves the corresponding bitmap that represents all document IDs containing NULL values for that column. This bitmap is then used to create a BitmapBasedFilterOperator to do the filtering operation.

Examples queries

Select Query

Filter Query

Aggregate Query

Aggregate Filter Query

Group By Query

Order By Query

Transform Query

PreviousIngestion Transformations NextUse the multi-stage query engine (v2)

Was this helpful?