Null value support
Multi-stage engine warning
This document describes null handling for the single-stage query engine. At this time, the multi-stage query engine (v2) does not support null handling. Queries involving null values in a multi-stage environment may return unexpected results.
Null handling is defined in two different parts: at ingestion and at query time.
Basic null handling support means that you have enabled null handling at ingestion.
Advanced null support means that you have also enabled null handling at query time.
Basic null handling support
By default, null handling is disabled (nullHandlingEnabled=false
) in the Table index configuration (tableIndexConfig). When null support is disabled, IS NOT NULL
evaluates to true,
and IS NULL
evaluates to false
. For example, the predicate in the query below matches all records.
Enable basic null support
To enable basic null support (IS NULL
and IS NOT NULL
) and generate the null index, in the Table index configuration (tableIndexConfig), set nullHandlingEnabled=true
.
When null support is enabled, IS NOT NULL
and IS NULL
evaluate to true
or false
according to whether a null is detected.
Important
You MUST SET enableNullHandling=true;
before you query. Just having "nullHandlingEnabled: true,"
set in your table config does not automatically provide enableNullHandling=true
when you execute a query. Basic null handling supports IS NOT NULL
and IS NULL
predicates. Advanced null handling adds SQL compatibility.
Example workarounds to handle null values
If you're not able to generate the null index for your use case, you may filter for null values using a default value specified in your schema or a specific value included in your query.
The following example queries work when the null value is not used in a dataset. Errors may occur if the specified null value is a valid value in the dataset.
Filter for default null value(s) specified in your schema
Specify a default null value (
defaultNullValue
) in your schema for dimension fields, (dimensionFieldSpecs
), metric fields (metricFieldSpecs)
, and date time fields (dateTimeFieldSpecs
).To filter out the specified default null value, for example, you could write a query like the following:
Filter for a specific value in your query
Filter for a specific value in your query that will not be included in the dataset. For example, to calculate the average age, use -1
to indicate the value of Age
is null
.
Rewrite the following query:
To cover null values as follows:
Advanced null handling support
Under development to improve performance for advanced null handling.
Pinot provides advanced null handling support similar to standard SQL null handling. Because this feature carries a notable performance impact (even queries without null values), this feature is not enabled by default. For optimal query latency, we recommend enabling basic null support.
Enable advanced null handling
To enable NULL
handling, do the following:
To enable
null handling during ingestion, in tableIndexConfig, set**nullHandlingEnabled=true
**.To enable null handling for queries, set the**
enableNullHandling
** query option.
Important
You MUST SET enableNullHandling=true;
before you query. Just having "nullHandlingEnabled: true,"
set in your table config does not automatically provide enableNullHandling=true
when you execute a query. Basic null handling supports IS NOT NULL
and IS NULL
predicates. Advanced null handling adds SQL compatibility.
Ingestion time
To store the null values in a segment, you must enable the nullHandlingEnabled
in tableIndexConfig section before ingesting the data.
During real-time or offline ingestion, Pinot checks to see if null handling is enabled, and stores null values in the segment itself. Data ingested when null handling is disabled does not store null values, and should be ingested again.
The nullHandlingEnabled
configuration affects all columns in a Pinot table.
Column-level null support is under development.
Query time
By default, null usage in the predicate is disabled.
For handling nulls in aggregation functions, explicitly enable the null support by setting the query option enableNullHandling
to true
. Configure this option in one of the following ways:
Set enableNullHandling=true
at the beginning of the query.If using JDBC, set the connection option
enableNullHandling=true
(either in the URL or as a property).
When this option is enabled, the Pinot query engine uses a different execution path that checks null predicates. Therefore, some indexes may not be usable, and the query is significantly more expensive. This is the main reason why null handling is not enabled by default.
If the query includes a IS NULL
or IS NOT NULL
predicate, Pinot fetches the NULL
value vector for the corresponding column within FilterPlanNode
and retrieves the corresponding bitmap that represents all document IDs containing NULL
values for that column. This bitmap is then used to create a BitmapBasedFilterOperator
to do the filtering operation.