arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Null value support

triangle-exclamation

Multi-stage engine warning

This document describes null handling for the single-stage query engine. At this time, the multi-stage query engine (v2) does not support null handling. Queries involving null values in a multi-stage environment may return unexpected results.

Null handling is defined in two different parts: at ingestion and at query time.

  • means that you have enabled null handling at ingestion.

  • means that you have also enabled null handling at query time.

hashtag
Basic null handling support

By default, null handling is disabled (nullHandlingEnabled=false) in the Table index configuration (). When null support is disabled, IS NOT NULL evaluates to true, and IS NULL evaluates to false. For example, the predicate in the query below matches all records.

hashtag
Enable basic null support

To enable basic null support (IS NULL and IS NOT NULL) and generate the null index, in the Table index configuration (), set nullHandlingEnabled=true.

When null support is enabled, IS NOT NULL and IS NULL evaluate to true or false according to whether a null is detected.

circle-info

Important

You MUST SET enableNullHandling=true; before you query. Just having "nullHandlingEnabled: true," set in your table config does not automatically provide enableNullHandling=true when you execute a query. Basic null handling supports IS NOT NULL and IS NULL predicates. Advanced null handling adds SQL compatibility.

hashtag
Example workarounds to handle null values

If you're not able to generate the null index for your use case, you may filter for null values using a default value specified in your schema or a specific value included in your query.

circle-info

The following example queries work when the null value is not used in a dataset. Errors may occur if the specified null value is a valid value in the dataset.

hashtag
Filter for default null value(s) specified in your schema

  1. Specify a default null value (defaultNullValue) in your for dimension fields, (dimensionFieldSpecs), metric fields (metricFieldSpecs), and date time fields (dateTimeFieldSpecs).

  2. To filter out the specified default null value, for example, you could write a query like the following:

hashtag
Filter for a specific value in your query

Filter for a specific value in your query that will not be included in the dataset. For example, to calculate the average age, use -1 to indicate the value of Age is null.

  • Rewrite the following query:

  • To cover null values as follows:

hashtag
Advanced null handling support

Under development to improve performance for advanced null handling.

Pinot provides advanced null handling support similar to standard SQL null handling. Because this feature carries a notable performance impact (even queries without null values), this feature is not enabled by default. For optimal query latency, we recommend .

hashtag
Enable advanced null handling

To enable NULL handling, do the following:

  1. To enable null handling during ingestion, in , set**nullHandlingEnabled=true**.

  2. To enable null handling for queries, set the**enableNullHandling** .

circle-info

Important

You MUST SET enableNullHandling=true; before you query. Just having "nullHandlingEnabled: true," set in your table config does not automatically provide enableNullHandling=true when you execute a query. Basic null handling supports IS NOT NULL and IS NULL predicates. Advanced null handling adds SQL compatibility.

hashtag
Ingestion time

To store the null values in a segment, you must enable the nullHandlingEnabled in before ingesting the data.

During real-time or offline ingestion, Pinot checks to see if null handling is enabled, and stores null values in the segment itself. Data ingested when null handling is disabled does not store null values, and should be ingested again.

The nullHandlingEnabled configuration affects all columns in a Pinot table.

circle-info

Column-level null support is under development.

hashtag
Query time

By default, null usage in the predicate is disabled.

For handling nulls in aggregation functions, explicitly enable the null support by setting the query option enableNullHandling to true. Configure this option in one of the following ways:

  1. Set enableNullHandling=true at the beginning of the query.

  2. If using JDBC, set the connection option enableNullHandling=true (either in the URL or as a property).

When this option is enabled, the Pinot query engine uses a different execution path that checks null predicates. Therefore, some indexes may not be usable, and the query is significantly more expensive. This is the main reason why null handling is not enabled by default.

If the query includes a IS NULL or IS NOT NULL predicate, Pinot fetches the NULL value vector for the corresponding column within FilterPlanNode and retrieves the corresponding bitmap that represents all document IDs containing NULL values for that column. This bitmap is then used to create a BitmapBasedFilterOperator to do the filtering operation.

hashtag
Examples queries

hashtag
Select Query

hashtag
Filter Query

hashtag
Aggregate Query

hashtag
Aggregate Filter Query

hashtag
Group By Query

hashtag
Order By Query

hashtag
Transform Query

Basic null handling support
Advanced null support
tableIndexConfigarrow-up-right
tableIndexConfigarrow-up-right
schemaarrow-up-right
enabling basic null support
tableIndexConfigarrow-up-right
query optionarrow-up-right
tableIndexConfig sectionarrow-up-right
select count(*) from my_table where column IS NOT NULL
    select count(*) from my_table where column <> 'default_null_value'
    select avg(Age) from my_table
    select avg(Age) from my_table WHERE Age <> -1