Union
Describes the union relation operator in the multi-stage query engine.
Was this helpful?
Describes the union relation operator in the multi-stage query engine.
Was this helpful?
The union operator combines the results of two or more queries into a single result set. The result set contains all the rows from the queries. Contrary to other set operations ( and ), the union operator does not remove duplicates from the result set. Therefore its semantic is similar to the SQL UNION
or UNION ALL
operator.
There is no guarantee on the order of the rows in the result set.
The current implementation consumes input relations one by one. It first returns all rows from the first input relation, then all rows from the second input relation, and so on.
The union operator is a streaming operator that consumes the input relations one by one. The current implementation fully consumes the inputs in order. See for more details.
None
Type: Long
The summation of time spent by all threads executing the operator. This means that the wall time spent in the operation may be smaller that this value if the parallelism is larger than 1.
Type: Long
The number of groups emitted by the operator.
The union operator is represented in the explain plan as a LogicalUnion
explain node.
Type: Boolean
Whether the union operator should remove duplicates from the result set.
For example the plan of:
Is expected to be:
While the plan of:
Is a bit more complex
The current implementation of the union operator consumes the input relations one by one starting from the first one. This means that the second input relation is not consumed until the first one is fully consumed and so on. Therefore is recommended to put the fastest input relation first to reduce the overall latency.
Although Pinot supports the SQL UNION
and UNION ALL
clauses, the union operator does only support the UNION ALL
semantic. In order to implement the UNION
semantic, the multi-stage query engine adds an extra to calculate the distinct.
Notice that LogicalUnion
is still using all=[true]
but the LogicalAggregate
is used to remove the duplicates. This also means that while the union operator is always streaming, the union clause results in a blocking plan (given the operator is blocking).
Usually a good way to set the order of the input relations is to change the input order trying to minimize the value of the stat of all the inputs.