Ingest records with dynamic schemas
Storing records with dynamic schemas in a table with a fixed schema.
Some domains (e.g., logging) generate records where each record can have a different set of keys, whereas Pinot tables have a relatively static schema. For records with varying keys, it's impractical to store each field in its own table column. However, most (if not all) fields may be important, so fields should not be dropped unnecessarily.
The SchemaConformingTransformer is a RecordTransformer that can transform records with dynamic schemas such that they can be ingested in a table with a static schema. The transformer primarily takes record fields that don't exist in the schema and stores them in a type of catchall field.
For example, consider this record:
{
"timestamp": 1687786535928,
"hostname": "host1",
"HOSTNAME": "host1",
"level": "INFO",
"message": "Started processing job1",
"tags": {
"platform": "data",
"service": "serializer",
"params": {
"queueLength": 5,
"timeout": 299,
"userData_noIndex": {
"nth": 99
}
}
}
}Let's say the table's schema contains the following fields:
timestamp
hostname
level
message
tags.platform
tags.service
indexableExtras
unindexableExtras
Without this transformer, the HOSTNAME field and the entire tags field would be dropped when storing the record in the table. However, with this transformer, the record would be transformed into the following:
{
"timestamp": 1687786535928,
"hostname": "host1",
"level": "INFO",
"message": "Started processing job1",
"tags.platform": "data",
"tags.service": "serializer",
"indexableExtras": {
"tags": {
"params": {
"queueLength": 5,
"timeout": 299
}
}
},
"unindexableExtras": {
"tags": {
"userData_noIndex": {
"nth": 99
}
}
}
}Notice that the transformer does the following:
Flattens nested fields which exist in the schema, like
tags.platformDrops some fields like
HOSTNAME, whereHOSTNAMEmust be listed as a field in the config optionfieldPathsToDropMoves fields that don't exist in the schema and have the suffix
_noIndexinto theunindexableExtrasfieldMoves any remaining fields that don't exist in the schema into the
indexableExtrasfield
The unindexableExtras field allows the transformer to separate fields that don't need indexing (because they are only retrieved, not searched) from those that do.
SchemaConformingTransformer Configuration
To use the transformer, add the schemaConformingTransformerConfig option in the ingestionConfig section of your table configuration, as shown in the following example.
For example:
{
"ingestionConfig": {
"schemaConformingTransformerConfig": {
"indexableExtrasField": "extras",
"unindexableExtrasField": "extrasNoIndex",
"unindexableFieldSuffix": "_no_index",
"fieldPathsToDrop": [
"HOSTNAME"
]
}
}
}Available configuration options are listed in SchemaConformingTransformerConfig.
Was this helpful?

