File Systems
This section contains a collection of short guides to show you how to import from a Pinot supported file system.
FileSystem is an abstraction provided by Pinot to access data in distributed file systems (DFS).
Pinot uses distributed file systems for the following purposes:
Batch Ingestion Job - To read the input data (CSV, Avro, Thrift, etc.) and to write generated segments to DFS
Controller - When a segment is uploaded to the controller, the controller saves it in the DFS configured.
Server - When a server(s) is notified of a new segment, the server copies the segment from remote DFS to their local node using the DFS abstraction.
Supported File Systems
Pinot lets you choose a distributed file system provider. The following file systems are supported by Pinot:
Enabling a File System
To use a distributed file system, you need to enable plugins. To do that, specify the plugin directory and include the required plugins -
Now, You can proceed to change the filesystem in the controller
and server
config as shown below:
scheme
refers to the prefix used in the URI of the filesystem. e.g. for the URI s3://bucket/path/to/file
, the scheme is s3
You can also change the filesystem during ingestion. In the ingestion job spec, specify the filesystem with the following config:
Last updated