Azure Data Lake Storage
This guide shows you how to import data from files stored in Azure Data Lake Storage (ADLS)
You can enable the Azure Data Lake Storage using the plugin pinot-adls. In the controller or server, add the config -
1
-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-adls
Copied!
By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...
Azure Blob Storage provides the following options -
    accountName : Name of the azure account under which the storage is created
    accessKey : access key required for the authentication
    fileSystemName - name of the filesystem to use i.e. container name (container name is similar to bucket name in S3)
    enableChecksum - enable MD5 checksum for verification. Default is false.
Each of these properties should be prefixed by pinot.[node].storage.factory.class.abfss. where node is either controller or server depending on the config
e.g.
1
pinot.controller.storage.factory.class.adl.accountName=test-user
Copied!

Examples

Job spec

1
executionFrameworkSpec:
2
name: 'standalone'
3
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
4
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
5
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
6
jobType: SegmentCreationAndTarPush
7
inputDirURI: 'adl://path/to/input/directory/'
8
outputDirURI: 'adl://path/to/output/directory/'
9
overwriteOutput: true
10
pinotFSSpecs:
11
- scheme: adl
12
className: org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
13
configs:
14
accountName: 'my-account'
15
accessKey: 'foo-bar-1234'
16
fileSystemName: 'fs-name'
17
recordReaderSpec:
18
dataFormat: 'csv'
19
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
20
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
21
tableSpec:
22
tableName: 'students'
23
pinotClusterSpecs:
24
- controllerURI: 'http://localhost:9000'
Copied!

Controller config

1
controller.data.dir=adl://path/to/data/directory/
2
controller.local.temp.dir=/path/to/local/temp/directory
3
controller.enable.split.commit=true
4
pinot.controller.storage.factory.class.adl=org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
5
pinot.controller.storage.factory.adl.accountName=my-account
6
pinot.controller.storage.factory.adl.accessKey=foo-bar-1234
7
pinot.controller.storage.factory.adl.fileSystemName=fs-name
8
pinot.controller.segment.fetcher.protocols=file,http,adl
9
pinot.controller.segment.fetcher.adl.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Server config

1
pinot.server.instance.enable.split.commit=true
2
pinot.server.storage.factory.class.adl=org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS
3
pinot.server.storage.factory.adl.accountName=my-account
4
pinot.server.storage.factory.adl.accessKey=foo-bar-1234
5
pinot.controller.storage.factory.adl.fileSystemName=fs-name
6
pinot.server.segment.fetcher.protocols=file,http,adl
7
pinot.server.segment.fetcher.adl.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Last modified 2mo ago
Copy link
Contents
Examples