Amazon S3
You can enable Amazon S3 Filesystem backend by including the plugin pinot-s3 .
1
-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-s3
Copied!
By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...
You can also configure the S3 filesystem using the following options:
Configuration
Description
region
The AWS Data center region in which the bucket is located
accessKey
(Optional) AWS access key required for authentication. This should only be used for testing purposes as we don't store these keys in secret.
secretKey
(Optional) AWS secret key required for authentication. This should only be used for testing purposes as we don't store these keys in secret.
endpoint
(Optional) Override endpoint for s3 client.
disableAcl
If this is set tofalse, bucket owner is granted full access to the objects created by pinot. Default value is true.
serverSideEncryption
(Optional) The server-side encryption algorithm used when storing this object in Amazon S3 (Now supports aws:kms), set to null to disable SSE.
ssekmsKeyId
(Optional, but required when serverSideEncryption=aws:kms) Specifies the AWS KMS key ID to use for object encryption. All GET and PUT requests for an object protected by AWS KMS will fail if not made via SSL or using SigV4.
ssekmsEncryptionContext
(Optional) Specifies the AWS KMS Encryption Context to use for object encryption. The value of this header is a base64-encoded UTF-8 string holding JSON with the encryption context key-value pairs.
Each of these properties should be prefixed by pinot.[node].storage.factory.s3. where node is either controller or server depending on the config
e.g.
1
pinot.controller.storage.factory.s3.region=ap-southeast-1
Copied!
S3 Filesystem supports authentication using the DefaultCredentialsProviderChain. The credential provider looks for the credentials in the following order -
    Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
    Java System Properties - aws.accessKeyId and aws.secretKey
    Web Identity Token credentials from the environment or container
    Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
    Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable is set and security manager has permission to access the variable,
    Instance profile credentials delivered through the Amazon EC2 metadata service
You can also specify the accessKey and secretKey using the properties. However, this method is not secure and should be used only for POC setups.

Examples

Job spec

1
executionFrameworkSpec:
2
name: 'standalone'
3
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
4
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
5
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
6
jobType: SegmentCreationAndTarPush
7
inputDirURI: 's3://pinot-bucket/pinot-ingestion/batch-input/'
8
outputDirURI: 's3://pinot-bucket/pinot-ingestion/batch-output/'
9
overwriteOutput: true
10
pinotFSSpecs:
11
- scheme: s3
12
className: org.apache.pinot.plugin.filesystem.S3PinotFS
13
configs:
14
region: 'ap-southeast-1'
15
recordReaderSpec:
16
dataFormat: 'csv'
17
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
18
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
19
tableSpec:
20
tableName: 'students'
21
pinotClusterSpecs:
22
- controllerURI: 'http://localhost:9000'
Copied!

Controller config

1
controller.data.dir=s3://path/to/data/directory/
2
controller.local.temp.dir=/path/to/local/temp/directory
3
controller.enable.split.commit=true
4
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
5
pinot.controller.storage.factory.s3.region=ap-southeast-1
6
pinot.controller.segment.fetcher.protocols=file,http,s3
7
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Server config

1
pinot.server.instance.enable.split.commit=true
2
pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
3
pinot.server.storage.factory.s3.region=ap-southeast-1
4
pinot.server.segment.fetcher.protocols=file,http,s3
5
pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!
Last modified 6mo ago