Below commands are based on pinot distribution binary.
In order to setup Pinot to use S3 as deep store, we need to put extra configs for Controller and Server.
Below is a sample controller.conf
file.
Please config:
controller.data.dir
to your s3 bucket. All the uploaded segments will be stored there.
And add s3 as a pinot storage with configs:
Regarding AWS Credential, we also follow the convention of DefaultAWSCredentialsProviderChain.
You can specify AccessKey and Secret using:
Environment Variables - AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
(RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY
and AWS_SECRET_KEY
(only recognized by Java SDK)
Java System Properties - aws.accessKeyId
and aws.secretKey
Credential profiles file at the default location (~/.aws/credentials
) shared by all AWS SDKs and the AWS CLI
Configure AWS credential in pinot config files, e.g. set pinot.controller.storage.factory.s3.accessKey
and pinot.controller.storage.factory.s3.secretKey
in the config file. (Not recommended)
Add s3
to pinot.controller.segment.fetcher.protocols
and set pinot.controller.segment.fetcher.s3.class
toorg.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
If you to grant full control to bucket owner, then add this to the config:
Then start pinot controller with:
Broker is a simple one you can just start it with default:
Below is a sample server.conf
file
Similar to controller config, please also set s3 configs in pinot server.
If you to grant full control to bucket owner, then add this to the config:
Then start pinot controller with:
In this demo, we just use airlineStats
table as an example.
Create table with below command:
Below is a sample standalone ingestion job spec with certain notable changes:
jobType is SegmentCreationAndUriPush
inputDirURI is set to a s3 location s3://my.bucket/batch/airlineStats/rawdata/
outputDirURI is set to a s3 location s3://my.bucket/output/airlineStats/segments
Add a new PinotFs under pinotFSSpecs
For library version < 0.6.0, please set segmentUriPrefix
to [scheme]://[bucket.name]
, e.g. s3://my.bucket
, from version 0.6.0, you can put empty string or just ignore segmentUriPrefix
.
Sample ingestionJobSpec.yaml
Below is a sample job output:
Please follow this page to setup a local spark cluster.
Below is a sample Spark Ingestion job
Submit spark job with the ingestion job:
Below is the sample snapshot of s3 location for controller:
Below is a sample download URI in PropertyStore, we expect the segment download uri is started with s3://