Google Cloud Storage
This guide shows you how to import data from GCP (Google Cloud Platform).
You can enable the Google Cloud Storage using the plugin pinot-gcs. In the controller or server, add the config -
1
-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-gcs
Copied!
By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...
GCP filesystems provides the following options -
    projectId - The name of the Google Cloud Platform project under which you have created your storage bucket.
    gcpKey - Location of the json file containing GCP keys. You can refer Creating and managing service account keys to download the keys.
Each of these properties should be prefixed by pinot.[node].storage.factory.class.gs. where node is either controller or server depending on the config
e.g.
1
pinot.controller.storage.factory.class.gs.projectId=test-project
Copied!

Examples

Job spec

1
executionFrameworkSpec:
2
name: 'standalone'
3
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
4
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
5
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
6
jobType: SegmentCreationAndTarPush
7
inputDirURI: 'gs://my-bucket/path/to/input/directory/'
8
outputDirURI: 'gs://my-bucket/path/to/output/directory/'
9
overwriteOutput: true
10
pinotFSSpecs:
11
- scheme: gs
12
className: org.apache.pinot.plugin.filesystem.GcsPinotFS
13
configs:
14
projectId: 'my-project'
15
gcpKey: 'path-to-gcp json key file'
16
recordReaderSpec:
17
dataFormat: 'csv'
18
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
19
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
20
tableSpec:
21
tableName: 'students'
22
pinotClusterSpecs:
23
- controllerURI: 'http://localhost:9000'
Copied!

Controller config

1
controller.data.dir=gs://path/to/data/directory/
2
controller.local.temp.dir=/path/to/local/temp/directory
3
controller.enable.split.commit=true
4
pinot.controller.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
5
pinot.controller.storage.factory.gs.projectId=my-project
6
pinot.controller.storage.factory.gs.gcpKey=path/to/gcp/key.json
7
pinot.controller.segment.fetcher.protocols=file,http,gs
8
pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Server config

1
pinot.server.instance.enable.split.commit=true
2
pinot.server.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
3
pinot.server.storage.factory.gs.projectId=my-project
4
pinot.server.storage.factory.gs.gcpKey=path/to/gcp/key.json
5
pinot.server.segment.fetcher.protocols=file,http,gs
6
pinot.server.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Last modified 6mo ago
Copy link
Contents
Examples