Apache Pinot Docs
Search…
Google Cloud Storage
This guide shows you how to import data from GCP (Google Cloud Platform).
You can enable the Google Cloud Storage using the plugin pinot-gcs. In the controller or server, add the config -
1
-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-gcs
Copied!
By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include, you need to put all the plugins you want to use, e.g. pinot-json, pinot-avro , pinot-kafka-2.0...
GCP filesystems provides the following options -
  • projectId - The name of the Google Cloud Platform project under which you have created your storage bucket.
  • gcpKey - Location of the json file containing GCP keys. You can refer Creating and managing service account keys to download the keys.
Each of these properties should be prefixed by pinot.[node].storage.factory.class.gs. where node is either controller or server depending on the config
e.g.
1
pinot.controller.storage.factory.class.gs.projectId=test-project
Copied!

Examples

Job spec

1
executionFrameworkSpec:
2
name: 'standalone'
3
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
4
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
5
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
6
jobType: SegmentCreationAndTarPush
7
inputDirURI: 'gs://my-bucket/path/to/input/directory/'
8
outputDirURI: 'gs://my-bucket/path/to/output/directory/'
9
overwriteOutput: true
10
pinotFSSpecs:
11
- scheme: gs
12
className: org.apache.pinot.plugin.filesystem.GcsPinotFS
13
configs:
14
projectId: 'my-project'
15
gcpKey: 'path-to-gcp json key file'
16
recordReaderSpec:
17
dataFormat: 'csv'
18
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
19
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
20
tableSpec:
21
tableName: 'students'
22
pinotClusterSpecs:
23
- controllerURI: 'http://localhost:9000'
Copied!

Controller config

1
controller.data.dir=gs://path/to/data/directory/
2
controller.local.temp.dir=/path/to/local/temp/directory
3
controller.enable.split.commit=true
4
pinot.controller.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
5
pinot.controller.storage.factory.gs.projectId=my-project
6
pinot.controller.storage.factory.gs.gcpKey=path/to/gcp/key.json
7
pinot.controller.segment.fetcher.protocols=file,http,gs
8
pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Server config

1
pinot.server.instance.enable.split.commit=true
2
pinot.server.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
3
pinot.server.storage.factory.gs.projectId=my-project
4
pinot.server.storage.factory.gs.gcpKey=path/to/gcp/key.json
5
pinot.server.segment.fetcher.protocols=file,http,gs
6
pinot.server.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!

Minion config

1
pinot.minion.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
2
pinot.minion.storage.factory.gs.projectId=my-project
3
pinot.minion.storage.factory.gs.gcpKey=path/to/gcp/key.json
4
pinot.minion.segment.fetcher.protocols=file,http,gs
5
pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Copied!
Last modified 1mo ago
Copy link
Contents
Examples