Google Cloud Storage This guide shows you how to import data from GCP (Google Cloud Platform).
You can enable the Google Cloud Storage using the plugin pinot-gcs
. In the controller or server, add the config -
Copy -Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-gcs
By default Pinot loads all the plugins, so you can just drop this plugin there. Also, if you specify -Dplugins.include
, you need to put all the plugins you want to use, e.g. pinot-json
, pinot-avro
, pinot-kafka-2.0...
GCP filesystems provides the following options -
projectId
- The name of the Google Cloud Platform project under which you have created your storage bucket.
Each of these properties should be prefixed by pinot.[node].storage.factory.class.gs.
where node
is either controller
or server
depending on the config
e.g.
Copy pinot.controller.storage.factory.class.gs.projectId=test-project
Examples
Job spec
Copy executionFrameworkSpec :
name : 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName : 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName : 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType : SegmentCreationAndTarPush
inputDirURI : 'gs://my-bucket/path/to/input/directory/'
outputDirURI : 'gs://my-bucket/path/to/output/directory/'
overwriteOutput : true
pinotFSSpecs :
- scheme : gs
className : org.apache.pinot.plugin.filesystem.GcsPinotFS
configs :
projectId : 'my-project'
gcpKey : 'path-to-gcp json key file'
recordReaderSpec :
dataFormat : 'csv'
className : 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName : 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec :
tableName : 'students'
pinotClusterSpecs :
- controllerURI : 'http://localhost:9000'
Controller config
Copy controller.data.dir=gs://path/to/data/directory/
controller.local.temp.dir=/path/to/local/temp/directory
controller.enable.split.commit=true
pinot.controller.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.controller.storage.factory.gs.projectId=my-project
pinot.controller.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.controller.segment.fetcher.protocols=file,http,gs
pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Server config
Copy pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.server.storage.factory.gs.projectId=my-project
pinot.server.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.server.segment.fetcher.protocols=file,http,gs
pinot.server.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
Minion config
Copy pinot.minion.storage.factory.class.gs=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.minion.storage.factory.gs.projectId=my-project
pinot.minion.storage.factory.gs.gcpKey=path/to/gcp/key.json
pinot.minion.segment.fetcher.protocols=file,http,gs
pinot.minion.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher