HDFS as Deep Storage
This guide shows how to set up HDFS as deep storage for a Pinot segment.
To use HDFS as deep storage you need to include HDFS dependency jars and plugins.
Server Setup
Configuration
pinot.server.instance.enable.split.commit=true
pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.hdfs.hadoop.conf.path=/path/to/hadoop/conf/directory/
# For server, instructing the HadoopPinotFS plugin to use the specified keytab and principal when accessing HDFS paths
pinot.server.storage.factory.hdfs.hadoop.kerberos.principle=<hdfs-principle>
pinot.server.storage.factory.hdfs.hadoop.kerberos.keytab=<hdfs-keytab>
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.principle=<your kerberos principal>
pinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab=<your kerberos keytab>
pinot.set.instance.id.to.hostname=true
pinot.server.instance.dataDir=/path/in/local/filesystem/for/pinot/data/server/index
pinot.server.instance.segmentTarDir=/path/in/local/filesystem/for/pinot/data/server/segment
pinot.server.grpc.enable=true
pinot.server.grpc.port=8090Executable
Controller Setup
Configuration
Executable
Broker Setup
Configuration
Executable
Kerberos Authentication
When using HDFS with Kerberos security enabled, Pinot provides two ways to authenticate:
1. Automatic Authentication (Recommended)
By configuring the storage.factory Kerberos properties shown above, Pinot will automatically handle Kerberos authentication using the specified keytab and principal. This eliminates the need for manual kinit commands and ensures continuous authentication even after ticket expiration.
Why These Properties Are Required
The storage.factory Kerberos properties serve a critical purpose in Pinot's HDFS integration:
For Controller:
The controller uses
controller.data.dirto store segment metadata and other data in HDFSWhen
controller.data.dirpoints to an HDFS path (e.g.,hdfs://namenode:8020/pinot/data), the HadoopPinotFS plugin needs Kerberos credentials to access itWithout
storage.factoryKerberos properties, the controller would fail to read/write to HDFS, causing segment upload and metadata operations to failThese properties enable the HadoopPinotFS plugin to programmatically authenticate using the keytab file
For Server:
The server uses HadoopPinotFS for various HDFS operations including segment downloads and deep storage access
When servers need to access segments stored in HDFS deep storage, they require valid Kerberos credentials
The
storage.factoryproperties provide persistent authentication that survives across server restarts and ticket expirations
Understanding the Two Sets of Kerberos Properties
You may notice two sets of Kerberos properties in the configuration:
storage.factoryproperties (NEW - Recommended):pinot.controller.storage.factory.hdfs.hadoop.kerberos.principalpinot.controller.storage.factory.hdfs.hadoop.kerberos.keytabpinot.server.storage.factory.hdfs.hadoop.kerberos.principalpinot.server.storage.factory.hdfs.hadoop.kerberos.keytab
Purpose: These properties configure Kerberos authentication for the HadoopPinotFS storage factory, which handles:
Controller's deep storage operations (reading/writing to
controller.data.dir)Server's deep storage operations
General HDFS filesystem operations through the storage factory
Why needed: The storage factory is initialized at startup and used throughout the component's lifecycle for HDFS access. Without these properties, any HDFS operation through the storage factory would fail with authentication errors.
segment.fetcherproperties (Legacy - For backward compatibility):pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.principle(note: typo "principle" instead of "principal" maintained for compatibility)pinot.controller.segment.fetcher.hdfs.hadoop.kerberos.keytabpinot.server.segment.fetcher.hdfs.hadoop.kerberos.principlepinot.server.segment.fetcher.hdfs.hadoop.kerberos.keytab
Purpose: These configure Kerberos for the segment fetcher component specifically
Why both are needed: While there is some functional overlap, having both ensures:
Complete coverage of all HDFS access patterns
Backward compatibility with existing deployments
Segment fetcher operations work independently of storage factory
Benefits of Automatic Authentication
No Manual Intervention:
Eliminates the need to run
kinitcommands manuallyReduces operational overhead and human error
Enables fully automated deployments
Automatic Ticket Renewal:
Kerberos tickets typically expire after 24 hours (configurable)
Manual
kinitrequires re-authentication before expirationWith keytab-based authentication, Pinot automatically renews tickets internally
Prevents service disruptions due to expired tickets
Production Reliability:
Manual authentication is unsuitable for production as it requires:
Someone to monitor ticket expiration times
Manual intervention during off-hours if tickets expire
Service restarts or re-authentication during critical operations
Automatic authentication runs 24/7 without human intervention
Security Best Practices:
Keytab files provide secure, long-term credentials
No need to store passwords in scripts or configuration
Keytabs can be managed through enterprise key management systems
Follows Hadoop's recommended security practices
2. Manual Authentication (Legacy)
Alternatively, you can manually authenticate using kinit before starting Pinot components:
Limitations of Manual Authentication:
Ticket Expiration: Kerberos tickets typically expire after 24 hours, requiring re-authentication
Service Interruption: If tickets expire while Pinot is running, HDFS operations will fail until re-authentication
Operational Burden: Requires monitoring and manual intervention, especially problematic for 24/7 production systems
Automation Challenges: Difficult to integrate into automated deployment pipelines
Not Recommended: This approach is only suitable for development/testing environments
Note: Manual authentication is not recommended for production environments. Always use the storage.factory Kerberos properties for production deployments.
Troubleshooting
HDFS FileSystem Issues
If you receive an error that says No FileSystem for scheme"hdfs", the problem is likely to be a class loading issue.
To fix, try adding the following property to core-site.xml:
fs.hdfs.impl org.apache.hadoop.hdfs.DistributedFileSystem
And then export /opt/pinot/lib/hadoop-common-<release-version>.jar in the classpath.
Kerberos Authentication Issues
Error: "Failed to authenticate with Kerberos"
Possible Causes:
Incorrect keytab path: Ensure the keytab file path is absolute and accessible by the Pinot process
Wrong principal name: Verify the principal name matches the one in the keytab file
Keytab file permissions: The keytab file must be readable by the user running Pinot (typically
chmod 400orchmod 600)
Solution:
Error: "GSSException: No valid credentials provided"
Cause: This typically occurs when:
The
storage.factoryKerberos properties are not setThe keytab file path is incorrect or the file doesn't exist
The Kerberos configuration (
krb5.conf) is not properly configured
Solution:
Verify all
storage.factoryKerberos properties are correctly set in the configurationEnsure the keytab file exists and has correct permissions
Check that
/etc/krb5.conf(or$JAVA_HOME/jre/lib/security/krb5.conf) is properly configured with your Kerberos realm settings
Error: "Unable to obtain Kerberos password" or "Clock skew too great"
Cause: Time synchronization issue between Pinot server and Kerberos KDC
Solution:
Kerberos requires clock synchronization within 5 minutes (default) between client and KDC.
Error: "HDFS operation fails after running for several hours"
Cause: This typically indicates that:
Manual
kinitwas used instead ofstorage.factorypropertiesKerberos tickets have expired (default 24 hours)
Solution:
Configure
storage.factoryKerberos properties to enable automatic ticket renewalRemove any manual
kinitfrom startup scriptsRestart Pinot components to apply the configuration
Verifying Kerberos Configuration
To verify your Kerberos setup is working correctly:
Best Practices
Use absolute paths for keytab files in configuration
Secure keytab files with appropriate permissions (400 or 600)
Use service principals (e.g.,
pinot/hostname@REALM) rather than user principals for productionMonitor Kerberos ticket expiration in logs to ensure automatic renewal is working
Keep keytab files backed up in secure locations
Test configuration in a non-production environment first
Last updated
Was this helpful?

