Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
The document discusses advanced security measures for Apache Hadoop clusters, emphasizing the need for wire encryption, RPC encryption, and transparent data encryption (TDE) for HDFS. Key points include the configuration of TLS for communication, implementing proper encryption policies, and managing key providers effectively. It also highlights various challenges and considerations for successfully securing Hadoop environments, while outlining future work directions regarding ongoing security enhancements.
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
1.
The picture can'tbe displayed. Technical tips for secure Apache Hadoop cluster Akira Ajisaka, Kei Kori Yahoo Japan Corporation Big Data
2.
Akira Ajisaka (@ajis_ka) •Software Engineer in Hadoop team @ Yahoo! JAPAN – Upgraded HDFS to 3.3.0 and enabled RBF – R&D for more secure Hadoop cluster than just enabling Kerberos auth • Apache Hadoop committer/PMC – ~800 commits in various components in 6 years – Handled and announced several CVEs – Manages build and QA environment
3.
Kei KORI (@2k0ri) •Data Platform Engineer in Hadoop team @ Yahoo! JAPAN – Built upgrading to and continuous delivery for HDFS 3.3.0 – Research of operation for more secure Hadoop cluster • Kubernetes admin for Hadoop client environment – Migrates users from VM/BM to cloud native way – Integrates ML/DL workloads with Hadoop ecosystem
Session Overview Prerequisites: • Hadoopis not secure by default • Kerberos authentication is required This talk is to introduce further details in practice: • Wire encryption in Hadoop ecosystem • HDFS transparent data encryption at rest • Other considerations
Background For making Hadoopecosystem more secure than perimeter security • Not only authenticate but encrypt communications • Protection and mitigation from internal threats like packet sniffing • Part of security compliance like NIST SP800-171
HTTP encryption forHadoop • dfs.http.policy: HTTPS_ONLY in hdfs-site, yarn.http.policy: HTTPS_ONLY in yarn-site, mapreduce.jobhistory.http.policy: HTTPS_ONLY in mapred-site etc. – Enable TLS on WebUI/REST API endpoints – HTTP_AND_HTTPS while rolling update endpoints • yarn.timeline-service.webapp.https.address in yarn-site, mapreduce.jobhistory.webapp.https.address in mapred-site – Set History/Timeline Server endpoints with HTTPS • Storing certs and passphrases using Hadoop Credential Provider into hadoop.security.credential.provider.path – Separates permissions from configs – Prevents exposure outside of hadoop.security.sensitive-config-keys filtering
10.
RPC encryption forHadoop • hadoop.rpc.protection: privacy in core-site – Encrypts RPC incl. Kerberos authentication on SASL layer – Propagates to hadoop.security.saslproperties.resolver.class, dfs.data.transfer.saslproperties.resolver.class and dfs.data.transfer.protection • hadoop.rpc.protection: privacy,authentication while rolling update whole Hadoop servers/clients – Accepts falling back to non-encrypted RPC
11.
Block data transferencryption for Hadoop • dfs.encrypt.data.transfer: true, dfs.encrypt.data.transfer.cipher.suites: AES/CTR/NoPadding in hdfs-site – Only encrypts payload between HDFS client and DataNodes • Rolling update is not supported within configs – Needs managing list of encrypted nodes or extend/implement own dfs.trustedchannel.resolver.class – Trusted nodes by dfs.trustedchannel.resolver.class are forced to transfer without encryption regardless of its encryption status
12.
Encryption for Spark Inspark-defaults: • HTTP encryption – spark.ssl.sparkHistory.enabled true • Switches protocol on 1 port, does not support HTTP_AND_HTTPS – spark.yarn.historyServer.address https://... • RPC encryption – spark.authenticate: true • Also in yarn-site – spark.authenticate.enableSaslEncryption true – spark.network.sasl.serverAlwaysEncrypt true • After all Spark components recognized enableSaslEncryption • Shuffle encryption – spark.network.crypto.enabled true – spark.io.encryption.enabled true • Encrypts spilled caches and RDDs on local disks
13.
Encryption for Hive •hive.server2.thrift.sasl.qop: auth-conf in hive-site – Encrypts JDBC between client and HiveServer2 binary mode – And Thrift between clients and Hive Metastore • hive.server2.use.SSL: true in hive-site – Only for HS2 http mode – HS2 binary mode cannot enable both TLS and SASL • Encryption for JDBC between HS2/Hive Metastore and remote RDBMS • Shuffle encryption – Tez: tez.runtime.shuffle.ssl.enable: true, tez.runtime.shuffle.keep-alive.enabled: true in tez-site – MapReduce: mapreduce.ssl.enabled: true, mapreduce.shuffle.ssl.enabled: true in mapred-site – Requires server certs for all NodeManagers
14.
Challenges in HTTPencryption: for Application Master / Spark Driver • Server certs for ApplicationMaster / SparkDriver need to be readable by the user who submitted it – ApplicationMaster and SparkDriver run as the user – WebApplicationProxy between ResourceManager and ApplicationMaster relies on this encryption • Applications support TLS and can bundle certs since – Spark 3.0.0: SPARK-24621 – MapReduce 3.3.0: MAPREDUCE-4669 – Tez: not supported yet
15.
Encryption for ZooKeeperserver • Authenticate with SASL, encrypt with TLS – ZooKeeper doen not respect SASL QOP • Requires ZooKeeper 3.5.6 or above for servers/quorums – serverCnxnFactory=org.apache.zookeeper.server.Nett yServerCnxnFactory – sslQuorum=true – ssl.clientAuth=NONE – ssl.quorum.clientAuth=NONE • Needs ZOOKEEPER-4276 to follow Upgrading existing non-TLS cluster with no downtime – Makes ZK can serve only with secureClientPort
16.
Encryption for ZooKeeperclient • Also Requires ZooKeeper 3.5.6 or above for clients -Dzookeeper.client.secure=true -Dzookeeper.clientCnxnSocket= org.apache.zookeeper.ClientCnxnSocketNetty in client JVM args – HADOOP_OPTS environment variable – mapreduce.admin.map.child.java.opts, mapreduce.admin.reduce.child.java.opts in mapred-site for Oozie Coordinator MapReduce jobs • Needs to replace and update ZooKeeper jars in all components which communicate with ZooKeeper – ZKFC, ResourceManager, Hive clients incl. HS2, Oozie and Livy – Apache Curator also be updated to 4.2.0, Netty from 4.0 to 4.1
17.
Enforcing Kerberos AuthN/Zfor ZooKeeper • Requires ZooKeeper 3.6.0 or above for servers – 3.6.0+: zookeeper.sessionRequireClientSASLAuth=true – 3.7.0+: enforce.auth.enabled=true enforce.auth.schemes=sasl • Oozie Hive action will not work with forcing ZK SASL – when acquiring the lock for Hive Metastore – Has no mechanisms to delegate authentication or impersonation for ZooKeeper – Using HiveServer2 / Oozie Hive2 action solve it
Background HDFS blocks arewritten to local filesystem of the DataNodes • the data is not encrypted by default • encryption is required in several use cases Encryption can be done at several layers: • Application: most secure, but hardest to do • Database: most databases have this, but may incur performance penalties • Filesystem: high performance, transparent, but may not be flexible • Disk: only really protects against physical theft HDFS TDE fits between database and filesystem level
KeyProvider: Where KEKis saved Implementations of KeyProvider API • Hadoop KMS: JavaKeyStoreProvider – JCEKS files in Hadoop compatible filesystems (localFS, HDFS, cloud storage) – Not recommended • Apache Ranger KMS: RangerKeyStoreProvider – RDBMS – master key can be stored in Luna HSM (optional) – HSM is required in some use cases • PCI-DSS, FIPS 140-2
22.
Extending KeyProvider APIis not difficult • Mandatory methods for HDFS TDE – getKeyVersion, getCurrentKey, getMetadata • Optional methods (nice to have for operation) – getKeys, getKeysMetadata, getKeyVersions, createKey, deleteKey, rollNewVersion – If not implemented, you need to create/delete/list/roll keys in some way • Use cases: – LinkedIn integrated with its own key management service, LiKMS https://engineering.linkedin.com/blog/2021/the-exabyte-club-- linkedin-s-journey-of-scaling-the-hadoop-distr – Yahoo! JAPAN also integrated with our own credential store by only ~500 LOC (including test code)
23.
KeyProvider is actuallystable, can be used safely • KeyProvider is @Public and @Unstable – @Unstable in Hadoop means "incompatible changes are allowed at any time" • Actually, the API is very stable – No incompatible changes – Ranger uses it since 2015: RANGER-247 • Provided a patch to mark it stable – HADOOP-17544
24.
Hadoop KMS: WhereKEK is cached and performs authorization • KMS interacts with HDFS clients, NameNodes, and KeyProvider • KMS have its own ACLs separated from HDFS ACLs – An attacker cannot decrypt data even if HDFS ACLs are compromised – If 'usera' reads/writes data in the encryption zone with 'keya', the configuration in kms-acls.xml will be: – The configuration is hot-reloaded • For HA and scalability, multiple KMS instances are supported <property> <name>key.acl.keya.DECRYPT_EEK</name> <value>usera</value> </property>
25.
How to deploymultiple KMS instances Two Approaches: 1. Behind a load-balancer or VIP 2. Using LoadBalancingKMSClientProvider – Implicitly used when multiple URIs are specified in hadoop.security.key.provider.path If you have a LB or VIP, use it • No configuration change to scale-out/decommission • LB saves clients' retry cost – LoadBalancingKMSClientProvider first try to connect to a KMS, if fails, then connect to another KMS
26.
How to configuremultiple KMS instances • Delegation Token must be synchronized – Use ZKDelegationTokenSecretManager – Documented an example configuration: HADOOP-17794 • hadoop.security.token.service.use_ip – If true (default), fails to validate SSL certificates in multi- homed environment – Documented: HADOOP-12665
27.
Tuning Hadoop KMS •Documented and discussed in HADOOP-15743 – Reduce SSL session cache size and TTL – Tuning https idle timeout – Increase max file descriptors – etc. • This tuning is effective in HttpFS as well – Both KMS/HttpFS use Jetty via HttpServer2
28.
Recap: HDFS TDE •Careful configuration required – How to save KEK – Running multiple KMS instances – KMS Tuning – Where to create encryption zones – ACLs (including key ACLs and impersonation) • They are not straightforward despite the long time since the feature was developed
Updating SSL certificates •Hadoop >= 3.3.1 allows updating SSL certificates without downtime: HADOOP-16524 – Use hot-reload feature in Jetty – Except DataNode since DN don't rely on Jetty • Useful especially for NameNode because it takes > 30 minutes to restart in large cluster
31.
Other considerations • Itis important to be ready to upgrade at any time – Sometimes CVEs have been published and the vendors warn users to upgrade • Security requirements may increase later, so be prepared for that early • Operational considerations are also necessary – Not only the cluster configuration but also the operations will be change
32.
Conclusion & Futurework We introduced many technical tips for secure Hadoop cluster • However, they might change in the future • Need to catch up with the OSS community Future work • How to enable SSL/TLS in ApplicationMaster & Spark Driver Web UIs • Impersonation does not work correctly in KMSClientProvider: HDFS-13697