Apache Hive
The Apache Hive connector allows Trino to connect to a Hive metastore and query data stored in Apache Hadoop or S3 compatible objects storage.
Example Hive catalog configuration
apiVersion: trino.stackable.tech/v1alpha1 kind: TrinoCatalog metadata: # The name of the catalog as it will appear in Trino name: hive-catalog # TrinoCluster can use these labels to select which catalogs to include labels: trino: simple-trino spec: connector: # Specify hive here when defining a hive catalog hive: metastore: configMap: simple-hive s3: inline: host: test-minio port: 9000 accessStyle: Path credentials: secretClass: minio-credentials # We can use configOverrides to add arbitrary properties to the Trino catalog configuration configOverrides: hive.metastore.username: trino
Connect to S3 store
The hive connector can connect to an S3 store as follows:
spec: connector: hive: s3: inline: host: test-minio port: 9000 accessStyle: Path credentials: secretClass: minio-credentials # OR s3: reference: my-minio
See S3 resources for details about S3 connections.
Please make sure that the underlying Hive metastore also has access to the S3 store, because it will e.g. check if the directory exists when creating tables. |
Connect to HDFS cluster
The hive connector can connect to an HDFS operated by Stackable as follows:
spec: connector: hive: hdfs: configMap: simple-hdfs
Please make sure that the underlying Hive metastore also has access to the HDFS, because it will e.g. check if the directory exists when creating tables. |
Adding unmanaged Hive clusters
You can add connect Trino to Hive catalogs from systems that are not managed by Stackable, including Hive running on existing Hadoop clusters. Unmanaged Hive instances can be defined by creating a configMap containing the configuration for the remote Hive Metastore and HDFS or S3 storage services.
Create a Hive Metastore configMap
The Hive metastore ConfigMap contains the URL for the metastore’s thrift endpoint.
apiVersion: v1 kind: ConfigMap metadata: name: cloudera-hive data: HIVE: thrift://10.132.0.59:9083
Create a HDFS configMap
When the Hive data is stored on HDFS you will need to provide a configMap containing the HDFS configuration. To do this take the core-site.xml
and hdfs-site.xml
from your Hadoop cluster and create a configMap with the keys core-site.xml
and hdfs-site.xml
.
apiVersion: v1 kind: ConfigMap metadata: name: cloudera-hdfs data: core-site.xml: |- <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://my.hadoop.cluster:8020</value> </property> <!-- truncated for brevity --> </configuration> hdfs-site.xml: |- <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>dfs.namenode.servicerpc-address</name> <value>my.hadoop.cluster:8022</value> </property> <!-- truncated for brevity --> </configuration>
Create the Trino Hive catalog
To use the unmanaged Hive metastore we define a TrinoCatalog
object in the same way we would for a managed cluster, referencing the custom configMap we created for Hive and HDFS.
apiVersion: trino.stackable.tech/v1alpha1 kind: TrinoCatalog metadata: name: clouderahive labels: trino: simple-trino spec: connector: hive: metastore: configMap: cloudera-hive hdfs: configMap: cloudera-hdfs