Apache Hive

The Apache Hive connector allows Trino to connect to a Hive metastore and query data stored in Apache Hadoop or S3 compatible objects storage.

Example Hive catalog configuration

apiVersion: trino.stackable.tech/v1alpha1 kind: TrinoCatalog metadata: # The name of the catalog as it will appear in Trino name: hive-catalog # TrinoCluster can use these labels to select which catalogs to include labels: trino: simple-trino spec: connector: # Specify hive here when defining a hive catalog hive: metastore: configMap: simple-hive s3: inline: host: test-minio port: 9000 accessStyle: Path credentials: secretClass: minio-credentials # We can use configOverrides to add arbitrary properties to the Trino catalog configuration configOverrides: hive.metastore.username: trino

Connect to S3 store

The hive connector can connect to an S3 store as follows:

spec: connector: hive: s3: inline: host: test-minio port: 9000 accessStyle: Path credentials: secretClass: minio-credentials # OR s3: reference: my-minio

See S3 resources for details about S3 connections.

Please make sure that the underlying Hive metastore also has access to the S3 store, because it will e.g. check if the directory exists when creating tables.

Connect to HDFS cluster

The hive connector can connect to an HDFS operated by Stackable as follows:

spec: connector: hive: hdfs: configMap: simple-hdfs
Please make sure that the underlying Hive metastore also has access to the HDFS, because it will e.g. check if the directory exists when creating tables.

Adding unmanaged Hive clusters

You can add connect Trino to Hive catalogs from systems that are not managed by Stackable, including Hive running on existing Hadoop clusters. Unmanaged Hive instances can be defined by creating a configMap containing the configuration for the remote Hive Metastore and HDFS or S3 storage services.

Create a Hive Metastore configMap

The Hive metastore ConfigMap contains the URL for the metastore’s thrift endpoint.

apiVersion: v1 kind: ConfigMap metadata: name: cloudera-hive data: HIVE: thrift://10.132.0.59:9083

Create a HDFS configMap

When the Hive data is stored on HDFS you will need to provide a configMap containing the HDFS configuration. To do this take the core-site.xml and hdfs-site.xml from your Hadoop cluster and create a configMap with the keys core-site.xml and hdfs-site.xml.

apiVersion: v1 kind: ConfigMap metadata: name: cloudera-hdfs data: core-site.xml: |- <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://my.hadoop.cluster:8020</value> </property> <!-- truncated for brevity --> </configuration> hdfs-site.xml: |- <?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>dfs.namenode.servicerpc-address</name> <value>my.hadoop.cluster:8022</value> </property> <!-- truncated for brevity --> </configuration>

Create the Trino Hive catalog

To use the unmanaged Hive metastore we define a TrinoCatalog object in the same way we would for a managed cluster, referencing the custom configMap we created for Hive and HDFS.

apiVersion: trino.stackable.tech/v1alpha1 kind: TrinoCatalog metadata: name: clouderahive labels: trino: simple-trino spec: connector: hive: metastore: configMap: cloudera-hive hdfs: configMap: cloudera-hdfs