All Products
Search
Document Center

Data Lake Formation:Upgrade the EMR-HOOK component in an EMR gateway

Last Updated:Dec 26, 2024

This topic describes how to upgrade the EMR-HOOK component in an E-MapReduce (EMR) gateway.

Background information

Scenarios: For computing operations in an EMR gateway, if you use EMR-CLI to deploy a gateway and use access frequency metrics that are displayed on the Data Overview tab and data access frequency rules on the Lifecycle Management page in the Data Lake Formation (DLF) console, you need to manually upgrade the EMR-HOOK component.

Note

During the upgrade process, the running computation tasks are not affected. After the upgrade, the tasks will automatically take effect.

Limits

Upgrade procedure

EMR-5.10.1, EMR-3.44.1, and versions later than EMR-3.44.1

Step 1: Upgrade the JAR package

Log on to the gateway by using SSH and execute the following script. You must have root permissions. Note that you must replace ${region} with the current region ID, such as cn-hangzhou.

sudo mkdir -p /opt/apps/EMRHOOK/upgrade/ sudo wget https://dlf-repo-${region}.oss-${region}-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/ sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/

Step 2: Modify Hive configurations

Important

${hive-jar} varies depending on the Spark version. For Hive 2, set ${hive-jar} to hive-hook-hive23.jar. For Hive 3, set ${hive-jar} to hive-hook-hive31.jar.

Configuration file

Configuration item

Configuration value

hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml)

hive.aux.jars.path

Append ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}.

Note

The separator is a comma (,).

hive.exec.post.hooks

Add com.aliyun.emr.meta.hive.hook.LineageLoggerHook.

hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh)

hive_aux_jars_path

Append ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}.

Note

The separator is a comma (,).

Step 3: Modify Spark configurations

Important

${spark-jar} varies depending on the Spark version. For Spark 2, set ${spark-jar}to spark-hook-spark24.jar. For Spark 3, set ${spark-jar} to spark-hook-spark30.jar.

Configuration file

Configuration item

Configuration value

spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf)

spark.driver.extraClassPath

Append :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the separator is a colon (:).

Note

The separator is a colon (:).

spark.executor.extraClassPath

Append :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the separator is a colon (:).

Note

The separator is a colon (:)

spark.sql.queryExecutionListeners

Add com.aliyun.emr.meta.spark.listener.EMRQueryLogger.

EMR-5.10.1 and versions earlier than EMR-3.44.1

Step 1: Upgrade the JAR package

  1. Log on to the gateway by using SSH and execute the following script. You must have root permissions. You must replace ${region} with the current region ID, such as cn-hangzhou.

    The following script is used to download and decompress the latest EMR-HOOK JAR package. After the EMR-HOOK JAR package is decompressed, proceed with subsequent steps to upgrade the decompressed JAR packages.

    sudo mkdir -p /opt/apps/EMRHOOK/upgrade/ sudo wget https://dlf-repo-${region}.oss-${region}-internal.aliyuncs.com/emrhook/latest/emrhook.tar.gz -P /opt/apps/EMRHOOK/upgrade sudo tar -p -zxf /opt/apps/EMRHOOK/upgrade/emrhook.tar.gz -C /opt/apps/EMRHOOK/upgrade/
  2. Rename the decompressed JAR packages. The minor version of the EMR-HOOK component varies based on EMR versions. Therefore, before you replace the original JAR packages, you must manually rename the JAR package to the current EMR-HOOK version before you perform the copy and replace operation.

    For example, in EMR-3.43.1, the minor version of the EMR-HOOK component is 1.1.4, and the naming convention of JAR packages is hive-hook-${version}-hive20.jar. In this case, you need to replace ${version} with the version of the current EMR-HOOK component.

    cd /opt/apps/EMRHOOK/upgrade/emrhook mv hive-hook-hive20.jar hive-hook-1.1.4-hive20.jar mv hive-hook-hive23.jar hive-hook-1.1.4-hive23.jar mv hive-hook-hive31.jar hive-hook-1.1.4-hive31.jar mv spark-hook-spark24.jar spark-hook-1.1.4-spark24.jar mv spark-hook-spark30.jar spark-hook-1.1.4-spark30.jar

    image

  3. After the modification, run the following command:

    sudo cp -p /opt/apps/EMRHOOK/upgrade/emrhook/* /opt/apps/EMRHOOK/emrhook-current/

Step 2: Modify Hive configurations

Important

${hive-jar} varies depending on the Spark version. For Hive 2, set ${hive-jar} to hive-hook-${emrhook-version}-hive23.jar. For Hive 3, set ${hive-jar} to hive-hook-${emrhook-version}-hive31.jar. Set ${emrhook-version} to the version of the EMR-HOOK component, such as hive-hook-1.1.4-hive23.jar.

Configuration file

Configuration item

Configuration value

hive-site.xml (/etc/taihao-apps/hive-conf/hive-site.xml)

hive.aux.jars.path

Append ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}.

Note

The separator is a comma (,).

hive.exec.post.hooks

Add com.aliyun.emr.meta.hive.hook.LineageLoggerHook.

hive-env.sh (/etc/taihao-apps/hive-conf/hive-env.sh)

hive_aux_jars_path

Append ,/opt/apps/EMRHOOK/emrhook-current/${hive-jar}.

Note

The separator is a comma (,).

Step 3: Modify Spark configurations

Important

${spark-jar} varies depending on the Spark version. For Spark 2, set ${spark-jar} to spark-hook-${emrhook-version}-spark24.jar. For Spark 3, set ${spark-jar} to spark-hook-${emrhook-version}-spark30.jar. Set ${emrhook-version} to the version of the EMR-HOOK component, such as spark-hook-1.1.4-spark24.jar.

Configuration file

Configuration item

Configuration value

spark-defaults.conf (/etc/taihao-apps/spark-conf/spark-defaults.conf)

spark.driver.extraClassPath

Append :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the separator is a colon (:).

Note

The separator is a colon (:).

spark.executor.extraClassPath

Append :/opt/apps/EMRHOOK/emrhook-current/${spark-jar}. Note that the separator is a colon (:).

Note

The separator is a colon (:)

spark.sql.queryExecutionListeners

Add com.aliyun.emr.meta.spark.listener.EMRQueryLogger.