View job details and resource usage - MaxCompute - Alibaba Cloud Documentation Center

The Job O&M feature in MaxCompute lets you view historical and running jobs. This helps you understand job details, analyze resource loads during runtime, and perform O&M on jobs.

Features

The Job O&M feature in MaxCompute lets you view and manage historical and running jobs in the current project.

For data developers, the Job O&M feature helps you view job details, promptly detect exceptions and issues, and handle problematic jobs. For example, you can terminate a single job or multiple jobs in a batch.
For administrators, the Job O&M feature helps you view the resource load of a quota group at a specific time, efficiently allocate and manage system resources, and improve job execution efficiency and performance.

In the navigation pane on the left of the MaxCompute console, choose Workspace > Job O&M. On the Job O&M page, you can configure filter conditions to find specific jobs. This lets you view job details and analyze jobs. The following features are available:

Operations

Filter jobs

You can filter jobs based on parameters to find the jobs that you want to view. The following table describes the filter parameters.

Filter parameters (click to expand)

Parameter	Description
Time Range	Filters jobs by time range (start and end time). This parameter is required. Note This is a global filter condition that affects the job statistics chart and the job list. The specified time has two meanings: Filters jobs that were completed within this time range. Filters jobs that are running at the end time or within 3 minutes before the end time. Job snapshot information is available. This time is also called the job snapshot observation time. The default range is the last hour. The maximum time range for a search is 7 days, and the minimum is 2 minutes. You can search for jobs from the last 45 days. You can select a preset time range or click the Time Range input box to quickly configure a time range in the time selection panel: 1h: The last hour. 12h: The last 12 hours. 1d: The last day. Select a specific period: In the time selection panel, select the year, month, and day. Then, click Select Time to select a time period.
Choose Project	Filters jobs by MaxCompute project name. Note This is a global filter condition that affects the job statistics chart and the job list. You can select multiple MaxCompute projects at the same time. This parameter is empty by default.
Select Quota	Filters jobs by quota group. Note This is a global filter condition that affects the job statistics chart and the job list. You can select only subscription quota groups. This parameter is empty by default. Note You do not need to configure this parameter when you query pay-as-you-go jobs. For more information about quota groups, see Manage computing resources - Manage quotas.
Job Type	Filters jobs by job type. Note This is a global filter condition that affects the job statistics chart and the job list. Valid values: SQL: SQL jobs. SQLRT: Query-accelerated SQL jobs. SQLCost: SQL cost estimation jobs. LOT: MapReduce jobs. CUPID: Spark or Mars jobs. AlgoTask: Machine learning jobs. Graph: Graph computing jobs. MaxQA (MCQA2): MaxQA jobs. MaxFrame: MaxFrame jobs.
Instance ID	Filters jobs by the InstanceID generated by a MaxCompute job. You can enter a job's InstanceID to find the exact job. Note This is a secondary filter for the results in the job list. It only affects the job list. This parameter is empty by default. For more information about InstanceIDs, see View instance information.
Latest Status	Filters jobs by running status. Note This is a secondary filter for the results in the job list. It only affects the job list. Valid values: Running: The job is running. All unfinished jobs are in this state. Success: The job ran successfully. Failed: The job failed. Cancelled: The job was canceled. Submitted: The job is submitted and waiting for computing resources. By default, no status is selected, which means all statuses are included. Note This status is the overall status of the job. However, a job may have multiple concurrent tasks, and each task can have a different sub-status. To view details, go to the LogView page. For more information, see Use LogView 2.0 to view job information.
Job Owner	Filters jobs by the account that submitted the MaxCompute job. Note This is a secondary filter for the results in the job list. It only affects the job list. This parameter is empty by default. Fuzzy search is not supported. The format must be a complete account name, such as ALIYUN$xxx or RAM$xxx.
ExtNodeId	Filters jobs by the job ID from the source that ran the MaxCompute job. Note This is a secondary filter for the results in the job list. It only affects the job list. An example is a node ID from DataWorks. For more information about DataWorks node IDs, see Configure basic properties.
Signature	Filters by SQL job signature. Note This is a secondary filter for the results in the job list. It only affects the job list. This parameter is available only for SQL jobs. You can use this signature to find the instances for each execution of the same SQL statement. This parameter is empty by default.
Intelligent Diagnosis	Filters jobs by the labels from the intelligent diagnosis result. By default, no labels are selected. For more information about the meanings of the intelligent diagnosis result labels, see Job intelligent diagnosis.

Sort jobs
By default, the filtered job results are sorted by job end time in descending order. Unfinished jobs appear at the top. You can use basic single-column sorting or advanced multi-column sorting.
- Basic single-column sorting: You can sort columns that have a sort button in ascending or descending order.
- Advanced multi-column sorting: Click the Advanced Sort button in the upper-right corner of the list. You can add multiple column names by clicking Add Sort and specify the sort order (ascending or descending) for each column. Click OK to apply the multi-column sorting.
Note
When advanced sorting conditions are active, you cannot perform basic single-column sorting. You must click the Advanced Sort button in the upper-right corner of the list, click Reset, and then click OK before you can perform basic single-column sorting.
View job details
In the job list, click LogView in the Actions column of a specific job to go to the LogView page. On this page, you can view the status, details, and results of the job.
Terminate jobs
You can Terminate a single job or Terminate Multiple Jobs In A Batch. This operation is available for jobs in the Running state, which is displayed in the Latest Status column.
Obtain job insights
You can perform an Insight operation on a single job to view its overview, resource consumption, and the resource allocation of its computing quota at a specific time. You can also trigger a job intelligent diagnosis.
Note
- Currently, intelligent diagnosis is supported only for SQL jobs.
- Job-level resource consumption data is not available for jobs with a runtime of less than 2 minutes or for job types other than SQL, MapReduce, Spark, or Mars.

Job statistics chart

Based on the filtered results, a stacked column chart of job counts is generated based on time and status. This chart helps you view the overall running status of jobs.

Job statistics chart details (click to expand)

The time interval that is represented by each column varies based on the Time Range setting:

If the Time Range is within 24 hours, the minimum time interval for each column is 2 minutes. The number of columns is adaptive, with a maximum of 24.
If the Time Range is greater than 24 hours and less than or equal to 48 hours, the time interval for each column is fixed at 2 hours. The number of columns is adaptive, with a maximum of 24.
If the Time Range is greater than 48 hours and less than or equal to 7 days, the time interval for each column is 6 hours. The number of columns is adaptive, with a maximum of 29.

The job statuses included in the statistics are:

Running: The job is in the Running state at the time of the snapshot.
Finished: The job has a status of Success, Failed, or Cancelled.

Note

Job snapshot data is collected every 3 minutes. Therefore, snapshot data for some jobs may not be collected. As a result, the snapshot status of a running job may be empty.

You can drag the mouse to select a range on the chart to shorten the time period.

Job list

The job list shows the filtered job results. It provides common job information to help you with job O&M. MaxCompute provides a Regular List and a Snapshot List to obtain job information for different scenarios:

Regular List: View all job information over a period of time.
Snapshot List: View snapshot information of jobs that are running at a specific time. This includes the snapshot status, along with CPU usage, memory usage, requested amounts, and usage ratios at the time of the snapshot.

Note

The following job information cannot be collected:

Job snapshot data is collected every 3 minutes. Therefore, snapshot data for some jobs may not be collected. This applies to jobs started within 3 minutes before the collection time.
Some MaxCompute jobs initiated through PAI cannot be collected, especially jobs initiated by RAM users.
Jobs in projects of the Developer Edition (to be discontinued) cannot be collected.

Because data is processed at a specific frequency, some jobs may be in the Running state in the job list but are already finished in LogView. This is common for jobs that run for a very short time. The latest status in LogView prevails.

Regular list

Parameters:

Column Name	Description
Instance ID	Each MaxCompute job generates an instance, and each instance has a corresponding Instance ID. The project, computing quota, and type information of the job are also displayed. Note You can click LogView in the Actions column of the instance to go to the LogView page and view the specific progress of the job. For more information about how to use LogView, see Use Logview 2.0 to view job information. You can also click Insight in the Actions column of the instance to go to the Job Insights page. On this page, you can view the diagnosis results, resource consumption, and similar job information. For more information, see Job insights.
Latest Status	The latest status of the job.
Job Owner	The Alibaba Cloud account used to run the MaxCompute job. You can find the job owner based on the account information. If a job uses too many resources and affects other jobs, you can contact the owner to stop the job. For more information about how to stop a job, see Instance operations.
Priority	Each job has a priority from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get computing resources before low-priority jobs. For more information, see Job priority.
Submission Time	The submission time of the instance.
Start Time	The time when the job acquired its first computing resource. For jobs that run for a short time or do not consume computing resources, such as DDL statements, the job submission time is used instead. This column is hidden by default. You can click the custom list options to display it.
Wait Time	The duration from the job submission time to the start time. This column is hidden by default. You can click the custom list options to display it.
Running Time	The duration from the job start time to the end time. This column is hidden by default. You can click the custom list options to display it.
End Time	The end time of the instance run.
Total Running Time	The total duration from the job submission time to the end time.
Cumulative CPU Usage	The total CPU consumption during the entire job execution. Unit: `100 × Core × s`.
Cumulative Memory Usage	The total memory consumption during the entire job execution. Unit: `MB × s`.
Input Data Size	The amount of input data for the job's computation.
Intelligent Diagnosis	The labels generated from the results of the job's intelligent diagnosis.
ExtPlantFrom	The client that initiated the job. For example, DataWorks. The initiating client must actively pass this information when it starts the job.
ExtNodeId	The task ID corresponding to the job initiator. For example, the node ID in DataWorks. The initiating client must actively pass this information when it starts the job.
ExtNodeOnDuty	The account ID of the task owner corresponding to the job initiator. For example, the node owner in DataWorks. The initiating client must actively pass this information when it starts the job.
Signature	The SQL job signature. You can use this signature to find the instances for each execution of an SQL statement.

Snapshot list

Parameters:

Column Name	Description
Instance ID	Each MaxCompute job generates an instance, and each instance has a corresponding Instance ID. The project, computing quota, and type information of the job are also displayed. Note You can click LogView in the Actions column of the instance to go to the LogView page and view the specific progress of the job. For more information about how to use LogView, see Use Logview 2.0 to view job information. You can also click Insight in the Actions column of the instance to go to the Job Insights page. On this page, you can view the diagnosis results, resource consumption, and similar job information. For more information, see Job insights.
Snapshot Time	The time when the job snapshot information was collected.
Snapshot Status	The status of the job at the time of snapshot collection.
Job Owner	The Alibaba Cloud account used to run the MaxCompute job. You can find the job owner based on the account information. If a job uses too many resources and affects other jobs, you can contact the owner to stop the job. For more information about how to stop a job, see Instance operations.
Priority	Each job has a priority from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get computing resources before low-priority jobs. For more information, see Job priority.
CPU Usage	The CPU usage of the job at the snapshot time. Unit: Core.
CPU Requested	The requested CPU of the job at the snapshot time. Unit: Core.
CPU Fulfillment Rate	CPU Usage / CPU Requested at the snapshot time.
CPU Usage Snapshot	The CPU usage percentage of the job at the observation time (`CPU Usage / (Guaranteed Reserved CPU + Elastic Reserved CPU)`). This information is not available for pay-as-you-go jobs or jobs for which snapshot information cannot be collected.
Memory Usage	The memory usage of the job at the snapshot time. The unit is displayed adaptively.
Memory Requested	The requested memory of the job at the snapshot time. The unit is displayed adaptively.
Memory Fulfillment Rate	Memory Usage / Memory Requested at the snapshot time.
Memory Usage Snapshot	The memory usage percentage of the job at the observation time (`Memory Usage / (Guaranteed Reserved Memory + Elastic Reserved Memory)`). This information is not available for pay-as-you-go jobs or jobs for which snapshot information cannot be collected.
Submission Time	The submission time of the instance.
Total Running Time	The total duration from the job submission time to the snapshot time.
ExtPlantFrom	The client that initiated the job. For example, DataWorks. The initiating client must actively pass this information when it starts the job.
ExtNodeId	The task ID corresponding to the job initiator. For example, the node ID in DataWorks. The initiating client must actively pass this information when it starts the job.
ExtNodeOnDuty	The account ID of the task owner corresponding to the job initiator. For example, the node owner in DataWorks. The initiating client must actively pass this information when it starts the job.
Signature	The SQL job signature. You can use this signature to find the instances for each execution of an SQL statement.

Examples of common O&M scenarios

View details of a specific job

Scenario

You need to view the running status of a job initiated by a DataWorks hourly scheduling node or audit a specific MaxCompute job.

Procedure

Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set the Time Range as required.
Click Search.
Above the job list, select the ExtNodeId or Instance ID parameter and enter the value for your job.
Click the icon to filter the job list again.
In the results list, you can click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.

View details of a job in a specific time range

O&M scenarios

You need to view the jobs that ran in the last day for the Project_1 and Project_2 projects that you are responsible for, and analyze which jobs failed so that you can handle them.

Procedure

Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set Time Range to 1d, or set a custom Time Range from 00:00:00 on the desired day to the current time.
From the Choose Project drop-down list, select Project_1 and Project_2.
In the results list, you can click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.

View resource usage of a subscription quota at a specific time

Scenario

When the resource usage of the Subscription Default Quota is high and many jobs are waiting, you need to identify which jobs are occupying the quota resources.

Procedure

Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set Time Range to 1h or define a custom time range by setting the Start Time and entering the current time as the End Time.
Set the Select Quota parameter to Subscription Default Quota.
Click Search.
In the query result list, you can view the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot for jobs whose Latest Status is Running. You can check whether the job with the highest percentage meets your business requirements and use other job information to help you decide whether the job is normal or needs to be terminated.
Note
For more information about a job, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.

View details of a query acceleration job

Scenario

You need to view the running status and details of query acceleration jobs from the last day.

Procedure

Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set Time Range to 1d and select SQLRT (Query Acceleration) for Job Type.
Click Search.
View the basic job information in the job list. For more information about a job, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.
Note
For jobs that use the query acceleration feature, multiple SQL commands may run in the same session. One session corresponds to one Instance ID. You can use the LogView for that Instance ID to view the running status of all SQL statements in the session. Therefore, when you view query acceleration jobs on the Job O&M page, note the following:
- If the session has not exited, which means some SQL statements are finished but others are still running, the job's Latest Status is Running.
- If the session expires or is exited because the interface was closed, the job's Latest Status is Cancelled.

View job resource consumption and computing quota resource allocation at a specific time

Scenario

When a job runs for a long time without finishing and the cause is difficult to find in LogView, or when a job runs slower than expected after it finishes, you need to analyze whether the issue is caused by resource supply problems.

Procedure

Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Select a Time Range and use the Select Quota parameter to filter. Click Search.
Click Insight in the Actions column of the target Instance ID to go to the Job Insights page.
On the Resource Consumption tab, view the resource consumption during the job's lifecycle.
- The resource consumption chart shows the change curves of used CUs and waiting CUs at the job level over time, along with the change of used CUs and waiting CUs at the quota level where the job ran. If you find that the job's CU usage is low but the quota-level CU usage is high or continuously at its limit, this means that the quota resources are insufficient and other jobs are competing for computing resources with the current job.
- You can click a time point on the horizontal axis of the resource consumption chart to view the resource allocation at the computing quota level for that moment. This shows the number and priority distribution of jobs to which Running and Waiting resources are allocated. You can click the color block that corresponds to a target priority to go to the job list and view the details of those jobs. This helps you identify which jobs are competing for computing resources with the current job. You can optimize task execution based on your business needs by adjusting job priorities or managing computing resources. For more information, see Job priority or Manage computing resources - Manage quotas.

Next steps

If a job consistently has high resource usage and many other jobs are waiting, you can take the following measures:

If this job does not meet business requirements, you can terminate it.
If this job meets business requirements, this indicates that the quota resource configuration is not reasonable. You need to optimize the resource configuration plan. For more information, see Optimize computing resource configurations.

References

To view job information, check job statuses, and stop jobs using commands, see Instance operations.