The Job O&M feature in MaxCompute lets you view historical and running jobs. This helps you understand job details, analyze resource loads during runtime, and perform O&M on jobs.
Features
The Job O&M feature in MaxCompute lets you view and manage historical and running jobs in the current project.
For data developers, the Job O&M feature helps you view job details, promptly detect exceptions and issues, and handle problematic jobs. For example, you can terminate a single job or multiple jobs in a batch.
For administrators, the Job O&M feature helps you view the resource load of a quota group at a specific time, efficiently allocate and manage system resources, and improve job execution efficiency and performance.
In the navigation pane on the left of the MaxCompute console, choose Workspace > Job O&M. On the Job O&M page, you can configure filter conditions to find specific jobs. This lets you view job details and analyze jobs. The following features are available:
Operations
Filter jobs
You can filter jobs based on parameters to find the jobs that you want to view. The following table describes the filter parameters.
Sort jobs
By default, the filtered job results are sorted by job end time in descending order. Unfinished jobs appear at the top. You can use basic single-column sorting or advanced multi-column sorting.
Basic single-column sorting: You can sort columns that have a sort button in ascending or descending order.
Advanced multi-column sorting: Click the Advanced Sort button in the upper-right corner of the list. You can add multiple column names by clicking Add Sort and specify the sort order (ascending or descending) for each column. Click OK to apply the multi-column sorting.
NoteWhen advanced sorting conditions are active, you cannot perform basic single-column sorting. You must click the Advanced Sort button in the upper-right corner of the list, click Reset, and then click OK before you can perform basic single-column sorting.
View job details
In the job list, click LogView in the Actions column of a specific job to go to the LogView page. On this page, you can view the status, details, and results of the job.
Terminate jobs
You can Terminate a single job or Terminate Multiple Jobs In A Batch. This operation is available for jobs in the
Running
state, which is displayed in the Latest Status column.Obtain job insights
You can perform an Insight operation on a single job to view its overview, resource consumption, and the resource allocation of its computing quota at a specific time. You can also trigger a job intelligent diagnosis.
NoteCurrently, intelligent diagnosis is supported only for SQL jobs.
Job-level resource consumption data is not available for jobs with a runtime of less than 2 minutes or for job types other than SQL, MapReduce, Spark, or Mars.
Job statistics chart
Based on the filtered results, a stacked column chart of job counts is generated based on time and status. This chart helps you view the overall running status of jobs.
Job list
The job list shows the filtered job results. It provides common job information to help you with job O&M. MaxCompute provides a Regular List and a Snapshot List to obtain job information for different scenarios:
Regular List: View all job information over a period of time.
Snapshot List: View snapshot information of jobs that are running at a specific time. This includes the snapshot status, along with CPU usage, memory usage, requested amounts, and usage ratios at the time of the snapshot.
The following job information cannot be collected:
Job snapshot data is collected every 3 minutes. Therefore, snapshot data for some jobs may not be collected. This applies to jobs started within 3 minutes before the collection time.
Some MaxCompute jobs initiated through PAI cannot be collected, especially jobs initiated by RAM users.
Jobs in projects of the Developer Edition (to be discontinued) cannot be collected.
Because data is processed at a specific frequency, some jobs may be in the Running
state in the job list but are already finished in LogView. This is common for jobs that run for a very short time. The latest status in LogView prevails.
Regular list
Parameters:
Column Name | Description |
Instance ID | Each MaxCompute job generates an instance, and each instance has a corresponding Instance ID. The project, computing quota, and type information of the job are also displayed. Note
|
Latest Status | The latest status of the job. |
Job Owner | The Alibaba Cloud account used to run the MaxCompute job. You can find the job owner based on the account information. If a job uses too many resources and affects other jobs, you can contact the owner to stop the job. For more information about how to stop a job, see Instance operations. |
Priority | Each job has a priority from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get computing resources before low-priority jobs. For more information, see Job priority. |
Submission Time | The submission time of the instance. |
Start Time | The time when the job acquired its first computing resource. For jobs that run for a short time or do not consume computing resources, such as DDL statements, the job submission time is used instead. This column is hidden by default. You can click the custom list options to display it. |
Wait Time | The duration from the job submission time to the start time. This column is hidden by default. You can click the custom list options to display it. |
Running Time | The duration from the job start time to the end time. This column is hidden by default. You can click the custom list options to display it. |
End Time | The end time of the instance run. |
Total Running Time | The total duration from the job submission time to the end time. |
Cumulative CPU Usage | The total CPU consumption during the entire job execution. Unit: |
Cumulative Memory Usage | The total memory consumption during the entire job execution. Unit: |
Input Data Size | The amount of input data for the job's computation. |
Intelligent Diagnosis | The labels generated from the results of the job's intelligent diagnosis. |
ExtPlantFrom | The client that initiated the job. For example, DataWorks. The initiating client must actively pass this information when it starts the job. |
ExtNodeId | The task ID corresponding to the job initiator. For example, the node ID in DataWorks. The initiating client must actively pass this information when it starts the job. |
ExtNodeOnDuty | The account ID of the task owner corresponding to the job initiator. For example, the node owner in DataWorks. The initiating client must actively pass this information when it starts the job. |
Signature | The SQL job signature. You can use this signature to find the instances for each execution of an SQL statement. |
Snapshot list
Parameters:
Column Name | Description |
Instance ID | Each MaxCompute job generates an instance, and each instance has a corresponding Instance ID. The project, computing quota, and type information of the job are also displayed. Note You can click LogView in the Actions column of the instance to go to the LogView page and view the specific progress of the job. For more information about how to use LogView, see Use Logview 2.0 to view job information. You can also click Insight in the Actions column of the instance to go to the Job Insights page. On this page, you can view the diagnosis results, resource consumption, and similar job information. For more information, see Job insights. |
Snapshot Time | The time when the job snapshot information was collected. |
Snapshot Status | The status of the job at the time of snapshot collection. |
Job Owner | The Alibaba Cloud account used to run the MaxCompute job. You can find the job owner based on the account information. If a job uses too many resources and affects other jobs, you can contact the owner to stop the job. For more information about how to stop a job, see Instance operations. |
Priority | Each job has a priority from 0 to 9. A smaller value indicates a higher priority. High-priority jobs get computing resources before low-priority jobs. For more information, see Job priority. |
CPU Usage | The CPU usage of the job at the snapshot time. Unit: Core. |
CPU Requested | The requested CPU of the job at the snapshot time. Unit: Core. |
CPU Fulfillment Rate | CPU Usage / CPU Requested at the snapshot time. |
CPU Usage Snapshot | The CPU usage percentage of the job at the observation time ( |
Memory Usage | The memory usage of the job at the snapshot time. The unit is displayed adaptively. |
Memory Requested | The requested memory of the job at the snapshot time. The unit is displayed adaptively. |
Memory Fulfillment Rate | Memory Usage / Memory Requested at the snapshot time. |
Memory Usage Snapshot | The memory usage percentage of the job at the observation time ( |
Submission Time | The submission time of the instance. |
Total Running Time | The total duration from the job submission time to the snapshot time. |
ExtPlantFrom | The client that initiated the job. For example, DataWorks. The initiating client must actively pass this information when it starts the job. |
ExtNodeId | The task ID corresponding to the job initiator. For example, the node ID in DataWorks. The initiating client must actively pass this information when it starts the job. |
ExtNodeOnDuty | The account ID of the task owner corresponding to the job initiator. For example, the node owner in DataWorks. The initiating client must actively pass this information when it starts the job. |
Signature | The SQL job signature. You can use this signature to find the instances for each execution of an SQL statement. |
Examples of common O&M scenarios
View details of a specific job
Scenario
You need to view the running status of a job initiated by a DataWorks hourly scheduling node or audit a specific MaxCompute job.
Procedure
Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set the Time Range as required.
Click Search.
Above the job list, select the ExtNodeId or Instance ID parameter and enter the value for your job.
Click the
icon to filter the job list again.
In the results list, you can click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.
View details of a job in a specific time range
O&M scenarios
You need to view the jobs that ran in the last day for the Project_1 and Project_2 projects that you are responsible for, and analyze which jobs failed so that you can handle them.
Procedure
Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set Time Range to 1d, or set a custom Time Range from
00:00:00
on the desired day to the current time.From the Choose Project drop-down list, select Project_1 and Project_2.
In the results list, you can click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.
View resource usage of a subscription quota at a specific time
Scenario
When the resource usage of the Subscription Default Quota is high and many jobs are waiting, you need to identify which jobs are occupying the quota resources.
Procedure
Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set Time Range to 1h or define a custom time range by setting the
Start Time
and entering the current time as theEnd Time
.Set the Select Quota parameter to
Subscription Default Quota
.Click Search.
In the query result list, you can view the CPU Utilization Percentage Snapshot and Memory Usage Percentage Snapshot for jobs whose Latest Status is
Running
. You can check whether the job with the highest percentage meets your business requirements and use other job information to help you decide whether the job is normal or needs to be terminated.NoteFor more information about a job, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.
View details of a query acceleration job
Scenario
You need to view the running status and details of query acceleration jobs from the last day.
Procedure
Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Set Time Range to 1d and select SQLRT (Query Acceleration) for Job Type.
Click Search.
View the basic job information in the job list. For more information about a job, click LogView in the Actions column of the target instance to go to the LogView page and view detailed job information. For more information, see Use LogView 2.0 to view job information.
NoteFor jobs that use the query acceleration feature, multiple SQL commands may run in the same session. One session corresponds to one Instance ID. You can use the LogView for that Instance ID to view the running status of all SQL statements in the session. Therefore, when you view query acceleration jobs on the Job O&M page, note the following:
If the session has not exited, which means some SQL statements are finished but others are still running, the job's Latest Status is
Running
.If the session expires or is exited because the interface was closed, the job's Latest Status is
Cancelled
.
View job resource consumption and computing quota resource allocation at a specific time
Scenario
When a job runs for a long time without finishing and the cause is difficult to find in LogView, or when a job runs slower than expected after it finishes, you need to analyze whether the issue is caused by resource supply problems.
Procedure
Log on to the MaxCompute console. In the navigation pane on the left, choose Workspace > Job O&M.
Select a Time Range and use the Select Quota parameter to filter. Click Search.
Click Insight in the Actions column of the target Instance ID to go to the Job Insights page.
On the Resource Consumption tab, view the resource consumption during the job's lifecycle.
The resource consumption chart shows the change curves of used CUs and waiting CUs at the job level over time, along with the change of used CUs and waiting CUs at the quota level where the job ran. If you find that the job's CU usage is low but the quota-level CU usage is high or continuously at its limit, this means that the quota resources are insufficient and other jobs are competing for computing resources with the current job.
You can click a time point on the horizontal axis of the resource consumption chart to view the resource allocation at the computing quota level for that moment. This shows the number and priority distribution of jobs to which Running and Waiting resources are allocated. You can click the color block that corresponds to a target priority to go to the job list and view the details of those jobs. This helps you identify which jobs are competing for computing resources with the current job. You can optimize task execution based on your business needs by adjusting job priorities or managing computing resources. For more information, see Job priority or Manage computing resources - Manage quotas.
Next steps
If a job consistently has high resource usage and many other jobs are waiting, you can take the following measures:
If this job does not meet business requirements, you can terminate it.
If this job meets business requirements, this indicates that the quota resource configuration is not reasonable. You need to optimize the resource configuration plan. For more information, see Optimize computing resource configurations.
References
To view job information, check job statuses, and stop jobs using commands, see Instance operations.