EAS architecture and usage - Platform For AI - Alibaba Cloud Documentation Center

After you train a model, you can use Elastic Algorithm Service (EAS) to quickly deploy it as an online inference service or an AI-powered web application. EAS supports heterogeneous resources and provides features such as automatic scaling, one-click stress testing, phased release, and real-time monitoring. These features ensure service stability and business continuity in high-concurrency scenarios at a lower cost.

EAS features

Details of EAS features

Infrastructure layer: This layer supports heterogeneous hardware, such as CPUs and GPUs. It provides GU instance types designed for AI and preemptible instances to balance performance and cost-effectiveness.
Container scheduling layer: This layer uses mechanisms such as automatic and scheduled scaling, and elastic resource pools to dynamically adapt to changes in business workloads and improve resource utilization.
- Automatic scaling: This feature automatically adjusts the number of instances based on real-time workloads to handle unpredictable traffic peaks and prevent resource underutilization or overload.
- Scheduled scaling: This feature is suitable for predictable business cycles, such as morning peaks or promotional events. You can set scaling policies in advance to precisely control resource allocation.
- Elastic resource pool: If a dedicated resource group is full, the system automatically schedules new instances to a pay-as-you-go public resource group to ensure service stability.
Model deployment layer: This layer integrates the entire process of publishing, stress testing, and monitoring to simplify operations and maintenance (O&M) and improve deployment reliability.
- One-click stress testing: This feature supports dynamic load increases and automatically detects the service limit. You can view real-time, second-level monitoring data and stress testing reports to quickly assess service capacity.
- Phased release: You can add multiple services to the same phased release group and flexibly allocate traffic ratios between production and grayscale environments to safely validate new versions.
- Real-time monitoring: After deployment, you can view key metrics such as queries per second (QPS), response time, and CPU utilization in the console to fully understand the operational status of the service.
- Traffic mirroring: This feature copies a portion of online traffic to a test service to validate the performance and reliability of the new service without affecting real user requests.
Inference capability: EAS provides three inference modes:
- Real-time synchronous inference: This mode is suitable for scenarios such as search recommendations and chatbots, and it features high throughput and low latency. The system can also select appropriate deployment instance types based on business requirements to achieve optimal performance.
- Near real-time asynchronous inference: This mode is suitable for time-consuming tasks such as text-to-image generation and video processing. It includes a built-in message queue, supports elastic scaling, and requires no O&M.

Billing

When you deploy services using EAS, you may be billed for computing resources, system disks, and dedicated gateways:

Computing resources: Includes public resources, dedicated resources, and Lingjun resources.
System disk (Optional): A free quota is available. This quota includes 30 GB for public resources and 200 GB for dedicated resources. You are billed separately for any additional system disk usage.
Dedicated gateway (Optional): Deployments use a free shared gateway by default. If you require features such as isolation, Resource Access Management, or a custom domain name, you can purchase a dedicated gateway. You must manually configure the dedicated gateway.

EAS supports two billing methods.

Pay-as-you-go: Billing is based on the service runtime, not the number of service invocations. This method is suitable for scenarios with uncertain or fluctuating demand.
Subscription: You pay in advance to receive a lower price. This method is suitable for long-term, stable business needs.

For Use Stable Diffusion web UI to deploy an AI painting service and Use ComfyUI to deploy an AI video generation model service, EAS provides a Serverless version. Service deployment is free. You are billed only for the actual inference duration during service invocations.

Important

If you use other Alibaba Cloud services, such as Elastic IP, OSS, or NAS, you will incur charges for those services.

For more information, see Billing of Elastic Algorithm Service (EAS).

Usage workflow

Step 1: Preparations

Prepare inference resources
You can select a suitable EAS resource type based on your model size, concurrency requirements, and budget. If you use dedicated resources or Lingjun resources, you must purchase them before use. For more information about resource selection and purchase, see Overview of EAS deployment resources.
Prepare model and code files
Prepare the trained model, code processing files, and other dependencies. Then, upload the files to a specified cloud storage product, such as OSS. You must also configure storage settings to access the data required for service deployment.

Step 2: Deploy a service

Deployment tools: You can deploy and manage services using the console, the EASCMD command line, or an SDK.
- Console: The console provides custom deployment and scenario-based deployment methods. The console is easy to use and suitable for beginners.
- EASCMD command line: This tool supports operations such as creating, updating, and viewing services. It is suitable for algorithm engineers who are familiar with EAS deployment.
- SDK: The SDK is suitable for large-scale, unified scheduling and O&M.
Deployment methods: EAS supports image-based deployment (recommended) and processor-based deployment. For more information about the differences between these methods, see Deployment principles.

Step 3: Invoke and stress test the service

If you deploy the model as a WebUI application, you can open the interactive page in a browser from the console to directly experience the capabilities of the model.
If you deploy the model as an API service:
- You can send HTTP requests using online service debugging to verify that the inference feature works as expected.
- You can use an API to make synchronous or asynchronous calls. EAS supports multiple service invocation methods, such as shared gateways, dedicated gateways, and direct connections.
You can use the built-in stress testing tool in EAS to perform one-click stress tests on the deployed service. This helps you test the performance of the service under pressure and understand its model inference processing capacity. For more information about stress testing, see Automatic stress testing.

Step 4: Monitor and scale the service

After the service is running, you can enable service monitoring and alerts. This helps you stay informed about resource usage, performance metrics, and potential anomalies to ensure that the service runs smoothly.
You can also enable horizontal or scheduled automatic scaling to dynamically manage the computing resources of your online service in real time. For more information, see Service Auto Scaling.

Step 5: Asynchronous inference service

For time-consuming requests, such as text-to-image generation or video processing, you can use asynchronous inference. In this mode, a queue service receives requests. After a request is processed, the results are written to an output queue. The client then asynchronously retrieves the results. This method prevents request backlogs and data loss, and improves system throughput. EAS supports automatic scaling based on the queue backlog to intelligently adjust the number of instances. For more information, see Asynchronous inference services.

Step 6: Update the service

In the inference service list, find the service that you want to update, and then click Update in the Actions column.

Warning

The service is temporarily interrupted during the update. This may cause requests that depend on this service to fail. Proceed with caution.

After the update is complete, you can click the current version number to view the Version Information or switch the service version.

FAQ

If you encounter problems during service deployment or invocation, see FAQ about EAS for solutions.

References

For more EAS use cases, see EAS use case summary.