通过场景化部署方式构建大模型RAG对话系统 - Platform For AI

Retrieval-augmented generation (RAG) technology enhances the ability of a large language model (LLM) to answer questions about private domains. This technology retrieves relevant information from an external knowledge base and combines it with user input before sending the combined data to the LLM. Elastic Algorithm Service (EAS) provides a scenario-based deployment method that lets you flexibly select an LLM and a vector database to quickly build and deploy a RAG chatbot. This topic describes how to deploy a RAG chatbot service and test its model inference capabilities.

Step 1: Deploy the RAG service

Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Elastic Algorithm Service (EAS) page, click Deploy Service. Under Scenario-based Model Deployment, click RAG-based Smart Dialogue Deployment.

On the RAG-based Smart Dialogue Deployment page, configure the parameters and click Deploy. The deployment is complete when the Service Status changes to Running. The deployment typically takes about 5 minutes, but the actual duration may vary depending on factors such as the number of model parameters. The following table describes the key parameters.

Basic Information

Parameter

Description

Version Selection

Two deployment versions are available:

LLM Integrated Deployment: Deploys the large language model (LLM) service and the retrieval-augmented generation (RAG) service in the same service.
LLM-Separated Deployment: Deploys only the RAG service. This option offers greater flexibility, letting you freely replace and connect LLM services.

Model Category

When you select LLM Integrated Deployment as the version, select the large language model (LLM) to deploy. Choose an open-source model based on your specific scenario.

Resource Information

Parameter

Description

Deploy Resources

For the LLM Integrated Deployment version, the system automatically selects suitable resource specifications based on the chosen model category. Changing to a different resource specification may prevent the model service from starting correctly.
For the LLM-Separated Deployment version, select a CPU with 8 or more cores and at least 16 GB of memory. Recommended instance types include ecs.g6.2xlarge and ecs.g6.4xlarge.

Vector Database Settings

RAG supports using Faiss, Elasticsearch, Hologres, OpenSearch, or RDS PostgreSQL to build a vector database. Select one of these services as your vector database based on your requirements.

FAISS

Use Faiss to build a local vector database. This method is lightweight, user-friendly, and eliminates the need to purchase an online vector database product.

Parameter	Description
Version Type	Select FAISS.
OSS Address	Select an existing OSS folder in the current region to store the uploaded knowledge base files. If no folder is available, see Console Quick Start to create one. Note If you use a self-hosted fine-tuned model deployment service, ensure that the selected OSS path is different from the model's path to prevent conflicts.

Elasticsearch

Configure the connection information for your Alibaba Cloud Elasticsearch instance. For more information about how to create an Elasticsearch instance and prepare the configuration items, see Prepare a vector database by using Elasticsearch.

Parameter	Description
Version Type	Select Elasticsearch.
Private Endpoint/port	Configure the private endpoint and port for the Elasticsearch instance. The format is `http://<private_endpoint>:<private_port>`. To obtain the private endpoint and port, see View instance basic information.
Index Name	Enter a new or existing index name. For an existing index, the index schema must meet PAI-RAG requirements. For example, you can use an index that was automatically created when you deployed the RAG service using EAS.
Account	The logon name configured when creating the Elasticsearch instance. The default is elastic.
Password	Enter the logon password that you set when you created the Elasticsearch instance. If you forget the logon password, reset the instance access password.
OSS Address	Select an existing OSS folder in the current region. This folder is mounted as a path for knowledge base management.

Hologres

Configure the connection information for your Hologres instance. If you have not activated a Hologres instance, see Purchase a Hologres instance.

Parameter	Description
Version Type	Select Hologres.
Call Information	The host for the specified VPC. Go to the instance details page in the Hologres Management Console. In the Network Information area, click Copy next to Specified VPC to get the host. The host is the part of the domain name before `:80`.
Database Name	The name of the database for the Hologres instance. For more information, see Create a database.
Account	The custom account that you created. For more information, see Create a custom user. For Select Member Role, select Instance Super Administrator (SuperUser).
Password	The password for the existing custom user.
Table Name	Enter a new table name or an existing table name. The schema for an existing table must meet PAI-RAG requirements. For example, you can specify a Hologres table that was automatically created when you deployed a RAG service on EAS.
OSS Address	Select an existing OSS storage folder in the current region. This lets you manage the knowledge base by mounting an OSS path.

OpenSearch

Configure the connection information for your OpenSearch Vector Search Edition instance. For more information about how to create an OpenSearch instance and prepare the configuration items, see Prepare the OpenSearch vector database.

Parameters	Description
Version Type	Select OpenSearch.
Endpoint	The Internet endpoint for the OpenSearch Vector Search Edition instance. Enable Internet access for the instance. For more information, see Set up an OpenSearch vector search database.
Instance ID	In the OpenSearch Vector Search Edition instance list, get the instance ID.
Username	Configure the username and password that you entered when creating the OpenSearch Vector Search Edition instance.
Password
Table Name	The name of the index table that you created when you prepared the OpenSearch Vector Search Edition instance. For more information, see Prepare the OpenSearch vector search database.
OSS Path	Select an existing OSS storage directory in the current region. Manage the knowledge base by attaching an OSS path.

ApsaraDB RDS for PostgreSQL

Configure the connection information for your ApsaraDB RDS for PostgreSQL instance. For more information about how to create an ApsaraDB RDS for PostgreSQL instance and prepare the configuration items, see Prepare the ApsaraDB RDS for PostgreSQL vector database.

Parameter	Description
Version Type	Select RDS PostgreSQL.
Host Address	Configure the internal endpoint of the ApsaraDB RDS for PostgreSQL instance. You can find it on the Database Connection page of your ApsaraDB RDS for PostgreSQL instance in the ApsaraDB RDS for PostgreSQL console.
Port	The default is 5432. Change the value as needed.
Database	Configure the name of the database you created. For more information about how to create a database and an account, see Create a database and an account. Note the following: When creating an account, set Account Type to Privileged Account. When creating a database, set Authorized Account to the privileged account you created.
Table Name	Customize the database table name.
Account	Configure the username and password for the privileged account you created. For more information about how to create a privileged account, see Create a database and an account. Set Account Type to Privileged Account.
Password
OSS Address	Select an existing OSS storage directory in the current region. Manage the knowledge base by attaching an OSS path.

VPC

Parameter	Description
Virtual Private Cloud (VPC)	When deploying the RAG service, if you select LLM-Separated Deployment, make sure the RAG service can access the LLM service. The network requirements are as follows: To access the LLM service over the Internet, configure a VPC with Internet access here. For more information, see Scenario 1: Enable an EAS service to access the Internet. To access the LLM service using an internal endpoint, the RAG service and the LLM service must use the same VPC. To use an Alibaba Cloud Model Studio model or use web search for Q&A, configure a VPC with Internet access. For more information, see Scenario 1: Enable an EAS service to access the Internet. Network requirements for the vector database: A Faiss vector database does not require network access. EAS can access Hologres, Elasticsearch, or RDS PostgreSQL over the Internet or a private network. We recommend that you use private network access. For private network access, the virtual private cloud (VPC) that is configured in EAS must be the same as the VPC of the vector search database. To create a VPC, vSwitch, or security group, see Create and manage a VPC and Create a security group. EAS can only access OpenSearch over the Internet. For configuration instructions, see Step 2: Prepare configuration items.
VSwitch
Security Group Name

Step 2: API Call

The following section describes how to call the API for common RAG features. For more information about how to call the API for more features, such as managing knowledge bases and updating RAG service configurations, see RAG API operations.

Important

You can specify a knowledge base name in both the query and upload APIs to switch between knowledge bases. If you omit the knowledge base name parameter, the default knowledge base is used.

Obtain invocation information

On the Elastic Algorithm Service (EAS) page, click the RAG service name and then click View Endpoint Information in the Basic Information section.
Obtain the service endpoint and token from the Invocation Method dialog box.
Note
You can choose to use a public endpoint or a VPC endpoint:
- To use a public endpoint, the client must have Internet access.
- To use a VPC endpoint, the client must be in the same VPC as the RAG service.

Upload knowledge base files

You can upload local knowledge base files using the API. You can query the status of the file upload task using the `task_id` returned by the upload interface.

In the following examples, replace <EAS_SERVICE_URL> with the RAG service endpoint and <EAS_TOKEN> with the RAG service token. For more information about how to obtain them, see Obtain invocation information.

Upload a single file (using -F 'files=@path to upload a file)

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively. # Replace <name> with the name of your knowledge base. # Replace the path after "-F 'files=@" with the path to your knowledge base file. curl -X 'POST' /api/v1/knowledgebases//files \ -H 'Authorization: ' \ -H 'Content-Type: multipart/form-data' \ -F 'files=@example_data/paul_graham/paul_graham_essay.txt'

Upload multiple files

Use multiple -F 'files=@path' parameters with each parameter corresponding to a file to be uploaded, as shown in the example:

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively. # Replace <name> with the name of your knowledge base. # Replace the path after "-F 'files=@" with the path to your knowledge base file. curl -X 'POST' /api/v1/knowledgebases//files \ -H 'Authorization: ' \ -H 'Content-Type: multipart/form-data' \ -F 'files=@example_data/paul_graham/paul_graham_essay.txt' \ -F 'files=@example_data/another_file1.md' \ -F 'files=@example_data/another_file2.pdf' \

Query the upload status

# Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively. # Replace <name> with the name of your knowledge base. Replace <file_name> with the name of your file. curl -X 'GET' /api/v1/knowledgebases//files/ -H 'Authorization: '

Send a chat request

Call the service by using the OpenAI API that is compatible with the service. Before calling the service, you need to complete the corresponding configurations on the WebUI page of the RAG service based on your business requirements.

Supported features

Web search: You need to configure web search parameters on the WebUI page of the RAG service in advance.
Knowledge base query: You need to upload knowledge base files in advance.
LLM chat: Use large language model (LLM) services to provide answers. You need to configure LLM services in advance.
Agent chat: You need to complete agent-related code configurations on the WebUI page of the RAG service in advance.
Database or table query: You need to complete data analysis-related configurations on the WebUI page of the RAG service in advance.

Request examples

Web search

from openai import OpenAI ##### API configuration ##### # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively. openai_api_key = "" openai_api_base = "/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) #### Chat ###### def chat(): stream = True chat_completion = client.chat.completions.create( model="default", stream=stream, messages=[ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hello, how can I help you?"}, {"role": "user", "content": "What is the capital of Zhejiang province?"}, {"role": "assistant", "content": "Hangzhou is the capital of Zhejiang province."}, {"role": "user", "content": "What are some interesting places to visit?"}, ], extra_body={ "search_web": True, }, ) if stream: for chunk in chat_completion: print(chunk.choices[0].delta.content, end="") else: result = chat_completion.choices[0].message.content print(result) chat()

Database query

from openai import OpenAI ##### API configuration ##### # Replace <EAS_TOKEN> and <EAS_SERVICE_URL> with the service token and the service access address, respectively. openai_api_key = "" openai_api_base = "/v1" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) #### Chat ###### def chat(): stream = True chat_completion = client.chat.completions.create( model="default", stream=stream, messages=[ {"role": "user", "content": "How many cats are there?"}, {"role": "assistant", "content": "There are 2 cats"}, {"role": "user", "content": "What about dogs?"}, ], extra_body={ "chat_db": True, }, ) if stream: for chunk in chat_completion: print(chunk.choices[0].delta.content, end="") else: result = chat_completion.choices[0].message.content print(result) chat()

Notes

This tutorial is limited by the server resource size and the default token limit of the LLM service, and can only support conversations of a limited length. The purpose of this tutorial is to help you experience the basic retrieval features of the RAG chatbot.

FAQ

How am I charged?

Billing

When you deploy a RAG-based LLM chatbot, you are charged only for EAS resources. If you use other products, such as Alibaba Cloud Model Studio, vector databases (such as Elasticsearch, Hologres, OpenSearch, or RDS PostgreSQL), Object Storage Service (OSS), an Internet NAT gateway, or web search services (such as Bing), you are billed for each product separately according to its billing rules.

Stop billing

After you stop the EAS service, only the billing for EAS resources stops. To stop billing for other products, refer to the documentation for those products and follow the instructions to stop or delete the relevant instances.

Are knowledge base documents uploaded through the API stored permanently?

Knowledge base files uploaded using the RAG service API are not stored permanently. Their storage duration depends on the configuration of the selected vector database, such as Object Storage Service (OSS), Elasticsearch, or Hologres. We recommend that you consult the relevant documentation to understand the storage policies and ensure long-term data preservation.

Why are the parameters I set through the API not taking effect?

Currently, the PAI-RAG service lets you configure parameters using the API only for those listed in the API Reference document. You must configure all other parameters using the WebUI.

References

Using EAS, you can also complete the following scenario-based deployments:
EAS provides stress testing methods for LLM services and general services. This feature helps you easily create stress testing tasks and perform one-click stress tests to fully understand the performance of your EAS services. For more information, see Automatic stress testing.