Quick Start for Model Gallery - Platform For AI - Alibaba Cloud Documentation Center

Model Gallery simplifies PAI-DLC and PAI-EAS, allowing you to deploy and train open-source large language models (LLMs) without writing any code. This topic uses the Qwen3-0.6B model as an example to demonstrate how to use Model Gallery. The same process applies to other models.

Prerequisites

Use your Alibaba Cloud main account to activate PAI and create a workspace. Log on to the PAI console, select a region in the upper-left corner, and then activate the product.

Billing description

This example uses public resources to create a DLC job and an EAS service. The billing method is pay-as-you-go. For more information about billing, see DLC billing and EAS billing.

Model deployment

Deploy the model

Log on to the PAI console. In the navigation pane on the left, click Model Gallery. Search for the Qwen3-0.6B tab and click Deploy.
Configure the deployment parameters. You can use the default parameters on the deployment configuration page. Click Deploy > OK. The deployment takes approximately 5 minutes. The deployment is successful when the status changes to Running.
By default, public resources are used for deployment, and the billing method is pay-as-you-go.

Invoke the model

On the service details page, click View Invocation Information to obtain the Endpoint and Token.
To view the deployment task details later, in the navigation pane on the left, click Model Gallery > Service details > Call information.
You can test the model service using the following common invocation methods:
Online debugging
Switch to the Online Debugging page. In the content field of the request, enter a question, such as Hello, who are you?. Then, click Send Request. The response from the LLM is displayed on the right.
Use the Cherry Studio client
Cherry Studio is a mainstream large language model chat client that integrates the MCP feature, which lets you easily chat with large language models.
Connect to the Qwen3 model deployed on PAI
1. Install the client
  Visit Cherry Studio to download and install the client.
  You can also visit https://github.com/CherryHQ/cherry-studio/releases to download it.
2. Add a provider
  1. Click the Settings button in the upper-right corner. In the Model Service section, click Add.
  2. For Provider Name, enter a custom name, such as Platform for AI. For Provider Type, select OpenAI.
  3. Click OK.
3. In the API Key field, enter the token that you obtained. In the API Host field, enter the endpoint that you obtained.
4. Click Add. In the Model ID field, enter Qwen3-0.6B (case-sensitive) to add the model.
5. You can click Test next to the API Key field to check the connectivity.
6. Click the Home tab to return to the chat page. At the top of the window, switch to the Qwen3-0.6B model that you added to start chatting.
Use the Python SDK
```
from openai import OpenAI import os # Set the Token as an environment variable to prevent sensitive information leaks. # For more information about how to configure environment variables, see https://help.aliyun.com/zh/sdk/developer-reference/configure-the-alibaba-cloud-accesskey-environment-variable-on-linux-macos-and-windows-systems token = os.environ.get("Token") # The endpoint ends with /v1. Do not remove it. client = OpenAI( api_key=token, base_url=f'Your Endpoint/v1', ) query = 'Hello, who are you?' messages = [{'role': 'user', 'content': query}] resp = client.chat.completions.create(model='Qwen3-0.6B', messages=messages, max_tokens=512, temperature=0) query = messages[0]['content'] response = resp.choices[0].message.content print(f'query: {query}') print(f'response: {response}')
```

Important

This example uses public resources to create the model service, which is billed on a pay-as-you-go basis. To avoid incurring further charges, please stop or delete the service when you no longer need it.

Model fine-tuning

If you want the model to perform better in a specific domain, you can fine-tune it on a dataset from that domain. This section uses an example scenario to describe the purpose and steps of model fine-tuning.

Scenario example

In the logistics industry, it is often necessary to extract structured information, such as recipient, address, and phone number, from natural language. Using a large-parameter model, such as Qwen3-235B-A22B, yields good results but is costly and slow. To balance performance and cost, you can first use a large-parameter model to annotate data, and then use this data to fine-tune a small-parameter model, such as Qwen3-0.6B, to achieve similar performance on the same task. This process is also known as model distillation.

For the same structured information extraction task, the accuracy of the original Qwen3-0.6B model is 14%, while the accuracy after fine-tuning can reach over 90%.

Example recipient address information

Example of extracted structured information

Seefeldstrasse 45, 3. Obergeschoss, Apartment 32, Kreis 6, Zürich, Switzerland Contact #: +81 88357171 For Lukas Meier

{ "country": "Switzerland", "state_province": null, "city": "Zürich", "district": "Kreis 6", "specific_location": "Seefeldstrasse 45, 3. Obergeschoss, Apartment 32", "name": "Lukas Meier", "phone": "+81 88357171" }

Prepare the data

To distill the knowledge of the teacher model (Qwen3-235B-A22B) for this task into Qwen3-0.6B, you first need to use the teacher model's API to extract recipient address information into structured JSON-formatted data. Generating this JSON-formatted data can take a long time. Therefore, this topic provides a sample training dataset train_qwen3.json and a validation set eval_qwen3.json that you can download and use directly.

In model distillation, large-parameter models are also called teacher models. The data used in this topic is generated by a large language model and does not involve sensitive user information.

Recommendations for Data Collection in Production Applications

If the model is to be applied to real-world business scenarios in the future, we recommend preparing your data using the following methods:

Real Business Scenarios (Recommended) :

Genuine business data better reflects actual use cases, allowing the fine-tuned model to more effectively adapt to your specific business needs.

After obtaining the business data, you will need to programmatically convert your data into a JSON file in the following format:

[ { "instruction": "You are a professional information extraction assistant specialized in extracting recipient JSON information from English text. The keys include: country (country name), state_province (state/province), city (city name), district (district/county/area), specific_location (street, number, apartment, building details), name (recipient name), phone (contact phone). Input text:Seefeldstrasse 45, 3. Obergeschoss, Apartment 32, Kreis 6, Zürich, Switzerland / Contact #: +81 88357171 / For Lukas Meier", "output": "{\"country\": \"Switzerland\", \"state_province\": null, \"city\": \"Zürich\", \"district\": \"Kreis 6\", \"specific_location\": \"Seefeldstrasse 45, 3. Obergeschoss, Apartment 32\", \"name\": \"Lukas Meier\", \"phone\": \"+81 88357171\"}" }, { "instruction": "You are a professional information extraction assistant specialized in extracting recipient JSON information from English text. The keys include: country (country name), state_province (state/province), city (city name), district (district/county/area), specific_location (street, number, apartment, building details), name (recipient name), phone (contact phone). Input text:Carrer de Valencia 247, 3r 2n, 08007, Eixample, Barcelona, Spain / Ph: +47 60676201 / Recipient Carmen Ruiz Vázquez", "output": "{\"country\": \"Spain\", \"state_province\": \"\", \"city\": \"Barcelona\", \"district\": \"Eixample\", \"specific_location\": \"Carrer de Valencia 247, 3r 2n\", \"name\": \"Carmen Ruiz Vázquez\", \"phone\": \"+47 60676201\"}" } ]

The JSON file contains multiple training examples. Each example includes two fields: instruction (instruction) and output (standard answer):

instruction: Includes prompts that guide the behavior of the LLM, as well as the input data.
output: The expected standard answer, typically generated by human experts or other large language models (e.g., qwen3-235b-a22b).

LLM Generation

When business data is insufficiently rich, you can consider using a LLM for data augmentation, which helps improve data diversity and coverage.

To avoid leaking user privacy, this solution uses a LLM to generate a set of synthetic address data. Below is a sample code for your reference:

Sample Code for Generating a Synthetic Dataset

This sample code will call the LLM service from Alibaba Cloud Bailian. You will need to obtain a Bailian API Key.

In this example, the qwen-plus-latest model is used to generate business data, and the qwen3-235b-a22b model is used for labeling.

# -*- coding: utf-8 -*- import os import asyncio import random import json import sys from typing import List, Dict from openai import AsyncOpenAI import platform # Create async client instance # Try multiple API providers for better reliability def create_client(): dashscope_key = os.getenv("DASHSCOPE_API_KEY") if dashscope_key: return AsyncOpenAI( api_key=dashscope_key, base_url=os.getenv("OPENAI_ENDPOINT") ) # If no API key available, return None return None client = create_client() # Backup data for when API calls fail BACKUP_NAMES = [ # American names "Michael Johnson", "Sarah Williams", "David Rodriguez", "Emily Chen", "James Wilson", "Maria Garcia", "Robert Taylor", "Lisa Anderson", "Christopher Brown", "Jessica Davis", "Daniel Miller", "Ashley Martinez", "Matthew Jones", "Amanda Thompson", "Joshua Garcia", "Stephanie Rodriguez", "Andrew Wilson", "Jennifer Martinez", "Ryan Anderson", "Michelle Thomas", # International names "Mohammed Al-Rashid", "Priya Patel", "Ahmed Hassan", "Anna Kowalski", "Giovanni Rossi", "Marie Dubois", "Hans Mueller", "Yuki Tanaka", "Carlos Silva", "Emma Thompson", "Klaus Weber", "Isabella Romano", "Pierre Martin", "Sofia Andersson", "Raj Gupta", "Fatima Al-Zahra", "Olaf Larsen", "Camila Santos", "Dimitri Petrov", "Aisha Ibrahim" ] BACKUP_STREETS = [ "Oak Street", "Maple Avenue", "Pine Road", "Cedar Lane", "Elm Drive", "Main Street", "Park Avenue", "Washington Street", "Lincoln Road", "Jefferson Drive", "Madison Lane", "First Street", "Second Avenue", "Third Street", "Broadway", "Market Street", "Church Street", "School Road", "Mill Lane", "River Road", "Hill Street", "Lake Drive", "Forest Avenue", "Sunset Boulevard", "Spring Street", "Summer Lane", "Winter Road", "Garden Street", "Valley Road", "Mountain View", "Ocean Drive", "Bay Street", "Harbor Lane" ] BACKUP_CITIES_US = { "California": ["Los Angeles", "San Francisco", "San Diego", "Sacramento", "Oakland"], "New York": ["New York City", "Buffalo", "Rochester", "Syracuse", "Albany"], "Texas": ["Houston", "Dallas", "Austin", "San Antonio", "Fort Worth"], "Florida": ["Miami", "Tampa", "Orlando", "Jacksonville", "Tallahassee"], "Illinois": ["Chicago", "Springfield", "Rockford", "Peoria", "Aurora"] } BACKUP_INTERNATIONAL = { "United Kingdom": {"cities": ["London", "Manchester", "Birmingham", "Liverpool"], "areas": ["Westminster", "Camden", "Kensington", "Greenwich"]}, "Germany": {"cities": ["Berlin", "Munich", "Hamburg", "Frankfurt"], "areas": ["Mitte", "Charlottenburg", "Schöneberg", "Kreuzberg"]}, "France": {"cities": ["Paris", "Lyon", "Marseille", "Toulouse"], "areas": ["1st Arrondissement", "Montmartre", "Latin Quarter", "Marais"]}, "Italy": {"cities": ["Rome", "Milan", "Naples", "Turin"], "areas": ["Centro Storico", "Trastevere", "Vatican City", "Brera"]}, "Canada": {"cities": ["Toronto", "Vancouver", "Montreal", "Calgary"], "areas": ["Downtown", "Midtown", "Old Town", "Financial District"]} } # US states list states = [ "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming" ] # International countries for diversity countries = [ "United States", "Canada", "United Kingdom", "Germany", "France", "Italy", "Spain", "Australia", "New Zealand", "Japan", "South Korea", "India", "Brazil", "Mexico", "Netherlands", "Sweden", "Norway", "Denmark", "Switzerland", "Austria" ] # Recipient format templates recipient_templates = [ "Recipient: {name}", "To: {name}", "Name: {name}", "Deliver to: {name}", "For: {name}", "{name}", "Contact: {name}", "Addressee: {name}", "Ship to: {name}", "Customer: {name}", "Recipient {name}", "To {name}", "Deliver to {name}", "For {name}", "Attn: {name}", "ATTN {name}", "C/O {name}", "Care of {name}", "Send to: {name}", "Mail to: {name}" ] # Phone number format templates phone_templates = [ "Tel: {phone}", "Phone: {phone}", "Mobile: {phone}", "Cell: {phone}", "Contact: {phone}", "Ph: {phone}", "Call: {phone}", "{phone}", "Telephone: {phone}", "Mob: {phone}", "Phone number: {phone}", "Tel #{phone}", "Contact #: {phone}", "Mobile #: {phone}", "Cell phone: {phone}", "Office: {phone}", "Home: {phone}", "Work: {phone}", "Primary: {phone}", "Main: {phone}" ] # Generate US phone numbers def generate_us_mobile(): """Generate realistic US mobile phone numbers""" area_codes = ['415', '650', '408', '510', '925', '707', '209', '559', '661', '805', '818', '323', '213', '310', '424', '562', '714', '949', '760', '858', '619', '951', '909', '626', '202', '301', '240', '410', '443', '667', '717', '412', '484', '610', '215', '267', '570', '610', '302', '914', '516', '631', '917', '212', '347', '646', '718', '929'] area_code = random.choice(area_codes) exchange = random.randint(200, 999) number = random.randint(1000, 9999) # Random format selection formats = [f"{area_code}-{exchange}-{number}", f"({area_code}) {exchange}-{number}", f"{area_code}.{exchange}.{number}", f"{area_code}{exchange}{number}"] return random.choice(formats) # Generate international phone numbers def generate_international_phone(): """Generate international phone numbers""" country_codes = ['+1', '+44', '+49', '+33', '+39', '+34', '+61', '+64', '+81', '+82', '+91', '+55', '+52', '+31', '+46', '+47', '+45', '+41', '+43'] country_code = random.choice(country_codes) if country_code == '+1': # North America return generate_us_mobile() elif country_code == '+44': # UK return f"{country_code} {random.randint(1000, 9999)} {random.randint(100000, 999999)}" else: # Other countries digits = ''.join([str(random.randint(0, 9)) for _ in range(random.randint(8, 10))]) return f"{country_code} {digits}" # Generate realistic backup data without API calls def generate_backup_address(location: str, is_international: bool = False): """Generate realistic address data without API calls""" name = random.choice(BACKUP_NAMES) street = random.choice(BACKUP_STREETS) street_number = random.randint(1, 9999) if is_international: if location in BACKUP_INTERNATIONAL: city = random.choice(BACKUP_INTERNATIONAL[location]["cities"]) district = random.choice(BACKUP_INTERNATIONAL[location]["areas"]) else: city = f"{location.split()[0]} City" district = "Downtown District" else: # US address if location in BACKUP_CITIES_US: city = random.choice(BACKUP_CITIES_US[location]) else: city = f"{location.split()[0]} City" districts = ["County", "Downtown", "Uptown", "Midtown", "East Side", "West Side", "North End", "South District"] district = f"{random.choice(districts)}" # Add apartment/suite sometimes apartment_types = ["Apt", "Suite", "Unit", "#", "Room"] if random.random() < 0.3: # 30% chance of apartment apt_num = random.randint(1, 999) specific_location = f"{street_number} {street}, {random.choice(apartment_types)} {apt_num}" else: specific_location = f"{street_number} {street}" retRecommendations for Data Collection in Production Applications return generate_backup_address(location, is_international) if is_international: prompt = f"""Generate recipient information for {location}, including: 1. A realistic name from that region/culture (diversify names, avoid repetition) 2. A real city name in {location} 3. A district/region/postal area within that city 4. A specific street address (street name + number, building/apartment, etc.) Return only valid JSON format: {{"name": "recipient name", "city": "city name", "district": "district/area", "specific_location": "detailed address"}} Make names culturally appropriate and diverse. No explanations, only JSON.""" else: prompt = f"""Generate recipient information for {location}, USA, including: 1. A realistic American name (diverse ethnic backgrounds, avoid repetition) 2. A real city name in {location} state 3. A county or district within that area 4. A specific street address (street name + number, apt/suite if appropriate) Return only valid JSON format: {{"name": "recipient name", "city": "city name", "district": "county/district", "specific_location": "detailed address"}} Diversify names ethnically. No explanations, only JSON.""" try: response = await client.chat.completions.create( messages=[{"role": "user", "content": prompt}], model="qwen-plus-latest", temperature=1.3, # High temperature for diversity max_tokens=400, # Limit tokens for JSON only ) result = response.choices[0].message.content.strip() # Clean result more aggressively if "```json" in result: result = result.split("```json")[1].split("```")[0].strip() elif "```" in result: result = result.split("```")[1].split("```")[0].strip() # Remove any non-JSON text before/after start_idx = result.find('{') end_idx = result.rfind('}') if start_idx != -1 and end_idx != -1: result = result[start_idx:end_idx+1] # Parse JSON info = json.loads(result) # Validate required fields required_fields = ['name', 'city', 'district', 'specific_location'] if all(field in info and info[field].strip() for field in required_fields): print(f"Successfully generated data for {location}: {info['name']}") return info, is_international else: raise ValueError("Missing required fields in response") except json.JSONDecodeError as e: print(f"JSON decode error for {location}: {e}") print(f"Raw response: {result[:100]}...") return generate_backup_address(location, is_international) except Exception as e: print(f"API call failed for {location}: {e}") return generate_backup_address(location, is_international) # Generate a single record async def generate_record(): # 30% chance for international address, 70% US is_international = random.random() < 0.3 if is_international: # Exclude US from international list intl_countries = [c for c in countries if c != "United States"] location = random.choice(intl_countries) else: location = random.choice(states) # Generate recipient and address info using LLM info, is_intl = await generate_recipient_and_address_by_llm(location, is_international) # Generate recipient info format recipient = random.choice(recipient_templates).format(name=info['name']) # Generate phone number (80% mobile, 20% international if international address) if is_intl and random.random() < 0.6: phone = generate_international_phone() else: phone = generate_us_mobile() phone_info = random.choice(phone_templates).format(phone=phone) # Assemble address if is_intl: full_address = f"{info['specific_location']}, {info['district']}, {info['city']}, {location}" else: # US format zip_code = ''.join([str(random.randint(0, 9)) for _ in range(5)]) full_address = f"{info['specific_location']}, {info['city']}, {location} {zip_code}" # Assemble data components = [recipient, phone_info, full_address] # Randomly shuffle order random.shuffle(components) # Random separator selection separators = [' | ', ', ', '; ', ' - ', '\t', ' ', ' / ', ' \\ ', '\n', ' :: ', ' -- '] separator = random.choice(separators) # Combine data combined_data = separator.join(components) return combined_data # Generate batch data async def generate_batch_data(count: int) -> List[str]: """Generate specified amount of data""" print(f"Starting to generate {count} records...") data = [] # Use semaphore to control concurrency semaphore = asyncio.Semaphore(15) async def generate_single_record(index): async with semaphore: record = await generate_record() print(f"Generated record {index+1}: {record}") return record # Concurrent generation tasks = [generate_single_record(i) for i in range(count)] data = await asyncio.gather(*tasks) return data # Save data to file def save_data(data: List[str], filename: str = "recipient_data.json"): """Save data to JSON file""" with open(filename, 'w', encoding='utf-8') as f: json.dump(data, f, ensure_ascii=False, indent=2) print(f"Data saved to {filename}") # Data generation phase async def produce_data_phase(): print("=== Phase 1: Starting recipient data generation ===") # Generate 2000 records batch_size = 2000 data = await generate_batch_data(batch_size) # Save data save_data(data) print(f"\nGenerated total of {len(data)} records") print("\nSample data:") for i, record in enumerate(data[:3]): # Show first 3 as examples print(f"{i+1}. Raw data: {record}") print() print("=== Phase 1 Complete ===\n") return True def get_system_prompt(): """Return system prompt""" return """You are a professional information extraction assistant specialized in extracting structured recipient information from English text. ## Task Description Please extract and generate JSON format output containing the following six fields based on the given input text: - country: Country name (complete official name, like "United States", "United Kingdom", "Germany", etc.) - state_province: State/Province/Region name (for US: state like "California", for others: province/region) - city: City name - district: District/County/Area name (like "Los Angeles County", "Downtown", etc.) - specific_location: Specific address (street, number, apartment, building details) - name: Recipient full name - phone: Contact phone number (complete phone number with area code) ## Extraction Rules 1. **Address Information Processing**: - Must accurately identify country, state/province, city hierarchy - Use complete official country names - For international addresses, adapt state_province field to local administrative divisions - specific_location should include detailed street address, building names, apartment numbers, etc. 2. **Name Recognition**: - Extract complete names including international names - Handle various cultural naming conventions 3. **Phone Number Processing**: - Extract complete phone numbers, maintain original format - Handle international formats with country codes ## Output Format Please strictly follow this JSON format, do not add any explanatory text: { "country": "country name", "state_province": "state/province name", "city": "city name", "district": "district/area name", "specific_location": "detailed address", "name": "recipient name", "phone": "contact phone" }""" # Use LLM to predict structured data (improved version) async def predict_structured_data(raw_data: str): """Use LLM to predict structured data with better error handling""" if client is None: print(f"No API client available, using backup parser") return predict_structured_data_backup(raw_data) system_prompt = """Extract recipient information from the text and return ONLY valid JSON with these exact keys: - country: Country name - state_province: State/Province name - city: City name - district: District/County/Area - specific_location: Street address - name: Recipient name - phone: Phone number Return only the JSON object, no other text.""" try: # Try different models based on API provider model = "qwen-plus-latest" extra_params = {} response = await client.chat.completions.create( messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": raw_data} ], model=model, temperature=0.1, max_tokens=300, **extra_params ) result = response.choices[0].message.content.strip() # Clean result if "```json" in result: result = result.split("```json")[1].split("```")[0].strip() elif "```" in result: result = result.split("```")[1].split("```")[0].strip() # Extract JSON portion start_idx = result.find('{') end_idx = result.rfind('}') if start_idx != -1 and end_idx != -1: result = result[start_idx:end_idx+1] structured_data = json.loads(result) # Validate structure required_keys = ["country", "state_province", "city", "district", "specific_location", "name", "phone"] for key in required_keys: if key not in structured_data: structured_data[key] = "" return structured_data except json.JSONDecodeError as e: print(f"JSON decode error in prediction: {e}") print(f"Raw response: {result[:100]}...") return predict_structured_data_backup(raw_data) except Exception as e: print(f"LLM prediction failed: {e}") return predict_structured_data_backup(raw_data) # Data conversion phase async def convert_data_phase(): """Convert data format and use LLM to predict structured data""" print("=== Phase 2: Starting data format conversion ===") try: print("Reading recipient_data.json file...") # Read raw data with open('recipient_data.json', 'r', encoding='utf-8') as f: raw_data_list = json.load(f) print(f"Successfully read data, total {len(raw_data_list)} records") print("Starting to use qwen3-235b-a22b to predict structured data...") system_prompt = """You are a professional information extraction assistant specialized in extracting recipient JSON information from English text. The keys include: country (country name), state_province (state/province), city (city name), district (district/county/area), specific_location (street, number, apartment, building details), name (recipient name), phone (contact phone). Input text:""" output_file = 'recipient_sft_data.json' # Use semaphore to control concurrency semaphore = asyncio.Semaphore(10) async def process_single_item(index, raw_data): async with semaphore: # Use LLM to predict structured data with fallback structured_data = await predict_structured_data(raw_data) print(f"Processing record {index+1}: {raw_data}") conversation = { "instruction": system_prompt + raw_data, "output": json.dumps(structured_data, ensure_ascii=False) } return conversation print(f"Starting data conversion to {output_file}...") # Process all data concurrently tasks = [process_single_item(i, raw_data) for i, raw_data in enumerate(raw_data_list)] conversations = await asyncio.gather(*tasks) with open(output_file, 'w', encoding='utf-8') as outfile: json.dump(conversations, outfile, ensure_ascii=False, indent=4) print(f"Conversion complete! Processed {len(raw_data_list)} records") print(f"Output file: {output_file}") print("=== Phase 2 Complete ===") except FileNotFoundError: print("Error: Cannot find recipient_data.json file") sys.exit(1) except json.JSONDecodeError as e: print(f"JSON parsing error: {e}") sys.exit(1) except Exception as e: print(f"Error during conversion: {e}") sys.exit(1) # Main function async def main(): print("Starting combined data processing pipeline...") print("This program will execute two phases sequentially:") print("1. Generate raw recipient data") print("2. Use qwen3-235b-a22b to predict structured data and convert to SFT training format") print("-" * 50) # Phase 1: Generate data success = await produce_data_phase() if success: # Phase 2: Convert data await convert_data_phase() print("\n" + "=" * 50) print("All processes completed!") print("Generated files:") print("- recipient_data.json: Raw data list") print("- recipient_sft_data.json: SFT training format data") print("=" * 50) else: print("Data generation phase failed, terminating execution") if __name__ == '__main__': # Set event loop policy if platform.system() == 'Windows': asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) # Run main coroutine asyncio.run(main(), debug=False)

Fine-tune the model

In the navigation pane on the left, click Model Gallery. Search for the Qwen3-0.6B tab and click Train.
Configure the training task parameters. You need to configure only the following key parameters. You can use the default values for the other parameters.
- Training Method: The default value is SFT (Supervised Fine-Tuning), which uses the LoRA fine-tuning method.
  LoRA is an efficient model fine-tuning technique that modifies only a portion of the model's parameters to conserve training resources.
- Training Dataset: First, click train_qwen3.json to download the sample training dataset. Then, select OSS file or directory, click the icon to select a Bucket, click Upload File to upload the downloaded training dataset to OSS, and then select the file.
- Validation Dataset: Click eval_qwen3.json to download the validation set, click Add Validation Dataset, and then follow the same procedure as for the training dataset to upload and select the file.
  The validation set is used to evaluate the model's performance during training and helps assess its performance on unseen data.
- Model Output Path: The fine-tuned model is stored in OSS by default. If the OSS folder is empty, you must click Create A New Folder and specify the new folder.
- Resource Group Type: Select Public Resource Group. This fine-tuning task requires approximately 5 GB of GPU memory. The console has already filtered the specifications that meet this requirement. Select a specification, such as ecs.gn7i-c16g1.4xlarge.
- Hyperparameter Configuration:
  - learning_rate: Set to 0.0005
  - num_train_epochs: Set to 4
  - per_device_train_batch_size: Set to 8
  - seq_length: Set to 512
  Then, click Train > OK. The training task enters the Creating state. When the status changes to Running, the model fine-tuning process begins.
View the training task and wait for the training to complete. Model fine-tuning takes approximately 10 minutes. During this process, the task details page displays task logs and metric curves. After the training is complete, the fine-tuned model is stored in the specified OSS directory.
To view the training task details later, in the navigation pane on the left, click Model Gallery > Task Management > Training Tasks, and then click the task name.
(Optional) Adjust hyperparameters based on the loss graph to improve model performance
On the task details page, you can view the train_loss curve, which reflects the training set loss, and the eval_loss curve, which reflects the validation set loss:
You can make an initial judgment about the training effectiveness of the current model based on the trend of the loss values:
- train_loss and eval_loss are still decreasing when the training ends (underfitting)
  You can increase the num_train_epochs parameter (number of training epochs, which is positively correlated with training depth) or appropriately increase the value of lora_rank (the rank of the low-rank matrix; a larger rank allows the model to express more complex tasks but is more prone to overfitting) and then train the model again to improve its fit to the training data.
- train_loss continues to decrease, but eval_loss starts to increase before the training ends (overfitting)
  You can decrease the num_train_epochs parameter or appropriately decrease the value of lora_rank and then train the model again to prevent it from overfitting.
- Both train_loss and eval_loss are stable before the training ends (good fit)
  When the model is in this state, you can proceed to the next steps.

Deploy the fine-tuned model

On the Model Gallery > Job Management > Training Jobs page, click the model you trained, and click Deploy button to open the deployment configuration page. Set Resource Type to Public Resources. Deploying the 0.6B model requires approximately 5 GB of GPU memory. In the Deployment Resources section, the console has filtered the specifications that meet this requirement. Select a specification, such as ecs.gn7i-c8g1.2xlarge. Keep the default values for the other parameters, and then click Deploy > OK.

The deployment process takes approximately 5 minutes. The deployment is successful when the status changes to Running.

To view the training task details later, in the navigation pane on the left, click Model Gallery > Task Management > Training Tasks, and then click the task name.

After the training task is successful, if the Deploy button is not clickable, it indicates that the output model is still being registered. You need to wait for approximately 1 minute.

The subsequent steps for invoking the model are the same as those in Invoke the model.

Verify the performance of the fine-tuned model

Before you deploy the fine-tuned model to a production environment, you must systematically evaluate its performance to ensure stability and accuracy and to prevent unexpected issues after it goes online.

Prepare test data

Prepare test data that does not overlap with the training data to test the model's performance. This topic has prepared a test dataset for you that is downloaded automatically when you run the accuracy test code below.

Test data samples must not overlap with training data. This provides a more accurate reflection of the model's generalization ability on new data and prevents inflated scores due to "seen samples."

Design evaluation metrics

Evaluation standards must align closely with actual business goals. In this topic, for example, in addition to determining whether the generated JSON string is valid, you must also check whether the corresponding key-value pairs are correct.

You must define evaluation metrics programmatically. For an implementation of the evaluation metrics in this example, see the compare_address_info method in the accuracy test code below.

Verify the performance of the fine-tuned model

Run the following test code to output the model's accuracy on the test dataset.

Example code for testing model accuracy

Note: Replace the Token and endpoint with the actual invocation information that you obtained earlier.

from openai import AsyncOpenAI import requests import json import asyncio import os # Set the Token as an environment variable to prevent sensitive information leaks. # For more information about how to configure environment variables, see https://help.aliyun.com/zh/sdk/developer-reference/configure-the-alibaba-cloud-accesskey-environment-variable-on-linux-macos-and-windows-systems client = AsyncOpenAI( api_key=os.getenv("Token"), base_url="Your Endpoint/v1" ) # You can also call the Qwen3-0.6b model in Model Studio to test the accuracy of the original model. Note that you need to change model="Qwen3-0.6B" to model="qwen3-0.6b". # client = AsyncOpenAI( # api_key=os.getenv("DASHSCOPE_API_KEY"), # base_url="https://dashscope.aliyuncs.com/compatible-mode/v1" # ) system_prompt = """You are a professional information extraction assistant specializing in extracting structured recipient information from Chinese text. ## Task Description Based on the given input text, accurately extract and generate a JSON output containing the following six fields: - province: Province/Municipality/Autonomous Region (must be the full official name, such as "Henan Province", "Shanghai City", "Xinjiang Uygur Autonomous Region", etc.) - city: City name (including "City", such as "Zhengzhou City", "Xi'an City", etc.) - district: District/County name (including "District", "County", etc., such as "Jinshui District", "Yanta District", etc.) - specific_location: Specific address (street, house number, community, building, etc.) - name: Recipient's full Chinese name - phone: Full contact phone number, including area code ## Extraction Rules 1. **Address Information Processing**: - Must accurately identify the hierarchical relationship of province, city, and district - Province names must use the official full name (e.g., "Henan Province" not "Henan") - For municipalities, the province and city fields should be the same (e.g., both "Shanghai City") - specific_location should contain the detailed street address, community name, building number, etc. 2. **Name Recognition**: - Accurately extract the full Chinese name, including compound surnames - Include names of ethnic minorities 3. **Phone Number Processing**: - Extract the complete phone number, maintaining its original format ## Output Format Strictly follow the JSON format below, without adding any explanatory text: { "province": "Province Name", "city": "City Name", "district": "District Name", "specific_location": "Detailed Address", "name": "Recipient Name", "phone": "Contact Phone" }""" def compare_address_info(actual_address_str, predicted_address_str): """Compare if two JSON strings representing address information are the same""" try: # Parse the actual address information if actual_address_str: actual_address_json = json.loads(actual_address_str) else: actual_address_json = {} # Parse the predicted address information if predicted_address_str: predicted_address_json = json.loads(predicted_address_str) else: predicted_address_json = {} # Directly compare if the two JSON objects are identical is_same = actual_address_json == predicted_address_json return { "is_same": is_same, "actual_address_parsed": actual_address_json, "predicted_address_parsed": predicted_address_json, "comparison_error": None } except json.JSONDecodeError as e: return { "is_same": False, "actual_address_parsed": None, "predicted_address_parsed": None, "comparison_error": f"JSON parsing error: {str(e)}" } except Exception as e: return { "is_same": False, "actual_address_parsed": None, "predicted_address_parsed": None, "comparison_error": f"Comparison error: {str(e)}" } async def predict_single_conversation(conversation_data): """Predict the label for a single conversation""" try: # Extract user content (remove assistant message) messages = conversation_data.get("messages", []) user_content = None for message in messages: if message.get("role") == "user": user_content = message.get("content", "") break if not user_content: return {"error": "User message not found"} response = await client.chat.completions.create( model="Qwen3-0.6B", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_content} ], response_format={"type": "json_object"}, extra_body={ "enable_thinking": False } ) predicted_labels = response.choices[0].message.content.strip() return {"prediction": predicted_labels} except Exception as e: return {"error": f"Prediction failed: {str(e)}"} async def process_batch(batch_data, batch_id): """Process a batch of data""" print(f"Processing batch {batch_id}, containing {len(batch_data)} data entries...") tasks = [] for i, conversation in enumerate(batch_data): task = predict_single_conversation(conversation) tasks.append(task) results = await asyncio.gather(*tasks, return_exceptions=True) batch_results = [] for i, result in enumerate(results): if isinstance(result, Exception): batch_results.append({"error": f"Exception: {str(result)}"}) else: batch_results.append(result) return batch_results async def main(): output_file = "predicted_labels.jsonl" batch_size = 20 # Amount of data to process per batch # Read the test data url = 'https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250616/ssrgii/test.jsonl' conversations = [] try: response = requests.get(url) response.raise_for_status() # Check if the request was successful for line_num, line in enumerate(response.text.splitlines(), 1): try: data = json.loads(line.strip()) conversations.append(data) except json.JSONDecodeError as e: print(f"JSON parsing error on line {line_num}: {e}") continue except requests.exceptions.RequestException as e: print(f"Request error: {e}") return print(f"Successfully read {len(conversations)} conversation data entries") # Process in batches all_results = [] total_batches = (len(conversations) + batch_size - 1) // batch_size for batch_id in range(total_batches): start_idx = batch_id * batch_size end_idx = min((batch_id + 1) * batch_size, len(conversations)) batch_data = conversations[start_idx:end_idx] batch_results = await process_batch(batch_data, batch_id + 1) all_results.extend(batch_results) print(f"Batch {batch_id + 1}/{total_batches} complete") # Add a small delay to avoid rapid requests if batch_id < total_batches - 1: await asyncio.sleep(1) # Save results same_count = 0 different_count = 0 error_count = 0 with open(output_file, 'w', encoding='utf-8') as f: for i, (original_data, prediction_result) in enumerate(zip(conversations, all_results)): result_entry = { "index": i, "original_user_content": None, "actual_address": None, "predicted_address": None, "prediction_error": None, "address_comparison": None } # Extract original user content messages = original_data.get("messages", []) for message in messages: if message.get("role") == "user": result_entry["original_user_content"] = message.get("content", "") break # Extract actual address information (if assistant message exists) for message in messages: if message.get("role") == "assistant": result_entry["actual_address"] = message.get("content", "") break # Save prediction result if "error" in prediction_result: result_entry["prediction_error"] = prediction_result["error"] error_count += 1 else: result_entry["predicted_address"] = prediction_result.get("prediction", "") # Compare address information comparison_result = compare_address_info( result_entry["actual_address"], result_entry["predicted_address"] ) result_entry["address_comparison"] = comparison_result # Tally comparison results if comparison_result["comparison_error"]: error_count += 1 elif comparison_result["is_same"]: same_count += 1 else: different_count += 1 f.write(json.dumps(result_entry, ensure_ascii=False) + '\n') print(f"All predictions are complete! The results have been saved to {output_file}") # Tally results success_count = sum(1 for result in all_results if "error" not in result) prediction_error_count = len(all_results) - success_count print(f"Number of samples: {success_count}") print(f"Correct responses: {same_count}") print(f"Incorrect responses: {different_count}") print(f"Accuracy: {same_count * 100 / success_count} %") if __name__ == "__main__": asyncio.run(main())

Output:

All predictions are complete! The results have been saved to predicted_labels.jsonl Number of samples: 400 Correct responses: 361 Incorrect responses: 39 Accuracy: 91.25 %

Due to the random seed in model fine-tuning and the randomness of the large language model's output, the accuracy that you test may differ from the results in this topic. This is expected behavior.

The accuracy is 91.25%, which is a significant improvement over the 14% accuracy of the original Qwen3-0.6B model. This indicates that the fine-tuned model has significantly enhanced its ability to extract structured information in the logistics form-filling domain.

Important reminder

This topic uses public resources to create the model service, which is billed on a pay-as-you-go basis. To avoid incurring further charges, remember to stop or delete the service when you no longer need it.

References

For more information about Model Gallery features such as evaluation and compression, see Model Gallery.
For more information about EAS features such as Auto Scaling, stress testing, and monitoring and alerting, see EAS overview.

Prerequisites

Billing description

Model deployment

Deploy the model

Invoke the model

Online debugging

Use the Cherry Studio client

Use the Python SDK

Model fine-tuning

Scenario example

Prepare the data

LLM Generation

Fine-tune the model

Deploy the fine-tuned model

Verify the performance of the fine-tuned model

Prepare test data

Design evaluation metrics

Verify the performance of the fine-tuned model

Important reminder

References