Tongyi Qianwen (Qwen)

Full-range, open-source, multimodal, and multi-functional

Alibaba Cloud Launched Qwen3-Max - Our Largest and Most Capable Model to Date.

About Qwen

Alibaba Cloud provides Tongyi Qianwen (Qwen) models, a series of large language models (LLMs) and multimodal models (MLLMs), to the open-source community. The latest Qwen3 models adopt hybrid thinking modes ("Thinking" and "Non-Thinking") that allow you to flexibly control reasoning performance, speed, and costs. Lightweight yet powerful, Qwen3 models achieve competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models. With support for 119 languages/dialects and Model Context Protocol (MCP), Qwen3 breaks language barriers and enhances Agent capabilities, setting a new benchmark for open-source and proprietary AI models worldwide.

Multimodal models, including Qwen-VL (large language vision model), Qwen-TTS (text-to-speech model), and Qwen-Audio, support cross-modal processing, while the latest Qwen3-Omni is capable of processing and reasoning across text, audio, and vision modalities simultaneously.

Leading Performance in Multiple Dimensions

Qwen outperforms other open-source baseline models of similar sizes on a series of benchmark datasets that evaluate natural language understanding, mathematical problem-solving, coding, etc.

Easy and Low-Cost Customization

You can deploy Qwen models with a few clicks in PAI-EAS, and fine-tune them with your data stored on Alibaba Cloud, or external sources, to perform industry or enterprise-specific tasks.

Applications for Generative AI Era

You can leverage Qwen APIs to build generative AI applications for a broad range of scenarios such as writing, image generation, audio analysis, etc. to improve work efficiency in your organization and transform customer experience.

What Qwen Can Do

Analyzing and Understanding

Generating Content

Editing Visuals

Coding

Multimodal Communication

Using Tools (Agentic Capabilities)

Understand Structured Data

Qwen understands structured data (such as tables) better. This contributes to extracting insightful information from structured data, helping users perform queries, and generating new datasets. For example, Qwen2.5-72B can provide formatted outputs based on the requirement and input data (table in JSON format).

Identify Objects in Images

Qwen-VL learns and analyzes objects and texts in images and creates new content based on its learning. For example, it can recognize the woman and the dog in the picture and their gestures (high five).

Analyze Images

QVQ can analyze and understand the content that the two images depict in real life, understand the user request, analyze the content, and use CoT to “think” and find the connection between the content in these images.

Understand and Analyze Audio

Qwen-Audio can accept diverse types of audio (such as human speech, natural sounds, instrumental music, and songs) and text as inputs, understand the audio content, and summarize information such as music genres and emotions of the speaker. It can also use tools to edit the audio files.

Understand Multimodal Data

You can use Qwen models (Qwen, Qwen-Audio, and Qwen-VL, etc.) to build a chat assistant that interacts with users intelligently and comprehensively and understands multimodal data, including text, images, audio, and videos.

Generate Long Text

Qwen2.5 significantly improves long text generation, increasing from 1K to over 8K tokens. For example, Qwen2.5-72B can write a report of over 5,000 Chinese characters on the requested subject.

Generate Images

Based on text prompts and input images, Qwen-VL can produce high-quality images in various styles and genres (for example, a cartoon-style human portrait) for different industry-specific scenarios.

Generate Videos

Wan excels at generating realistic videos by accurately handling complex movements, enhancing pixel quality, adhering to physical principles, and optimizing the precision of instruction execution.

Edit Images with Higher Quality and Efficiency

Qwen-Image-Edit can add, remove, or modify elements in images, including texts in both Chinese and English. It can also perform high-level visual semantic editing (such as IP creation, object rotation, and style transfer, allowing overall pixel changes while maintaining semantic consistency.)

Generate Code in Cline

Qwen3-Coder generates code in Cline to create an interactive animation of color explosion effects.

Build Websites

Qwen3 provides the "Web Dev" mode that can help you create a personalized website based on simple input.

Generate JSON Code

Qwen 2.5 offers improved and more reliable generation of structured outputs. It can generate JSON code step by step with explanations as requested.

See, Hear, Talk, Write, Do It All!

Qwen-Omni provides exciting multimodal features, including video and audio interactions, image analysis, text communication, etc. For example, Qwen-Omni can provide real-time guidance on how to complete and improve a painting, identify multiple persons in a video, or describe real-life scenes and provide advice based on request.

Think and Interact with the Environment Using Tools

Qwen3-32B uses tools to fetch information (Repository Stars) from a GitHub page and produce a bar chart to illustrate the information, then organize local desktop files by file type.

What Qwen Can Do

Analyzing and Understanding

Understand Structured Data

Qwen understands structured data (such as tables) better. This contributes to extracting insightful information from structured data, helping users perform queries, and generating new datasets. For example, Qwen2.5-72B can provide formatted outputs based on the requirement and input data (table in JSON format).

Identify Objects in Images

Qwen-VL learns and analyzes objects and texts in images and creates new content based on its learning. For example, it can recognize the woman and the dog in the picture and their gestures (high five).

Analyze Images

QVQ can analyze and understand the content that the two images depict in real life, understand the user request, analyze the content, and use CoT to “think” and find the connection between the content in these images.

Understand and Analyze Audio

Qwen-Audio can accept diverse types of audio (such as human speech, natural sounds, instrumental music, and songs) and text as inputs, understand the audio content, and summarize information such as music genres and emotions of the speaker. It can also use tools to edit the audio files.

Understand Multimodal Data

You can use Qwen models (Qwen, Qwen-Audio, and Qwen-VL, etc.) to build a chat assistant that interacts with users intelligently and comprehensively and understands multimodal data, including text, images, audio, and videos.

Generating Content

Generate Long Text

Qwen2.5 significantly improves long text generation, increasing from 1K to over 8K tokens. For example, Qwen2.5-72B can write a report of over 5,000 Chinese characters on the requested subject.

Generate Images

Based on text prompts and input images, Qwen-VL can produce high-quality images in various styles and genres (for example, a cartoon-style human portrait) for different industry-specific scenarios.

Generate Videos

Wan excels at generating realistic videos by accurately handling complex movements, enhancing pixel quality, adhering to physical principles, and optimizing the precision of instruction execution.

Editing Visuals

Edit Images with Higher Quality and Efficiency

Qwen-Image-Edit can add, remove, or modify elements in images, including texts in both Chinese and English. It can also perform high-level visual semantic editing (such as IP creation, object rotation, and style transfer, allowing overall pixel changes while maintaining semantic consistency.)

Coding

Generate Code in Cline

Qwen3-Coder generates code in Cline to create an interactive animation of color explosion effects.

Build Websites

Qwen3 provides the "Web Dev" mode that can help you create a personalized website based on simple input.

Generate JSON Code

Qwen 2.5 offers improved and more reliable generation of structured outputs. It can generate JSON code step by step with explanations as requested.

Multimodal Communication

See, Hear, Talk, Write, Do It All!

Qwen-Omni provides exciting multimodal features, including video and audio interactions, image analysis, text communication, etc. For example, Qwen-Omni can provide real-time guidance on how to complete and improve a painting, identify multiple persons in a video, or describe real-life scenes and provide advice based on request.

Using Tools (Agentic Capabilities)

Think and Interact with the Environment Using Tools

Qwen3-32B uses tools to fetch information (Repository Stars) from a GitHub page and produce a bar chart to illustrate the information, then organize local desktop files by file type.

Try Qwen Models on Alibaba Cloud Model Studio

Learn More

Qwen on Open-Source Communities

Hugging Face is an open-source community advancing AI collaboration through tools like the Transformers library, enabling easy access to pre-trained NLP, vision, and generative models.

Qwen on Hugging Face

ModelScope, developed by Alibaba, is an open-source platform offering diverse AI models (NLP, CV, multimodal, etc.) and tools to streamline model development, deployment, and sharing.

Qwen on ModelScope

GitHub is the world’s largest open-source hub for code collaboration, providing version control, issue tracking, and community-driven development across software and AI projects.

Qwen on GitHub

About Qwen

Leading Performance in Multiple Dimensions

Easy and Low-Cost Customization

Applications for Generative AI Era

What Qwen Can Do

What Qwen Can Do

Analyzing and Understanding

Generating Content

Editing Visuals

Coding

Multimodal Communication

Using Tools (Agentic Capabilities)

Try Qwen Models on Alibaba Cloud Model Studio

Qwen on Open-Source Communities

Friends of Qwen