|
| 1 | +# Qwen3-Coder-30B-A3B |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +The newly released Qwen3-Coder-30B-A3B employs a sparse MoE architecture for efficient training and inference, delivering significant optimizations in agentic coding, extended context support of up to 1M tokens, and versatile function calling. |
| 6 | + |
| 7 | +This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation. |
| 8 | + |
| 9 | +## Supported Features |
| 10 | + |
| 11 | +Refer to [supported features](../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix. |
| 12 | + |
| 13 | +Refer to [feature guide](../user_guide/feature_guide/index.md) to get the feature's configuration. |
| 14 | + |
| 15 | +## Environment Preparation |
| 16 | + |
| 17 | +### Model Weight |
| 18 | + |
| 19 | +- `Qwen3-Coder-30B-A3B-Instruct`(BF16 version): require 1 Atlas 800 A3 (64G × 16) nodes or 1 Atlas 800 A2 (64G/32G × 8) nodes. [Download model weight](https://modelers.cn/models/Modelers_Park/Qwen3-Coder-30B-A3B-Instruct) |
| 20 | + |
| 21 | +It is recommended to download the model weight to the shared directory of multiple nodes, such as `/root/.cache/` |
| 22 | + |
| 23 | +### Installation |
| 24 | + |
| 25 | +`Qwen3-Coder` is first supported in `vllm-ascend:v0.10.0rc1`, please run this model using a later version. |
| 26 | + |
| 27 | +You can using our official docker image to run `Qwen3-Coder-30B-A3B-Instruct` directly. |
| 28 | + |
| 29 | +- Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker). |
| 30 | + |
| 31 | + |
| 32 | +In addition, if you don't want to use the docker image as above, you can also build all from source: |
| 33 | + |
| 34 | +- Install `vllm-ascend` from source, refer to [installation](../installation.md). |
| 35 | + |
| 36 | + |
| 37 | +## Deployment |
| 38 | + |
| 39 | +### Single-node Deployment |
| 40 | + |
| 41 | +Run the following script to execute online inference. |
| 42 | + |
| 43 | +For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 2, and for 32 GB of memory, tensor-parallel-size should be at least 4. |
| 44 | + |
| 45 | +```shell |
| 46 | +#!/bin/sh |
| 47 | +export VLLM_USE_MODELSCOPE=true |
| 48 | + |
| 49 | +vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct --served-model-name qwen3-coder --tensor-parallel-size 4 --enable_expert_parallel |
| 50 | +``` |
| 51 | + |
| 52 | +## Functional Verification |
| 53 | + |
| 54 | +Once your server is started, you can query the model with input prompts: |
| 55 | + |
| 56 | +```shell |
| 57 | +curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ |
| 58 | + "model": "qwen3-coder", |
| 59 | + "messages": [ |
| 60 | + {"role": "user", "content": "Give me a short introduction to large language models."} |
| 61 | + ], |
| 62 | + "temperature": 0.6, |
| 63 | + "top_p": 0.95, |
| 64 | + "top_k": 20, |
| 65 | + "max_tokens": 4096 |
| 66 | +}' |
| 67 | +``` |
| 68 | + |
| 69 | +## Accuracy Evaluation |
| 70 | + |
| 71 | +### Using AISBench |
| 72 | + |
| 73 | +1. Refer to [Using AISBench](../developer_guide/evaluation/using_ais_bench.md) for details. |
| 74 | + |
| 75 | +2. After execution, you can get the result, here is the result of `Qwen3-Coder-30B-A3B-Instruct` in `vllm-ascend:0.11.0rc0` for reference only. |
| 76 | + |
| 77 | +| dataset | version | metric | mode | vllm-api-general-chat | |
| 78 | +|----- | ----- | ----- | ----- | -----| |
| 79 | +| openai_humaneval | f4a973 | humaneval_pass@1 | gen | 94.51 | |
| 80 | + |
| 81 | +## Performance |
| 82 | + |
| 83 | +### Using AISBench |
| 84 | + |
| 85 | +Refer to [Using AISBench for performance evaluation](../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details. |
0 commit comments