Try Qwen3 VL 2B - Instruct in the Workbench
Run this model interactively, tune parameters, and compare outputs.
qwen3-vl-2b-instruct
Qwen/Qwen3-VL-2B-Instruct is a multimodal LLM that excels at lightweight vision‑language tasks such as visual question answering, document and UI understanding, and general image‑grounded chat, while being small enough for edge or resource‑constrained environments.
Some other noteworthy use cases of Qwen/Qwen3-VL-2B-Instruct include OCR and document analysis across many languages, and agentic interactions that involve interpreting screen content or layouts before deciding on actions.
| Metric | Value |
|---|---|
| Parameter Count | 2 billion |
| Mixture of Experts | No |
| Context Length | 256,000 tokens |
| Multilingual | Yes |
| Quantized* | No |
Example request
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.