Use this file to discover all available pages before exploring further.
Try Qwen3 VL 4B - Instruct in the Workbench
Run this model interactively, tune parameters, and compare outputs.
Model ID:qwen3-vl-4b-instructQwen/Qwen3-VL-4B-Instruct is a multimodal LLM that processes both text and images, offering a relatively lightweight option for vision-language tasks while maintaining strong general language capabilities.It excels in visual question answering, document and UI understanding, spatial reasoning over images, and general instruction-following dialogue, making it suitable when you need a compact model that can both see and read.Some other noteworthy use cases of Qwen/Qwen3-VL-4B-Instruct include image captioning and explanation, multimodal coding assistance from designs or screenshots, and agentic visual assistants that can reason about interfaces and complex scenes.
Metric
Value
Parameter Count
4 billion
Mixture of Experts
No
Context Length
256,000 tokens (up to 1M with extension)
Multilingual
Yes
Quantized*
No
*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.