Qwen 3.6 is the latest generation of the Qwen open-weight language model series from Alibaba, released April 2026. It ships in two main flavours — a 27B dense model (recommended default) and a 35B Mixture-of-Experts model with 3B active parameters. Both support vision, agentic coding, thinking-mode preservation, and a 256K-token context window.

What hardware do I need to run Qwen 3.6 locally?

For the recommended Qwen 3.6 27B Q4_K_M quant you need 16 GB VRAM or 16 GB system RAM with CPU offload. For 12 GB VRAM (RTX 3060) the Q3_K_M quant fits completely on GPU and runs at 15-25 tok/s. For 8 GB VRAM the UD-IQ2_XXS quant works at reduced quality. The 35B MoE variants need 24 GB VRAM for Q4_K_M, 22 GB for the NVFP4 quant on RTX 40-series.

What is the difference between Qwen 3.6 27B dense and 35B MoE?

The 27B dense activates all 27B parameters per token — slower per token but consistent quality. The 35B MoE has 3B active parameters per token — faster per token, lower VRAM peak during inference, but the routing introduces variance. For coding tasks the 35B MoE coding-specialised variant outperforms the 27B dense on SWE-bench. For general chat and reasoning the 27B dense is the recommended default.

How do I download Qwen 3.6 GGUF files?

From HuggingFace: unsloth/Qwen3.6-27B-GGUF for the dense model, with files Qwen3.6-27B-Q4_K_M.gguf (16 GB), Qwen3.6-27B-Q3_K_M.gguf (13 GB), Qwen3.6-27B-UD-IQ2_XXS.gguf (8.7 GB), Qwen3.6-27B-Q5_K_M.gguf (18 GB), Qwen3.6-27B-Q6_K.gguf (21 GB), Qwen3.6-27B-Q8_0.gguf (27 GB). For the 35B MoE pull via Ollama: ollama pull qwen3.6 (24 GB). All variants are also one-click downloadable in Locally Uncensored's Model Manager.

Does Qwen 3.6 work with thinking mode?

Yes. Qwen 3.6 preserves thinking-mode reasoning, similar to QwQ and DeepSeek R1. In Locally Uncensored toggle the Think button in the chat input — the visible response keeps the model's reasoning chain hidden but the answer benefits from it. Works with both Ollama's native think parameter and direct provider APIs.

How does Qwen 3.6 compare to Qwen 3.5?

Qwen 3.6 brings vision support, 256K context (up from Qwen 3.5's 128K), better agentic coding scores on SWE-bench, and thinking-mode preservation. The 27B dense Qwen 3.6 outperforms the 35B MoE Qwen 3.5 on most reasoning and coding benchmarks while using less VRAM.

Can I run Qwen 3.6 without an NVIDIA GPU?

Yes via CPU offload but performance is limited. With 32 GB system RAM and a fast CPU you can run the Q4_K_M quant at 1-3 tok/s. AMD GPUs work via ROCm with Ollama. Apple Silicon runs the MLX BF16 variant natively at competitive speeds on M3/M4.

How to Run Qwen 3.6 Locally — 27B Dense, 35B MoE, and Coding Variants

Qwen 3.6 dropped on April 21 2026. Two main families: a 27B dense model that activates every parameter per token and a 35B MoE with 3B active per token. Both ship with vision, agentic coding, thinking-mode preservation, and a 256K context window. This guide covers everything you need to run them locally on consumer hardware.

If you only have time for the short version: install Locally Uncensored, open Model Manager, click Discover, search Qwen 3.6, hit the download arrow on the variant that fits your VRAM. The rest of this post is the long version.

Which Qwen 3.6 Variant Should You Pick?

The biggest decision is dense vs MoE. The second biggest is which quant. The third is whether you want the coding-specialised variant of the 35B MoE.

Dense vs MoE

The 27B dense activates all 27B parameters for every token. Slower per token, but every token gets the full model. Quality is consistent. This is the recommended default for general chat, reasoning, and most coding work.

The 35B MoE only activates 3B parameters per token via routing. Much faster per token (often 2-3x the throughput of the dense at similar quants). VRAM peak during inference is lower than the model size suggests. But routing introduces variance — some tokens get the wrong expert and quality dips. The MoE wins on coding benchmarks (SWE-bench specifically) when you pick the coding-specialised variant.

Quant Comparison Table

All sizes below are the disk footprint of the GGUF file. VRAM usage during inference is roughly file size + 1-2 GB of overhead.

Variant	Quant	Disk	VRAM Target	Quality
27B dense	UD-IQ2_XXS	8.7 GB	8 GB GPU	Good (low-VRAM lifesaver)
27B dense	Q3_K_M	13 GB	12 GB GPU	Very good (RTX 3060 sweet spot)
27B dense	Q4_K_M	16 GB	16 GB GPU	Recommended default
27B dense	UD-Q4_K_XL	16 GB	16 GB GPU	Better quality per GB than Q4_K_M
27B dense	Q5_K_M	18 GB	20 GB GPU	High
27B dense	Q6_K	21 GB	24 GB GPU	Near-lossless
27B dense	Q8_0	27 GB	32 GB GPU	Effectively lossless
35B MoE	Q4_K_M	24 GB	24 GB GPU	Recommended for MoE
35B MoE	NVFP4	22 GB	22 GB GPU (RTX 40+)	Smallest with full quality on Blackwell
35B MoE coding NVFP4	NVFP4	22 GB	22 GB GPU (RTX 40+)	Best coding-bench-per-GB
35B MoE	BF16	71 GB	96 GB GPU	Reference quality
35B MoE	MLX BF16	70 GB	Apple Silicon M3/M4	MLX-optimised

Recommendation by Hardware

8 GB VRAM (RTX 3060 8GB, RTX 4060 8GB): 27B UD-IQ2_XXS — the only quant that fits
12 GB VRAM (RTX 3060 12GB, RTX 3080 Ti, RTX 4070): 27B Q3_K_M — sweet spot, ~15-25 tok/s
16 GB VRAM (RTX 4070 Ti Super, RTX 4080): 27B Q4_K_M or UD-Q4_K_XL — the recommended default
24 GB VRAM (RTX 3090, RTX 4090, RTX 5090): 27B Q6_K for max dense quality, OR 35B MoE Q4_K_M for coding
RTX 40-series Blackwell (5090, 6000 Ada): 35B MoE NVFP4 wins — smallest size with native quality
Apple Silicon M3/M4: Qwen 3.6 35B MoE MLX BF16 via MLX runtime
CPU only with 32 GB RAM: 27B Q4_K_M at 1-3 tok/s — usable for short tasks

Installation Path 1 — Ollama (CLI)

Ollama is the no-frills route if you don’t want a GUI.

Install Ollama from ollama.com
Pull the model: ollama pull qwen3.6:27b (dense Q4_K_M, 16 GB) or ollama pull qwen3.6 (35B MoE Q4_K_M, 24 GB)
For variants: ollama pull qwen3.6:35b-a3b-coding-nvfp4 for the NVFP4 coding model
Chat: ollama run qwen3.6:27b

Ollama’s default 4096-token context window is conservative for Qwen 3.6’s 256K capability. To enable the long context create a Modelfile:

FROM qwen3.6:27b
PARAMETER num_ctx 32768

Then ollama create qwen3.6-long -f Modelfile and use that. 32K is a sane starting point — full 256K eats VRAM aggressively.

Installation Path 2 — Locally Uncensored (GUI)

If you want a one-click experience plus chat, agent mode, image generation, and a/b model compare in the same window, use Locally Uncensored.

Download the v2.4.0 installer for your OS from GitHub Releases
Run the installer (Windows: signed NSIS .exe; Linux: deb / rpm / AppImage)
The first-launch wizard auto-detects Ollama if you have it. If not, the wizard offers a one-click Ollama install.
Open the Model Manager, switch to the Discover tab, sub-tab Text
Search Qwen 3.6 — you’ll see all variants with size + hardware tags
Click the download arrow on the variant matching your VRAM. The download badge in the header shows progress
Once done, switch to Chat, the model picker shows qwen3.6:27b (or your variant). Type a prompt

The new v2.4.0 Settings > Model Storage section lets you redirect the GGUF folder to a custom path — useful for dual-boot setups or NAS-mounted model libraries.

Performance Numbers

Tested on RTX 3060 12 GB (Windows 11, CUDA 12.1) with Qwen 3.6 27B Q3_K_M, 4096-token context, fp16 KV cache:

Workload	tok/s	Notes
Cold first response (model load)	~3	Includes load_duration
Warm chat (50-token answers)	22-26	Steady state
Long-form generation (1000 tokens)	18-20	Sustained
Thinking-mode enabled	15-18	Includes hidden chain-of-thought tokens

For comparison, our broader benchmark of 2026 local models covers the same hardware against Llama 4, GPT-OSS, GLM 4.7, and DeepSeek R1.

Thinking Mode

Qwen 3.6 preserves the thinking-mode lineage from QwQ and DeepSeek R1. The model produces a hidden chain-of-thought before the visible answer. In Locally Uncensored toggle the Think button in the chat input. With Ollama set think: true in the request body. Without thinking the model still answers but skips the deliberation step — faster but lower quality on hard reasoning prompts.

Note: thinking mode adds 1.5-3x to response latency depending on prompt complexity. Worth it for code review, math, multi-step planning. Skip it for casual chat.

Vision Support

Both 27B dense and 35B MoE accept image input. Drag-and-drop a screenshot, photo, or chart into the LU chat input. The model returns a description, transcribes text, identifies objects, or answers questions about the image.

VRAM cost for vision inference is +1-2 GB on top of the base model. So 27B Q3_K_M with vision needs ~14 GB — doesn’t fit on 12 GB GPUs. Use Q3_K_M for text only or step down to UD-IQ2_XXS for vision on 12 GB GPUs.

Coding Performance

The 35B MoE coding-specialised variants (qwen3.6:35b-a3b-coding-nvfp4, qwen3.6:35b-a3b-coding-mxfp8) are tuned on the SWE-bench training data. On the SWE-bench-verified benchmark the coding NVFP4 variant scores in the same ballpark as Claude 3.5 Sonnet at a fraction of the inference cost.

For day-to-day coding inside LU’s Codex agent, the 27B dense Q4_K_M is the better default — consistent quality, no MoE-routing variance. Switch to the 35B MoE coding variant for hard refactors or when SWE-bench-style multi-file changes are involved.

Comparison — Qwen 3.6 vs Qwen 3.5

Feature	Qwen 3.5	Qwen 3.6
Vision	No	Yes (both 27B and 35B)
Context window	128K	256K
Thinking mode	QwQ-only	Preserved across variants
Coding-specific MoE	No	Yes (35B-a3b-coding)
NVFP4 quant	No	Yes (35B MoE)
MLX variant for Apple Silicon	No	Yes (35B MoE BF16 MLX)
Best dense size	27B	27B (denser per benchmark)

Troubleshooting

Out of memory on Q4_K_M

Step down one quant level. Q4_K_M → Q3_K_M (12 GB), Q3_K_M → UD-IQ2_XXS (8 GB). Or reduce context window size in the Modelfile.

"does not support thinking" HTTP 400 error

You’re on an Ollama version older than 0.3.10 with thinking enabled in LU. Update Ollama. v2.4.0 of LU also retries automatically without the thinking flag and warns in-app.

Slow first response, fast subsequent

Normal. The first prompt loads the model into VRAM (load_duration in the Ollama response). Keep the model warm by setting OLLAMA_KEEP_ALIVE=24h as an env var, or in LM Studio set "Keep model loaded for" to infinite.

HuggingFace download 404

If you’re downloading via LU’s search box (not the curated list) and hit a 404, you may have hit the pre-v2.4.0 filename heuristic bug. Update LU to v2.4.0 — that release fixes the doubled-quant-tag issue.