Locally Uncensored v2.2.2 — Coding Agent, MCP Tools, Vision & Thinking Mode

Version 2.2.2 of Locally Uncensored is the release where a chat app becomes an engineering tool. This update adds a built-in coding agent that reads, writes, and executes code directly on your machine, a 13-tool MCP system that gives any LLM real-world capabilities, file upload with vision across every provider, thinking mode that works with any model, granular permissions to control exactly what the AI can touch, and a complete UI overhaul that makes everything 15% larger and easier to use.

This isn't an incremental polish release. It's the foundation for using local LLMs as agents — not just conversational assistants, but tools that can navigate your filesystem, run shell commands, analyze images, and build software. All locally, all private, all under your control.

Let's walk through every major feature.

Coding Agent

The headline feature of v2.2.2 is the Coding Agent — a fully integrated coding environment that turns any LLM into a software development assistant. This isn't a prompt wrapper. It's an autonomous agent loop that can read your codebase, write new files, modify existing ones, and run shell commands to test its work.

The interface introduces a 3-tab system: LU (standard chat), Coding Agent, and OpenClaw (an open-ended task runner). Each tab maintains its own conversation history and context, so you can chat in LU while the Coding Agent works through a multi-step refactor in the background.

How It Works

When you open the Coding Agent tab, you select a project folder using a native folder picker powered by Tauri's Rust backend (no browser file dialogs — the real OS picker). The Coding Agent scans the directory structure and builds a context map of your project. From there, you describe what you want done in plain language.

The agent then enters an autonomous tool loop, calling up to 20 tool iterations per task. In a single session, it might:

Read files to understand existing code structure and conventions
Write or modify files with precise, targeted edits
Run shell commands to install dependencies, run tests, or check build output
Iterate on errors — if a test fails, it reads the error, fixes the code, and retries

The 20-iteration cap exists for safety. Each iteration consumes tokens and potentially modifies your filesystem, so there's a hard stop to prevent runaway loops. In practice, most coding tasks complete in 5–10 iterations. The permissions system (covered below) gives you additional control over what the agent can actually do.

This works with any LLM — local Ollama models, OpenAI, Anthropic, Google Gemini. Smaller models will produce less reliable code, but the agent loop itself is model-agnostic. Pair it with a strong coder like Qwen 2.5 Coder 32B or Claude 3.5 Sonnet and the results are genuinely useful for real development work.

13 MCP Tools

The Coding Agent is powered by MCP (Model Context Protocol) tools — a standardized way for LLMs to call functions in the real world. Version 2.2.2 ships with 13 built-in tools, and the architecture is designed for extensibility.

Dynamic Tool Registry

Tools aren't hardcoded into the prompt. Instead, Locally Uncensored maintains a dynamic registry that injects only the relevant tools into each conversation based on context. If you're chatting in the standard LU tab, you might get web search and file read tools. In Coding Agent mode, the full suite — file write, shell execute, directory listing — becomes available.

Smart Keyword Filtering

Here's the problem with tool-use and local LLMs: tool definitions are verbose. Injecting 13 tool schemas into every prompt eats context window and wastes tokens on tools the model will never call for a given query. The smart keyword filtering system analyzes the user's message and injects only the tools that match relevant keywords. Asking about code? You get file tools. Asking about your system? You get system info tools. Asking a general knowledge question? No tools injected at all.

This saves approximately 80% of tool-description tokens compared to naive injection. For local models running on limited context windows (4K–8K), this is the difference between tools working and tools not fitting in the prompt at all.

JSON Repair for Local LLMs

Cloud APIs like OpenAI and Anthropic return well-formed JSON tool calls reliably. Local models? Not so much. A 7B parameter model running through Ollama will frequently produce malformed JSON — missing closing braces, unescaped quotes in strings, trailing commas. The JSON repair layer catches these errors and fixes them before the tool executor sees them. It handles the most common failure modes: incomplete JSON, single quotes instead of double quotes, missing commas, and unquoted keys.

XML Fallback

Some local models are better at generating XML than JSON (particularly models fine-tuned on Anthropic-style tool use). If JSON parsing fails even after repair, the tool system falls back to XML parsing. The model can emit tool calls as <tool_call> blocks and they'll be interpreted correctly. This dual-format approach means tool use works reliably across a much wider range of local models than any single-format system.

Granular Permissions

Giving an AI agent access to your shell and filesystem is powerful. It's also dangerous if unchecked. Version 2.2.2 introduces a granular permissions system that lets you control exactly what the AI can and cannot do.

7 Permission Categories

Permissions are organized into seven categories:

File Read — can the agent read files on your filesystem?
File Write — can the agent create or modify files?
Shell Execute — can the agent run shell commands?
System Info — can the agent query CPU, RAM, GPU, and OS details?
Screenshot — can the agent capture your screen?
Network — can the agent make HTTP requests?
Process Control — can the agent start or stop processes?

3 Permission Levels

Each category supports three levels: Deny (never allowed), Ask (prompt the user each time), and Allow (always permitted). The default for destructive operations like shell execute and file write is Ask — you see a confirmation dialog before anything runs. You can upgrade trusted workflows to Allow or lock down categories to Deny if you want a read-only agent.

Per-Conversation Overrides

Global defaults apply across all conversations, but you can override them per conversation. This means you can have a Coding Agent session with full write + execute permissions for a trusted project, while another conversation is locked to read-only. Overrides are stored with the conversation and persist across app restarts.

File Upload & Vision

Version 2.2.2 adds full image upload and vision analysis to the chat interface. This works across every provider that supports vision — Ollama (with multimodal models like LLaVA, Llama 3.2 Vision), OpenAI (GPT-4o, GPT-4 Vision), Anthropic (Claude 3.5 Sonnet, Claude 3 Haiku), and Google Gemini.

Multiple Input Methods

Getting images into the chat is flexible:

Drag and drop — drag image files from your file manager directly onto the chat input
Paste from clipboard — Ctrl+V to paste a screenshot or copied image
Clip button — click the attachment icon to open a file picker

You can attach up to 5 images per message. Each image is resized and encoded appropriately for the target provider — base64 for Ollama, URL upload for cloud APIs. The app handles the format conversion transparently so you don't need to think about which provider expects what encoding.

Vision is particularly powerful when combined with the Coding Agent. Screenshot your terminal error, paste it into the Coding Agent, and ask it to fix the bug. It reads the error from the image, finds the relevant file, and writes the fix.

Thinking Mode

Chain-of-thought reasoning makes LLMs significantly better at complex tasks. But every provider implements it differently (or doesn't implement it at all). Version 2.2.2 adds a provider-agnostic thinking mode that works consistently regardless of which backend you're using.

How It Adapts Per Provider

Ollama — uses the native think parameter when the model supports it (Qwen 3, DeepSeek R1). The thinking output is captured and displayed separately from the response.
OpenAI & Anthropic — injects a system prompt that instructs the model to think step-by-step inside designated blocks before answering. The app parses these blocks out of the response and displays them separately.
Gemma 3 & Gemma 4 — these models emit thinking content wrapped in special tags. The app strips the thinking tags from the displayed response and renders the thinking content in a separate collapsible section, so you get a clean answer with the reasoning available on demand.

Collapsible Thinking Blocks

Thinking output can be verbose — sometimes longer than the actual answer. The UI renders all thinking content in collapsible blocks that are collapsed by default. You see a compact "Thinking..." indicator with the token count. Click to expand and read the full chain of thought. This keeps the conversation view clean while preserving full transparency into the model's reasoning process.

Model Load & Unload

Local models consume VRAM even when you're not using them. If you loaded a 13B model for a coding task and now want to generate images, that model is sitting in GPU memory doing nothing while ComfyUI fights for VRAM. Version 2.2.2 adds explicit model load and unload controls.

Power Icons

Each model in the model selector now has a power icon — a visual indicator showing whether the model is currently loaded in VRAM. Green means loaded and ready, gray means unloaded. Click the icon to toggle: load a model before you need it (pre-loading for faster first response), or unload it when you're done to free VRAM for other tasks.

VRAM Status

The model panel shows real-time VRAM usage — how much is consumed, how much is free, and which models are currently occupying memory. This is pulled from the Ollama API and updated in real time. No more guessing whether you have enough VRAM to load the next model or wondering why image generation is running out of memory.

Pre-loading is especially useful for Coding Agent workflows. Before starting a coding session, load your preferred model so the first tool iteration doesn't incur the cold-start delay (which can be 10–30 seconds for large models). The model stays warm in VRAM across all 20 iterations.

Native PC Control via Rust

The MCP tools need to actually do things on your computer — read files, run commands, take screenshots. In a browser, all of that is sandboxed. The Tauri v2 backend solves this with a set of Rust-powered native commands that bypass the browser sandbox and interact with your OS directly.

What the Rust Backend Provides

Async shell execution — run any command asynchronously. Output streams back to the UI in real time, so you see build logs and test results as they happen, not as a single dump when the command finishes.
Filesystem operations — read, write, create, delete, and list files and directories. All operations go through the permissions system before executing.
System info — query CPU model, core count, RAM total/available, GPU name, VRAM, OS version, and disk space. Useful for the AI to understand the hardware it's working with.
Screenshot capture — capture the current screen and return it as a base64 image. Combined with vision, this lets the AI "see" what's on your screen.
Native folder picker — the OS-native folder selection dialog, used by the Coding Agent to select project directories. No fake browser dialogs.

All of these run through Tauri's IPC bridge, which means they execute in the Rust process with full OS permissions while the frontend remains sandboxed. The sandbox bypass is intentional and controlled — every native operation is gated by the permissions system described above. The AI can't run a shell command unless you've explicitly allowed (or approved) it.

UI Overhaul

The entire interface has been redesigned in v2.2.2. This isn't a theme change — it's a structural overhaul that affects how you read, write, and interact with every element in the app.

15% Larger Interface

Every text element, button, and input field is 15% larger than the previous version. Font sizes are up, padding is more generous, click targets are bigger. The previous UI was optimized for information density; this version is optimized for readability and comfort during long sessions. If you're spending hours in Coding Agent mode working through a refactor, you'll appreciate the reduced eye strain.

Collapsible Code Blocks

LLM responses frequently contain large code blocks — sometimes hundreds of lines. In previous versions, these expanded inline and pushed the rest of the conversation off screen. Now, code blocks above a certain length are automatically collapsed with a click-to-expand control. You see the first few lines and the language label. Expand when you need to read the code, collapse it when you're done.

Monochrome Tool Blocks

When the AI uses MCP tools (file read, shell execute, etc.), the tool call and result are displayed in monochrome blocks — visually distinct from the conversation but not attention-grabbing. Previous versions used colored badges for each tool type, which became visually noisy in a Coding Agent session with 15+ tool calls. The monochrome design keeps the focus on the AI's reasoning and the final output, not the mechanics of tool execution.

Elapsed Counter

Every AI response now shows a real-time elapsed counter while generating. You can see exactly how long the model has been thinking. On completion, it shows total generation time and tokens per second. This is essential for benchmarking local models and knowing when a response is genuinely slow versus when the model is working through a complex chain of thought.

Working Stop Button

The stop button actually works now. In previous versions, hitting stop during generation would sometimes leave the model in a hung state, requiring a page refresh or app restart. The new implementation properly cancels the in-flight request at the HTTP level, cleans up partial responses, and leaves the UI in a consistent state. This is especially important for Coding Agent sessions — if the agent is heading in the wrong direction on iteration 12 of 20, you need to be able to stop it immediately and redirect.

DWM Border Removed

On Windows, the Tauri window previously displayed the default Desktop Window Manager border — a thin colored line around the window that looked out of place against the dark UI. This has been removed in v2.2.2 for a cleaner, borderless look that blends seamlessly with the app's dark theme. A small detail, but it eliminates one of the most common visual complaints from Windows users.

Upgrade Guide

If you're already running Locally Uncensored, upgrading is straightforward:

cd locally-uncensored
git pull
npm install
npm run dev

For the standalone desktop app, download the latest .exe (or .dmg / .AppImage) from the releases page. Your existing conversations, settings, and permissions carry over automatically — they're stored in your OS's app data directory, not inside the app bundle.

New users can clone and run:

git clone https://github.com/PurpleDoubleD/locally-uncensored.git
cd locally-uncensored
# Windows: setup.bat | macOS/Linux: ./setup.sh

What's Next

Version 2.2.2 lays the foundation for agent-based workflows. The next milestones on the roadmap:

Custom MCP tool authoring — write your own tools in JavaScript or Rust and register them with the tool system
Multi-agent orchestration — let multiple specialized agents collaborate on complex tasks
RAG integration — ground conversations in your local documents and codebases
Voice-to-agent pipeline — speak commands, have the agent execute them, hear the results

Follow the GitHub repo for updates, or join the Discussions to share feedback and feature requests.

FAQ

Does the Coding Agent work with small local models?

It works with any model, but results scale with model quality. A 7B model can handle simple file reads and basic code edits. For multi-step refactoring, writing tests, and debugging, you'll want at least a 14B+ coding model like Qwen 2.5 Coder 14B/32B, DeepSeek Coder V2, or a cloud model like Claude 3.5 Sonnet. The tool system itself is model-agnostic — the quality of the agent depends entirely on the model's coding ability.

Is the Coding Agent safe to use on production code?

Use it with version control. The permissions system adds a safety net (set file write and shell execute to Ask mode so you approve every change), but the best safety net is a git commit before you start. If the agent makes a mess, git checkout . undoes everything. We recommend treating it like a junior developer with full repo access: powerful, but supervise it.

How does smart keyword filtering decide which tools to inject?

Each tool has a set of associated keywords. The user's message is analyzed for keyword matches, and only tools with at least one match are included in the prompt. For example, "read the config file" matches file_read, while "what's the weather" matches nothing and skips tool injection entirely. The keyword sets are tuned to minimize false negatives — it's better to inject an unnecessary tool than to miss one the model needs.

Can I use vision with local models?

Yes, if the model supports multimodal input. LLaVA 1.6, Llama 3.2 Vision, and BakLLaVA work through Ollama. The app detects whether the selected model supports vision and enables/disables the upload UI accordingly. Cloud providers (OpenAI, Anthropic, Gemini) all support vision with their latest models.

Does thinking mode use more tokens?

Yes. The thinking output is generated in addition to the answer, so you'll use roughly 2–5x more output tokens per response. On local models, this means longer generation times. The benefit is significantly better reasoning on complex tasks — math, logic, multi-step planning, and code architecture decisions. For simple questions, leave thinking mode off.

Why 20 tool iterations? Can I increase the limit?

The 20-iteration cap balances capability with safety. Each iteration can modify files or run commands, so an unbounded loop risks unintended consequences. In practice, most tasks complete in 5–10 iterations. The limit is configurable in settings if you need more for complex workflows, but we recommend keeping the default until you're comfortable with the agent's behavior.

Is this update free?

Yes. Locally Uncensored is AGPL-3.0 licensed and completely free. No paid tiers, no pro features behind a paywall, no subscriptions. The full source code is on GitHub.

Locally Uncensored v2.2.2 is available now. AGPL-3.0 licensed and free to use. Built by PurpleDoubleD.