OpenHands Local Coding Agent: Run Autonomously via Ollama

7. Jun 2026 English 5 min read Also in: Deutsch, Español

coding-agent openhands local-llm

Most AI coding tools sit inside an IDE and complete lines. OpenHands does something fundamentally different: it takes a task description and works through it autonomously, reading files, writing code, running tests, reading error output, iterating, until the task is done or it needs human input.

The project launched in 2024 as OpenDevin and was renamed OpenHands in early 2025. By version 1.7.0 (May 2026), it had crossed 74,000 GitHub stars, making it the most-adopted open-source coding agent framework in the developer community. With recent updates to the Ollama inference backend, including the new MLX-powered performance layer for Apple Silicon, OpenHands now runs at practically useful speeds on a single workstation, with no cloud API required.

What OpenHands Actually Does

Think of OpenHands as a developer assistant that works in a sandboxed environment. You assign it a task, "Write unit tests for all public methods in payments.py", "Refactor the API handler module to use async/await", "Find the bug causing the 422 error in POST /orders and propose a fix", and it executes a multi-step plan.

Each step involves one or more tool calls: reading a file, running a shell command, browsing documentation, writing code. The LLM receives the result of each step and decides the next. At the end, it produces a diff that you review before merging.

Crucially: every session runs in a dedicated Docker container. The agent can read and write files, run shell commands, open browser windows, but it has no permissions outside that container without explicit configuration. When the LLM backend is also local (Ollama), the entire execution loop happens without touching any external service.

Ollama as the Local Backend: Why Timing Matters

The official Ollama account stated on X: "Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework" (Ollama on X). The MLX backend matters specifically for agentic workloads because agents issue many sequential requests per task, typically 20-50 for a moderate, well-defined task. Lower per-token latency compounds across those calls.

Community practitioners report running OpenHands with 32B coding models at 25-40 tokens/second on Mac Studio M3 Ultra hardware (192 GB unified memory), these are community measurements, not Freshlab benchmarks. At that throughput, iteration cycles take minutes rather than the 15-20 minutes typical on lower-spec hardware from two years ago, which changes the practical experience significantly.

On Linux workstations with an NVIDIA GPU (RTX 4090 or equivalent, 24 GB VRAM), DeepSeek-Coder-V2 16B via Ollama delivers comparable throughput according to community reports, making this a viable path for development teams who prefer dedicated GPU servers.

Choosing the Right Model

Not all open-weight models perform reliably in agentic settings. OpenHands requires a model with consistent JSON tool-call formatting, long-context handling for files and error logs, and strong code generation across common languages. Recommendations from the OpenHands documentation and community experience:

Qwen2.5-Coder 32B (Q4KM, ~20 GB): Best-in-class open coding model at this parameter count. Consistent tool-call formatting. Runs on a 24 GB VRAM GPU or Mac Studio with 192 GB unified memory.
DeepSeek-Coder-V2 16B (Q4, ~11 GB): Strong code generation, well-documented tool-call support, good fit for workstations with 16-24 GB VRAM.
OpenHands LM 32B: Purpose-trained for OpenHands task execution, available on Hugging Face. Published resolve rate of 37.2% on SWE-Bench Verified, among the highest reported figures for open-weight models under 70B parameters.
Llama 3.3 70B (Q4KM, ~44 GB): For Mac Studio Ultra with 192+ GB unified memory. Generalist but highly reliable on tool calls across diverse task types.

Setup Overview

Prerequisites: Docker Desktop, Ollama, macOS or Linux (Windows via WSL2 works with some caveats), 16+ GB RAM.

Pull your model: ollama pull qwen2.5-coder:32b
Start Ollama with external container access: OLLAMA_HOST=0.0.0.0 ollama serve
Launch OpenHands via Docker with LLMMODEL, LLMBASEURL (pointing to your local Ollama instance), and LLMAPI_KEY=ollama set as environment variables. The full Docker run command is in the OpenHands documentation.
Open http://localhost:3000 in your browser.
Assign a task and observe the step-by-step execution log.

OpenHands displays each planned action, the tool call it issues, and the result before deciding the next step, making it easy to monitor, interrupt, or redirect mid-task.

What Small Development Teams Actually Gain

Three task categories where a local autonomous agent delivers consistent value without needing frontier-model performance:

Test coverage lift. Most teams have production code with inadequate test coverage. Generating tests for existing, well-defined functions is mechanical and repetitive, exactly the profile where agentic execution works well. A developer reviews and merges; they don't write boilerplate.

Codebase documentation. "Read this 400-line module, generate docstrings for all public methods, and write a README section explaining its purpose" is an agent-friendly task. The agent reads the code, generates the documentation, and stages a commit. The team reviews.

Dependency modernisation. Updating deprecated API calls across a codebase is tedious and error-prone when done manually. An agent walks the files systematically, applies changes, runs the test suite after each file, and stops when it encounters ambiguous cases.

The honest limits: OpenHands performs worse on under-specified tasks, on changes requiring deep knowledge of an unfamiliar framework, and on large multi-file architectural rewrites. Human review before merging is not optional, it is the intended workflow, and the agent surfaces its uncertainty clearly.

IP Protection and GDPR Compliance

For European businesses, source code has a regulatory dimension beyond its commercial value. It often contains business logic, database schemas, and internal API structures that qualify as trade secrets under the EU Trade Secrets Directive (2016/943). Routing it through a cloud API creates a processing chain that may span multiple jurisdictions, sub-processors, and data transfer agreements.

Running OpenHands with a local Ollama backend eliminates this chain entirely. Nothing leaves the network. GDPR Article 5(1)(c) data minimisation is satisfied by architecture rather than policy. There is no third-country transfer chapter to draft, no Data Processing Agreement to negotiate with an AI provider for your source code.

For companies subject to sector-specific regulations, financial services, healthcare, legal, this architectural property can be the difference between a compliant deployment and one requiring extensive legal review.

Next Steps

A local coding agent stack is no longer a weekend project. OpenHands and Ollama can be set up in an afternoon, and the first real task reveals whether the approach fits your workflow and hardware.

If you want to evaluate this for a European SMB context, including hardware sizing, model selection, and integration with your existing CI/CD toolchain, Freshlab offers structured pilot projects.

→ Start a pilot project | → Learn about local AI for SMBs | → Data sovereignty overview