Vitalik Buterin's Local LLM Setup Validates Freshlab

22. Apr 2026 English 6 min read

local-llm security self-sovereign

In early April, Vitalik Buterin published a post that rippled through technical circles: "My self-sovereign / local / private / secure LLM setup, April 2026". Not a manifesto, not marketing — a plain-spoken workshop report on how he personally uses AI, and why he keeps it strictly local, sandboxed, and disconnected from the cloud.

The post is significant for us at Freshlab for a specific reason. Buterin argues in the same coordinates we have been building infrastructure in for years: data sovereignty first, inference on the user's own hardware, minimum attack surface. He arrives from the crypto/systems-security corner; we arrive from European SMB consulting. The conclusions overlap almost completely. You can read the original post here: vitalik.eth.limo — securellms.html.

For European SMBs — whether a German GmbH, a Spanish PYME, or a Danish smaller business — that are currently trying to reconcile GDPR, the EU AI Act and productive AI use, Buterin's post delivers independent confirmation of a technical direction we have been executing for over a year.

The core thesis

Buterin opens with an inventory he treats as an alarm signal. His example is "OpenClaw" — standing in for a new generation of agent systems that act in the browser, on the desktop, or directly at the operating-system level. These agents change critical settings without confirmation. Malicious web content can be enough to trigger arbitrary code execution. Prompt injection is not an edge case, it is default behaviour.

His diagnosis is blunt: the ecosystem as a whole is casual about security and privacy. Convenience dominates, telemetry-by-default is accepted, and "we don't store anything except for training purposes" has become the standard excuse.

Against that, Buterin sets three non-negotiable requirements:

Privacy — no data leaves the device unless the user explicitly wants it to.
Security — dangerous operations are sandboxed by default, not by opt-in.
Self-sovereignty — no vendor lock, no kill switch, no mandatory cloud account.

This is not an ideological position, it is an engineering requirement. It describes the conditions under which an AI system can even be taken seriously as part of real infrastructure — for an individual or for a company with confidentiality obligations.

What he recommends

The practical part of the post is a detailed toolbox. Buterin lists the hardware options he has tested himself:

NVIDIA RTX 5090 laptop — around 90 tokens/sec, the mobile high-end option.
AMD Ryzen AI Max with 128 GB unified memory — around 51 tokens/sec, interesting because of the large memory pool.
NVIDIA DGX Spark — disappointing for the price, not recommended.
Hardware pooling — he explicitly suggests teaming up with trusted people at a static IP if individual investment is too high.

On the software side he runs NixOS as a reproducible, declarative operating system. For inference he uses llama-server behind llama-swap, loading different models on demand. For multimodal work he uses ComfyUI, tested with Qwen-Image and Hunyuan Video 1.5. His workhorse model is Qwen 3.5:35B, which in his practice delivers the best balance between speed and capability.

Four mechanisms run through his architecture:

Local-first — inference stays on the user's hardware.
Comprehensive sandboxing — dangerous operations are isolated so malicious content cannot compromise the host.
Substitution of dependencies — local AI replaces third-party libraries where possible, reducing the attack surface.
Air-gap capable — offline operation is supported.

Buterin himself sums it up plainly: "a starting point for a space that desperately needs to exist, not a description of a finished product." He is also explicit about his stance against the mainstream security-research view that is comfortable with corporate data access — he describes himself as "deeply opposed to normalizing 'feeding your entire life to cloud-based AI'."

Where Freshlab's approach differs

We have been building the same underlying principle on different hardware since 2024. Our standard configuration for SMBs is the Mac Studio M3 Ultra, not the RTX 5090. As inference runtime we use Ollama (and MLX directly for specific workloads), not llama-server/llama-swap. As operating system we run macOS, not NixOS.

These are real differences, not cosmetic — and each has a reason:

One box, not a rack. The Mac Studio is a single unit. No ATX case, no tower, no separate 1000 W power supply. For an office in Berlin or Valencia, that difference decides whether the system gets adopted.
Noise and heat. The Mac Studio's cooling is essentially silent. An RTX 5090 system under load is audible and it heats the room.
Fewer driver fragility points. Apple's toolchain for Apple Silicon is tightly integrated. A CUDA stack plus a NixOS configuration plus kernel-module updates is a skill-dependent environment that many SMB customers simply do not have in-house.
Unified memory up to 192 GB. The Mac Studio M3 Ultra can be configured up to 192 GB for around 5,800 euros for hardware alone. That is enough headroom to run models in the 70B–120B range locally without the complexity of multi-GPU setups.
Lifecycle. Apple's 5–7 year horizon for security updates and its closed firmware chain are easier to document for compliance audits than a self-rolled NixOS build.

We consider Buterin's stack technically excellent — but it is built for someone who reads NixOS config the way other people read a newspaper. Our stack is built for the managing director of a 40-person company who wants to slide a device into the rack and have it run.

The real agreement

The detail differences — RTX vs. M3 Ultra, NixOS vs. macOS, llama-server vs. Ollama — can easily obscure what the two architectures really share. And that is the important part:

Inference stays local. No prompts, no documents, no embeddings leave the device.
Sandboxing is mandatory, not optional. Agent actions run in isolated environments, not with full host permissions.
No cloud exfiltration. Not during operation, not "for training purposes later".
Compliance-ready by design. GDPR, the EU AI Act, industry-specific regulators — all of these regimes get simpler when the data never leaves the company's control.

For a European SMB the practical consequence is concrete: the AI infrastructure lives in the company's own server room or a trusted local data centre. Data-processing agreements shrink. The answer to "where is my input processed?" is: here, in the building. National funding programmes — the German BAFA digital subsidy, Spain's Kit Digital, Danish SMV:Digital — apply more cleanly to this kind of investment because there is a real capex line (hardware, setup, training) and no recurring cloud drain.

Buterin's post is valuable to us precisely because it is a second, independent voice. Someone arriving from a completely different direction — Ethereum, cryptography, systems research — lands at the same architecture. That is convergence, not coincidence. Local inference, sandboxing and trust minimisation are no longer a niche; they are the sensible default for serious AI use.

For more on data sovereignty in detail, see /data-sovereignty.html. A technical overview of local models sits at /local-ai.html.

Next step

If you read Buterin's post and now wonder how the same architecture can be implemented in a 20-to-200-person European SMB — hardware, setup, training and compliance documentation included — start with our pilot package: request a pilot project.