Osaurus: Run a Local AI Agent on macOS Fully Offline

6. May 2026 English 6 min read

apple-silicon ai-agents local-llm

Most tools for running AI locally on a Mac solve one specific problem: they replace cloud inference with local inference. The model runs on your hardware instead of a remote server — and that matters. But a language model without memory, without access to your tools, and without the ability to act autonomously is not an assistant. It is an expensive autocomplete.

What is missing in most local setups is continuity and agency. After every session, the model forgets who you are, what your company does, and what was discussed yesterday. For a business that needs a reliable AI assistant embedded in daily workflows, that is a hard limitation to work around.

Osaurus addresses this directly. The open-source project describes itself in its official GitHub documentation as "the native macOS harness for AI agents — any model, persistent memory, autonomous execution, cryptographic identity". AI practitioner Rohan Paul highlighted the project on X as a "native, Apple Silicon–only" solution. It is written in Swift, runs exclusively on Apple Silicon, and operates fully offline in normal use.

How Osaurus Works

Osaurus is not a language model and not a bare inference server. It is an orchestration layer: the project connects a local LLM of your choice, a structured memory system, a sandboxed execution environment, and a library of native macOS plugins into a running, autonomous agent.

Installation is straightforward:

brew install --cask osaurus

After installation, Osaurus can be launched as a native macOS UI (osaurus ui) or as a local API server (osaurus serve). The server speaks the OpenAI API format — existing integrations can be pointed at the local Osaurus endpoint without code changes.

For the model, you choose: MLX-optimized local models (recommended for full privacy), cloud providers such as OpenAI or Anthropic, or Apple Foundation Models. For privacy-first business use, local models are the only option that keeps all data on-device.

Osaurus also supports the Model Context Protocol (MCP) — as both MCP server and MCP client. Existing MCP toolkits can be plugged in immediately, without additional integration work.

Three Memory Layers

The structured memory system is Osaurus's core differentiator. According to the official GitHub documentation, the system maintains three layers:

Identity layer: The permanent profile of the agent — persists across all sessions
Pinned facts: Durably stored facts the agent can access at any time
Per-session episodes: Session-specific context that documents the progress of a task

In practice this means the agent knows who you are. You configure it once — company name, products, key contacts, preferred communication style, internal policies — and that configuration persists indefinitely. No manual context rebuilding each time. No explaining from scratch what your business does.

This is what separates Osaurus from a chat interface sitting on top of a local model: the agent operates with your organization's institutional knowledge and updates it over time.

20+ Native macOS Plugins

What sets Osaurus apart from simpler local AI projects is the native macOS integration layer. According to the GitHub documentation, more than 20 plugins are available, including:

Mail: Read, summarize, and triage incoming emails — locally, no content leaves the device
Calendar: Analyse calendar entries, detect scheduling conflicts, draft time proposals
Vision: Process and analyse images and documents directly on-device
Git: Commit analysis, automated code reviews, diff evaluation
Browser: Web access for the agent, optionally enabled as needed

Osaurus also provides voice input with on-device transcription: speech is converted to text locally — no audio is sent to an external transcription service.

The combination of persistent memory and native tool access enables workflows that are not achievable with a bare chat interface: the agent reads morning emails, ranks them by urgency, drafts replies based on stored templates, and enters confirmed appointments into the Calendar plugin — autonomously, without sending a single byte to a cloud endpoint.

Sandboxing and Cryptographic Identity

Code generated and executed by the agent runs in isolated Linux VMs via Apple's own Containerization framework. Even if the agent produces unexpected or faulty code, the execution environment is fully separated from the host operating system. The blast radius of any error is structurally limited.

Each agent also receives a cryptographic identity, implemented as secp256k1 addresses stored in iCloud Keychain. This is more than a security feature: it enables tamper-evident audit trails showing which agent performed which action and when. For regulated industries — legal, financial services, healthcare — that kind of accountability record is directly useful for compliance documentation.

What This Means for European SMBs

For companies operating under GDPR, the offline-first architecture of Osaurus is not a convenience feature — it is a structural response to data protection obligations. Customer data, contract details, internal correspondence: based on our reading of GDPR requirements, processing this kind of information through third-party cloud services requires a valid legal basis and often a Data Processing Agreement. Running the same workflows locally eliminates this layer of complexity entirely.

Concrete use cases across European mid-market businesses:

Professional services firms: Client emails are summarized and prioritized locally; the agent flags time-sensitive matters without any content leaving the office network
Engineering and construction: A locally-running agent reviews incoming tender documents, cross-references stored project criteria, and generates a summary for the project lead — on an M2 Ultra Mac Studio, no internet required
Software development teams: The Git plugin enables a fully local code review agent that comments on pull requests, identifies common error patterns, and checks against internal coding standards

System requirements are macOS 15.5 or later on Apple Silicon. A Mac Studio M2 Ultra or M3 Ultra handles sustained agent server workloads well; practitioners report MLX-optimized models of moderate size achieving 20–60 tok/s on these chips, depending on model size and context length.

On the EU AI Act side: based on our reading, a locally-deployed agent harness where no data is processed by an external provider falls into a significantly lower risk tier than cloud-based AI services. Companies looking to document AI usage for EU AI Act compliance purposes will find the cryptographic audit trail in Osaurus a useful starting point. That said, this is informational commentary — not individual compliance advice.

Getting Started

Anyone already running Ollama has the simplest path in: Osaurus is Ollama-compatible and layers on top of the existing stack. Already-downloaded models do not need to be reinstalled.

A practical first agent: a mail triage agent that runs each morning, reads incoming messages, ranks by urgency, and generates a prioritized summary with brief action notes. Configuration takes around 30 minutes; the value is immediately visible — and builds confidence in the system before more complex workflows are added.

More on local AI infrastructure for businesses is available on our local AI page and in the Kaira Toolkit. For a conversation about hardware sizing, model selection, and GDPR-compliant setup for your organization, reach out via /contact.html.