Microsoft Foundry Local: Local LLM Runtime Now GA

13. Jun 2026 English 5 min read Also in: Deutsch, Español

local-llm microsoft on-premise

Microsoft has moved Foundry Local to general availability, a local LLM runtime that runs entirely on-device, with no cloud dependency, no per-token costs, and no data transmitted to external services. For SMBs that have been waiting for a production-grade local AI option that works on Windows, not just on Mac or Linux, this changes the calculus significantly.

A developer in the community put it simply on X: "Microsoft is officially joining the local LLM trend with Foundry Local", and described it as "more or less the same as Ollama or LM Studio." That framing is accurate for the basic use case, though the enterprise differentiators matter for teams choosing a runtime for production deployment.

What Foundry Local Actually Is

Foundry Local is a roughly 20 MB native runtime library that can be embedded directly into desktop applications, developer tools, or enterprise software. It exposes an OpenAI-compatible endpoint for chat completions and audio transcription, running locally, with no network round-trip, no usage-based billing, and no API key required.

The runtime automatically detects available hardware and selects the optimal execution provider: NVIDIA CUDA, AMD GPUs, Intel NPUs, Qualcomm NPUs (on Snapdragon-based Windows Copilot+ PCs), or CPU as a fallback. This automatic hardware optimisation is a meaningful difference from Ollama, which currently does not support NPU acceleration on Windows. For SMBs running modern Windows laptops with dedicated AI processing units, Foundry Local may deliver noticeably faster inference out of the box.

Models download on first use, are cached locally for instant subsequent launches, and the runtime selects the best-performing variant for the specific hardware configuration.

Getting Started in Two Commands

On Windows:

winget install Microsoft.FoundryLocal
foundry model run phi-4-mini

After the initial download, phi-4-mini, Microsoft's compact 3.8-billion-parameter model, runs as an interactive CLI chat. For applications that need an OpenAI-compatible API endpoint, foundry service start exposes the server at http://localhost:5273/v1/.

On macOS (Apple Silicon), a native ARM64 installer handles setup. On Linux, the x64 CLI is available. In all three cases, the setup requires no Docker containers, no Python virtual environments, and no manual CUDA configuration, a genuine barrier-reduction compared to earlier local AI tooling.

Supported Models

Foundry Local ships with a curated model library. Based on available documentation, current supported families include:

Microsoft Phi-4 and Phi-4-mini: Compact Small Language Models (3.8B, 14B parameters) with strong reasoning performance, designed to run on consumer hardware
Qwen 3.5 (Alibaba): General-purpose model with a 256K-token context window
DeepSeek-R1-Distill: Reasoning-optimised variants from 1.5B to 14B parameters
Mistral: Established open-weight model for general workloads

The trade-off compared to Ollama is clear: Ollama supports nearly any model in GGUF format, Gemma 4, Llama 4, Qwen 3.5, Command R+, and hundreds more. Foundry Local is deliberately selective, but the models it does support are more deeply optimised and actively maintained by Microsoft.

Foundry Local vs. Ollama: Honest Comparison

Both tools are worth running. They address slightly different needs:

Criterion	Foundry Local	Ollama
Platforms	Windows, macOS, Linux	macOS, Linux, Windows
NPU acceleration	Yes (Intel, Qualcomm)	No
Model selection	Curated (~20 models)	Open (GGUF, 500+)
App embedding	~20 MB library	HTTP daemon (separate process)
OpenAI-compatible API	Yes	Yes
Enterprise support	Microsoft support channel	Community
Licence	Microsoft EULA	MIT (open source)

For teams building Windows applications and wanting to embed the LLM stack directly, Foundry Local is the more natural fit. For users who need maximum model flexibility or primarily work on Apple Silicon, Ollama remains the broader choice. The two are not mutually exclusive: because both expose an OpenAI-compatible API, applications can be switched between runtimes without code changes.

GDPR and Data Sovereignty

For European SMBs operating under GDPR, the architectural point matters more than the performance delta: Foundry Local sends no requests to external APIs. Prompts, context, and responses remain on the local device. For workloads involving employee data, customer queries, or commercially sensitive information, this is not a convenience feature, it is a compliance prerequisite.

Based on our reading of current GDPR guidance, running inference entirely on-premises means personal data is not transferred to a third-party data processor, which simplifies the data processing inventory and removes the need for data processing agreements with AI vendors. When the EU AI Act's deployer obligations under Article 26 fully apply (currently deferred to December 2027 for Annex III systems), local deployments face a materially lighter documentation burden than cloud-routed equivalents.

For an overview of how local AI supports GDPR compliance in practice, and what data sovereignty looks like in SMB deployments, we have documented both topics on our site.

What This Means for SMBs in Practice

The GA release of Foundry Local removes several common blockers for SMBs exploring local AI:

No dedicated server required: A Windows PC with sufficient RAM handles initial testing with Phi-4-mini (as reported by practitioners, 8 GB RAM suffices for the 3.8B model)
No container expertise needed: A single winget install and one command gets a model running
Clear scaling path: The same Foundry API surface can be pointed at Azure AI Foundry for cloud-scale workloads, with no application code changes
Predictable cost: No per-token charges, no cloud subscription, one-time hardware cost and ongoing electricity

For European SMBs with access to digitalisation support schemes, the German BAFA Digital module, KfW SMB programmes, or pan-EU equivalents, a documented on-premises AI pilot can serve as the foundation for a funding application before committing to larger hardware investments such as a dedicated GPU workstation.

If you want to explore how Foundry Local or a comparable local AI stack would work in your organisation, we are happy to walk through the concrete steps, request a pilot project.