Open WebUI: Self-Hosted Local LLM Interface for Teams

open-webui self-hosted lokale-ki

Most organisations that experiment with AI assistants eventually hit the same wall: the best interfaces โ€” ChatGPT, Claude, Gemini โ€” require sending your data to external servers. For companies handling client files, internal contracts, or any category of personal data under GDPR, that is not a neutral design choice. Open WebUI exists precisely to solve this.

Version 0.9.5, released on 10 May 2026, brings Open WebUI to 138,000+ GitHub stars, making it the most widely adopted open-source AI interface for self-hosted deployments. The platform runs entirely on your own hardware, connects to locally running language models via Ollama or any OpenAI-compatible endpoint, and ships with multi-user management, built-in RAG, and enterprise authentication โ€” all without a token subscription.

What Open WebUI actually is (and is not)

Open WebUI is a browser-based chat interface designed to be completely offline-capable. It does not host or run language models itself; it acts as a polished front end that connects to a model runtime โ€” most commonly Ollama โ€” running on the same machine or network. From the user's perspective, the experience is nearly identical to ChatGPT: a familiar chat UI, conversation history, file uploads, and model switching.

The critical difference is where computation happens. Every prompt, every document chunk, every generated token stays on infrastructure you control. Nothing is transmitted to a third-party API unless you explicitly configure an external backend. For organisations under the EU AI Act's deployer obligations (Article 26), this distinction matters: local deployments give you full audit trails and technical control over AI system outputs.

Installation: one Docker command

Provided Docker is already installed, the full setup takes under five minutes:

docker run -d -p 3000:80 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Open http://localhost:3000 in a browser and create the first account โ€” it automatically receives administrator privileges. If Ollama is already running on the same machine, point Open WebUI to http://host.docker.internal:11434 and all locally installed models appear immediately in the model selector.

For production environments, the community recommends switching the default SQLite database to PostgreSQL and configuring an S3-compatible object store for document uploads. Kubernetes deployments via Helm are also officially supported, making it straightforward to integrate Open WebUI into existing DevOps pipelines.

Choosing the right model

Open WebUI is backend-agnostic, connecting to Ollama, vLLM, LM Studio, LocalAI, or any OpenAI-compatible endpoint. This means the interface stays consistent while you swap models as requirements evolve.

Practitioners report the following configurations as production-viable in 2026:

  • Llama 3.3 (70B, quantised): Strong all-round quality for drafting, summarisation, and internal Q&A. Requires a Mac Studio M3 Ultra (192 GB unified memory) or a GPU server with 80+ GB VRAM. Reported inference speeds: 20โ€“40 tok/s depending on quantisation level and prompt length.
  • Qwen 2.5 (32B): Particularly strong on structured tasks, multilingual text, and code. Runs on a Mac Studio M2 Ultra or a server with 64 GB RAM.
  • Gemma 4 (12B): A practical choice for teams with modest hardware โ€” M3 Pro laptops (18โ€“36 GB unified memory) or servers with 32 GB RAM. Suitable for email drafting, meeting summaries, and simple document lookups.
  • DeepSeek-V3 (MoE): Reported as a strong coding and reasoning model that fits on hardware typically used for 30B dense models, thanks to its Mixture-of-Experts architecture.

All of these models can be pulled directly from within the Open WebUI admin panel once Ollama is connected, without touching a command line.

RAG: querying company documents with natural language

Open WebUI's integrated RAG (Retrieval Augmented Generation) engine is one of the features that most clearly separates it from simpler chat interfaces. Users can upload documents โ€” PDFs, Word files, spreadsheets, presentations โ€” directly into a conversation or into a persistent shared knowledge base accessible to the whole team.

The platform automatically chunks documents, generates embeddings, and stores them in a local vector database. Nine vector backends are supported, including ChromaDB, Qdrant, PGVector, Milvus, Elasticsearch, and OpenSearch. Content extraction from scanned documents is handled by Tika, Docling, or optional OCR backends โ€” all running locally.

The practical result: a law firm's staff can ask "summarise the indemnity clauses across these five contracts" and receive a cited answer within seconds, without a single character of document content leaving the firm's network. The same applies to HR teams reviewing CVs, finance teams querying balance sheets, or manufacturing teams looking up maintenance procedures from technical manuals.

This is the architecture we describe in more depth on our local AI overview and data sovereignty page.

Multi-user management and access control

Open WebUI implements a three-tier permission model: administrators manage models, system configuration, and user accounts; standard users work within the boundaries set by admins; pending accounts await approval before activation.

For organisations with existing identity infrastructure, v0.9 adds full LDAP and Active Directory integration, SCIM 2.0 automated provisioning compatible with Okta, Azure AD, and Google Workspace, and SSO via OAuth providers. New hires gain the right access automatically on day one; leavers lose it the moment they are deprovisioned in the identity provider โ€” without any manual Open WebUI intervention.

Each user's conversation history is isolated by default. Shared prompts, shared knowledge bases, and shared model configurations are opt-in, controlled by administrators. This gives organisations a level of access governance that most cloud AI subscriptions do not offer.

What it costs

The open-source core of Open WebUI is free. Costs consist of one-time hardware investment, electricity, and โ€” optionally โ€” an Enterprise Plan that adds custom theming, SLA support, and long-term support (LTS) versions.

For teams of five or more users, the total cost of ownership of a self-hosted setup typically compares favourably to commercial AI subscriptions within 12โ€“18 months, based on our reading of community cost analyses. The exact timeline depends on hardware choice, usage volume, and whether an Enterprise Plan is required.

Organisations in EU member states may find local AI infrastructure investments eligible for national or EU-level digitisation funding programmes. We recommend individual verification with a qualified advisor, as eligibility depends on company size, sector, and programme-specific criteria.

A breakdown of how pilot projects are structured and budgeted is available on our pilot project page.

Who should use Open WebUI today?

Open WebUI is a strong fit for:

  • Teams of 3 or more who need a shared, governed AI workspace without external accounts
  • Regulated industries โ€” legal, healthcare, finance, HR, manufacturing โ€” where data residency is non-negotiable
  • Companies with existing cloud-AI restrictions driven by legal, IT security, or procurement policy
  • Organisations scaling internal AI who want to avoid token costs growing linearly with usage

It is worth being precise about limitations: Open WebUI does not improve model quality. The quality of responses depends entirely on the chosen LLM. Teams that need specialised domain knowledge โ€” internal jargon, proprietary processes โ€” should consider combining Open WebUI with RAG pipelines over curated internal documents, or, for deeper adaptation, LoRA fine-tuning.

For a broader view of the local AI stack โ€” hardware, model selection, RAG, and orchestration โ€” visit our local AI solution page.


If you want to evaluate Open WebUI for your organisation but are unsure which hardware configuration, model, or deployment approach fits your use case, contact us. We help SMBs move from a first local AI test to a production deployment that meets both technical and compliance requirements.