How to Self-Host an AI App With Docker
Packaging an AI app into Docker containers so it runs the same on your laptop and your server — images, compose, GPU access and the real trade-offs.

How do you self-host an AI app with Docker?
You package the app and its dependencies into Docker images, describe how the pieces run together with Docker Compose, then run those containers on any machine you control. Docker guarantees the app behaves the same on your laptop and your server, which removes the classic "works on my machine" failures that plague AI projects with their heavy, fussy dependency trees.
Self-hosting with Docker gives you full control and predictable cost — you rent a server, not a per-token API — at the price of running the infrastructure yourself. For teams that want their data on their own hardware or need a specific model and runtime, it is the right call.
Containerise the app
The core idea is one image per service. A typical AI app has a backend that loads or calls a model, perhaps a separate model-serving container, and a database.
- Write a Dockerfile that installs your runtime, copies your code and declares the start command.
- Pin dependency versions so a rebuild months later produces the same image.
- Keep images lean — use a slim base and multi-stage builds so the final image carries only what it needs to run.
- Pass configuration and secrets in as environment variables, never baked into the image.
Wire the services together with Compose
Docker Compose describes your whole stack in one file: the backend, the model server, the database and the network between them. With one command, Compose starts everything in the right order and connects the containers so they can talk to each other by name. This is what makes the setup reproducible — a teammate or a fresh server runs the same command and gets an identical environment. Use named volumes for anything that must survive a restart, such as your database files and downloaded model weights, so recreating a container does not wipe your data.
Giving containers GPU access
If your AI app runs a model locally rather than calling an API, it almost certainly wants a GPU. Docker can expose the host's NVIDIA GPU to a container through the NVIDIA Container Toolkit, which you install on the host. Once configured, you grant a service GPU access in your Compose file and the container can use CUDA as if it were running on the host directly. This is the standard way to run open models — text generation, embeddings, image models — on your own hardware without rewriting anything for the container.
One detail worth getting right: model weights are large and slow to download, so do not bake them into the image or pull them on every start. Mount them from a named volume or a host path that persists, so a container restart reuses weights already on disk. This keeps your images small, your rebuilds fast, and your startup time short — and it means swapping the application code does not force a multi-gigabyte re-download of the model.
The trade-offs of self-hosting
Docker makes self-hosting tractable, but it does not make the responsibilities disappear. You now own patching the host, monitoring the containers, handling restarts and crashes, securing the server, and backing up your volumes. There is no provider quietly keeping things alive for you. Against that, you get cost predictability, data residency on your own hardware, freedom to run any model, and no per-request inference bill. For a regulated client, a data-sensitive workload, or a high-volume app where API costs would balloon, those benefits outweigh the operational load. For a quick demo, a managed platform is usually less work.
From laptop to server
The payoff of the Docker approach is that promotion is boring in the best way. The images that ran on your laptop run unchanged on the server; only the environment variables and the volume locations differ. Push your images to a registry, pull them on the host, run Compose, and the app is live. Add a reverse proxy in front for TLS and routing, and you have a self-hosted AI app you fully control.
This repeatability is also what makes recovery calm. If the server dies, you provision a fresh host, pull the same images, restore your volumes from backup, and run the same Compose command — the app comes back identical, because nothing about it lived in the snowflake configuration of one machine. That discipline, keeping everything in images, Compose files and backed-up volumes rather than in undocumented manual steps, is what separates a self-hosted setup you can sleep through from one that becomes a single point of failure nobody dares touch.
Prefer it built and managed for you?
Self-hosting is the right answer often, but only if the operations around it are done properly. Talk to BSH Technologies and we will containerise your AI app, set up GPU access, and put monitoring and backups in place so it stays healthy. Explore our cloud engineering services to see how we run self-hosted workloads without the 2 a.m. surprises.
Frequently asked questions
Why use Docker to self-host an AI app?
Docker packages the app and its dependencies into images that run identically on any machine, which removes the dependency conflicts that plague AI projects. It makes the setup reproducible, simplifies moving from laptop to server, and lets you keep data and models on hardware you control with predictable cost.
How do containers get GPU access?
Install the NVIDIA Container Toolkit on the host, then grant a service GPU access in your Docker Compose configuration. The container can then use CUDA as if running on the host directly. This is the standard way to run open models locally for text generation, embeddings or image tasks without code changes.
What does Docker Compose do for an AI stack?
Compose describes the whole stack — backend, model server, database and network — in one file and starts it with a single command. Containers find each other by name, and named volumes keep data and model weights across restarts. A fresh server runs the same command and gets an identical environment.
What are the downsides of self-hosting?
You own patching the host, monitoring containers, handling crashes, securing the server and backing up volumes. No provider keeps it alive for you. The upside is cost predictability, data residency, freedom to run any model and no per-request bill, which suits regulated, data-sensitive or high-volume workloads.
How do I move Docker images from laptop to server?
Push your built images to a container registry, pull them on the host, and run Docker Compose there. The images run unchanged; only environment variables and volume locations differ. Adding a reverse proxy in front handles TLS and routing, giving you a self-hosted app you fully control.
Related Topics
From the blog
View all posts
How to Build an AI Agent for Free in 2026
You can build a working AI agent for free in 2026 using n8n, open-source frameworks, and a free LLM tier. Here is the exact stack and the steps.

Best Free AI Agent Frameworks in 2026
The best free AI agent frameworks in 2026 are LangChain, CrewAI, Microsoft AutoGen, LangGraph, and n8n. Here is how to choose between them.