Docs/Introduction
Introduction
Carve one NVIDIA GPU into memory-isolated slices for multiple containers — no Kubernetes, no driver patches.
GPU Shards is a self-hosted toolkit that partitions a single NVIDIA GPU into memory-isolated slices ("shards") so multiple containers can share one card without stepping on each other. It runs on infrastructure you own or control — there is no cloud, no account, and no telemetry.
What you get
- Memory-level isolation — Each container is pinned to a fixed slice of GPU memory and cannot exceed it, powered by the Project-HAMi
libvgpulibrary. Workloads still run on the same physical card — this is isolation at the memory level, not hardware-level partitioning like MIG. - No Kubernetes — A one-line installer wires up Docker, the NVIDIA Container Toolkit, and the management panel. No cluster, no operators.
- Stock CUDA images — Workloads run against the real driver with unmodified CUDA images. The memory cap is transparent to the code inside the container.
- A web panel — Pick a GPU instance, allocate shards, configure a container, and deploy from your browser.
How it fits together
The installer sets up three pieces on a single Ubuntu host:
| Component | Port | Role |
|---|---|---|
| Backend (FastAPI) | 8000 |
Orchestrates containers and enforces shard limits |
| Frontend (Next.js) | 3000 |
The management panel |
libvgpu image |
— | The slim CUDA image with HAMi-core baked in |
Requirements
- Ubuntu 22.04 or newer
- An NVIDIA GPU with a working driver (verify with
nvidia-smi) - Docker (the installer can set this up for you)
GPU Shards interacts with GPU drivers at a low level. Test it on a non-production host first.
Ready? Head to the Quick Start.