Docs/Introduction

Introduction

Carve one NVIDIA GPU into memory-isolated slices for multiple containers — no Kubernetes, no driver patches.

GPU Shards is a self-hosted toolkit that partitions a single NVIDIA GPU into memory-isolated slices ("shards") so multiple containers can share one card without stepping on each other. It runs on infrastructure you own or control — there is no cloud, no account, and no telemetry.

What you get

Memory-level isolation — Each container is pinned to a fixed slice of GPU memory and cannot exceed it, powered by the Project-HAMi libvgpu library. Workloads still run on the same physical card — this is isolation at the memory level, not hardware-level partitioning like MIG.
No Kubernetes — A one-line installer wires up Docker, the NVIDIA Container Toolkit, and the management panel. No cluster, no operators.
Stock CUDA images — Workloads run against the real driver with unmodified CUDA images. The memory cap is transparent to the code inside the container.
A web panel — Pick a GPU instance, allocate shards, configure a container, and deploy from your browser.

How it fits together

The installer sets up three pieces on a single Ubuntu host:

Component	Port	Role
Backend (FastAPI)	`8000`	Orchestrates containers and enforces shard limits
Frontend (Next.js)	`3000`	The management panel
`libvgpu` image	—	The slim CUDA image with HAMi-core baked in

Requirements

Ubuntu 22.04 or newer
An NVIDIA GPU with a working driver (verify with nvidia-smi)
Docker (the installer can set this up for you)

GPU Shards interacts with GPU drivers at a low level. Test it on a non-production host first.

Ready? Head to the Quick Start.

Quick Start