Troubleshooting
Common issues when installing and running GPU Shards.
nvidia-smi works on the host but not in containers
The NVIDIA Container Toolkit is likely not configured as the Docker runtime. Re-run:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Then test:
docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu22.04 nvidia-smi
A container ignores its memory limit
Confirm both pieces are present:
LD_PRELOAD=/libvgpu/build/libvgpu.sois set.CUDA_DEVICE_MEMORY_LIMITis set to a value with a unit (e.g.4096m).
If LD_PRELOAD points at a path that does not exist inside the image, the library is
silently skipped and the cap is not applied. Use the provided hami-core-demo:latest
image, which has the library at that path.
"permission denied" talking to the Docker daemon
Your user is not in the docker group yet, or you have not started a new session
since being added:
sudo usermod -aG docker "$USER"
# then log out and back in
The panel can't reach the backend
Make sure both services are running — the frontend on :3000 and the backend on
:8000. If you started them with run.sh, check its output for errors. Confirm
nothing else is bound to those ports:
sudo lsof -i :3000 -i :8000
CUDA out of memory immediately on start
The shard is too small for the model plus the CUDA context. Increase
CUDA_DEVICE_MEMORY_LIMIT or pick a larger shard in the panel. See
Memory Limits & Shards for sizing guidance.
Still stuck?
Check the HAMi-core documentation for the underlying library, or revisit the manual install guide.