Docs/Troubleshooting

Troubleshooting

Common issues when installing and running GPU Shards.

nvidia-smi works on the host but not in containers

The NVIDIA Container Toolkit is likely not configured as the Docker runtime. Re-run:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Then test:

docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu22.04 nvidia-smi

A container ignores its memory limit

Confirm both pieces are present:

LD_PRELOAD=/libvgpu/build/libvgpu.so is set.
CUDA_DEVICE_MEMORY_LIMIT is set to a value with a unit (e.g. 4096m).

If LD_PRELOAD points at a path that does not exist inside the image, the library is silently skipped and the cap is not applied. Use the provided hami-core-demo:latest image, which has the library at that path.

"permission denied" talking to the Docker daemon

Your user is not in the docker group yet, or you have not started a new session since being added:

sudo usermod -aG docker "$USER"
# then log out and back in

The panel can't reach the backend

Make sure both services are running — the frontend on :3000 and the backend on :8000. If you started them with run.sh, check its output for errors. Confirm nothing else is bound to those ports:

sudo lsof -i :3000 -i :8000

CUDA out of memory immediately on start

The shard is too small for the model plus the CUDA context. Increase CUDA_DEVICE_MEMORY_LIMIT or pick a larger shard in the panel. See Memory Limits & Shards for sizing guidance.

Still stuck?

Check the HAMi-core documentation for the underlying library, or revisit the manual install guide.

Memory Limits & Shards

License