For Linux systems with NVIDIA GPUs, you can run Cake via Docker. The NVIDIA Container Toolkit is required.
docker build -t cake .Set CUDA_COMPUTE_CAP to match your GPU (common values: 75 for RTX 2000, 80 for A100, 86 for RTX 3000, 89 for RTX 4000, 90 for H100):
docker build -t cake --build-arg CUDA_COMPUTE_CAP=86 .Load the entire model in a single container (no cluster):
docker run --rm --gpus all \
-v /path/to/model:/model:ro \
-p 8080:8080 \
cake serve /modelA docker-compose.yml is provided as an example for running a multi-worker cluster. Create a topology-docker.yml mapping layers to the worker-1 / worker-2 service names, place your model data in ./cake-data/, and run:
docker compose up --buildDocker on macOS cannot access Metal GPUs. For Apple Silicon, build and run natively:
cargo build --release --features metal