Try out Command R+ with Medusa heads on 4xA100s with:
model=text-generation-inference/commandrplus-medusa
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model --speculate 3 --num-shard 4
--trust-remote-code. by @Narsil in https://github.com/huggingface/text-generation-inference/pull/1704Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v1.4.5...v2.0.0
Fetched April 7, 2026