v0.6.0
$npx -y @buildinternet/releases show rel_lIkMl27xcB_rAUtUhSNoN Features
- server: flash attention past key values optimization (contributed by @njhill)
- router: remove requests when client closes the connection (co-authored by @njhill)
- server: support quantization for flash models
- router: add info route
- server: optimize token decode
- server: support flash sharded santacoder
- security: image signing with cosign
- security: image analysis with trivy
- docker: improve image size
Fix
- server: check cuda capability before importing flash attention
- server: fix hf_transfer issue with private repositories
- router: add auth token for private tokenizers
Misc
Fetched April 7, 2026