Highlights
- server: add support for flash attention v2
- server: add support for llamav2
Features
- launcher: add debug logs
- server: rework the quantization to support all models
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v0.9.2...v0.9.3