Features
- router: support vectorized warpers in flash causal lm (co-authored by @jlamypoirier )
- proto: decrease IPC proto size
- benchmarker: add summary tables
- server: support RefinedWeb models
Fix
- server: Fix issue when load AutoModelForSeq2SeqLM model (contributed by @CL-Shang)
New Contributors
Full Changelog: https://github.com/huggingface/text-generation-inference/compare/v0.7.0...v0.8.0