Hugging Face/Text Generation Inference

Text Generation Inference

$npx -y @buildinternet/releases show text-generation-inference

Sun

Mon

Tue

Wed

Thu

Fri

Sat

AprMayJunJulAugSepOctNovDecJanFebMarApr

Less

Releases2Avg0/wkVersionsv3.3.6 → v3.3.7

Mar 26, 2023

Features

server: New faster GPTNeoX implementation based on flash attention

Fix

server: fix input-length discrepancy between Rust and Python tokenizers

Mar 9, 2023

Features

router: support best_of sampling
router: support left truncation
server: support typical sampling
launcher: allow local models
clients: add text-generation Python client
launcher: allow parsing num_shard from CUDA_VISIBLE_DEVICES

Fix

server: do not warp prefill logits
server: fix formatting issues in generate_stream tokens
server: fix galactica batch
server: fix index out of range issue with watermarking

Mar 3, 2023

Features

router: add support for huggingface api-inference
server: add logits watermark with "A Watermark for Large Language Models"
server: use a fixed transformers commit

Fix

launcher: add missing parameters to launcher
server: update to hf_transfer==0.1.2 to fix corrupted files issue

Feb 24, 2023

Features

server: allocate full attention mask to decrease latency
server: enable hf-transfer for insane download speeds
router: add CORS options

Fix

server: remove position_ids from galactica forward

Feb 16, 2023

Features

server: support t5 models
router: add max_total_tokens and empty_input validation
launcher: add the possibility to disable custom CUDA kernels
server: add automatic safetensors conversion
router: add prometheus scrape endpoint
server, router: add distributed tracing

Fix

launcher: copy current env vars to subprocesses
docker: add note around shared memory

Feb 7, 2023

Fix

server: fix bug with repetition penalty when using GPUs and inference mode

Feb 3, 2023

Features

router: support Token streaming using Server Side Events
router: support seeding
server: support gpt-neox
server: support santacoder
server: support repetition penalty
server: allow the server to use a local weight cache

Breaking changes

router: refactor Token API
router: modify /generate API to only return generated text

Misc

router: use background task to manage request queue
ci: docker build/push on update

Previous 2 3 4Next

Similar releases

Other sources from this team

Similar sources

Latest

v3.3.7

Source

@huggingface/text-generation-inference

Tracking Since

Feb 3, 2023

Last checked Apr 21, 2026

.json·.md·.atom

Text Generation Inference

Features

Fix

Features

Fix

Features

Fix

Features

Fix

Features

Fix

Fix

Features

Breaking changes

Misc

More from this team

Similar releases

Other sources from this team

Similar sources

Similar sources

Other sources from this team

Similar releases

More from this team