v0.24.0: Inference, serialization and optimizations
The InferenceClient's chat completion API is now fully compliant with OpenAI client. This means it's a drop-in replacement in your script:
- from openai import OpenAI
+ from huggingface_hub import InferenceClient
- client = OpenAI(
+ client = InferenceClient(
base_url=...,
api_key=...,
)
output = client.chat.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Count to 10"},
],
stream=True,
max_tokens=1024,
)
for chunk in output:
print(chunk.choices[0].delta.content)
Why switching to InferenceClient if you already use OpenAI then? Because it's better integrated with HF services, such as the Serverless Inference API and Dedicated Endpoints. Check out the more detailed answer in this HF Post.
For more details about OpenAI compatibility, check out this guide's section.
InferenceClient improvementsSome new parameters have been added to the InferenceClient, following the latest changes in our Inference API:
prompt_name, truncate and normalize in feature_extractionmodel_id and response_format, in chat_completionadapter_id in text_generationhypothesis_template and multi_labels in zero_shot_classificationOf course, all of those changes are also available in the AsyncInferenceClient async equivalent 🤗
prompt_name to feature-extraction + update types by @Wauplin in #2363adapter_id (text-generation) and response_format (chat-completion) by @Wauplin in #2383Added helpers for TGI servers:
get_endpoint_info to get information about an endpoint (running model, framework, etc.). Only available on TGI/TEI-powered models.health_check to check health status of the server. Only available on TGI/TEI-powered models and only for InferenceEndpoint or local deployment. For serverless InferenceAPI, it's better to use get_model_status.Other fixes:
image_to_text output type has been fixedwait-for-model to avoid been rate limited while model is not loadedproxies supportThe serialization module introduced in v0.22.x has been improved to become the preferred way to serialize a torch model to disk. It handles how of the box sharding and safe serialization (using safetensors) with subtleties to work with shared layers. This logic was previously scattered in libraries like transformers, diffusers, accelerate and safetensors. The goal of centralizing it in huggingface_hub is to allow any external library to safely benefit from the same naming convention, making it easier to manage for end users.
>>> from huggingface_hub import save_torch_model
>>> model = ... # A PyTorch model
# Save state dict to "path/to/folder". The model will be split into shards of 5GB each and saved as safetensors.
>>> save_torch_model(model, "path/to/folder")
# Or save the state dict manually
>>> from huggingface_hub import save_torch_state_dict
>>> save_torch_state_dict(model.state_dict(), "path/to/folder")
More details in the serialization package reference.
save_torch_state_dict + add save_torch_model by @Wauplin in #2373Some helpers related to serialization have been made public for reuse in external libraries:
get_torch_storage_idget_torch_storage_sizemax_shard_size as string in split_state_dict_into_shards_factory by @SunMarc in #2286The HfFileSystem has been improved to optimize calls, especially when listing files from a repo. This is especially useful for large datasets like HuggingFaceFW/fineweb for faster processing and reducing risk of being rate limited.
hf_file_system.py by @lappemic in #2278fs.walk() by @lhoestq in #2346Thanks to @lappemic, HfFileSystem methods are now properly documented. Check it out here!
HfFilesyStem Methods by @lappemic in #2380A new mechanism has been introduced to prevent empty commits if no changes have been detected. Enabled by default in upload_file, upload_folder, create_commit and the huggingface-cli upload command. There is no way to force an empty commit.
Resource Groups allow organizations administrators to group related repositories together, and manage access to those repos. It is now possible to specify a resource group ID when creating a repo:
from huggingface_hub import create_repo
create_repo("my-secret-repo", private=True, resource_group_id="66670e5163145ca562cb1988")
resource_group_id in create_repo by @Wauplin in #2324Webhooks allow you to listen for new changes on specific repos or to all repos belonging to particular set of users/organizations (not just your repos, but any repo). With the Webhooks API you can create, enable, disable, delete, update, and list webhooks from a script!
from huggingface_hub import create_webhook
# Example: Creating a webhook
webhook = create_webhook(
url="https://webhook.site/your-custom-url",
watched=[{"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"}],
domains=["repo", "discussion"],
secret="your-secret"
)
The search API has been slightly improved. It is now possible to:
model_info/list_models (and similarly for datasets/Spaces). For example, you can ask the server to return downloadsAllTime for all models.>>> from huggingface_hub import list_models
>>> for model in list_models(library="transformers", expand="downloadsAllTime", sort="downloads", limit=5):
... print(model.id, model.downloads_all_time)
MIT/ast-finetuned-audioset-10-10-0.4593 1676502301
sentence-transformers/all-MiniLM-L12-v2 115588145
sentence-transformers/all-MiniLM-L6-v2 250790748
google-bert/bert-base-uncased 1476913254
openai/clip-vit-large-patch14 590557280
expand parameter in xxx_info and list_xxxs (model/dataset/Space) by @Wauplin in #2333It is now possible to delete files from a repo using the command line:
Delete a folder:
>>> huggingface-cli repo-files Wauplin/my-cool-model delete folder/
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
Use Unix-style wildcards to delete sets of files:
>>> huggingface-cli repo-files Wauplin/my-cool-model delete *.txt folder/*.bin
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
repo_files command, with recursive deletion. by @OlivierKessler01 in #2280The ModelHubMixin, allowing for quick integration of external libraries with the Hub have been updated to fix some existing bugs and ease its use. Learn how to integrate your library from this guide.
ModelHubMixin siblings by @Wauplin in #2394Efforts from the Korean-speaking community continued to translate guides and package references to KO! Check out the result here.
package_reference/cards.md to Korean by @usr-bin-ksh in #2204package_reference/community.md to Korean by @seoulsky-field in #2183guides/integrations.md to Korean by @cjfghk5697 in #2256package_reference/environment_variables.md to Korean by @jungnerd in #2311package_reference/webhooks_server.md to Korean by @fabxoe in #2344guides/manage-cache.md to Korean by @cjfghk5697 in #2347French documentation is also being updated, thanks to @JibrilEl!
A very nice illustration has been made by @severo to explain how hf:// urls works with the HfFileSystem object. Check it out here!
A few breaking changes have been introduced:
ModelFilter and DatasetFilter are completely removed. You can now pass arguments directly to list_models and list_datasets. This removes one level of complexity for the same result.organization and name from update_repo_visibility. Please use a proper repo_id instead. This makes the method consistent with all other methods from HfApi.These breaking changes have been announced with a regular deprecation cycle.
The legacy_cache_layout parameter (in hf_hub_download/snapshot_download) as well as cached_download, filename_to_url and url_to_filename helpers are now deprecated and will be removed in huggingface_hub==0.26.x. The proper way to download files is to use the current cache system with hf_hub_download/snapshot_download that have been in place for 2 years already.
legacy_cache_layout parameter in hf_hub_download by @Wauplin in #2317.resume() if Inference Endpoint is already running by @Wauplin in #2335docs/README.md by @lappemic in #2382safetensors[torch] by @qgallouedec in #2371The following contributors have made significant changes to the library over the last release:
package_reference/cards.md to Korean (#2204)package_reference/community.md to Korean (#2183)hf_file_system.py (#2278)docs/README.md (#2382)HfFilesyStem Methods (#2380)repo_files command, with recursive deletion. (#2280)guides/integrations.md to Korean (#2256)guides/manage-cache.md to Korean (#2347)package_reference/environment_variables.md to Korean (#2311)package_reference/webhooks_server.md to Korean (#2344)Fetched April 7, 2026