After more than a year of development, we're excited to announce the release of 🤗 Transformers.js v3!
You can get started by installing Transformers.js v3 from NPM using:
npm i @huggingface/transformers
Then, importing the library with
import { pipeline } from "@huggingface/transformers";
or, via a CDN
import { pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0";
For more information, check out the documentation.
WebGPU is a new web standard for accelerated graphics and compute. The API enables web developers to use the underlying system's GPU to carry out high-performance computations directly in the browser. WebGPU is the successor to WebGL and provides significantly better performance, because it allows for more direct interaction with modern GPUs. Lastly, it supports general-purpose GPU computations, which makes it just perfect for machine learning!
[!WARNING]
As of October 2024, global WebGPU support is around 70% (according to caniuse.com), meaning some users may not be able to use the API.If the following demos do not work in your browser, you may need to enable it using a feature flag:
Thanks to our collaboration with ONNX Runtime Web, enabling WebGPU acceleration is as simple as setting device: 'webgpu' when loading a model. Let's see some examples!
Example: Compute text embeddings on WebGPU (demo)
import { pipeline } from "@huggingface/transformers";
// Create a feature-extraction pipeline
const extractor = await pipeline(
"feature-extraction",
"mixedbread-ai/mxbai-embed-xsmall-v1",
{ device: "webgpu" },
});
// Compute embeddings
const texts = ["Hello world!", "This is an example sentence."];
const embeddings = await extractor(texts, { pooling: "mean", normalize: true });
console.log(embeddings.tolist());
// [
// [-0.016986183822155, 0.03228696808218956, -0.0013630966423079371, ... ],
// [0.09050482511520386, 0.07207386940717697, 0.05762749910354614, ... ],
// ]
Example: Perform automatic speech recognition with OpenAI whisper on WebGPU (demo)
import { pipeline } from "@huggingface/transformers";
// Create automatic speech recognition pipeline
const transcriber = await pipeline(
"automatic-speech-recognition",
"onnx-community/whisper-tiny.en",
{ device: "webgpu" },
);
// Transcribe audio from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";
const output = await transcriber(url);
console.log(output);
// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' }
Example: Perform image classification with MobileNetV4 on WebGPU (demo)
import { pipeline } from "@huggingface/transformers";
// Create image classification pipeline
const classifier = await pipeline(
"image-classification",
"onnx-community/mobilenetv4_conv_small.e2400_r224_in1k",
{ device: "webgpu" },
);
// Classify an image from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg";
const output = await classifier(url);
console.log(output);
// [
// { label: 'tiger, Panthera tigris', score: 0.6149784922599792 },
// { label: 'tiger cat', score: 0.30281734466552734 },
// { label: 'tabby, tabby cat', score: 0.0019135422771796584 },
// { label: 'lynx, catamount', score: 0.0012161266058683395 },
// { label: 'Egyptian cat', score: 0.0011465961579233408 }
// ]
Before Transformers.js v3, we used the quantized option to specify whether to use a quantized (q8) or full-precision (fp32) variant of the model by setting quantized to true or false, respectively. Now, we've added the ability to select from a much larger list with the dtype parameter.
The list of available quantizations depends on the model, but some common ones are: full-precision ("fp32"), half-precision ("fp16"), 8-bit ("q8", "int8", "uint8"), and 4-bit ("q4", "bnb4", "q4f16").
Example: Run Qwen2.5-0.5B-Instruct in 4-bit quantization (demo)
import { pipeline } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"onnx-community/Qwen2.5-0.5B-Instruct",
{ dtype: "q4", device: "webgpu" },
);
// Define the list of messages
const messages = [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Tell me a funny joke." },
];
// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);
Some encoder-decoder models, like Whisper or Florence-2, are extremely sensitive to quantization settings: especially of the encoder. For this reason, we added the ability to select per-module dtypes, which can be done by providing a mapping from module name to dtype.
Example: Run Florence-2 on WebGPU (demo)
import { Florence2ForConditionalGeneration } from "@huggingface/transformers";
const model = await Florence2ForConditionalGeneration.from_pretrained(
"onnx-community/Florence-2-base-ft",
{
dtype: {
embed_tokens: "fp16",
vision_encoder: "fp16",
encoder_model: "q4",
decoder_model_merged: "q4",
},
device: "webgpu",
},
);
<p align="middle">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/florence-2-webgpu.gif" alt="Florence-2 running on WebGPU" />
</p>
<details>
<summary>
See full code example
</summary>
import {
Florence2ForConditionalGeneration,
AutoProcessor,
AutoTokenizer,
RawImage,
} from "@huggingface/transformers";
// Load model, processor, and tokenizer
const model_id = "onnx-community/Florence-2-base-ft";
const model = await Florence2ForConditionalGeneration.from_pretrained(
model_id,
{
dtype: {
embed_tokens: "fp16",
vision_encoder: "fp16",
encoder_model: "q4",
decoder_model_merged: "q4",
},
device: "webgpu",
},
);
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
// Load image and prepare vision inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);
// Specify task and prepare text inputs
const task = "<MORE_DETAILED_CAPTION>";
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);
// Generate text
const generated_ids = await model.generate({
...text_inputs,
...vision_inputs,
max_new_tokens: 100,
});
// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, {
skip_special_tokens: false,
})[0];
// Post-process the generated text
const result = processor.post_process_generation(
generated_text,
task,
image.size,
);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }
</details>
This release increases the total number of supported architectures to 120 (see full list), spanning a wide range of input modalities and tasks. Notable new names include: Phi-3, Gemma & Gemma 2, LLaVa, Moondream, Florence-2, MusicGen, Sapiens, Depth Pro, PyAnnote, and RT-DETR.
<p align="middle"> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/architectures.png" alt="Bubble diagram of new architectures in Transformers.js v3" /> </p> <details> <summary>List of new models</summary>As part of the release, we've published 25 new example projects and templates, primarily focused on showcasing WebGPU support! This includes demos like Phi-3.5 WebGPU and Whisper WebGPU, as shown below.
[!NOTE]
We're in the process of moving all our example projects and demos to https://github.com/huggingface/transformers.js-examples, so stay tuned for updates on this!
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/phi-3.5-webgpu.gif" style="max-height: 500px;" alt="Phi-3.5 running on WebGPU" /> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/transformersjs-v3/whisper-turbo-webgpu.gif" style="max-height: 500px;" alt="Whisper Turbo running on WebGPU" /> |
|---|
As of today's release, the community has converted over 1200 models to be compatible with Transformers.js! You can find the full list of available models here.
If you'd like to convert your own models or fine-tunes, you can use our conversion script as follows:
python -m scripts.convert --quantize --model_id <model_name_or_path>
After uploading the generated files to the Hugging Face Hub, remember to add the transformers.js tag so others can easily find and use your model!
Transformers.js v3 is now compatible with the three most popular server-side JavaScript runtimes:
| Runtime | Description | Examples |
|---|---|---|
| Node.js | A widely-used JavaScript runtime built on Chrome's V8. It has a large ecosystem and supports a wide range of libraries and frameworks. | ESM Example / CJS Example |
| Deno | A modern runtime for JavaScript and TypeScript that is secure by default. It uses ES modules and even features experimental WebGPU support. | Deno Example |
| Bun | A fast JavaScript runtime optimized for performance. It features a built-in bundler, transpiler, and package manager. | Bun Example |
Finally, we're delighted to announce that Transformers.js will now be published under the official Hugging Face organization on NPM as @huggingface/transformers (instead of @xenova/transformers, which was used for v1 and v2).
We've also moved the repository to the official Hugging Face organization on GitHub (https://github.com/huggingface/transformers.js), which will be our new home — come say hi! We look forward to hearing your feedback, responding to your issues, and reviewing your PRs!
This is a significant milestone and we're extremely grateful to the community for helping us achieve this long-term goal! None of this would be possible without all of you… thank you! 🤗
Full Changelog: https://github.com/huggingface/transformers.js/compare/2.17.2...3.0.0
Fetched April 7, 2026