releases.shpreview
Hugging Face/Transformers.js

Transformers.js

$npx -y @buildinternet/releases show transformers-js
Mon
Wed
Fri
AprMayJunJulAugSepOctNovDecJanFebMarApr
Less
More
Releases0Avg0/wk
Mar 30, 2026

🚀 Transformers.js v4

We're excited to announce that Transformers.js v4 is now available on NPM! After a year of development (we started in March 2025 🤯), we're finally ready for you to use it.

npm i @huggingface/transformers

Links: YouTube Video, Blog Post, Demo Collection

New WebGPU backend

The biggest change is undoubtedly the adoption of a new WebGPU Runtime, completely rewritten in C++. We've worked closely with the ONNX Runtime team to thoroughly test this runtime across our ~200 supported model architectures, as well as many new v4-exclusive architectures.

In addition to better operator support (for performance, accuracy, and coverage), this new WebGPU runtime allows the same transformers.js code to be used across a wide variety of JavaScript environments, including browsers, server-side runtimes, and desktop applications. That's right, you can now run WebGPU-accelerated models directly in Node, Bun, and Deno!

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformersjs-v4/webgpu.png" alt="WebGPU Overview" width="100%">

We've proven that it's possible to run state-of-the-art AI models 100% locally in the browser, and now we're focused on performance: making these models run as fast as possible, even in resource-constrained environments. This required completely rethinking our export strategy, especially for large language models. We achieve this by re-implementing new models operation by operation, leveraging specialized ONNX Runtime Contrib Operators like com.microsoft.GroupQueryAttention, com.microsoft.MatMulNBits, or com.microsoft.QMoE to maximize performance.

For example, adopting the com.microsoft.MultiHeadAttention operator, we were able to achieve a ~4x speedup for BERT-based embedding models.

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformersjs-v4/speedups.png" alt="Optimized ONNX Exports" width="100%">

New models

Thanks to our new export strategy and ONNX Runtime's expanding support for custom operators, we've been able to add many new models and architectures to Transformers.js v4. These include popular models like GPT-OSS, Chatterbox, GraniteMoeHybrid, LFM2-MoE, HunYuanDenseV1, Apertus, Olmo3, FalconH1, and Youtu-LLM. Many of these required us to implement support for advanced architectural patterns, including Mamba (state-space models), Multi-head Latent Attention (MLA), and Mixture of Experts (MoE). Perhaps most importantly, these models are all compatible with WebGPU, allowing users to run them directly in the browser or server-side JavaScript environments with hardware acceleration. We've released several Transformers.js v4 demos so far... and we'll continue to release more!

Additionally, we've added support for larger models exceeding 8B parameters. In our tests, we've been able to run GPT-OSS 20B (q4f16) at ~60 tokens per second on an M4 Pro Max.

New features

ModelRegistry

The new ModelRegistry API is designed for production workflows. It provides explicit visibility into pipeline assets before loading anything: list required files with get_pipeline_files, inspect per-file metadata with get_file_metadata (quite useful to calculate total download size), check cache status with is_pipeline_cached, and clear cached artifacts with clear_pipeline_cache. You can also query available precision types for a model with get_available_dtypes. Based on this new API, progress_callback now includes a progress_total event, making it easy to render end-to-end loading progress without manually aggregating per-file updates.

<details> <summary>See `ModelRegistry` examples</summary>
import { ModelRegistry, pipeline } from "@huggingface/transformers";

const modelId = "onnx-community/all-MiniLM-L6-v2-ONNX";
const modelOptions = { dtype: "fp32" };

const files = await ModelRegistry.get_pipeline_files(
  "feature-extraction",
  modelId,
  modelOptions
);
// ['config.json', 'onnx/model.onnx', ..., 'tokenizer_config.json']

const metadata = await Promise.all(
  files.map(file => ModelRegistry.get_file_metadata(modelId, file))
);

const downloadSize = metadata.reduce((total, item) => total + item.size, 0);

const cached = await ModelRegistry.is_pipeline_cached(
  "feature-extraction",
  modelId,
  modelOptions
);

const dtypes = await ModelRegistry.get_available_dtypes(modelId);
// ['fp32', 'fp16', 'q4', 'q4f16']

if (cached) {
  await ModelRegistry.clear_pipeline_cache(
    "feature-extraction",
    modelId,
    modelOptions
  );
}

const pipe = await pipeline(
  "feature-extraction",
  modelId,
  {
    progress_callback: e => {
      if (e.status === "progress_total") {
        console.log(`${Math.round(e.progress)}%`);
      }
    },
  }
);
</details>

New Environment Settings

We also added new environment controls for model loading. env.useWasmCache enables caching of WASM runtime files (when cache storage is available), allowing applications to work fully offline after the initial load.

env.fetch lets you provide a custom fetch implementation for use cases such as authenticated model access, custom headers, and abortable requests.

<details> <summary>See env examples</summary>
import { env } from "@huggingface/transformers";

env.useWasmCache = true;

env.fetch = (url, options) =>
  fetch(url, {
    ...options,
    headers: {
      ...options?.headers,
      Authorization: `Bearer ${MY_TOKEN}`,
    },
  });
</details>

Improved Logging Controls

Finally, logging is easier to manage in real-world deployments. ONNX Runtime WebGPU warnings are now hidden by default, and you can set explicit verbosity levels for both Transformers.js and ONNX Runtime. This update, also driven by community feedback, keeps console output focused on actionable signals rather than low-value noise.

<details> <summary>See `logLevel` example</summary>
import { env, LogLevel } from "@huggingface/transformers";

// LogLevel.DEBUG
// LogLevel.INFO
// LogLevel.WARNING
// LogLevel.ERROR
// LogLevel.NONE

env.logLevel = LogLevel.WARNING;
</details>

Repository Restructuring

Developing a new major version gave us the opportunity to invest in the codebase and tackle long-overdue refactoring efforts.

PNPM Workspaces

Until now, the GitHub repository served as our npm package. This worked well as long as the repository only exposed a single library. However, looking to the future, we saw the need for various sub-packages that depend heavily on the Transformers.js core while addressing different use cases, like library-specific implementations, or smaller utilities that most users don't need but are essential for some.

That's why we converted the repository to a monorepo using pnpm workspaces. This allows us to ship smaller packages that depend on @huggingface/transformers without the overhead of maintaining separate repositories.

Modular Class Structure

Another major refactoring effort targeted the ever-growing models.js file. In v3, all available models were defined in a single file spanning over 8,000 lines, becoming increasingly difficult to maintain. For v4, we split this into smaller, focused modules with a clear distinction between utility functions, core logic, and model-specific implementations. This new structure improves readability and makes it much easier to add new models. Developers can now focus on model-specific logic without navigating through thousands of lines of unrelated code.

Examples Repository

In v3, many Transformers.js example projects lived directly in the main repository. For v4, we've moved them to a dedicated repository, allowing us to maintain a cleaner codebase focused on the core library. This also makes it easier for users to find and contribute to examples without sifting through the main repository.

Prettier

We updated the Prettier configuration and reformatted all files in the repository. This ensures consistent formatting throughout the codebase, with all future PRs automatically following the same style. No more debates about formatting... Prettier handles it all, keeping the code clean and readable for everyone.

Standalone Tokenizers.js Library

A frequent request from users was to extract the tokenization logic into a separate library, and with v4, that's exactly what we've done. @huggingface/tokenizers is a complete refactor of the tokenization logic, designed to work seamlessly across browsers and server-side runtimes. At just 8.8kB (gzipped) with zero dependencies, it's incredibly lightweight while remaining fully type-safe.

<details> <summary>See example code</summary>
import { Tokenizer } from "@huggingface/tokenizers";

// Load from Hugging Face Hub
const modelId = "HuggingFaceTB/SmolLM3-3B";
const tokenizerJson = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer.json`
).then(res => res.json());

const tokenizerConfig = await fetch(
  `https://huggingface.co/${modelId}/resolve/main/tokenizer_config.json`
).then(res => res.json());

// Create tokenizer
const tokenizer = new Tokenizer(tokenizerJson, tokenizerConfig);

// Tokenize text
const tokens = tokenizer.tokenize("Hello World");
// ['Hello', 'ĠWorld']

const encoded = tokenizer.encode("Hello World");
// { ids: [9906, 4435], tokens: ['Hello', 'ĠWorld'], ... }
</details>

This separation keeps the core of Transformers.js focused and lean while offering a versatile, standalone tool that any WebML project can use independently.

New build system

We've migrated our build system from Webpack to esbuild, and the results have been incredible. Build times dropped from 2 seconds to just 200 milliseconds, a 10x improvement that makes development iteration significantly faster. Speed isn't the only benefit, though: bundle sizes also decreased by an average of 10% across all builds. The most notable improvement is in transformers.web.js, our default export, which is now 53% smaller, meaning faster downloads and quicker startup times for users.

Improved types

We've made several quality-of-life improvements across the library. The type system has been enhanced with dynamic pipeline types that adapt based on inputs, providing better developer experience and type safety.

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformersjs-v4/types.png" alt="Type Improvements" width="100%">

Bug fixes

Documentation improvements

Miscellaneous improvements

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.8.1...4.0.0

Nov 19, 2025

🚀 Transformers.js v3.8 — SAM2, SAM3, EdgeTAM, Supertonic TTS

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.7.6...3.8.0

Oct 20, 2025

What's new?

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.7.5...3.7.6

Sep 29, 2025

What's new?

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.7.3...3.7.4

Sep 12, 2025

What's new?

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.7.2...3.7.3

Aug 15, 2025

What's new?

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.7.1...3.7.2

Aug 1, 2025
Jul 23, 2025

🚀 Transformers.js v3.7 — Voxtral, LFM2, ModernBERT Decoder

🤖 New models

This update adds support for 3 new architectures:

  • Voxtral
  • LFM2
  • ModernBERT Decoder
<h3 id="new-models-voxtral">Voxtral</h2>

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. ONNX weights for Voxtral-Mini-3B-2507 can be found here. Learn more about Voxtral in the release blog post.

Try it out with our online demo:

https://github.com/user-attachments/assets/e1b95fe1-461d-4cb9-8fe8-5ec17e6c93f0

Example: Audio transcription

import { VoxtralForConditionalGeneration, VoxtralProcessor, TextStreamer, read_audio } from "@huggingface/transformers";

// Load the processor and model
const model_id = "onnx-community/Voxtral-Mini-3B-2507-ONNX";
const processor = await VoxtralProcessor.from_pretrained(model_id);
const model = await VoxtralForConditionalGeneration.from_pretrained(
    model_id,
    {
        dtype: {
            embed_tokens: "fp16", // "fp32", "fp16", "q8", "q4"
            audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
            decoder_model_merged: "q4", // "q4", "q4f16"
        },
        device: "webgpu",
    },
);

// Prepare the conversation
const conversation = [
    {
        "role": "user",
        "content": [
            { "type": "audio" },
            { "type": "text", "text": "lang:en [TRANSCRIBE]" },
        ],
    }
];
const text = processor.apply_chat_template(conversation, { tokenize: false });
const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
const inputs = await processor(text, audio);

// Generate the response
const generated_ids = await model.generate({
    ...inputs,
    max_new_tokens: 256,
    streamer: new TextStreamer(processor.tokenizer, { skip_special_tokens: true, skip_prompt: true }),
});

// Decode the generated tokens
const new_tokens = generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]);
const generated_texts = processor.batch_decode(
    new_tokens,
    { skip_special_tokens: true },
);
console.log(generated_texts[0]);
// I have a dream that one day this nation will rise up and live out the true meaning of its creed.

Added in https://github.com/huggingface/transformers.js/pull/1373 and https://github.com/huggingface/transformers.js/pull/1375.

<h3 id="new-models-lfm2">LFM2</h2>

LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.

The models, which we have converted to ONNX, come in three different sizes: 350M, 700M, and 1.2B parameters.

Example: Text-generation with LFM2-350M:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/LFM2-350M-ONNX",
  { dtype: "q4" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" },
];

// Generate a response
const output = await generator(messages, {
    max_new_tokens: 512,
    do_sample: false,
    streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text.at(-1).content);
// The capital of France is Paris. It is a vibrant city known for its historical landmarks, art, fashion, and gastronomy.

Added in https://github.com/huggingface/transformers.js/pull/1367 and https://github.com/huggingface/transformers.js/pull/1369.

<h3 id="new-models-modernbert-decoder">ModernBERT Decoder</h2>

These models form part of the Ettin suite: the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.

The list of supported models can be found here.

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/ettin-decoder-150m-ONNX",
  { dtype: "fp32" },
);

// Generate a response
const text = "Q: What is the capital of France?\nA:";
const output = await generator(text, {
  max_new_tokens: 128,
  streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text);

Added in https://github.com/huggingface/transformers.js/pull/1371.

🛠️ Other improvements

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.6.3...3.7.0

Jul 11, 2025

What's new?

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.6.2...3.6.3

Jul 8, 2025

What's new?

  • Add support for SmolLM3 in https://github.com/huggingface/transformers.js/pull/1359

    SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.

    <img src="https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/zy0dqTCCt5IHmuzwoqtJ9.png" />

    Example:

    import { pipeline, TextStreamer } from "@huggingface/transformers";
    
    // Create a text generation pipeline
    const generator = await pipeline(
      "text-generation",
      "HuggingFaceTB/SmolLM3-3B-ONNX",
      { dtype: "q4f16" },
    );
    
    // Define the list of messages
    const messages = [
      { role: "system", content: "You are SmolLM, a language model created by Hugging Face. If asked by the user, here is some information about you: SmolLM has 3 billion parameters and can converse in 6 languages: English, Spanish, German, French, Italian, and Portuguese. SmolLM is a fully open model and was trained on a diverse mix of public datasets./think" },
      { role: "user", content: "Solve the equation x^2 - 3x + 2 = 0" },
    ];
    
    // Generate a response
    const output = await generator(messages, {
        max_new_tokens: 1024,
        do_sample: false,
        streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
    });
    console.log(output[0].generated_text.at(-1).content);
  • Add support for ERNIE-4.5 in https://github.com/huggingface/transformers.js/pull/1354 Example:

    import { pipeline, TextStreamer } from "@huggingface/transformers";
    
    // Create a text generation pipeline
    const generator = await pipeline(
      "text-generation",
      "onnx-community/ERNIE-4.5-0.3B-ONNX",
      { dtype: "fp32" }, // Options: "fp32", "fp16", "q8", "q4", "q4f16"
    );
    
    // Define the list of messages
    const messages = [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "What is the capital of France?" },
    ];
    
    // Generate a response
    const output = await generator(messages, {
        max_new_tokens: 512,
        do_sample: false,
        streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
    });
    console.log(output[0].generated_text.at(-1).content);
    // The capital of France is Paris.

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.6.1...3.6.2

Jul 2, 2025

What's new?

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.6.0...3.6.1

Jun 26, 2025

🚀 Transformers.js v3.6 — Gemma 3n, Qwen3-Embedding, Llava-Qwen2

  • 🤖 New models

    • Gemma 3n
    • Qwen3-Embedding
    • Llava-Qwen2
  • 🛠️ Other improvements

<h2 id="new-models">🤖 New models</h2> <h3 id="new-models-gemma3n">Gemma 3n</h2>

Gemma 3n, which was announced as a preview during Google I/O, is a model that is designed from the ground up to run locally on your hardware. On top of that, it's natively multimodal, supporting image, text, audio, and video inputs 🤯

Gemma 3n models have multiple architecture innovations:

  • They are available in two sizes based on effective parameters. While the raw parameter count of this model is 6B, the architecture design allows the model to be run with a memory footprint comparable to a traditional 2B model by offloading low-utilization matrices from the accelerator.
  • They use a MatFormer architecture that allows nesting sub-models within the E4B model. We provide one sub-model (this model repository), or you can access a spectrum of custom-sized models using the Mix-and-Match method.

Learn more about these techniques in the technical blog post and the Gemma documentation.

As part of the release, we are releasing ONNX weights for the gemma-3n-E2B-it variant (link), making it compatible with Transformers.js:

[!WARNING]
Due to the model's large size, we currently only support Node.js, Deno, and Bun execution. In-browser WebGPU support is actively being worked on, so stay tuned for an update!

Example: Caption an image

import {
  AutoProcessor,
  AutoModelForImageTextToText,
  load_image,
  TextStreamer,
} from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/gemma-3n-E2B-it-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await AutoModelForImageTextToText.from_pretrained(model_id, {
  dtype: {
    embed_tokens: "q8",
    audio_encoder: "q8",
    vision_encoder: "fp16",
    decoder_model_merged: "q4",
  },
  device: "cpu", // NOTE: WebGPU support coming soon!
});

// Prepare prompt
const messages = [
  {
    role: "user",
    content: [
      { type: "image" },
      { type: "text", text: "Describe this image in detail." },
    ],
  },
];
const prompt = processor.apply_chat_template(messages, {
  add_generation_prompt: true,
});

// Prepare inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg";
const image = await load_image(url);
const audio = null;
const inputs = await processor(prompt, image, audio, {
  add_special_tokens: false,
});

// Generate output
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: false,
  streamer: new TextStreamer(processor.tokenizer, {
    skip_prompt: true,
    skip_special_tokens: false,
    // callback_function: (text) => { /* Do something with the streamed output */ },
  }),
});

// Decode output
const decoded = processor.batch_decode(
  outputs.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: true },
);
console.log(decoded[0]);
<details> <summary>See example output</summary>
The image is a close-up, slightly macro shot of a cluster of vibrant pink cosmos flowers in full bloom. The flowers are the focal point, with their delicate, slightly ruffled petals radiating outwards. They have a soft, almost pastel pink hue, and their edges are subtly veined. 

A small, dark-colored bee is actively visiting one of the pink flowers, its body positioned near the center of the bloom. The bee appears to be collecting pollen or nectar. 

The flowers are attached to slender, brownish-green stems, and some of the surrounding foliage is visible in a blurred background, suggesting a natural outdoor setting. There are also hints of other flowers in the background, including some red ones, adding a touch of contrast to the pink. 

The lighting in the image seems to be natural daylight, casting soft shadows and highlighting the textures of the petals and the bee. The overall impression is one of delicate beauty and the gentle activity of nature.
</details>

Example: Transcribe audio

import {
  AutoProcessor,
  AutoModelForImageTextToText,
  TextStreamer,
} from "@huggingface/transformers";
import wavefile from "wavefile";

// Load processor and model
const model_id = "onnx-community/gemma-3n-E2B-it-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await AutoModelForImageTextToText.from_pretrained(model_id, {
  dtype: {
    embed_tokens: "q8",
    audio_encoder: "q4",
    vision_encoder: "fp16",
    decoder_model_merged: "q4",
  },
  device: "cpu", // NOTE: WebGPU support coming soon!
});

// Prepare prompt
const messages = [
  {
    role: "user",
    content: [
      { type: "audio" },
      { type: "text", text: "Transcribe this audio verbatim." },
    ],
  },
];
const prompt = processor.apply_chat_template(messages, {
  add_generation_prompt: true,
});

// Prepare inputs
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";
const buffer = Buffer.from(await fetch(url).then((x) => x.arrayBuffer()));
const wav = new wavefile.WaveFile(buffer);
wav.toBitDepth("32f"); // Pipeline expects input as a Float32Array
wav.toSampleRate(processor.feature_extractor.config.sampling_rate);
let audioData = wav.getSamples();
if (Array.isArray(audioData)) {
  if (audioData.length > 1) {
    for (let i = 0; i < audioData[0].length; ++i) {
      audioData[0][i] = (Math.sqrt(2) * (audioData[0][i] + audioData[1][i])) / 2;
    }
  }
  audioData = audioData[0];
}

const image = null;
const audio = audioData;
const inputs = await processor(prompt, image, audio, {
  add_special_tokens: false,
});

// Generate output
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: false,
  streamer: new TextStreamer(processor.tokenizer, {
    skip_prompt: true,
    skip_special_tokens: false,
    // callback_function: (text) => { /* Do something with the streamed output */ },
  }),
});

// Decode output
const decoded = processor.batch_decode(
  outputs.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: true },
);
console.log(decoded[0]);
<details> <summary>See example output</summary>
And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country.
</details> <h3 id="new-models-qwen3-embed">Qwen3-Embedding</h2>

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model.

You can run it with Transformers.js as follows:

import { pipeline, matmul } from "@huggingface/transformers";

// Create a feature extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "onnx-community/Qwen3-Embedding-0.6B-ONNX",
  {
    dtype: "fp32", // Options: "fp32", "fp16", "q8"
    // device: "webgpu",
  },
);

function get_detailed_instruct(task_description, query) {
  return `Instruct: ${task_description}\nQuery:${query}`;
}

// Each query must come with a one-sentence instruction that describes the task
const task = "Given a web search query, retrieve relevant passages that answer the query";
const queries = [
  get_detailed_instruct(task, "What is the capital of China?"),
  get_detailed_instruct(task, "Explain gravity"),
];

// No need to add instruction for retrieval documents
const documents = [
  "The capital of China is Beijing.",
  "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
];
const input_texts = [...queries, ...documents];

// Extract embeddings for queries and documents
const output = await extractor(input_texts, {
  pooling: "last_token",
  normalize: true,
});
const scores = await matmul(
  output.slice([0, queries.length]), // Query embeddings
  output.slice([queries.length, null]).transpose(1, 0), // Document embeddings
);
console.log(scores.tolist());
// [
//   [ 0.7645590305328369, 0.14142560958862305 ],
//   [ 0.13549776375293732, 0.599955141544342 ]
// ]
<h3 id="new-models-llava-qwen2">Llava-Qwen2</h2>

Finally, we also added support for Llava models with a Qwen2 text backbone:

import {
  AutoProcessor,
  AutoModelForImageTextToText,
  load_image,
  TextStreamer,
} from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/FastVLM-0.5B-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await AutoModelForImageTextToText.from_pretrained(model_id, {
  dtype: {
    embed_tokens: "fp16",
    vision_encoder: "q4",
    decoder_model_merged: "q4",
  },
});

// Prepare prompt
const messages = [
  {
    role: "user",
    content: "<image>Describe this image in detail.",
  },
];
const prompt = processor.apply_chat_template(messages, {
  add_generation_prompt: true,
});

// Prepare inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg";
const image = await load_image(url);
const inputs = await processor(image, prompt, {
  add_special_tokens: false,
});

// Generate output
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: false,
  streamer: new TextStreamer(processor.tokenizer, {
    skip_prompt: true,
    skip_special_tokens: false,
    // callback_function: (text) => { /* Do something with the streamed output */ },
  }),
});

// Decode output
const decoded = processor.batch_decode(
  outputs.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: true },
);
console.log(decoded[0]);
<details> <summary>See here for example output</summary>
The image depicts a vibrant and colorful scene featuring a variety of flowers and plants. The main focus is on a striking pink flower with a dark center, which appears to be a type of petunia. The petals are a rich, deep pink, and the flower has a classic, slightly ruffled appearance. The dark center of the flower is a contrasting color, likely a deep purple or black, which adds to the flower's visual appeal.

In the background, there are several other flowers and plants, each with their unique colors and shapes. To the left, there is a red flower with a bright, vivid hue, which stands out against the pink flower. The red flower has a more rounded shape and a lighter center, with petals that are a lighter shade of red compared to the pink flower.

To the right of the pink flower, there is a plant with red flowers, which are smaller and more densely packed. The red flowers are a deep, rich red color, and they have a more compact shape compared to the pink flower.

In the foreground, there is a green plant with a few leaves and a few small flowers. The leaves are a bright green color, and the flowers are a lighter shade of green, with a few petals that are slightly open.

Overall, the image is a beautiful representation of a garden or natural setting, with a variety of flowers and plants that are in full bloom. The colors are vibrant and the composition is well-balanced, with the pink flower in the center drawing the viewer's attention.
</details> <h2 id="other-improvements">🛠️ Other improvements</h2>

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.5.2...3.6.0

May 30, 2025

What's new?

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.5.1...3.5.2

May 3, 2025

What's new?

New Contributors

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.5.0...3.5.1

Apr 16, 2025

🔥 Transformers.js v3.5

<h2 id="improvements">🛠️ Improvements</h2> <h2 id="new-contributors">🤗 New contributors</h2>

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.4.2...3.5.0

Apr 2, 2025

What's new?

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.4.1...3.4.2

Mar 25, 2025

What's new?

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.4.0...3.4.1

Mar 7, 2025

🚀 Transformers.js v3.4 — Background Removal Pipeline, Ultravox DAC, Mimi, SmolVLM2, LiteWhisper.

  • 🖼️ Background Removal Pipeline
  • 🤖 New models: Ultravox DAC, Mimi, SmolVLM2, LiteWhisper
  • 🛠️ Other improvements
  • 🤗 New contributors
<h2 id="new-pipeline">🖼️ New Background Removal Pipeline</h2>

Removing backgrounds from images is now as easy as:

import { pipeline } from "@huggingface/transformers";
const segmenter = await pipeline("background-removal", "onnx-community/BEN2-ONNX");
const output = await segmenter("input.png");
output[0].save("output.png"); // (Optional) Save the image

You can find the full list of compatible models here, which will continue to grow in future! 🔥 For more information, check out https://github.com/huggingface/transformers.js/pull/1216.

<h2 id="new-models">🤖 New models</h2>
  • Ultravox for audio-text-to-text generation (https://github.com/huggingface/transformers.js/pull/1207). See here for the list of supported models.

    <details> <summary> See example usage </summary>
    import { UltravoxProcessor, UltravoxModel, read_audio } from "@huggingface/transformers";
    
    const processor = await UltravoxProcessor.from_pretrained(
      "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
    );
    const model = await UltravoxModel.from_pretrained(
      "onnx-community/ultravox-v0_5-llama-3_2-1b-ONNX",
      {
        dtype: {
          embed_tokens: "q8", // "fp32", "fp16", "q8"
          audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
          decoder_model_merged: "q4", // "q8", "q4", "q4f16"
        },
      },
    );
    
    const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
    const messages = [
      {
        role: "system",
        content: "You are a helpful assistant.",
      },
      { role: "user", content: "Transcribe this audio:<|audio|>" },
    ];
    const text = processor.tokenizer.apply_chat_template(messages, {
      add_generation_prompt: true,
      tokenize: false,
    });
    
    const inputs = await processor(text, audio);
    const generated_ids = await model.generate({
      ...inputs,
      max_new_tokens: 128,
    });
    
    const generated_texts = processor.batch_decode(
      generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]),
      { skip_special_tokens: true },
    );
    console.log(generated_texts[0]);
    // "I can transcribe the audio for you. Here's the transcription:\n\n\"I have a dream that one day this nation will rise up and live out the true meaning of its creed.\"\n\n- Martin Luther King Jr.\n\nWould you like me to provide the transcription in a specific format (e.g., word-for-word, character-for-character, or a specific font)?"
    </details>
  • DAC and Mimi for audio tokenization/neural audio codecs (https://github.com/huggingface/transformers.js/pull/1215). See here for the list of supported DAC models and here for the list of supported Mimi models.

    <details> <summary> See example usage </summary>

    DAC:

    import { DacModel, AutoFeatureExtractor } from '@huggingface/transformers';
    
    const model_id = "onnx-community/dac_16khz-ONNX";
    const model = await DacModel.from_pretrained(model_id);
    const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);
    
    const audio_sample = new Float32Array(12000);
    
    // pre-process the inputs
    const inputs = await feature_extractor(audio_sample);
    {
        // explicitly encode then decode the audio inputs
        const encoder_outputs = await model.encode(inputs);
        const { audio_values } = await model.decode(encoder_outputs);
        console.log(audio_values);
    }
    
    {
        // or the equivalent with a forward pass
        const { audio_values } = await model(inputs);
        console.log(audio_values);
    }

    Mimi:

    import { MimiModel, AutoFeatureExtractor } from '@huggingface/transformers';
    
    const model_id = "onnx-community/kyutai-mimi-ONNX";
    const model = await MimiModel.from_pretrained(model_id);
    const feature_extractor = await AutoFeatureExtractor.from_pretrained(model_id);
    
    const audio_sample = new Float32Array(12000);
    
    // pre-process the inputs
    const inputs = await feature_extractor(audio_sample);
    {
        // explicitly encode then decode the audio inputs
        const encoder_outputs = await model.encode(inputs);
        const { audio_values } = await model.decode(encoder_outputs);
        console.log(audio_values);
    }
    
    {
        // or the equivalent with a forward pass
        const { audio_values } = await model(inputs);
        console.log(audio_values);
    }
    </details>
  • SmolVLM2, a lightweight multimodal model designed to analyze image and video content (https://github.com/huggingface/transformers.js/pull/1196). See here for the list of supported models. Usage is identical to SmolVLM.

  • LiteWhisper for automatic speech recognition (https://github.com/huggingface/transformers.js/pull/1219). See here for the list of supported models. Usage is identical to Whisper.

<h2 id="other-improvements">🛠️ Other improvements</h2> <h2 id="new-contributors">🤗 New contributors</h2>

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.3.3...3.4.0

Previous123Next
Latest
4.0.0
Tracking Since
May 15, 2023
Last fetched Apr 18, 2026