3.2.4 — Transformers.js

What's new?

Add support for visualizing self-attention heatmaps in https://github.com/huggingface/transformers.js/pull/1117

<table> <tr> <td rowspan="2"> <img src="https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg" alt="Cats" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/928c3d97-2c67-4ddb-9e9c-2a06745a532f" alt="Attention Head 0" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/e7725424-10fd-4a47-8350-8f367d21657d" alt="Attention Head 1" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/81790060-f4bf-4e5c-8d35-a9246acb9a36" alt="Attention Head 2" width="200"> </td> </tr> <tr> <td> <img src="https://github.com/user-attachments/assets/ebe44550-8a40-4e17-84eb-75fe6fce5df5" alt="Attention Head 3" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/32439d8d-7798-40e2-a4aa-d0e109afe1b5" alt="Attention Head 4" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/2faff471-fba1-4456-8332-e66a4a05bc5d" alt="Attention Head 5" width="200"> </td> </tr> </table> <details> <summary>Example code</summary>

import { AutoProcessor, AutoModelForImageClassification, interpolate_4d, RawImage } from "@huggingface/transformers";

// Load model and processor
const model_id = "onnx-community/dinov2-with-registers-small-with-attentions";
const model = await AutoModelForImageClassification.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);

// Load image from URL
const image = await RawImage.read("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg");

// Pre-process image
const inputs = await processor(image);

// Perform inference
const { logits, attentions } = await model(inputs);

// Get the predicted class
const cls = logits[0].argmax().item();
const label = model.config.id2label[cls];
console.log(`Predicted class: ${label}`);

// Set config values
const patch_size = model.config.patch_size;
const [width, height] = inputs.pixel_values.dims.slice(-2);
const w_featmap = Math.floor(width / patch_size);
const h_featmap = Math.floor(height / patch_size);
const num_heads = model.config.num_attention_heads;
const num_cls_tokens = 1;
const num_register_tokens = model.config.num_register_tokens ?? 0;

// Visualize attention maps
const selected_attentions = attentions
    .at(-1) // we are only interested in the attention maps of the last layer
    .slice(0, null, 0, [num_cls_tokens + num_register_tokens, null])
    .view(num_heads, 1, w_featmap, h_featmap);

const upscaled = await interpolate_4d(selected_attentions, {
    size: [width, height],
    mode: "nearest",
});

for (let i = 0; i < num_heads; ++i) {
    const head_attentions = upscaled[i];
    const minval = head_attentions.min().item();
    const maxval = head_attentions.max().item();
    const image = RawImage.fromTensor(
        head_attentions
            .sub_(minval)
            .div_(maxval - minval)
            .mul_(255)
            .to("uint8"),
    );
    await image.save(`attn-head-${i}.png`);
}

</details>

Add min, max, argmin, argmax tensor ops for dim=null
Add support for nearest-neighbour interpolation in interpolate_4d
Depth Estimation pipeline improvements (faster & returns resized depth map)
TypeScript improvements by @ocavue and @shrirajh in https://github.com/huggingface/transformers.js/pull/1081 and https://github.com/huggingface/transformers.js/pull/1122
Remove unused imports from tokenizers.js by @pratapvardhan in https://github.com/huggingface/transformers.js/pull/1116

New Contributors

@shrirajh made their first contribution in https://github.com/huggingface/transformers.js/pull/1122
@pratapvardhan made their first contribution in https://github.com/huggingface/transformers.js/pull/1116

Full Changelog: https://github.com/huggingface/transformers.js/compare/3.2.3...3.2.4