Add support for visualizing self-attention heatmaps in https://github.com/huggingface/transformers.js/pull/1117
<table> <tr> <td rowspan="2"> <img src="https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg" alt="Cats" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/928c3d97-2c67-4ddb-9e9c-2a06745a532f" alt="Attention Head 0" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/e7725424-10fd-4a47-8350-8f367d21657d" alt="Attention Head 1" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/81790060-f4bf-4e5c-8d35-a9246acb9a36" alt="Attention Head 2" width="200"> </td> </tr> <tr> <td> <img src="https://github.com/user-attachments/assets/ebe44550-8a40-4e17-84eb-75fe6fce5df5" alt="Attention Head 3" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/32439d8d-7798-40e2-a4aa-d0e109afe1b5" alt="Attention Head 4" width="200"> </td> <td> <img src="https://github.com/user-attachments/assets/2faff471-fba1-4456-8332-e66a4a05bc5d" alt="Attention Head 5" width="200"> </td> </tr> </table> <details> <summary>Example code</summary>import { AutoProcessor, AutoModelForImageClassification, interpolate_4d, RawImage } from "@huggingface/transformers";
// Load model and processor
const model_id = "onnx-community/dinov2-with-registers-small-with-attentions";
const model = await AutoModelForImageClassification.from_pretrained(model_id);
const processor = await AutoProcessor.from_pretrained(model_id);
// Load image from URL
const image = await RawImage.read("https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg");
// Pre-process image
const inputs = await processor(image);
// Perform inference
const { logits, attentions } = await model(inputs);
// Get the predicted class
const cls = logits[0].argmax().item();
const label = model.config.id2label[cls];
console.log(`Predicted class: ${label}`);
// Set config values
const patch_size = model.config.patch_size;
const [width, height] = inputs.pixel_values.dims.slice(-2);
const w_featmap = Math.floor(width / patch_size);
const h_featmap = Math.floor(height / patch_size);
const num_heads = model.config.num_attention_heads;
const num_cls_tokens = 1;
const num_register_tokens = model.config.num_register_tokens ?? 0;
// Visualize attention maps
const selected_attentions = attentions
.at(-1) // we are only interested in the attention maps of the last layer
.slice(0, null, 0, [num_cls_tokens + num_register_tokens, null])
.view(num_heads, 1, w_featmap, h_featmap);
const upscaled = await interpolate_4d(selected_attentions, {
size: [width, height],
mode: "nearest",
});
for (let i = 0; i < num_heads; ++i) {
const head_attentions = upscaled[i];
const minval = head_attentions.min().item();
const maxval = head_attentions.max().item();
const image = RawImage.fromTensor(
head_attentions
.sub_(minval)
.div_(maxval - minval)
.mul_(255)
.to("uint8"),
);
await image.save(`attn-head-${i}.png`);
}
</details>
Add min, max, argmin, argmax tensor ops for dim=null
Add support for nearest-neighbour interpolation in interpolate_4d
Depth Estimation pipeline improvements (faster & returns resized depth map)
TypeScript improvements by @ocavue and @shrirajh in https://github.com/huggingface/transformers.js/pull/1081 and https://github.com/huggingface/transformers.js/pull/1122
Remove unused imports from tokenizers.js by @pratapvardhan in https://github.com/huggingface/transformers.js/pull/1116
Full Changelog: https://github.com/huggingface/transformers.js/compare/3.2.3...3.2.4
Fetched April 7, 2026