This release adds support for a bunch of new model architectures, covering a wide range of use cases! In total, we now support 73 different model architectures!
Example: Image matting w/ Xenova/vitmatte-small-distinctions-646.
import { AutoProcessor, VitMatteForImageMatting, RawImage } from '@xenova/transformers';
// Load processor and model
const processor = await AutoProcessor.from_pretrained('Xenova/vitmatte-small-distinctions-646');
const model = await VitMatteForImageMatting.from_pretrained('Xenova/vitmatte-small-distinctions-646');
// Load image and trimap
const image = await RawImage.fromURL('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/vitmatte_image.png');
const trimap = await RawImage.fromURL('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/vitmatte_trimap.png');
// Prepare image + trimap for the model
const inputs = await processor(image, trimap);
// Predict alpha matte
const { alphas } = await model(inputs);
// Tensor {
// dims: [ 1, 1, 640, 960 ],
// type: 'float32',
// size: 614400,
// data: Float32Array(614400) [ 0.9894027709960938, 0.9970508813858032, ... ]
// }
<details>
<summary>Visualization code</summary>
import { Tensor, cat } from '@xenova/transformers';
// Visualize predicted alpha matte
const imageTensor = new Tensor(
'uint8',
new Uint8Array(image.data),
[image.height, image.width, image.channels]
).transpose(2, 0, 1);
// Convert float (0-1) alpha matte to uint8 (0-255)
const alphaChannel = alphas
.squeeze(0)
.mul_(255)
.clamp_(0, 255)
.round_()
.to('uint8');
// Concatenate original image with predicted alpha
const imageData = cat([imageTensor, alphaChannel], 0);
// Save output image
const outputImage = RawImage.fromTensor(imageData);
outputImage.save('output.png');
</details>
Inputs:
| Image | Trimap |
|---|---|
Outputs:
| Quantized | Unquantized |
|---|---|
Example: Protein sequence classification w/ Xenova/esm2_t6_8M_UR50D_sequence_classifier_v1.
import { pipeline } from '@xenova/transformers';
// Create text classification pipeline
const classifier = await pipeline('text-classification', 'Xenova/esm2_t6_8M_UR50D_sequence_classifier_v1');
// Suppose these are your new sequences that you want to classify
// Additional Family 0: Enzymes
const new_sequences_0 = [ 'ACGYLKTPKLADPPVLRGDSSVTKAICKPDPVLEK', 'GVALDECKALDYLPGKPLPMDGKVCQCGSKTPLRP', 'VLPGYTCGELDCKPGKPLPKCGADKTQVATPFLRG', 'TCGALVQYPSCADPPVLRGSDSSVKACKKLDPQDK', 'GALCEECKLCPGADYKPMDGDRLPAAATSKTRPVG', 'PAVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYG', 'VLGYTCGALDCKPGKPLPKCGADKTQVATPFLRGA', 'CGALVQYPSCADPPVLRGSDSSVKACKKLDPQDKT', 'ALCEECKLCPGADYKPMDGDRLPAAATSKTRPVGK', 'AVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYGR' ]
// Additional Family 1: Receptor Proteins
const new_sequences_1 = [ 'VGQRFYGGRQKNRHCELSPLPSACRGSVQGALYTD', 'KDQVLTVPTYACRCCPKMDSKGRVPSTLRVKSARS', 'PLAGVACGRGLDYRCPRKMVPGDLQVTPATQRPYG', 'CGVRLGYPGCADVPLRGRSSFAPRACMKKDPRVTR', 'RKGVAYLYECRKLRCRADYKPRGMDGRRLPKASTT', 'RPTGAVNCKQAKVYRGLPLPMMGKVPRVCRSRRPY', 'RLDGGYTCGQALDCKPGRKPPKMGCADLKSTVATP', 'LGTCRKLVRYPQCADPPVMGRSSFRPKACCRQDPV', 'RVGYAMCSPKLCSCRADYKPPMGDGDRLPKAATSK', 'QPKAVNCRKAMVYRPKPLPMDKGVPVCRSKRPRPY' ]
// Additional Family 2: Structural Proteins
const new_sequences_2 = [ 'VGKGFRYGSSQKRYLHCQKSALPPSCRRGKGQGSAT', 'KDPTVMTVGTYSCQCPKQDSRGSVQPTSRVKTSRSK', 'PLVGKACGRSSDYKCPGQMVSGGSKQTPASQRPSYD', 'CGKKLVGYPSSKADVPLQGRSSFSPKACKKDPQMTS', 'RKGVASLYCSSKLSCKAQYSKGMSDGRSPKASSTTS', 'RPKSAASCEQAKSYRSLSLPSMKGKVPSKCSRSKRP', 'RSDVSYTSCSQSKDCKPSKPPKMSGSKDSSTVATPS', 'LSTCSKKVAYPSSKADPPSSGRSSFSMKACKKQDPPV', 'RVGSASSEPKSSCSVQSYSKPSMSGDSSPKASSTSK', 'QPSASNCEKMSSYRPSLPSMSKGVPSSRSKSSPPYQ' ]
// Merge all sequences
const new_sequences = [...new_sequences_0, ...new_sequences_1, ...new_sequences_2];
// Get the predicted class for each sequence
const predictions = await classifier(new_sequences);
// Output the predicted class for each sequence
for (let i = 0; i < predictions.length; ++i) {
console.log(`Sequence: ${new_sequences[i]}, Predicted class: '${predictions[i].label}'`)
}
// Sequence: ACGYLKTPKLADPPVLRGDSSVTKAICKPDPVLEK, Predicted class: 'Enzymes'
// ... (truncated)
// Sequence: AVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYGR, Predicted class: 'Enzymes'
// Sequence: VGQRFYGGRQKNRHCELSPLPSACRGSVQGALYTD, Predicted class: 'Receptor Proteins'
// ... (truncated)
// Sequence: QPKAVNCRKAMVYRPKPLPMDKGVPVCRSKRPRPY, Predicted class: 'Receptor Proteins'
// Sequence: VGKGFRYGSSQKRYLHCQKSALPPSCRRGKGQGSAT, Predicted class: 'Structural Proteins'
// ... (truncated)
// Sequence: QPSASNCEKMSSYRPSLPSMSKGVPSSRSKSSPPYQ, Predicted class: 'Structural Proteins'
Example: Speech command recognition w/ Xenova/hubert-base-superb-ks.
import { pipeline } from '@xenova/transformers';
// Create audio classification pipeline
const classifier = await pipeline('audio-classification', 'Xenova/hubert-base-superb-ks');
// Classify audio
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speech-commands_down.wav';
const output = await classifier(url, { topk: 5 });
// [
// { label: 'down', score: 0.9954305291175842 },
// { label: 'go', score: 0.004518700763583183 },
// { label: '_unknown_', score: 0.00005029444946558215 },
// { label: 'no', score: 4.877569494965428e-7 },
// { label: 'stop', score: 5.504634081887616e-9 }
// ]
Example: Perform automatic speech recognition w/ Xenova/hubert-large-ls960-ft.
import { pipeline } from '@xenova/transformers';
// Create automatic speech recognition pipeline
const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/hubert-large-ls960-ft');
// Transcribe audio
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const output = await transcriber(url);
// { text: 'AND SO MY FELLOW AMERICA ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY' }
Example: Zero-shot image classification w/ Xenova/hubert-large-ls960-ft.
import { pipeline } from '@xenova/transformers';
// Create zero-shot image classification pipeline
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/chinese-clip-vit-base-patch16');
// Set image url and candidate labels
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/pikachu.png';
const candidate_labels = ['杰尼龟', '妙蛙种子', '小火龙', '皮卡丘'] // Squirtle, Bulbasaur, Charmander, Pikachu in Chinese
// Classify image
const output = await classifier(url, candidate_labels);
console.log(output);
// [
// { score: 0.9926728010177612, label: '皮卡丘' }, // Pikachu
// { score: 0.003480620216578245, label: '妙蛙种子' }, // Bulbasaur
// { score: 0.001942147733643651, label: '杰尼龟' }, // Squirtle
// { score: 0.0019044597866013646, label: '小火龙' } // Charmander
// ]
Example: Image classification w/ Xenova/dinov2-small-imagenet1k-1-layer.
import { pipeline} from '@xenova/transformers';
// Create image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/dinov2-small-imagenet1k-1-layer');
// Classify an image
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url);
console.log(output)
// [{ label: 'tabby, tabby cat', score: 0.8088238835334778 }]
Example: Feature extraction w/ Xenova/conv-bert-small.
import { pipeline } from '@xenova/transformers';
// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/conv-bert-small');
// Perform feature extraction
const output = await extractor('This is a test sentence.');
console.log(output)
// Tensor {
// dims: [ 1, 8, 256 ],
// type: 'float32',
// data: Float32Array(2048) [ -0.09434918314218521, 0.5715903043746948, ... ],
// size: 2048
// }
Example: Feature extraction w/ Xenova/electra-small-discriminator.
import { pipeline } from '@xenova/transformers';
// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/electra-small-discriminator');
// Perform feature extraction
const output = await extractor('This is a test sentence.');
console.log(output)
// Tensor {
// dims: [ 1, 8, 256 ],
// type: 'float32',
// data: Float32Array(2048) [ 0.5410046577453613, 0.18386700749397278, ... ],
// size: 2048
// }
NOTE: This only adds support for the architecture. When the external data format is supported in ONNX Runtime, we will make an update that includes converted versions of the available Phi models.
In the last release, we added support for CLAP models (CLIP but for audio), so in this one, we're releasing a simple demo application which shows how you can use a CLAP model to perform real-time semantic music search! For simplicity, we implemented everything in vanilla JavaScript, but feel free to adapt it to your framework of choice! As always, the source code is open source! 🥳 PR: https://github.com/xenova/transformers.js/pull/442
https://github.com/xenova/transformers.js/assets/26504141/72e09f8c-d6e9-4430-a56c-7994737966db
SpeechT5ForSpeechToText in https://github.com/xenova/transformers.js/pull/438Full Changelog: https://github.com/xenova/transformers.js/compare/2.10.1...2.11.0
Fetched April 7, 2026