In this release, we've added a ton of new architectures: BLOOM, MPT, BeiT, CamemBERT, CodeLlama, GPT NeoX, GPT-J, HerBERT, mBART, mBART-50, OPT, ResNet, WavLM, and XLM. This brings the total number of supported architectures up to 46! Here's some example code to help you get started:
Text-generation with MPT (models):
import { pipeline } from '@xenova/transformers';
const generator = await pipeline('text-generation', 'Xenova/ipt-350m', {
quantized: false, // using unquantized to ensure it matches python version
});
const output = await generator('La nostra azienda');
// { generated_text: "La nostra azienda è specializzata nella vendita di prodotti per l'igiene orale e per la salute." }
Other text-generation models: BLOOM, GPT-NeoX, CodeLlama, GPT-J, OPT.
CamemBERT for masked language modelling, text classification, token classification, question answering, and feature extraction (models). For example:
import { pipeline } from '@xenova/transformers';
let pipe = await pipeline('token-classification', 'Xenova/camembert-ner-with-dates');
let output = await pipe("Je m'appelle jean-baptiste et j'habite à montréal depuis fevr 2012");
// [
// { entity: 'I-PER', score: 0.9258053302764893, index: 5, word: 'jean' },
// { entity: 'I-PER', score: 0.9048717617988586, index: 6, word: '-' },
// { entity: 'I-PER', score: 0.9227054119110107, index: 7, word: 'ba' },
// { entity: 'I-PER', score: 0.9385354518890381, index: 8, word: 'pt' },
// { entity: 'I-PER', score: 0.9139659404754639, index: 9, word: 'iste' },
// { entity: 'I-LOC', score: 0.9877734780311584, index: 15, word: 'montré' },
// { entity: 'I-LOC', score: 0.9891639351844788, index: 16, word: 'al' },
// { entity: 'I-DATE', score: 0.9858269691467285, index: 18, word: 'fe' },
// { entity: 'I-DATE', score: 0.9780661463737488, index: 19, word: 'vr' },
// { entity: 'I-DATE', score: 0.980688214302063, index: 20, word: '2012' }
// ]
WavLM for feature-extraction (models). For example:
import { AutoProcessor, AutoModel, read_audio } from '@xenova/transformers';
// Read and preprocess audio
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base');
const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav', 16000);
const inputs = await processor(audio);
// Run model with inputs
const model = await AutoModel.from_pretrained('Xenova/wavlm-base');
const output = await model(inputs);
// {
// last_hidden_state: Tensor {
// dims: [ 1, 549, 768 ],
// type: 'float32',
// data: Float32Array(421632) [-0.349443256855011, -0.39341306686401367, 0.022836603224277496, ...],
// size: 421632
// }
// }
MBart +MBart50 for multilingual translation (models). For example:
import { pipeline } from '@xenova/transformers';
let translator = await pipeline('translation', 'Xenova/mbart-large-50-many-to-many-mmt');
let output = await translator('संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है', {
src_lang: 'hi_IN', // Hindi
tgt_lang: 'fr_XX', // French
});
// [{ translation_text: 'Le chef des Nations affirme qu 'il n 'y a military solution in Syria.' }]
See here for the full list of languages and their corresponding codes.
BeiT for image classification (models):
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let pipe = await pipeline('image-classification', 'Xenova/beit-base-patch16-224');
let output = await pipe(url);
// [{ label: 'tiger, Panthera tigris', score: 0.7168469429016113 }]
ResNet for image classification (models):
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let pipe = await pipeline('image-classification', 'Xenova/resnet-50');
let output = await pipe(url);
// [{ label: 'tiger, Panthera tigris', score: 0.7576608061790466 }]
To get started with these new architectures (and expand coverage for other models), we're releasing over 150 new models on the Hugging Face Hub! Check out the full list here.
Thanks to a recent update of 🤗 Optimum, we were able to remove duplicate weights across various models. In some cases, like whisper-tiny's decoder, this resulted in a 40% reduction in size! Here are some improvements we saw:
Play around with some of the smaller whisper models (for automatic speech recognition) here!
Transformers.js integration with LangChain JS (docs)
import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";
const model = new HuggingFaceTransformersEmbeddings({
modelName: "Xenova/all-MiniLM-L6-v2",
});
/* Embed queries */
const res = await model.embedQuery(
"What would be a good company name for a company that makes colorful socks?"
);
console.log({ res });
/* Embed documents */
const documentRes = await model.embedDocuments(["Hello world", "Bye bye"]);
console.log({ documentRes });
Refactored PreTrainedModel to require significantly less code when adding new models
Typing improvements by @kungfooman
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/swin-base-patch4-window7-224-in22k');
let output = await classifier(url, { topk: null });
// [
// { label: 'Bengal_tiger', score: 0.2258443683385849 },
// { label: 'tiger, Panthera_tigris', score: 0.21161635220050812 },
// { label: 'predator, predatory_animal', score: 0.09135803580284119 },
// { label: 'tigress', score: 0.08038495481014252 },
// // ... 21838 more items
// ]
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/deit-tiny-distilled-patch16-224');
let output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.9804046154022217 }]
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let detector = await pipeline('object-detection', 'Xenova/yolos-small-300');
let output = await detector(url);
// [
// { label: 'remote', score: 0.9837935566902161, box: { xmin: 331, ymin: 80, xmax: 367, ymax: 192 } },
// { label: 'cat', score: 0.94994056224823, box: { xmin: 8, ymin: 57, xmax: 316, ymax: 470 } },
// { label: 'couch', score: 0.9843178987503052, box: { xmin: 0, ymin: 0, xmax: 639, ymax: 474 } },
// { label: 'remote', score: 0.9704685211181641, box: { xmin: 39, ymin: 71, xmax: 179, ymax: 114 } },
// { label: 'cat', score: 0.9921762943267822, box: { xmin: 339, ymin: 17, xmax: 642, ymax: 380 } }
// ]
Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.3...2.5.4
Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.2...2.5.3
audio-classification with MMS and Wav2Vec2 in https://github.com/xenova/transformers.js/pull/220. Example usage:
// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';
// Create audio classification pipeline
let classifier = await pipeline('audio-classification', 'Xenova/mms-lid-4017');
// Run inference
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jeanNL.wav';
let output = await classifier(url);
// [
// { label: 'fra', score: 0.9995712041854858 },
// { label: 'hat', score: 0.00003788191679632291 },
// { label: 'lin', score: 0.00002646935718075838 },
// { label: 'hun', score: 0.000015628289474989288 },
// { label: 'bre', score: 0.000007014674793026643 }
// ]
automatic-speech-recognition for Wav2Vec2 models in https://github.com/xenova/transformers.js/pull/220 (MMS coming soon).Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.1...2.5.2
Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.0...2.5.1
You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a demo application for semantic image search to showcase this functionality.
Example: Compute text embeddings with CLIPTextModelWithProjection.
import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';
// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
// dims: [ 2, 512 ],
// type: 'float32',
// data: Float32Array(1024) [ ... ],
// size: 1024
// }
Example: Compute vision embeddings with CLIPVisionModelWithProjection.
import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from '@xenova/transformers';
// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
// dims: [ 1, 512 ],
// type: 'float32',
// data: Float32Array(512) [ ... ],
// size: 512
// }
We've updated the source code for our example browser extension, making the following improvements:
RawImage.save()New model: StarCoder (Xenova/starcoderbase-1b and Xenova/tiny_starcoder_py)
In-browser code completion example application (demo and source code)
Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.3...2.4.4
Example next.js applications in https://github.com/xenova/transformers.js/pull/211
Demo: client-side or server-side
Source code: client-side or server-side
<img src="https://github.com/xenova/transformers.js/assets/26504141/d979e1ce-4235-47d7-95fc-1900c984b641" width=600>Add support for mpnet models by @xenova in https://github.com/xenova/transformers.js/pull/221
Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.2...2.4.3
Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.1...2.4.2
Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.0...2.4.1
This release adds the ability to predict word-level timestamps for our whisper automatic-speech-recognition models by analyzing the cross-attentions and applying dynamic time warping. Our implementation is adapted from this PR, which added this functionality to the 🤗 transformers Python library.
Example usage: (see docs)
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
revision: 'output_attentions',
});
let output = await transcriber(url, { return_timestamps: 'word' });
// {
// "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
// "chunks": [
// { "text": " And", "timestamp": [0, 0.78] },
// { "text": " so", "timestamp": [0.78, 1.06] },
// { "text": " my", "timestamp": [1.06, 1.46] },
// ...
// { "text": " for", "timestamp": [9.72, 9.92] },
// { "text": " your", "timestamp": [9.92, 10.22] },
// { "text": " country.", "timestamp": [10.22, 13.5] }
// ]
// }
Note: For now, you need to choose the output_attentions revision (see above). In future, we may merge these models into the main branch. Also, we currently do not have exports for the medium and large models, simply because I don't have enough RAM to do the export myself (>25GB needed) 😅 ... so, if you would like to use our conversion script to do the conversion yourself, please make a PR on the hub with these new models (under a new output_attentions branch)!
From our testing, the JS implementation exactly matches the output produced by the Python implementation (when using the same model of course)! 🥳
Python (left) vs. JavaScript (right)
<details> <summary>surprise me</summary> <br> </details>I'm excited to see what you all build with this! Please tag me on twitter if you use it in your project - I'd love to see! I'm also planning on adding this as an option to whisper-web, so stay tuned! 🚀
MobileViT for image classificationRoberta for token classification (thanks @julien-c)XLMRoberta for masked language modelling, sequence classification, token classification, and question answeringFalconTokenizer, GPTNeoXTokenizer.generate() function output with original python implementationFull Changelog: https://github.com/xenova/transformers.js/compare/2.3.0...2.3.1
All Transformers.js-compatible models are now displayed with a super cool tag! To indicate your model is compatible with the library, simply add the "transformers.js" library tag in your README (example).
This also means you can now search for and filter these models by task!
For example,
feature-extraction pipeline!And lastly, clicking the "Use in Transformers.js" button will show some sample code for how to use the model!
You can now use all Transformers.js-compatible feature-extraction models for embeddings computation directly in Chroma! For example:
const {ChromaClient, TransformersEmbeddingFunction} = require('chromadb');
const client = new ChromaClient();
// Create the embedder. In this case, I just use the defaults, but you can change the model,
// quantization, revision, or add a progress callback, if desired.
const embedder = new TransformersEmbeddingFunction({ /* Configuration goes here */ });
const main = async () => {
// Empties and completely resets the database.
await client.reset()
// Create the collection
const collection = await client.createCollection({name: "my_collection", embeddingFunction: embedder})
// Add some data to the collection
await collection.add({
ids: ["id1", "id2", "id3"],
metadatas: [{"source": "my_source"}, {"source": "my_source"}, {"source": "my_source"}],
documents: ["I love walking my dog", "This is another document", "This is a legal document"],
})
// Query the collection
const results = await collection.query({
nResults: 2,
queryTexts: ["This is a query document"]
})
console.log(results)
// {
// ids: [ [ 'id2', 'id3' ] ],
// embeddings: null,
// documents: [ [ 'This is another document', 'This is a legal document' ] ],
// metadatas: [ [ [Object], [Object] ] ],
// distances: [ [ 1.0109775066375732, 1.0756263732910156 ] ]
// }
}
main();
Other links:
You can now call decoder-only models loaded via AutoModel.from_pretrained(...):
import { AutoModel, AutoTokenizer } from '@xenova/transformers';
// Choose model to use
let model_id = "Xenova/gpt2";
// Load model and tokenizer
let tokenizer = await AutoTokenizer.from_pretrained(model_id);
let model = await AutoModel.from_pretrained(model_id);
// Tokenize text and call
let model_inputs = await tokenizer('Once upon a time');
let output = await model(model_inputs);
console.log(output);
// {
// logits: Tensor {
// dims: [ 1, 4, 50257 ],
// type: 'float32',
// data: Float32Array(201028) [
// -20.166624069213867, -19.662782669067383, -23.189680099487305,
// ...
// ],
// size: 201028
// },
// past_key_values: { ... }
// }
Examples for computing perplexity: https://github.com/xenova/transformers.js/issues/137#issuecomment-1595496161
We've updated the quantization parameters used for the pre-converted whisper models on the hub. You can test them out with whisper web! Thanks to @jozefchutka for reporting this issue.
Thanks to @jozefchutka for reporting this issue!
You can now transcribe and translate speech for over 100 different languages, directly in your browser, with Whisper! Play around with our demo application here.
Example: Transcribe English.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url);
// { text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country." }
Example: Transcribe English w/ timestamps.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url, { return_timestamps: true });
// {
// text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country."
// chunks: [
// { timestamp: [0, 8], text: " And so my fellow Americans ask not what your country can do for you" }
// { timestamp: [8, 11], text: " ask what you can do for your country." }
// ]
// }
Example: Transcribe French.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'transcribe' });
// { text: " J'adore, j'aime, je n'aime pas, je déteste." }
Example: Translate French to English.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'translate' });
// { text: " I love, I like, I don't like, I hate." }
.generate() function with original python implementationFull Changelog: https://github.com/xenova/transformers.js/compare/2.1.1...2.2.0
Minor patch for v2.1.0 to fix an issue with browser caching.
You can now perform feature extraction on models other than sentence-transformers! All you need to do is target a repo (and/or revision) that was exported with --task default. Also be sure to use the correct quantization for your use-case!
Example: Run feature extraction with bert-base-uncased (without pooling/normalization).
let extractor = await pipeline('feature-extraction', 'Xenova/bert-base-uncased', { revision: 'default' });
let result = await extractor('This is a simple test.');
console.log(result);
// Tensor {
// type: 'float32',
// data: Float32Array [0.05939924716949463, 0.021655935794115067, ...],
// dims: [1, 8, 768]
// }
Example: Run feature extraction with bert-base-uncased (with pooling/normalization).
let extractor = await pipeline('feature-extraction', 'Xenova/bert-base-uncased', { revision: 'default' });
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
// type: 'float32',
// data: Float32Array [0.03373778983950615, -0.010106077417731285, ...],
// dims: [1, 768]
// }
Example: Calculating embeddings with sentence-transformers models.
let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
// type: 'float32',
// data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
// dims: [1, 384]
// }
This also means you can do things like semantic search directly in JavaScript/Typescript! Check out the Pinecone docs for an example app which uses Transformers.js!
We now have 109 models to choose from! Check them out at https://huggingface.co/models?other=transformers.js! If you'd like to contribute models (exported with Optimum), you can tag them with library_name: "transformers.js"! Let's make ML more web-friendly!
Full Changelog: https://github.com/xenova/transformers.js/compare/2.0.2...2.1.0
Fixes issues stemming from ORT's recent release of a buggy version 1.15.0 🙄 (https://www.npmjs.com/package/onnxruntime-web)
Also freezes examples and updates links to use the latest stable wasm files.
It's finally here! 🔥
Run Hugging Face transformers directly in your browser, with no need for a server!
GitHub: https://github.com/xenova/transformers.js Demo site: https://xenova.github.io/transformers.js/ Documentation: https://huggingface.co/docs/transformers.js
🛠️ Complete ES6 rewrite 📄 Documentation and examples 🤗 Improved Hugging Face Hub integration 🖥️ Server-side model caching (in Node.js)
🧪 Improved testing framework w/ Jest ⚙️ CI/CD with GitHub actions
Same as https://github.com/xenova/transformers.js/releases/tag/2.0.0-alpha.3 with various improvements, including: