Add Background Removal demo in https://github.com/xenova/transformers.js/pull/576 (online demo).
Add support for owlv2 models in https://github.com/xenova/transformers.js/pull/579
Example: Zero-shot object detection w/ Xenova/owlv2-base-patch16-ensemble.
import { pipeline } from '@xenova/transformers';
const detector = await pipeline('zero-shot-object-detection', 'Xenova/owlv2-base-patch16-ensemble');
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const candidate_labels = ['a photo of a cat', 'a photo of a dog'];
const output = await detector(url, candidate_labels);
console.log(output);
// [
// { score: 0.7400985360145569, label: 'a photo of a cat', box: { xmin: 0, ymin: 50, xmax: 323, ymax: 485 } },
// { score: 0.6315087080001831, label: 'a photo of a cat', box: { xmin: 333, ymin: 23, xmax: 658, ymax: 378 } }
// ]
Add support for Adaptive Retrieval w/ Matryoshka Embeddings (nomic-ai/nomic-embed-text-v1.5) in https://github.com/xenova/transformers.js/pull/587 and https://github.com/xenova/transformers.js/pull/588 (online demo).
Add support for Gemma Tokenizer in https://github.com/xenova/transformers.js/pull/597 and https://github.com/xenova/transformers.js/pull/598
Full Changelog: https://github.com/xenova/transformers.js/compare/2.15.0...2.15.1
Yesterday, the Qwen team (Alibaba Group) released the Qwen1.5 series of chat models. As part of the release, they published several sub-2B-parameter models, including Qwen/Qwen1.5-0.5B-Chat and Qwen/Qwen1.5-1.8B-Chat, which both demonstrate strong performance despite their small sizes. The best part? They can run in the browser with Transformers.js (PR)! 🚀 See here for the full list of supported models.
Example: Text generation with Xenova/Qwen1.5-0.5B-Chat.
import { pipeline } from '@xenova/transformers';
// Create text-generation pipeline
const generator = await pipeline('text-generation', 'Xenova/Qwen1.5-0.5B-Chat');
// Define the prompt and list of messages
const prompt = "Give me a short introduction to large language model."
const messages = [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": prompt }
]
// Apply chat template
const text = generator.tokenizer.apply_chat_template(messages, {
tokenize: false,
add_generation_prompt: true,
});
// Generate text
const output = await generator(text, {
max_new_tokens: 128,
do_sample: false,
});
console.log(output[0].generated_text);
// 'A large language model is a type of artificial intelligence system that can generate text based on the input provided by users, such as books, articles, or websites. It uses advanced algorithms and techniques to learn from vast amounts of data and improve its performance over time through machine learning and natural language processing (NLP). Large language models have become increasingly popular in recent years due to their ability to handle complex tasks such as generating human-like text quickly and accurately. They have also been used in various fields such as customer service chatbots, virtual assistants, and search engines for information retrieval purposes.'
Next, we added support for MODNet, a small (but powerful) portrait image matting model (PR). Thanks to @cyio for the suggestion!
Example: Perform portrait image matting with Xenova/modnet.
import { AutoModel, AutoProcessor, RawImage } from '@xenova/transformers';
// Load model and processor
const model = await AutoModel.from_pretrained('Xenova/modnet', { quantized: false });
const processor = await AutoProcessor.from_pretrained('Xenova/modnet');
// Load image from URL
const url = 'https://images.pexels.com/photos/5965592/pexels-photo-5965592.jpeg?auto=compress&cs=tinysrgb&w=1024';
const image = await RawImage.fromURL(url);
// Pre-process image
const { pixel_values } = await processor(image);
// Predict alpha matte
const { output } = await model({ input: pixel_values });
// Save output mask
const mask = await RawImage.fromTensor(output[0].mul(255).to('uint8')).resize(image.width, image.height);
mask.save('mask.png');
| Input image | Output mask |
|---|---|
We also added support for several new text embedding models, including:
Check out the links for example usage.
jsdoc-to-markdown dev dependency (https://github.com/xenova/transformers.js/pull/574).Full Changelog: https://github.com/xenova/transformers.js/compare/2.14.2...2.15.0
Full Changelog: https://github.com/xenova/transformers.js/compare/2.14.1...2.14.2
Add support for Depth Anything (https://github.com/xenova/transformers.js/pull/534). See here for the list of available models.
Example: Depth estimation with Xenova/depth-anything-small-hf.
import { pipeline } from '@xenova/transformers';
// Create depth-estimation pipeline
const depth_estimator = await pipeline('depth-estimation', 'Xenova/depth-anything-small-hf');
// Predict depth map for the given image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/bread_small.png';
const output = await depth_estimator(url);
// {
// predicted_depth: Tensor {
// dims: [350, 518],
// type: 'float32',
// data: Float32Array(181300) [...],
// size: 181300
// },
// depth: RawImage {
// data: Uint8Array(271360) [...],
// width: 640,
// height: 424,
// channels: 1
// }
// }
You can visualize the output with:
output.depth.save('depth.png');
| Input image | Visualized output |
|---|---|
Online demo: https://huggingface.co/spaces/Xenova/depth-anything-web
Example video:
https://github.com/xenova/transformers.js/assets/26504141/bbac3db6-8d8f-4386-a212-7e66ca616a0d
Fix typo in tokenizers.js (https://github.com/xenova/transformers.js/pull/518)
Return empty tokens array if text is empty after normalization (https://github.com/xenova/transformers.js/pull/535)
Full Changelog: https://github.com/xenova/transformers.js/compare/2.14.0...2.14.1
The Segment Anything Model (SAM) can be used to generate segmentation masks for objects in a scene, given an input image and input points. See here for the full list of pre-converted models. Support for this model was added in https://github.com/xenova/transformers.js/pull/510.
Demo + source code: https://huggingface.co/spaces/Xenova/segment-anything-web
Example: Perform mask generation w/ Xenova/slimsam-77-uniform.
import { SamModel, AutoProcessor, RawImage } from '@xenova/transformers';
const model = await SamModel.from_pretrained('Xenova/slimsam-77-uniform');
const processor = await AutoProcessor.from_pretrained('Xenova/slimsam-77-uniform');
const img_url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/corgi.jpg';
const raw_image = await RawImage.read(img_url);
const input_points = [[[340, 250]]] // 2D localization of a window
const inputs = await processor(raw_image, input_points);
const outputs = await model(inputs);
const masks = await processor.post_process_masks(outputs.pred_masks, inputs.original_sizes, inputs.reshaped_input_sizes);
console.log(masks);
// [
// Tensor {
// dims: [ 1, 3, 410, 614 ],
// type: 'bool',
// data: Uint8Array(755220) [ ... ],
// size: 755220
// }
// ]
const scores = outputs.iou_scores;
console.log(scores);
// Tensor {
// dims: [ 1, 1, 3 ],
// type: 'float32',
// data: Float32Array(3) [
// 0.8350210189819336,
// 0.9786665439605713,
// 0.8379436731338501
// ],
// size: 3
// }
You can then visualize the 3 predicted masks with:
const image = RawImage.fromTensor(masks[0][0].mul(255));
image.save('mask.png');
| Input image | Visualized output |
|---|---|
Next, select the channel with the highest IoU score, which in this case is the second (green) channel. Intersecting this with the original image gives us an isolated version of the subject:
| Selected Mask | Intersected |
|---|---|
ConvNextFeatureExtractor in https://github.com/xenova/transformers.js/pull/503Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.4...2.14.0
Add support for cross-encoder models (+fix token type ids) (#501)
Example: Information Retrieval w/ Xenova/ms-marco-TinyBERT-L-2-v2.
import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';
const model = await AutoModelForSequenceClassification.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2');
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/ms-marco-TinyBERT-L-2-v2');
const features = tokenizer(
['How many people live in Berlin?', 'How many people live in Berlin?'],
{
text_pair: [
'Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.',
'New York City is famous for the Metropolitan Museum of Art.',
],
padding: true,
truncation: true,
}
)
const { logits } = await model(features)
console.log(logits.data);
// quantized: [ 7.210887908935547, -11.559350967407227 ]
// unquantized: [ 7.235750675201416, -11.562294006347656 ]
Check out the list of pre-converted models here. We also put out a demo for you to try out.
Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.3...2.13.4
Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.2...2.13.3
This release is a follow-up to #485, with additional intellisense-focused improvements (see PR).
Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.1...2.13.2
Improve typing of pipeline function in https://github.com/xenova/transformers.js/pull/485. Thanks to @wesbos for the suggestion!
This also means when you hover over the class name, you'll get example code to help you out.
Add phi-1_5 model in https://github.com/xenova/transformers.js/pull/493.
import { pipeline } from '@xenova/transformers';
// Create a text-generation pipeline
const generator = await pipeline('text-generation', 'Xenova/phi-1_5_dev');
// Construct prompt
const prompt = `\`\`\`py
import math
def print_prime(n):
"""
Print all primes between 1 and n
"""`;
// Generate text
const result = await generator(prompt, {
max_new_tokens: 100,
});
console.log(result[0].generated_text);
Results in:
import math
def print_prime(n):
"""
Print all primes between 1 and n
"""
primes = []
for num in range(2, n+1):
is_prime = True
for i in range(2, int(math.sqrt(num))+1):
if num % i == 0:
is_prime = False
break
if is_prime:
primes.append(num)
print(primes)
print_prime(20)
Running the code produces the correct result:
[2, 3, 5, 7, 11, 13, 17, 19]
</details>
Full Changelog: https://github.com/xenova/transformers.js/compare/2.13.0...2.13.1
This release adds support for many new multimodal architectures, bringing the total number of supported architectures to 80! 🤯
import { pipeline } from '@xenova/transformers';
// Create English text-to-speech pipeline
const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng');
// Generate speech
const output = await synthesizer('I love transformers');
// {
// audio: Float32Array(26112) [...],
// sampling_rate: 16000
// }
https://github.com/xenova/transformers.js/assets/26504141/63c1a315-1ad6-44a2-9a2f-6689e2d9d14e
See here for the list of available models. To start, we've converted 12 of the ~1140 models on the Hugging Face Hub. If we haven't added the one you wish to use, you can make it web-ready using our conversion script.
import { AutoTokenizer, AutoProcessor, CLIPSegForImageSegmentation, RawImage } from '@xenova/transformers';
// Load tokenizer, processor, and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clipseg-rd64-refined');
const processor = await AutoProcessor.from_pretrained('Xenova/clipseg-rd64-refined');
const model = await CLIPSegForImageSegmentation.from_pretrained('Xenova/clipseg-rd64-refined');
// Run tokenization
const texts = ['a glass', 'something to fill', 'wood', 'a jar'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Read image and run processor
const image = await RawImage.read('https://github.com/timojl/clipseg/blob/master/example_image.jpg?raw=true');
const image_inputs = await processor(image);
// Run model with both text and pixel inputs
const { logits } = await model({ ...text_inputs, ...image_inputs });
// logits: Tensor {
// dims: [4, 352, 352],
// type: 'float32',
// data: Float32Array(495616)[ ... ],
// size: 495616
// }
You can visualize the predictions as follows:
const preds = logits
.unsqueeze_(1)
.sigmoid_()
.mul_(255)
.round_()
.to('uint8');
for (let i = 0; i < preds.dims[0]; ++i) {
const img = RawImage.fromTensor(preds[i]);
img.save(`prediction_${i}.png`);
}
| Original | "a glass" | "something to fill" | "wood" | "a jar" |
|---|---|---|---|---|
See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create an image segmentation pipeline
const segmenter = await pipeline('image-segmentation', 'Xenova/segformer_b2_clothes');
// Segment an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/young-man-standing-and-leaning-on-car.jpg';
const output = await segmenter(url);
<details>
<summary>See output</summary>
[
{
score: null,
label: 'Background',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Hair',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Upper-clothes',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Pants',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Left-shoe',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Right-shoe',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Face',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Left-leg',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Right-leg',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Left-arm',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
},
{
score: null,
label: 'Right-arm',
mask: RawImage {
data: [Uint8ClampedArray],
width: 970,
height: 1455,
channels: 1
}
}
]
</details>
See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create an object detection pipeline
const detector = await pipeline('object-detection', 'Xenova/table-transformer-detection', { quantized: false });
// Detect tables in an image
const img = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice-with-table.png';
const output = await detector(img);
// [{ score: 0.9967531561851501, label: 'table', box: { xmin: 52, ymin: 322, xmax: 546, ymax: 525 } }]
<details>
<summary>Show example output</summary>
</details>
See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create an image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/dit-base-finetuned-rvlcdip');
// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/coca_cola_advertisement.png';
const output = await classifier(url);
// [{ label: 'advertisement', score: 0.9035086035728455 }]
See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create a zero-shot image classification pipeline
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-base-patch16-224');
// Classify images according to provided labels
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
hypothesis_template: 'a photo of {}',
});
// [
// { score: 0.16770583391189575, label: '2 cats' },
// { score: 0.000022096000975579955, label: '2 dogs' }
// ]
See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create a masked language modelling pipeline
const pipe = await pipeline('fill-mask', 'Xenova/antiberta2');
// Predict missing token
const output = await pipe('Ḣ Q V Q ... C A [MASK] D ... T V S S');
<details>
<summary>See output</summary>
[
{
score: 0.48774364590644836,
token: 19,
token_str: 'R',
sequence: 'Ḣ Q V Q C A R D T V S S'
},
{
score: 0.2768442928791046,
token: 18,
token_str: 'Q',
sequence: 'Ḣ Q V Q C A Q D T V S S'
},
{
score: 0.0890476182103157,
token: 13,
token_str: 'K',
sequence: 'Ḣ Q V Q C A K D T V S S'
},
{
score: 0.05106702819466591,
token: 14,
token_str: 'L',
sequence: 'Ḣ Q V Q C A L D T V S S'
},
{
score: 0.021606773138046265,
token: 8,
token_str: 'E',
sequence: 'Ḣ Q V Q C A E D T V S S'
}
]
</details>
See here for the list of available models.
Full Changelog: https://github.com/xenova/transformers.js/compare/2.12.1...2.13.0
Patch for release 2.12.1, making @huggingface/jinja a dependency instead of a peer dependency. This also means apply_chat_template is now synchronous (and does not lazily load the module). In future, we may want to add this functionality, but for now, it causes issues with lazy loading from a CDN.
import { AutoTokenizer } from "@xenova/transformers";
// Load tokenizer from the Hugging Face Hub
const tokenizer = await AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1");
// Define chat messages
const chat = [
{ role: "user", content: "Hello, how are you?" },
{ role: "assistant", content: "I'm doing great. How can I help you today?" },
{ role: "user", content: "I'd like to show off how chat templating works!" },
]
const text = tokenizer.apply_chat_template(chat, { tokenize: false });
// "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
const input_ids = tokenizer.apply_chat_template(chat, { tokenize: true, return_tensor: false });
// [1, 733, 16289, 28793, 22557, 28725, 910, 460, 368, 28804, 733, 28748, 16289, 28793, ...]
</details>
Full Changelog: https://github.com/xenova/transformers.js/compare/2.12.0...2.12.1
This release adds support for chat templates, a highly-requested feature that enables users to convert conversations (represented as a list of chat objects) into a single tokenizable string, in the format that the model expects. As you may know, chat templates can vary greatly across model types, so it was important to design a system that: (1) supports complex chat templates; (2) is generalizable, and (3) is easy to use. So, how did we do it? 🤔
This is made possible with @huggingface/jinja, a minimalistic JavaScript implementation of the Jinja templating engine, that we created to align with how transformers handles templating. Although it was originally designed for parsing and rendering ChatML templates, we decided to separate out the templating logic into an external (optional) library due to its usefulness in other types of applications. Special thanks to @tlaceby for his amazing "Guide to Interpreters" series, which provided the basis for our implementation. 🤗
Anyway, let's take a look at an example:
import { AutoTokenizer } from "@xenova/transformers";
// Load tokenizer from the Hugging Face Hub
const tokenizer = await AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1");
// Define chat messages
const chat = [
{ role: "user", content: "Hello, how are you?" },
{ role: "assistant", content: "I'm doing great. How can I help you today?" },
{ role: "user", content: "I'd like to show off how chat templating works!" },
]
const text = tokenizer.apply_chat_template(chat, { tokenize: false });
// "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
Notice how the entire chat is condensed into a single string. If you would instead like to return the tokenized version (i.e., a list of token IDs), you can use the following:
const input_ids = tokenizer.apply_chat_template(chat, { tokenize: true, return_tensor: false });
// [1, 733, 16289, 28793, 22557, 28725, 910, 460, 368, 28804, 733, 28748, 16289, 28793, 28737, 28742, 28719, 2548, 1598, 28723, 1602, 541, 315, 1316, 368, 3154, 28804, 2, 28705, 733, 16289, 28793, 315, 28742, 28715, 737, 298, 1347, 805, 910, 10706, 5752, 1077, 3791, 28808, 733, 28748, 16289, 28793]
For more information about chat templates, check out the transformers documentation.
Incorrect encoding/decoding of whitespace around special characters with Fast Llama tokenizers. These bugs will also soon be fixed in the transformers library. For backwards compatibility reasons, if the tokenizer was exported with the legacy behaviour, it will still act in the same way unless explicitly set otherwise. Newer exports won't be affected. If you wish to override this default, to either still use the legacy behaviour (for backwards compatibility reasons), or to upgrade to the fixed version, you can do so with:
// Use the default behaviour (specified in tokenizer_config.json, which in the case is `{legacy: false}`).
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/llama2-tokenizer');
const { input_ids } = tokenizer('<s>\n', { add_special_tokens: false, return_tensor: false });
console.log(input_ids); // [1, 13]
// Use the legacy behaviour
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/llama2-tokenizer', { legacy: true });
const { input_ids } = tokenizer('<s>\n', { add_special_tokens: false, return_tensor: false });
console.log(input_ids); // [1, 29871, 13]
Strip whitespace around special tokens for wav2vec tokenizers.
Full Changelog: https://github.com/xenova/transformers.js/compare/2.11.0...2.12.0
This release adds support for a bunch of new model architectures, covering a wide range of use cases! In total, we now support 73 different model architectures!
Example: Image matting w/ Xenova/vitmatte-small-distinctions-646.
import { AutoProcessor, VitMatteForImageMatting, RawImage } from '@xenova/transformers';
// Load processor and model
const processor = await AutoProcessor.from_pretrained('Xenova/vitmatte-small-distinctions-646');
const model = await VitMatteForImageMatting.from_pretrained('Xenova/vitmatte-small-distinctions-646');
// Load image and trimap
const image = await RawImage.fromURL('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/vitmatte_image.png');
const trimap = await RawImage.fromURL('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/vitmatte_trimap.png');
// Prepare image + trimap for the model
const inputs = await processor(image, trimap);
// Predict alpha matte
const { alphas } = await model(inputs);
// Tensor {
// dims: [ 1, 1, 640, 960 ],
// type: 'float32',
// size: 614400,
// data: Float32Array(614400) [ 0.9894027709960938, 0.9970508813858032, ... ]
// }
<details>
<summary>Visualization code</summary>
import { Tensor, cat } from '@xenova/transformers';
// Visualize predicted alpha matte
const imageTensor = new Tensor(
'uint8',
new Uint8Array(image.data),
[image.height, image.width, image.channels]
).transpose(2, 0, 1);
// Convert float (0-1) alpha matte to uint8 (0-255)
const alphaChannel = alphas
.squeeze(0)
.mul_(255)
.clamp_(0, 255)
.round_()
.to('uint8');
// Concatenate original image with predicted alpha
const imageData = cat([imageTensor, alphaChannel], 0);
// Save output image
const outputImage = RawImage.fromTensor(imageData);
outputImage.save('output.png');
</details>
Inputs:
| Image | Trimap |
|---|---|
Outputs:
| Quantized | Unquantized |
|---|---|
Example: Protein sequence classification w/ Xenova/esm2_t6_8M_UR50D_sequence_classifier_v1.
import { pipeline } from '@xenova/transformers';
// Create text classification pipeline
const classifier = await pipeline('text-classification', 'Xenova/esm2_t6_8M_UR50D_sequence_classifier_v1');
// Suppose these are your new sequences that you want to classify
// Additional Family 0: Enzymes
const new_sequences_0 = [ 'ACGYLKTPKLADPPVLRGDSSVTKAICKPDPVLEK', 'GVALDECKALDYLPGKPLPMDGKVCQCGSKTPLRP', 'VLPGYTCGELDCKPGKPLPKCGADKTQVATPFLRG', 'TCGALVQYPSCADPPVLRGSDSSVKACKKLDPQDK', 'GALCEECKLCPGADYKPMDGDRLPAAATSKTRPVG', 'PAVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYG', 'VLGYTCGALDCKPGKPLPKCGADKTQVATPFLRGA', 'CGALVQYPSCADPPVLRGSDSSVKACKKLDPQDKT', 'ALCEECKLCPGADYKPMDGDRLPAAATSKTRPVGK', 'AVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYGR' ]
// Additional Family 1: Receptor Proteins
const new_sequences_1 = [ 'VGQRFYGGRQKNRHCELSPLPSACRGSVQGALYTD', 'KDQVLTVPTYACRCCPKMDSKGRVPSTLRVKSARS', 'PLAGVACGRGLDYRCPRKMVPGDLQVTPATQRPYG', 'CGVRLGYPGCADVPLRGRSSFAPRACMKKDPRVTR', 'RKGVAYLYECRKLRCRADYKPRGMDGRRLPKASTT', 'RPTGAVNCKQAKVYRGLPLPMMGKVPRVCRSRRPY', 'RLDGGYTCGQALDCKPGRKPPKMGCADLKSTVATP', 'LGTCRKLVRYPQCADPPVMGRSSFRPKACCRQDPV', 'RVGYAMCSPKLCSCRADYKPPMGDGDRLPKAATSK', 'QPKAVNCRKAMVYRPKPLPMDKGVPVCRSKRPRPY' ]
// Additional Family 2: Structural Proteins
const new_sequences_2 = [ 'VGKGFRYGSSQKRYLHCQKSALPPSCRRGKGQGSAT', 'KDPTVMTVGTYSCQCPKQDSRGSVQPTSRVKTSRSK', 'PLVGKACGRSSDYKCPGQMVSGGSKQTPASQRPSYD', 'CGKKLVGYPSSKADVPLQGRSSFSPKACKKDPQMTS', 'RKGVASLYCSSKLSCKAQYSKGMSDGRSPKASSTTS', 'RPKSAASCEQAKSYRSLSLPSMKGKVPSKCSRSKRP', 'RSDVSYTSCSQSKDCKPSKPPKMSGSKDSSTVATPS', 'LSTCSKKVAYPSSKADPPSSGRSSFSMKACKKQDPPV', 'RVGSASSEPKSSCSVQSYSKPSMSGDSSPKASSTSK', 'QPSASNCEKMSSYRPSLPSMSKGVPSSRSKSSPPYQ' ]
// Merge all sequences
const new_sequences = [...new_sequences_0, ...new_sequences_1, ...new_sequences_2];
// Get the predicted class for each sequence
const predictions = await classifier(new_sequences);
// Output the predicted class for each sequence
for (let i = 0; i < predictions.length; ++i) {
console.log(`Sequence: ${new_sequences[i]}, Predicted class: '${predictions[i].label}'`)
}
// Sequence: ACGYLKTPKLADPPVLRGDSSVTKAICKPDPVLEK, Predicted class: 'Enzymes'
// ... (truncated)
// Sequence: AVDCKKALVYLPKPLPMDGKVCRGSKTPKTRPYGR, Predicted class: 'Enzymes'
// Sequence: VGQRFYGGRQKNRHCELSPLPSACRGSVQGALYTD, Predicted class: 'Receptor Proteins'
// ... (truncated)
// Sequence: QPKAVNCRKAMVYRPKPLPMDKGVPVCRSKRPRPY, Predicted class: 'Receptor Proteins'
// Sequence: VGKGFRYGSSQKRYLHCQKSALPPSCRRGKGQGSAT, Predicted class: 'Structural Proteins'
// ... (truncated)
// Sequence: QPSASNCEKMSSYRPSLPSMSKGVPSSRSKSSPPYQ, Predicted class: 'Structural Proteins'
Example: Speech command recognition w/ Xenova/hubert-base-superb-ks.
import { pipeline } from '@xenova/transformers';
// Create audio classification pipeline
const classifier = await pipeline('audio-classification', 'Xenova/hubert-base-superb-ks');
// Classify audio
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speech-commands_down.wav';
const output = await classifier(url, { topk: 5 });
// [
// { label: 'down', score: 0.9954305291175842 },
// { label: 'go', score: 0.004518700763583183 },
// { label: '_unknown_', score: 0.00005029444946558215 },
// { label: 'no', score: 4.877569494965428e-7 },
// { label: 'stop', score: 5.504634081887616e-9 }
// ]
Example: Perform automatic speech recognition w/ Xenova/hubert-large-ls960-ft.
import { pipeline } from '@xenova/transformers';
// Create automatic speech recognition pipeline
const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/hubert-large-ls960-ft');
// Transcribe audio
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const output = await transcriber(url);
// { text: 'AND SO MY FELLOW AMERICA ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY' }
Example: Zero-shot image classification w/ Xenova/hubert-large-ls960-ft.
import { pipeline } from '@xenova/transformers';
// Create zero-shot image classification pipeline
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/chinese-clip-vit-base-patch16');
// Set image url and candidate labels
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/pikachu.png';
const candidate_labels = ['杰尼龟', '妙蛙种子', '小火龙', '皮卡丘'] // Squirtle, Bulbasaur, Charmander, Pikachu in Chinese
// Classify image
const output = await classifier(url, candidate_labels);
console.log(output);
// [
// { score: 0.9926728010177612, label: '皮卡丘' }, // Pikachu
// { score: 0.003480620216578245, label: '妙蛙种子' }, // Bulbasaur
// { score: 0.001942147733643651, label: '杰尼龟' }, // Squirtle
// { score: 0.0019044597866013646, label: '小火龙' } // Charmander
// ]
Example: Image classification w/ Xenova/dinov2-small-imagenet1k-1-layer.
import { pipeline} from '@xenova/transformers';
// Create image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/dinov2-small-imagenet1k-1-layer');
// Classify an image
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url);
console.log(output)
// [{ label: 'tabby, tabby cat', score: 0.8088238835334778 }]
Example: Feature extraction w/ Xenova/conv-bert-small.
import { pipeline } from '@xenova/transformers';
// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/conv-bert-small');
// Perform feature extraction
const output = await extractor('This is a test sentence.');
console.log(output)
// Tensor {
// dims: [ 1, 8, 256 ],
// type: 'float32',
// data: Float32Array(2048) [ -0.09434918314218521, 0.5715903043746948, ... ],
// size: 2048
// }
Example: Feature extraction w/ Xenova/electra-small-discriminator.
import { pipeline } from '@xenova/transformers';
// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/electra-small-discriminator');
// Perform feature extraction
const output = await extractor('This is a test sentence.');
console.log(output)
// Tensor {
// dims: [ 1, 8, 256 ],
// type: 'float32',
// data: Float32Array(2048) [ 0.5410046577453613, 0.18386700749397278, ... ],
// size: 2048
// }
NOTE: This only adds support for the architecture. When the external data format is supported in ONNX Runtime, we will make an update that includes converted versions of the available Phi models.
In the last release, we added support for CLAP models (CLIP but for audio), so in this one, we're releasing a simple demo application which shows how you can use a CLAP model to perform real-time semantic music search! For simplicity, we implemented everything in vanilla JavaScript, but feel free to adapt it to your framework of choice! As always, the source code is open source! 🥳 PR: https://github.com/xenova/transformers.js/pull/442
https://github.com/xenova/transformers.js/assets/26504141/72e09f8c-d6e9-4430-a56c-7994737966db
SpeechT5ForSpeechToText in https://github.com/xenova/transformers.js/pull/438Full Changelog: https://github.com/xenova/transformers.js/compare/2.10.1...2.11.0
{percentage: true} in https://github.com/xenova/transformers.js/pull/434. Thanks to @tobiascornille for reporting the issue!HF_ACCESS_TOKEN -> HF_TOKEN environment variables in https://github.com/xenova/transformers.js/pull/431Full Changelog: https://github.com/xenova/transformers.js/compare/2.10.0...2.10.1
The task of classifying audio into classes that are unseen during training. See here for more information.
Example: Perform zero-shot audio classification with Xenova/clap-htsat-unfused.
import { pipeline } from '@xenova/transformers';
// Create a zero-shot audio classification pipeline
const classifier = await pipeline('zero-shot-audio-classification', 'Xenova/clap-htsat-unfused');
const audio = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/dog_barking.wav';
const candidate_labels = ['dog', 'vaccum cleaner'];
const scores = await classifier(audio, candidate_labels);
// [
// { score: 0.9993992447853088, label: 'dog' },
// { score: 0.0006007603369653225, label: 'vaccum cleaner' }
// ]
<details>
<summary>Audio used</summary>
</details>
We added support for 4 new architectures, bringing the total up to 65!
CLAP for zero-shot audio classification, text embeddings, and audio embeddings (https://github.com/xenova/transformers.js/pull/427). See here for the list of available models.
Zero-shot audio classification (same as above)
Text embeddings with Xenova/clap-htsat-unfused:
import { AutoTokenizer, ClapTextModelWithProjection } from '@xenova/transformers';
// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clap-htsat-unfused');
const text_model = await ClapTextModelWithProjection.from_pretrained('Xenova/clap-htsat-unfused');
// Run tokenization
const texts = ['a sound of a cat', 'a sound of a dog'];
const text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
// dims: [ 2, 512 ],
// type: 'float32',
// data: Float32Array(1024) [ ... ],
// size: 1024
// }
Audio embeddings with Xenova/clap-htsat-unfused:
import { AutoProcessor, ClapAudioModelWithProjection, read_audio } from '@xenova/transformers';
// Load processor and audio model
const processor = await AutoProcessor.from_pretrained('Xenova/clap-htsat-unfused');
const audio_model = await ClapAudioModelWithProjection.from_pretrained('Xenova/clap-htsat-unfused');
// Read audio and run processor
const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cat_meow.wav');
const audio_inputs = await processor(audio);
// Compute embeddings
const { audio_embeds } = await audio_model(audio_inputs);
// Tensor {
// dims: [ 1, 512 ],
// type: 'float32',
// data: Float32Array(512) [ ... ],
// size: 512
// }
Audio Spectrogram Transformer for audio classification (https://github.com/xenova/transformers.js/pull/427). See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create an audio classification pipeline
const classifier = await pipeline('audio-classification', 'Xenova/ast-finetuned-audioset-10-10-0.4593');
// Predict class
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cat_meow.wav';
const output = await classifier(url, { topk: 4 });
// [
// { label: 'Meow', score: 0.5617874264717102 },
// { label: 'Cat', score: 0.22365376353263855 },
// { label: 'Domestic animals, pets', score: 0.1141069084405899 },
// { label: 'Animal', score: 0.08985692262649536 },
// ]
ConvNeXT for image classification (https://github.com/xenova/transformers.js/pull/428). See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/convnext-tiny-224');
// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
const output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.6153212785720825 }]
ConvNeXT-v2 for image classification (https://github.com/xenova/transformers.js/pull/428). See here for the list of available models.
import { pipeline } from '@xenova/transformers';
// Create image classification pipeline
const classifier = await pipeline('image-classification', 'Xenova/convnextv2-atto-1k-224');
// Classify an image
const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
const output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.6391205191612244 }]
Full Changelog: https://github.com/xenova/transformers.js/compare/2.9.0...2.10.0
Transformers.js v2.9.0 adds support for three new tasks: (1) Depth estimation, (2) Zero-shot object detection, and (3) Optical document understanding.
The task of predicting the depth of objects present in an image. See here for more information.
import { pipeline } from '@xenova/transformers';
// Create depth estimation pipeline
let depth_estimator = await pipeline('depth-estimation', 'Xenova/dpt-hybrid-midas');
// Predict depth for image
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let output = await depth_estimator(url);
| Input | Output |
|---|---|
// {
// predicted_depth: Tensor {
// dims: [ 384, 384 ],
// type: 'float32',
// data: Float32Array(147456) [ 542.859130859375, 545.2833862304688, 546.1649169921875, ... ],
// size: 147456
// },
// depth: RawImage {
// data: Uint8Array(307200) [ 86, 86, 86, ... ],
// width: 640,
// height: 480,
// channels: 1
// }
// }
</details>
The task of identifying objects of classes that are unseen during training. See here for more information.
import { pipeline } from '@xenova/transformers';
// Create zero-shot object detection pipeline
let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');
// Predict bounding boxes
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
let candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
let output = await detector(url, candidate_labels);
<details>
<summary>Raw output</summary>
// [
// {
// score: 0.24392342567443848,
// label: 'human face',
// box: { xmin: 180, ymin: 67, xmax: 274, ymax: 175 }
// },
// {
// score: 0.15129457414150238,
// label: 'american flag',
// box: { xmin: 0, ymin: 4, xmax: 106, ymax: 513 }
// },
// {
// score: 0.13649864494800568,
// label: 'helmet',
// box: { xmin: 277, ymin: 337, xmax: 511, ymax: 511 }
// },
// {
// score: 0.10262022167444229,
// label: 'rocket',
// box: { xmin: 352, ymin: -1, xmax: 463, ymax: 287 }
// }
// ]
</details>
This task involves translating images of scientific PDFs to markdown, enabling easier access to them. See here for more information.
import { pipeline } from '@xenova/transformers';
// Create image-to-text pipeline
let pipe = await pipeline('image-to-text', 'Xenova/nougat-small');
// Generate markdown
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
let output = await pipe(url, {
min_length: 1,
max_new_tokens: 40,
bad_words_ids: [[pipe.tokenizer.unk_token_id]],
});
// [{ generated_text: "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: lblecher@meta.com\n\nGuillem Cucur" }]
<details>
<summary>See input image</summary>
</details>
We added support for 4 new architectures, bringing the total up to 61!
image-to-text). See here for the list of available models.CLIPFeatureExtractor (and tests) in https://github.com/xenova/transformers.js/pull/387multilingual-e5-* models by @do-me in https://github.com/xenova/transformers.js/pull/403Full Changelog: https://github.com/xenova/transformers.js/compare/2.8.0...2.9.0
This release adds support for image-to-image translation (e.g., super-resolution) with Swin2SR models.
| Side-by-side (full) | Animated (zoomed) |
|---|---|
As always, you can get started in just a few lines of code!
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/spaces/jjourney1125/swin2sr/resolve/main/testsets/real-inputs/0855.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-compressed-sr-x4-48');
let output = await upscaler(url);
// RawImage {
// data: Uint8Array(12582912) [165, 166, 163, ...],
// width: 2048,
// height: 2048,
// channels: 3
// }
We also added support for 4 new architectures, bringing the total up to 57! 🤯
TrOCR for optical character recognition (OCR).
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/handwriting.jpg';
let captioner = await pipeline('image-to-text', 'Xenova/trocr-small-handwritten');
let output = await captioner(url);
// [{ generated_text: 'Mr. Brown commented icily.' }]
Added in https://github.com/xenova/transformers.js/pull/375. See here for the list of available models.
Swin2SR for super-resolution and image restoration.
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/butterfly.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-classical-sr-x2-64');
let output = await upscaler(url);
// RawImage {
// data: Uint8Array(786432) [ 41, 31, 24, 43, ... ],
// width: 512,
// height: 512,
// channels: 3
// }
Added in https://github.com/xenova/transformers.js/pull/381. See here for the list of available models.
Mistral and Falcon for text-generation. Added in https://github.com/xenova/transformers.js/pull/379. Note: Other than testing models, we haven't yet converted any of the larger (≥7B parameter) models. Stay tuned for more updates on this!
text2text-generation pipeline output inconsistency w/ python library in https://github.com/xenova/transformers.js/pull/384Full Changelog: https://github.com/xenova/transformers.js/compare/2.7.0...2.8.0
Due to popular demand, we've added text-to-speech support to Transformers.js! 😍
https://github.com/xenova/transformers.js/assets/26504141/9fa5131d-0e07-47fa-9a13-122c1b69d233
You can get started in just a few lines of code!
import { pipeline } from '@xenova/transformers';
let speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin';
let synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts', { quantized: false });
let out = await synthesizer('Hello, my dog is cute', { speaker_embeddings });
// {
// audio: Float32Array(26112) [-0.00005657337896991521, 0.00020583874720614403, ...],
// sampling_rate: 16000
// }
You can then save the audio to a .wav file with the wavefile package:
import wavefile from 'wavefile';
import fs from 'fs';
let wav = new wavefile.WaveFile();
wav.fromScratch(1, out.sampling_rate, '32f', out.audio);
fs.writeFileSync('out.wav', wav.toBuffer());
Alternatively, you can play the file in your browser (see below).
Don't like the speaker's voice? Well, you can choose another from the >7000 speaker embeddings in the CMU Arctic dataset (see here)!
Note: currently, we only support TTS w/ speecht5, but in future we'll add others like bark and MMS!
To showcase the power of in-browser TTS, we're also releasing a simple example app (demo, code). Feel free to make improvements to it... and if you do (or end up building your own), please tag me on Twitter! 🤗
https://github.com/xenova/transformers.js/assets/26504141/98adea31-b002-403b-ba9d-1edcc7e7bf11
< and > symbols generated from docs in https://github.com/xenova/transformers.js/pull/335Full Changelog: https://github.com/xenova/transformers.js/compare/2.6.2...2.7.0
Document Question Answering is the task of answering questions based on an image of a document. Document Question Answering models take a (document, question) pair as input and return an answer in natural language. Check out the docs for more info!
<details> <summary> Example code </summary>// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';
let image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice.png';
let question = 'What is the invoice number?';
// Create document question answering pipeline
let qa_pipeline = await pipeline('document-question-answering', 'Xenova/donut-base-finetuned-docvqa');
// Run the pipeline
let output = await qa_pipeline(image, question);
// [{ answer: 'us-001' }]
</details>
DonutSwin models in https://github.com/xenova/transformers.js/pull/320Blenderbot and BlenderbotSmall in https://github.com/xenova/transformers.js/pull/292LongT5 models https://github.com/xenova/transformers.js/pull/316In-browser semantic image search in https://github.com/xenova/transformers.js/pull/326 (demo, code, tweet)
https://github.com/xenova/transformers.js/assets/26504141/c2ea6e69-2344-401e-8745-fdea3a0613ad
_call LSP errors + extra typings by @kungfooman in https://github.com/xenova/transformers.js/pull/304CustomCache requirement for example browser extension project in https://github.com/xenova/transformers.js/pull/325Full Changelog: https://github.com/xenova/transformers.js/compare/2.6.1...2.6.2
Add Vanilla JavaScript tutorial by @perborgen in https://github.com/xenova/transformers.js/pull/271. This includes an interactive video tutorial ("scrim"), which walks you through the code! Let us know if you want to see more of these video tutorials! 🤗
Add support for min_length and min_new_tokens generation parameters in https://github.com/xenova/transformers.js/pull/308
Fix issues with minification in https://github.com/xenova/transformers.js/pull/307
Fix ByteLevel pretokenizer and improve whisper test cases in https://github.com/xenova/transformers.js/pull/287
Misc. documentation improvements by @rubiagatra in https://github.com/xenova/transformers.js/pull/293
Full Changelog: https://github.com/xenova/transformers.js/compare/2.6.0...2.6.1