What's new?

Support for computing CLIP image and text embeddings separately (https://github.com/xenova/transformers.js/pull/227)

You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a demo application for semantic image search to showcase this functionality.

Example: Compute text embeddings with CLIPTextModelWithProjection.

import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 512 ],
//   type: 'float32',
//   data: Float32Array(1024) [ ... ],
//   size: 1024
// }

Example: Compute vision embeddings with CLIPVisionModelWithProjection.

import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);

// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 512 ],
//   type: 'float32',
//   data: Float32Array(512) [ ... ],
//   size: 512
// }

Improved browser extension example/template (https://github.com/xenova/transformers.js/pull/196)

We've updated the source code for our example browser extension, making the following improvements:

Custom model caching - meaning you don't need to ship the weights of the model with the extension. In addition to a smaller bundle size, when the user updates, they won't need to redownload the weights!
Use ES6 module syntax (vs. CommonJS) - much cleaner code!
Persistent service worker - fixed an issue where the service worker would go to sleep after a portion of inactivity.

Summary of updates since last minor release (2.4.0):

(2.4.1) Improved documentation
(2.4.2) Support for private/gated models (https://github.com/xenova/transformers.js/pull/202)
(2.4.3) Example Next.js applications (https://github.com/xenova/transformers.js/pull/211) + MPNet model support (https://github.com/xenova/transformers.js/pull/221)
(2.4.4) StarCoder models + example application (release; demo + source code)

Misc bug fixes and improvements

Fixed floating-point-precision edge-case for resizing images
Fixed RawImage.save()
BPE tokenization for weird whitespace characters (https://github.com/xenova/transformers.js/pull/208)

2.5.0

What's new?

Support for computing CLIP image and text embeddings separately (https://github.com/xenova/transformers.js/pull/227)

Improved browser extension example/template (https://github.com/xenova/transformers.js/pull/196)

Summary of updates since last minor release (2.4.0):

Misc bug fixes and improvements