3.1.0 — Datasets — releases.sh

Dataset Features

Video support by @lhoestq in https://github.com/huggingface/datasets/pull/7230

>>> from datasets import Dataset, Video, load_dataset
>>> ds = Dataset.from_dict({"video":["path/to/Screen Recording.mov"]}).cast_column("video", Video())
>>> # or from the hub
>>> ds = load_dataset("username/dataset_name", split="train")
>>> ds[0]["video"]
<decord.video_reader.VideoReader at 0x105525c70>

Add IterableDataset.shard() by @lhoestq in https://github.com/huggingface/datasets/pull/7252

>>> from datasets import load_dataset
>>> full_ds = load_dataset("amphion/Emilia-Dataset", split="train", streaming=True)
>>> full_ds.num_shards
2360
>>> ds = full_ds.shard(num_shards=ds.num_shards, index=0)
>>> ds.num_shards
1
>>> ds = full_ds.shard(num_shards=8, index=0)
>>> ds.num_shards
295

Basic XML support by @lhoestq in https://github.com/huggingface/datasets/pull/7250

What's Changed

(Super tiny doc update) Mention to_polars by @fzyzcjy in https://github.com/huggingface/datasets/pull/7232
[MINOR:TYPO] Update arrow_dataset.py by @cakiki in https://github.com/huggingface/datasets/pull/7236
Missing video docs by @lhoestq in https://github.com/huggingface/datasets/pull/7251
fix decord import by @lhoestq in https://github.com/huggingface/datasets/pull/7255
fix ci for pyarrow 18 by @lhoestq in https://github.com/huggingface/datasets/pull/7257
Retry all requests timeouts by @lhoestq in https://github.com/huggingface/datasets/pull/7256
Always set non-null writer batch size by @lhoestq in https://github.com/huggingface/datasets/pull/7258
Don't embed videos by @lhoestq in https://github.com/huggingface/datasets/pull/7259
Allow video with disabeld decoding without decord by @lhoestq in https://github.com/huggingface/datasets/pull/7262
Small addition to video docs by @lhoestq in https://github.com/huggingface/datasets/pull/7263
fix docs relative links by @lhoestq in https://github.com/huggingface/datasets/pull/7264
Disallow video push_to_hub by @lhoestq in https://github.com/huggingface/datasets/pull/7265

New Contributors

@fzyzcjy made their first contribution in https://github.com/huggingface/datasets/pull/7232

Full Changelog: https://github.com/huggingface/datasets/compare/3.0.2...3.1.0