.flatten_indices() (x2) + save/load_from_disk (x100) on selected/shuffled datasetsverification_mode you can pass to `load_dataset()):.map() in multiprocessing.to_iterable_dataset() to get a IterableDataset from a DatasetIterableDataset in the documentation about the differences between Dataset and IterableDataset.select_column() to return a dataset only containing the requested columnsds = ds.sort(['col_1', 'col_2'], reverse=[True, False])ds = ds.with_format("jax", device=device)nyu_depth_v2 dataset by @awsaf49 in https://github.com/huggingface/datasets/pull/5484load_from_cache_file arg from Dataset.shard() docstring by @polinaeterna in https://github.com/huggingface/datasets/pull/5493NumpyFormatter by @alvarobartt in https://github.com/huggingface/datasets/pull/5530load_from_cache_file type and logic by @HallerPatrick in https://github.com/huggingface/datasets/pull/5515ruff by @mariosasko in https://github.com/huggingface/datasets/pull/5519Full Changelog: https://github.com/huggingface/datasets/compare/2.9.0...ef
Fetched April 7, 2026