datasets are not able to reload datasets pushed with this new model, so we encourage everyone to update.IterableDataset.map that lead to features=None by @alvarobartt in https://github.com/huggingface/datasets/pull/5287
features after column renaming or removalfeatures param to IterableDataset.map by @alvarobartt in https://github.com/huggingface/datasets/pull/5311num_shards or max_shard_size to ds.save_to_disk() or ds.push_to_hub()num_proc to use multiprocessing.from datasets import load_dataset
ds = load_dataset("c4", "en", streaming=True, split="train")
dataloader = DataLoader(ds, batch_size=32, num_workers=4)
max_shard_size docs by @lhoestq in https://github.com/huggingface/datasets/pull/5267from_generator docs by @mariosasko in https://github.com/huggingface/datasets/pull/5307wikipedia or natural_questionsArrowWriter.finalize before inference error by @mariosasko in https://github.com/huggingface/datasets/pull/5309num_proc for dataset download and generation by @mariosasko in https://github.com/huggingface/datasets/pull/5300IterableDataset.map param batch_size typing as optional by @alvarobartt in https://github.com/huggingface/datasets/pull/5336topdown parameter in xwalk by @mariosasko in https://github.com/huggingface/datasets/pull/5308use_auth_token docstring and deprecate use_auth_token in download_and_prepare by @mariosasko in https://github.com/huggingface/datasets/pull/5302.tar archives in the same way as for .tar.gz and .tgz in _get_extraction_protocol by @polinaeterna in https://github.com/huggingface/datasets/pull/5322Full Changelog: https://github.com/huggingface/datasets/compare/2.7.0...2.8.0
Fetched April 7, 2026