releases.shpreview

3.2.0

December 10, 2024DatasetsView original ↗
$npx -y @buildinternet/releases show rel_0BrHuL2NmpuhH1l7R6Kqt

Dataset Features

  • Faster parquet streaming + filters with predicate pushdown by @lhoestq in https://github.com/huggingface/datasets/pull/7309
    • Up to +100% streaming speed
    • Fast filtering via predicate pushdown (skip files/row groups based on predicate instead of downloading the full data), e.g.
      from datasets import load_dataset
      filters = [('date', '>=', '2023')]
      ds = load_dataset("HuggingFaceFW/fineweb-2", "fra_Latn", streaming=True, filters=filters)
      

Other improvements and bug fixes

New Contributors

Full Changelog: https://github.com/huggingface/datasets/compare/3.1.0...3.2.0

Fetched April 7, 2026