releases.shpreview

3.0.0

September 11, 2024DatasetsView original ↗
$npx -y @buildinternet/releases show rel_Ww_r01Y56W6Qfa--Exndy

Dataset Features

  • Use Polars functions in .map()
    • Allow Polars as valid output type by @psmyth94 in https://github.com/huggingface/datasets/pull/6762

    • Example:

      >>> from datasets import load_dataset
      >>> ds = load_dataset("lhoestq/CudyPokemonAdventures", split="train").with_format("polars")
      >>> cols = [pl.col("content").str.len_bytes().alias("length")]
      >>> ds_with_length = ds.map(lambda df: df.with_columns(cols), batched=True)
      >>> ds_with_length[:5]
      shape: (5, 5)
      ┌─────┬───────────────────────────────────┬───────────────────────────────────┬───────────────────────┬────────┐
      │ idx ┆ title                             ┆ content                           ┆ labels                ┆ length │
      │ --- ┆ ---                               ┆ ---                               ┆ ---                   ┆ ---    │
      │ i64 ┆ str                               ┆ str                               ┆ str                   ┆ u32    │
      ╞═════╪═══════════════════════════════════╪═══════════════════════════════════╪═══════════════════════╪════════╡
      │ 0   ┆ The Joyful Adventure of Bulbasau… ┆ Bulbasaur embarked on a sunny qu… ┆ joyful_adventure      ┆ 180    │
      │ 1   ┆ Pikachu's Quest for Peace         ┆ Pikachu, with his cheeky persona… ┆ peaceful_narrative    ┆ 138    │
      │ 2   ┆ The Tender Tale of Squirtle       ┆ Squirtle took everyone on a memo… ┆ gentle_adventure      ┆ 135    │
      │ 3   ┆ Charizard's Heartwarming Tale     ┆ Charizard found joy in helping o… ┆ heartwarming_story    ┆ 112    │
      │ 4   ┆ Jolteon's Sparkling Journey       ┆ Jolteon, with his zest for life,… ┆ celebratory_narrative ┆ 111    │
      └─────┴───────────────────────────────────┴───────────────────────────────────┴───────────────────────┴────────┘
      
  • Support NumPy 2

Cache Changes

  • Use huggingface_hub cache by @lhoestq in https://github.com/huggingface/datasets/pull/7105
    • use the huggingface_hub cache for files downloaded from HF, by default at ~/.cache/huggingface/hub
    • cached datasets (Arrow files) will still be reloaded from the datasets cache, by default at ~/.cache/huggingface/datasets

Breaking changes

General improvements and bug fixes

New Contributors

Full Changelog: https://github.com/huggingface/datasets/compare/2.21.0...3.0.0

Fetched April 7, 2026