Support async functions in map() by @lhoestq in https://github.com/huggingface/datasets/pull/7384
prompt = "Answer the following question: {question}. You should think step by step."
async def ask_llm(example):
return await query_model(prompt.format(question=example["question"]))
ds = ds.map(ask_llm)
Add repeat method to datasets by @alex-hh in https://github.com/huggingface/datasets/pull/7198
ds = ds.repeat(10)
Support faster processing using pandas or polars functions in IterableDataset.map() by @lhoestq in https://github.com/huggingface/datasets/pull/7370
ds = load_dataset("ServiceNow-AI/R1-Distill-SFT", "v0", split="train", streaming=True)
ds = ds.with_format("polars")
expr = pl.col("solution").str.extract("boxed\\{(.*)\\}").alias("value_solution")
ds = ds.map(lambda df: df.with_columns(expr), batched=True)
Apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets by @alex-hh in https://github.com/huggingface/datasets/pull/7207
Full Changelog: https://github.com/huggingface/datasets/compare/3.2.0...3.3.0
Fetched April 7, 2026