The library shifted toward flexible data handling and cloud-native workflows. It added support for Hugging Face Storage Buckets, letting developers load raw data directly from cloud storage, process it, and publish results—fixing multiprocessed operations on macOS in the process. Concurrently, the Json() type shipped to handle mixed-type fields that Arrow normally rejects, essential for datasets with arbitrary JSON structures like tool-calling examples. Support for Lance format and rich media types (Image, Video, Audio) expanded the ecosystem for specialized data formats, with push_to_hub() graduating to handle videos alongside structured data.
Expanded data loading flexibility and fixed regressions across the 4.7–4.8 line. The major addition was HF Storage Bucket support, letting developers load raw data directly from cloud storage, process it with map/filter operations, and push the result to dataset repos—while also fixing a macOS segfault in multiprocessed push_to_hub. The Json() type arrived to handle mixed-type fields in datasets like tool calling use cases, complemented by quick fixes to JSON parsing, distributed node splitting, and Arrow iteration behavior.