We now have the highest quality and most comprehensive web data API available powered by our new semantic index and custom browser stack.
See the benchmarks below:
<img width="1200" height="675" alt="image" src="https://github.com/user-attachments/assets/96a2ba36-0c7f-4fa3-829e-d6ac91b53705" />
New Features
- Implemented scraping for
.xlsx (Excel) files.
- Introduced new crawl architecture and NUQ concurrency tracking system.
- Per-owner/group concurrency limiting + dynamic concurrency calculation.
- Added group backlog handling and improved group operations.
- Added
/search pricing update
- Added team flag to skip country check.
- Always populate NUQ metrics for improved observability.
- New test-site app for improved CI testing.
- Extract metadata from document head for richer output.
Enhancements & Improvements
- Improved blocklist loading and unsupported site error messages.
- Updated x402-express version.
- Improved includePaths handling for subdomains.
- Updated self-hosted search to use DuckDuckGo.
- JS & Python SDKs no longer require API key for self-hosted deployments.
- Python SDK timeout handling improvements.
- Rust client now uses
tracing instead of print.
- Reduced noise in auto-recharge Slack notifications.
Fixes
- Ensured crawl robots.txt warnings surface reliably.
- Resolved concurrency deadlocks and duplicate job handling.
- Fixed search country defaults and pricing logic bugs.
- Fixed port conflicts in harness environments.
- Fixed viewport dimension support and screenshot behavior in Playwright.
- Resolved CI test flakiness (playwright cache, prod tests).
👋 New Contributors
- @delong3
- @c4nc
- @codetheweb
Full diff: https://github.com/firecrawl/firecrawl/compare/v2.4.0...v2.5.0
What's Changed
New Contributors
Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.4.0...v2.5.0