v2.9.0
Firecrawl v2.9.0
Improvements
- Browser Interaction via
/interactendpoint — Scrape a page, then call/interactto take actions on it — click buttons, fill forms, navigate deeper, or extract dynamic content. Describe what you want in natural language viaprompt, or write Playwright code (Node.js, Python) and Bash (agent-browser) for full control. Sessions persist across calls, with live view and interactive live view URLs for real-time browser streaming. Persistent profiles let you save and reuse browser state (cookies, localStorage) across scrapes. Available in JS, Python, Java, and Rust SDKs. queryformat — Addedqueryformat to the/scrapeendpoint — pass a natural-language prompt and get a direct answer back indata.answer.audioformat — Addedaudioformat option to scrape responses, returning audio output as a field on the document.onlyCleanContentparameter — AddedonlyCleanContentparameter to the/scrapeendpoint, which strips navigation, ads, cookie banners, and other non-semantic content from markdown output.- PDF parsing modes — Added PDF parsing modes (
fast,auto,ocr) and amaxPagesoption to control extraction depth and OCR behavior. - Java and Elixir SDKs — Added official Java and Elixir SDKs with full v2 API support.
- Legacy
.docfile support — Added support for parsing legacy.docfiles. - Wikimedia engine — Added a dedicated engine for scraping Wikipedia and Wikimedia pages with improved output quality.
contentTypein scrape responses — AddedcontentTypeto scrape responses for PDFs and documents.- PDF pipeline improvements — Improved PDF pipeline with better table detection, header/footer stripping, mixed PDF handling, inline image parsing, and magic byte detection.
- Branding extraction — Improved branding extraction to skip hidden DOM elements for cleaner output.
- HTML-to-markdown performance — Improved HTML-to-markdown conversion performance and fixed code blocks losing content during conversion.
- Concurrency queue — New concurrency queue system with reconciler and backfill for more reliable job scheduling.
- Rust SDK v2 — Added v2 API namespace with agent support to the Rust SDK.
- Fixed Python SDK parameters
timeout,max_retries, andbackoff_factor— these were previously accepted but silently ignored. - Capped job timeouts at 48 hours to prevent runaway jobs from consuming resources.
- Added retry limits to prevent scrape loops.
- Binary content types are now rejected early in the scrape pipeline to avoid wasted processing.
Fixes
- Fixed empty responses when using the
o3-minimodel on extract jobs. - Fixed revoked API keys remaining valid for up to 10 minutes after deletion.
- Fixed a race condition in extract jobs that caused "Job not found" crashes.
- Fixed
time_takenin/v1/mapalways returning ~0. - Fixed crawl status responses now surfacing a
failedstatus with an error message and partial data when a crawl-level failure occurs. - Fixed
maxPagesnot being passed to the PDF extractor — previously, full PDF content was returned while only charging for the limited page count. - Fixed free request credits being incorrectly consumed and billed on agent jobs exceeding the
maxCreditsthreshold. - Fixed dashboard displaying incorrect concurrency limits due to stale reads.
- Fixed branding
colors.secondarynot being populated. - Fixed
removeBase64Imagesrunning afterderiveDiffin the transformer pipeline, causing diff issues. - Fixed GCS fetch using wrong row index for cache info lookups.
- Fixed unhandled
ZodErrorin/v1/searchcontroller. - Resolved multiple CVEs across dependencies including
handlebars,path-to-regexp,fast-xml-parser,rollup(CVE-2026-27606),undici, and others. - Hardened the Playwright service against SSRF attacks.
API
- Added
GET /v2/team/activityendpoint for listing recent scrape, crawl, and extract jobs with cursor-based pagination (last 24 hours, up to 100 results per page, filterable by endpoint type). - Added
regexOnFullURLparameter on crawl requests to applyincludePaths/excludePathsfiltering against the full URL including query parameters. Available in JS, Python, Java, and Elixir SDKs. - Added
deduplicateSimilarURLsparameter on crawl requests. Available in JS, Python, Java, and Elixir SDKs. - Deprecated the
extractendpoint — use the/agentendpoint instead. Existingextractmethods in JS and Python SDKs are marked deprecated. - Renamed
persistentSessiontoprofileon browser/interact requests (writeModeis nowsaveChanges). The old parameter name remains functional but is no longer documented.
New Contributors
- @misza-one made their first contribution in https://github.com/firecrawl/firecrawl/pull/2660
- @madmikeross made their first contribution in https://github.com/firecrawl/firecrawl/pull/2948
- @rowinsg made their first contribution in https://github.com/firecrawl/firecrawl/pull/3065
- @Bortlesboat made their first contribution in https://github.com/firecrawl/firecrawl/pull/3243
- @dagecko made their first contribution in https://github.com/firecrawl/firecrawl/pull/3249
- @cokemine made their first contribution in https://github.com/firecrawl/firecrawl/pull/3262
- @paulonasc made their first contribution in https://github.com/firecrawl/firecrawl/pull/3275
Contributors
- @nickscamara
- @mogery
- @amplitudesxd
- @abimaelmartell
- @ericciarla
- @rafaelsideguide
- @delong3
- @devhims
- @Chadha93
- @tomsideguide
- @charlietlamb
- @developersdigest
- @micahstairs
- @rhys-firecrawl
- @firecrawl-spring
- @devin-ai-integration
- @misza-one
- @madmikeross
- @rowinsg
- @Bortlesboat
- @dagecko
- @cokemine
- @paulonasc
Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.8.0...v2.9.0
Fetched April 11, 2026

