releases.shpreview
Home/Firecrawl
Firecrawl

Firecrawl

$npx @buildinternet/releases get firecrawl
Apr 10, 2026

Firecrawl v2.9.0

Improvements

  • Browser Interaction via /interact endpoint — Scrape a page, then call /interact to take actions on it — click buttons, fill forms, navigate deeper, or extract dynamic content. Describe what you want in natural language via prompt, or write Playwright code (Node.js, Python) and Bash (agent-browser) for full control. Sessions persist across calls, with live view and interactive live view URLs for real-time browser streaming. Persistent profiles let you save and reuse browser state (cookies, localStorage) across scrapes. Available in JS, Python, Java, and Rust SDKs.
  • query format — Added query format to the /scrape endpoint — pass a natural-language prompt and get a direct answer back in data.answer.
  • audio format — Added audio format option to scrape responses, returning audio output as a field on the document.
  • onlyCleanContent parameter — Added onlyCleanContent parameter to the /scrape endpoint, which strips navigation, ads, cookie banners, and other non-semantic content from markdown output.
  • PDF parsing modes — Added PDF parsing modes (fast, auto, ocr) and a maxPages option to control extraction depth and OCR behavior.
  • Java and Elixir SDKs — Added official Java and Elixir SDKs with full v2 API support.
  • Legacy .doc file support — Added support for parsing legacy .doc files.
  • Wikimedia engine — Added a dedicated engine for scraping Wikipedia and Wikimedia pages with improved output quality.
  • contentType in scrape responses — Added contentType to scrape responses for PDFs and documents.
  • PDF pipeline improvements — Improved PDF pipeline with better table detection, header/footer stripping, mixed PDF handling, inline image parsing, and magic byte detection.
  • Branding extraction — Improved branding extraction to skip hidden DOM elements for cleaner output.
  • HTML-to-markdown performance — Improved HTML-to-markdown conversion performance and fixed code blocks losing content during conversion.
  • Concurrency queue — New concurrency queue system with reconciler and backfill for more reliable job scheduling.
  • Rust SDK v2 — Added v2 API namespace with agent support to the Rust SDK.
  • Fixed Python SDK parameters timeout, max_retries, and backoff_factor — these were previously accepted but silently ignored.
  • Capped job timeouts at 48 hours to prevent runaway jobs from consuming resources.
  • Added retry limits to prevent scrape loops.
  • Binary content types are now rejected early in the scrape pipeline to avoid wasted processing.

Fixes

  • Fixed empty responses when using the o3-mini model on extract jobs.
  • Fixed revoked API keys remaining valid for up to 10 minutes after deletion.
  • Fixed a race condition in extract jobs that caused "Job not found" crashes.
  • Fixed time_taken in /v1/map always returning ~0.
  • Fixed crawl status responses now surfacing a failed status with an error message and partial data when a crawl-level failure occurs.
  • Fixed maxPages not being passed to the PDF extractor — previously, full PDF content was returned while only charging for the limited page count.
  • Fixed free request credits being incorrectly consumed and billed on agent jobs exceeding the maxCredits threshold.
  • Fixed dashboard displaying incorrect concurrency limits due to stale reads.
  • Fixed branding colors.secondary not being populated.
  • Fixed removeBase64Images running after deriveDiff in the transformer pipeline, causing diff issues.
  • Fixed GCS fetch using wrong row index for cache info lookups.
  • Fixed unhandled ZodError in /v1/search controller.
  • Resolved multiple CVEs across dependencies including handlebars, path-to-regexp, fast-xml-parser, rollup (CVE-2026-27606), undici, and others.
  • Hardened the Playwright service against SSRF attacks.

API

  • Added GET /v2/team/activity endpoint for listing recent scrape, crawl, and extract jobs with cursor-based pagination (last 24 hours, up to 100 results per page, filterable by endpoint type).
  • Added regexOnFullURL parameter on crawl requests to apply includePaths/excludePaths filtering against the full URL including query parameters. Available in JS, Python, Java, and Elixir SDKs.
  • Added deduplicateSimilarURLs parameter on crawl requests. Available in JS, Python, Java, and Elixir SDKs.
  • Deprecated the extract endpoint — use the /agent endpoint instead. Existing extract methods in JS and Python SDKs are marked deprecated.
  • Renamed persistentSession to profile on browser/interact requests (writeMode is now saveChanges). The old parameter name remains functional but is no longer documented.

New Contributors

Contributors

  • @nickscamara
  • @mogery
  • @amplitudesxd
  • @abimaelmartell
  • @ericciarla
  • @rafaelsideguide
  • @delong3
  • @devhims
  • @Chadha93
  • @tomsideguide
  • @charlietlamb
  • @developersdigest
  • @micahstairs
  • @rhys-firecrawl
  • @firecrawl-spring
  • @devin-ai-integration
  • @misza-one
  • @madmikeross
  • @rowinsg
  • @Bortlesboat
  • @dagecko
  • @cokemine
  • @paulonasc

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.8.0...v2.9.0

v2.9.0 is live

Firecrawl v2.9.0 includes browser interaction via /interact, new scrape formats, smarter PDF handling, two new SDKs, and reliability fixes.

Key Features:

  • Browser Interaction via /interact — Scrape a page, then call /interact to click buttons, fill forms, navigate, or extract dynamic content using natural language or Playwright/Bash code. Sessions persist across calls with live view URLs and reusable browser profiles.
  • Query Format — Pass natural-language prompts to /scrape and get direct answers in data.answer.
  • Audio Format — Request audio output from any scrape as a field on the document.
  • onlyCleanContent Parameter — Strip navigation, ads, and non-semantic content from markdown output.
  • PDF Parsing Modes — Choose fast, auto, or ocr parsing with maxPages option for fine-grained extraction control.
  • Java & Elixir SDKs — Official SDKs with full v2 API support, joining JS, Python, Go, and Rust.
Mar 25, 2026

Introduce the new /interact endpoint that turns any scrape into a live browser session where agents can click, type, and navigate using natural language.

Key Features:

  • Natural Language Control — Describe what you want in plain English; the agent clicks, types, scrolls, and extracts data automatically without selectors or scripts.
  • Live Browser Sessions — Every session includes a live URL you can embed, share, or interact with in real time for debugging and demos.
  • Persistent Profiles — Log in once and pick up where you left off with cookies and localStorage carrying across scrapes with named profiles.
  • Full Playwright Control — Switch to code mode and run Playwright (Node.js or Python) or Bash for precision control.
  • Session Reuse — Chain multiple interact calls on the same scrape with the browser maintaining state between calls for complex multi-step workflows.
Mar 12, 2026

Full support for core endpoints including scrape, search, and crawl. Works with Maven, Gradle, and Java 17+.

Key Features:

  • Maven & Gradle Ready — Drop into any Java project via JitPack with standard dependency management.
  • Java 17+ Support — Built for modern Java environments.
  • Core Endpoint Coverage — Scrape, search, crawl, map, and agent endpoints all supported.
Feb 26, 2026

New PDF parsing engine delivers 3x faster parsing and significantly improved reliability. Rebuilt in Rust, it automatically adapts to any PDF from clean text files to scanned reports and complex layouts.

Key Features:

  • Rust-Based Parser — High-performance engine built in Rust delivers up to 3x faster parsing, reducing latency in data ingestion and embedding workflows.
  • Three Parsing Modes:
    • fast — text-only parsing for maximum performance.
    • auto — new default; starts in fast mode and automatically falls back to OCR when needed, intelligently detecting edge cases like embedded images, graphs, multi-column layouts, and unusual text encodings.
    • ocr — forces OCR parsing for fully image-based or scanned documents.
  • Built for Production Reliability — Extensively tested across thousands of real-world PDFs for consistent, accurate extraction.
Feb 17, 2026

Browser Sandbox gives agents a secure, fully managed browser environment for interactive web automation with no local setup, Chromium installs, or driver compatibility issues. Each session runs in an isolated, disposable sandbox that scales without infrastructure management.

Key Features:

  • Browser Sandbox — Launch secure, isolated browser sessions with Python, JavaScript, and bash execution. Pre-installed with agent-browser CLI and Playwright.
  • Multi-Language Support — Execute Python, JavaScript, or bash code remotely via API, CLI, or SDK with instant results.
  • agent-browser Integration — Pre-installed CLI with 40+ commands for AI agents to write simple bash commands instead of complex Playwright code.
  • Live View & CDP Access — Watch sessions in real time via embeddable stream URL or connect own Playwright instance over WebSocket.
  • Session Management — Configurable TTL controls, parallel sessions (up to 20 concurrent), and automatic cleanup. 2 credits per browser minute with 5 minutes free.
Feb 6, 2026

Significantly improved logo extraction accuracy for Branding Format v2, the endpoint for extracting brand identities from websites.

Key Features:

  • Significantly improved logo detection — More reliable logo extraction with fewer false positives and better handling of edge cases like logos embedded in background images.
  • Works with modern site builders — Branding Format now properly detects logos built with Wix, Framer, and other drag-and-drop platforms generating complex or non-semantic HTML.
  • Built for AI agents and developers — Captures colors, typography, spacing, and UI components in structured format to power AI agents and apps.
Feb 3, 2026

Firecrawl v2.8.0 is here!

Firecrawl v2.8.0 brings major improvements to agent workflows, developer tooling, and self-hosted deployments across the API and SDKs, including our new Skill.

  • Parallel Agents for running thousands of /agent queries simultaneously, powered by our new Spark 1 Fast model.
  • Firecrawl CLI with full support for scrape, search, crawl, and map commands.
  • Firecrawl Skill for enabling AI agents (Claude Code, Codex, OpenCode) to use Firecrawl autonomously.
  • Three new models powering /agent: Spark 1 Fast for instant retrieval (currently only available in Playground), Spark 1 Mini for complex research queries, and Spark 1 Pro for advanced extraction tasks.
  • Agent enhancements including webhooks, model selection, and new MCP Server tools.
  • Platform-wide performance improvements including faster search execution and optimized Redis calls.
  • SDK improvements including Zod v4 compatibility.

And much more, check it out below!

New Features

  • Parallel Agents
    Execute thousands of /agent queries in parallel with automatic failure handling and intelligent waterfall execution. Powered by Spark 1-Fast for instant retrieval, automatically upgrading to Spark 1 Mini for complex queries requiring full research.

  • Firecrawl CLI
    New command-line interface for Firecrawl with full support for scrape, search, crawl, and map commands. Install with npm install -g firecrawl-cli.

  • Firecrawl Skill
    Enables agents like Claude Cursor, Codex, and OpenCode to use Firecrawl for web scraping and data extraction, installable via npx skills add firecrawl/cli.

  • Spark Model Family
    Three new models powering /agent: Spark 1 Fast for instant retrieval (currently available in Playground), Spark 1 Mini (default) for everyday extraction tasks at 60% lower cost, and Spark 1 Pro for complex multi-domain research requiring maximum accuracy. Spark 1 Pro achieves ~50% recall while Mini delivers ~40% recall, both significantly outperforming tools costing 4-7x more per task.

  • Firecrawl MCP Server Agent Tools
    New firecrawl_agent and firecrawl_agent_status tools for autonomous web data gathering via MCP-enabled agents.

  • Agent Webhooks
    Agent endpoint now supports webhooks for real-time notifications on job completion and progress.

  • Agent Model Selection
    Agent endpoint now accepts a model parameter and includes model info in status responses.

  • Multi-Arch Docker Images
    Self-hosted deployments now support linux/arm64 architecture in addition to amd64.

  • Sitemap-Only Crawl Mode
    New crawl option to exclusively use sitemap URLs without following links.

  • ignoreCache Map Parameter
    New option to bypass cached results when mapping URLs.

  • Custom Headers for /map
    Map endpoint now supports custom request headers.

  • Background Image Extraction
    Scraper now extracts background images from CSS styles.

  • Improved Error Messages
    All user-facing error messages now include detailed explanations to help diagnose issues.


API Improvements

  • Search without concurrency limits — scrapes in search now execute directly without queue overhead.
  • Return 400 for unsupported actions with clear errors when requested actions aren't supported by available engines.
  • Job ID now included in search metadata for easier tracking.
  • Metadata responses now include detected timezone.
  • Backfill metadata title from og:title or twitter:title when missing.
  • Preserve gid parameter when rewriting Google Sheets URLs.
  • Fixed v2 path in batch scrape status pagination.
  • Validate team ownership when appending to existing crawls.
  • Screenshots with custom viewport or quality settings now bypass cache.
  • Optimized Redis calls across endpoints.
  • Reduced excessive robots.txt fetching and parsing.
  • Minimum request timeout parameter now configurable.

SDK Improvements

JavaScript SDK

  • Zod v4 Compatibility — schema conversion now works with Zod v4 with improved error detection.
  • Watcher ExportsWatcher and WatcherOptions now exported from the SDK entrypoint.
  • Agent Webhook Support — new webhook options for agent calls.
  • Error Retry Polling — SDK retries polling after transient errors.
  • Job ID in Exceptions — error exceptions now include jobId for debugging.

Python SDK

  • Manual pagination helpers for iterating through results.
  • Agent webhook support added to agent client.
  • Agent endpoint now accepts model selection parameter.
  • Metadata now includes concurrency limit information.
  • Fixed max_pages handling in crawl requests.

Dashboard Improvements

  • Dark mode is now supported.
  • On the usage page, you can now view credit usage broken down by day.
  • On the activity logs page, you can now filter by the API key that was used.
  • The "images" output format is now supported in the Playground.
  • All admins can now manage their team's subscriptions.

Quality & Performance

  • Skip markdown conversion checks for large HTML documents.
  • Export Google Docs as HTML instead of PDF for improved performance.
  • Improved branding format with better logo detection and error messages for PDFs and documents.
  • Improved lopdf metadata loading performance.
  • Updated html-to-markdown module with multiple bug fixes.
  • Increased markdown service body limit and added request ID logging.
  • Better Sentry filtering for cancelled jobs and engine errors.
  • Fixed extract race conditions and RabbitMQ poison pill handling.
  • Centralized Firecrawl configuration across the codebase.
  • Multiple security vulnerability fixes, including CVE-2025-59466 and lodash prototype pollution.

Self-Hosted Improvements

  • CLI custom API URL support via firecrawl --api-url http://localhost:3002 for local instances.
  • ARM64 Docker support via multi-arch images for Apple Silicon and ARM servers.
  • Fixed docker-compose database credentials out of the box.
  • Fixed Playwright service startup caused by Chromium path issues.
  • Updated Node.js to major version 22 instead of a pinned minor.
  • Added RabbitMQ health check endpoint.
  • Fixed PostgreSQL port exposure in docker-compose.

New Contributors

  • @gemyago
  • @loganaden
  • @pcgeek86
  • @dmlarionov

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.7.0...v2.8.0

What's Changed

v2.8.0 is live

Firecrawl v2.8.0 brings major improvements to agent workflows, developer tooling, and self-hosted deployments across the API and SDKs.

Key Features:

  • Parallel Agents — Execute thousands of /agent queries simultaneously with automatic failure handling and intelligent waterfall execution. Powered by Spark 1 Fast for instant retrieval, automatically upgrading to Spark 1 Mini for complex queries.
  • Firecrawl Skill — Enables agents to use Firecrawl for web scraping and data extraction.
  • Firecrawl CLI — Command-line interface with full scrape, search, crawl, and map support.
  • Spark Model Family — Three new models: Spark 1 Fast for instant retrieval, Spark 1 Mini for complex research queries, and Spark 1 Pro for advanced extraction tasks.
  • Agent Enhancements — Webhook support, model selection, and new MCP Server tools for autonomous web data gathering.
Jan 30, 2026

Bringing parallel processing to /agent, letting you batch hundreds or thousands of queries simultaneously. What took hours of sequential queries now completes in minutes with automatic failure handling and parallel execution.

Key Features:

  • Parallel Batch Processing — Run thousands of /agent queries simultaneously to enrich companies, research competitors, or build datasets at scale.
  • Intelligent Waterfall — Tries instant retrieval first, then automatically upgrades specific cells to full agent research (Spark One Mini) only when needed.
  • Real-Time Spreadsheet Interface — Work in familiar CSV format with instant visual feedback as cells populate in real-time.
  • Zero Configuration — Input data schema, write one prompt, hit run without workflow building.
  • Predictable Pricing — 10 credits per cell with Spark-1 Fast.
Jan 27, 2026

Introducing the Firecrawl Skill and CLI, a new way for AI agents to reliably access real-time web data. With a single install, agents like Claude Code, Antigravity, and OpenCode can access Firecrawl endpoints including scrape, search, crawl, and map.

Key Features:

  • One-Command Install — Install the skill with a single command to teach agents how to authenticate and use all of Firecrawl's endpoints.
  • Real-Time Web Data at Runtime — Agents can pull fresh, full-page content from docs, product pages, pricing, and articles exactly when needed.
  • Context-Efficient for Agents — Uses a file-based approach for context management and bash methods for efficient search and retrieval.
  • Works Across Complex & Dynamic Sites — Powered by Firecrawl's custom browser stack for reliable extraction from large, JavaScript-heavy sites.
  • Proven, Best-in-Class Coverage — Backed by benchmark results showing >80% coverage across real-world evaluations.
Dec 5, 2025

Firecrawl v2.7.0 is here!

  • ZDR Search support for enterprise customers.
  • Improved Branding Format with better detection.
  • Partner Integrations API now in closed beta.
  • Faster and more accurate screenshots.
  • Self-hosted improvements

And a lot more enhacements, check it out below!

New Features

  • Improved Branding Extract
    Better logo and color detection for more accurate brand extraction results.

  • NOQ Scrape System (Experimental)
    New scrape pipeline with improved stability and integrated concurrency checks.

  • Enhanced Redirect Handling
    URLs now resolve before mapping, with safer redirect-chain detection and new abort timeouts.

  • Enterprise Search Parameters
    New enterprise-level options available for the /search endpoint.

  • Integration-Based User Creation
    Users can now be automatically created when coming from referring integrations.

  • minAge Scrape Parameter
    Allows requiring a minimum cached age before re-scraping.

  • Extract Billing Credits
    Extract jobs now use the same credit billing system as other endpoints.

  • Self-Host: Configurable Crawl Concurrency
    Self-hosted deployments can now set custom concurrency limits.

  • Sentry Enhancements
    Added Vercel AI integration, configurable sampling rates, and improved exception filtering.

  • UUIDv7 IDs
    All new resources use lexicographically sortable UUIDv7.

API Improvements

  • DNS Resolution Errors Now Return 200 for more consistent failure handling.
  • Improved URL Mapping Logic including sitemap maxAge fixes, recursive sitemap support, Vue/Angular router normalization, and skipping subdomain logic for IP addresses.
  • Partial Results for Multi-Source Search instead of failing all sources.
  • Concurrency Metadata Added to scrape job responses.
  • Enhanced Metrics including total wait time, LLM usage, and format details.
  • Batch Scrape Upgrades
    • Added missing /v2/batch/scrape/:jobId/errors endpoint
    • Fixed pagination off-by-one bug
  • More Robust Error Handling for PDF/document engines, pydantic parsing, Zod validation, URL validation, and billing edge cases.

SDK Improvements

JavaScript SDK

  • Returns job ID from synchronous methods.
  • Improved WebSocket document event handling.
  • Fixed types, Deno WS, and added support for ignoreQueryParameter.
  • Version bump with internal cleanup.

Python SDK

  • Added extra metadata fields.
  • Improved batch validation handling.

Quality & Performance

  • Reduced log file size and improved tmp file cleanup.
  • Updated Express version and patched vulnerable packages.
  • Disabled markdown conversion for sitemap scrapes for improved performance.
  • Better precrawl logging and formatting.
  • Skip URL rewriting for published Google Docs.
  • Prevent empty cookie headers during webhook callbacks.

Self-Hosted Improvements

  • Disabled concurrency limit enforcement for self-hosted mode.
  • PostgreSQL credentials now configurable via environment variables.
  • Docker-compose build instructions fixed.

👥 New Contributors

  • @omahs
  • @davidkhala
  • @DraPraks
  • @devhims

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.6.0...v2.7.0

What's Changed

New Contributors

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.6.0...v2.7.0

v2.7.0 is here!

Major release with enterprise features and platform improvements.

Key Features:

  • ZDR Search Support — Enterprise customers can now search with Zero Data Retention enabled end-to-end.
  • Partner Integrations API — Available in closed beta for native integrations in partner products.
  • Improved Branding Format — Better detection and support across all platforms.
  • Faster Screenshots — Enhanced viewport and full page screenshots with improved speed and accuracy.
  • Self-hosted Improvements — Significant enhancements for deployments and infrastructure.
  • Performance Enhancements — Platform-wide improvements for better user experience.
Nov 14, 2025

Highlights

  • Unified Billing Model - Credits and tokens merged into single system. Extract now uses credits (15 tokens = 1 credit), existing tokens work everywhere.
  • Full Release of Branding Format - Full support across Playground, MCP, JS and Python SDKs.
  • Change Tracking - Faster and more reliable detection of web page content updates.
  • Reliability and Speed Improvements - All endpoints significantly faster with improved reliability.
  • Instant Credit Purchases - Buy credit packs directly from dashboard without waiting for auto-recharge.
  • Improved Markdown Parsing - Enhanced markdown conversion and main content extraction accuracy.
  • Core Stability Fixes - Fixed change-tracking issues, PDF timeouts, and improved error handling.

What's Changed

New Contributors

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.5.0...v2.6.0

v2.6.0 available now

Major release with unified billing, enhanced features, and significant reliability improvements.

Key Features:

  • Unified Billing Model — Credits and tokens merged into single system. Extract now uses credits (15 tokens = 1 credit), existing tokens work everywhere.
  • Enhanced Branding Format — Full support across Playground, MCP, JS and Python SDKs.
  • Reliability and Speed Improvements — All endpoints significantly faster with improved reliability.
  • Instant Credit Purchases — Buy credit packs directly from dashboard without waiting for auto-recharge.
  • Improved Markdown Parsing — Enhanced markdown conversion and main content extraction accuracy.
  • Change Tracking — Faster and more reliable detection of web page content updates.
  • Core Stability Fixes — Fixed core stability issues, PDF timeouts, and improved error handling.
Oct 30, 2025
v2.5.0 - The World's Best Web Data API

We now have the highest quality and most comprehensive web data API available powered by our new semantic index and custom browser stack.

See the benchmarks below:

<img width="1200" height="675" alt="image" src="https://github.com/user-attachments/assets/96a2ba36-0c7f-4fa3-829e-d6ac91b53705" />

New Features

  • Implemented scraping for .xlsx (Excel) files.
  • Introduced new crawl architecture and NUQ concurrency tracking system.
  • Per-owner/group concurrency limiting + dynamic concurrency calculation.
  • Added group backlog handling and improved group operations.
  • Added /search pricing update
  • Added team flag to skip country check.
  • Always populate NUQ metrics for improved observability.
  • New test-site app for improved CI testing.
  • Extract metadata from document head for richer output.

Enhancements & Improvements

  • Improved blocklist loading and unsupported site error messages.
  • Updated x402-express version.
  • Improved includePaths handling for subdomains.
  • Updated self-hosted search to use DuckDuckGo.
  • JS & Python SDKs no longer require API key for self-hosted deployments.
  • Python SDK timeout handling improvements.
  • Rust client now uses tracing instead of print.
  • Reduced noise in auto-recharge Slack notifications.

Fixes

  • Ensured crawl robots.txt warnings surface reliably.
  • Resolved concurrency deadlocks and duplicate job handling.
  • Fixed search country defaults and pricing logic bugs.
  • Fixed port conflicts in harness environments.
  • Fixed viewport dimension support and screenshot behavior in Playwright.
  • Resolved CI test flakiness (playwright cache, prod tests).

👋 New Contributors

  • @delong3
  • @c4nc
  • @codetheweb

Full diff: https://github.com/firecrawl/firecrawl/compare/v2.4.0...v2.5.0

What's Changed

New Contributors

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.4.0...v2.5.0

Oct 25, 2025
v2.5.0 - The World's Best Web Data API

Major release delivering the highest quality and most comprehensive web data API with two major infrastructure improvements: a new Semantic Index and a completely custom browser stack.

Key Features:

  • Semantic Index — New infrastructure improvement for better understanding and extraction of web content.
  • Custom Browser Stack — Completely redesigned browser infrastructure for improved reliability and performance.
  • Benchmark Results — Represents a significant leap forward in web data extraction quality and comprehensiveness.
  • Open-Sourced Benchmarks — Released scrape-evals, a reproducible framework for testing web scraping engines on 1,000 real URLs.
Oct 13, 2025

New Features

  • New PDF Search Category - You can now search for only pdfs via our v2/search endpoints by specifying .pdf category
  • Gemini 2.5 Flash CLI Image Editor — Create and edit images directly in the CLI using Firecrawl + Gemini 2.5 Flash integration (#2172)
  • x402 Search Endpoint (/v2/x402) — Added a next-gen search API with improved accuracy and speed (#2218)
  • RabbitMQ Event System — Firecrawl jobs now support event-based communication and prefetching from Postgres (#2230, #2233)
  • Improved Crawl Status API — More accurate and real-time crawl status reporting using the new crawl_status_2 RPC (#2239)
  • Low-Results & Robots.txt Warnings — Users now receive clear feedback when crawls are limited by robots.txt or yield few results (#2248)
  • Enhanced Tracing (OpenTelemetry) — Much-improved distributed tracing for better observability across services (#2219)
  • Metrics & Analytics — Added request-level metrics for both Scrape and Search endpoints (#2216)
  • Self-Hosted Webhook Support — Webhooks can now be delivered to private IP addresses for self-hosted environments (#2232)

Improvements

  • Reduced Docker Image Size — Playwright service image size reduced by 1 GB by only installing Chromium (#2210)
  • Python SDK Enhancements — Added "cancelled" job status handling and poll interval fixes (#2240, #2265)
  • Faster Node SDK Timeouts — Axios timeouts now propagate correctly, improving reliability under heavy loads (#2235)
  • Improved Crawl Parameter Previews — Enhanced prompts and validation for crawl parameter previews (#2220)
  • Zod Schema Validation — Stricter API parameter validation with rejection of extra fields (#2058)
  • Better Redis Job Handling — Fixed edge cases in getDoneJobsOrderedUntil for more stable Redis retrieval (#2258)
  • Markdown & YouTube Fixes — Fixed YouTube cache and empty markdown summary bugs (#2226, #2261)
  • Updated Docs & Metadata — README updates and new metadata fields added to the JS SDK (#2250, #2254)
  • Improved API Port Configuration — The API now respects environment-defined ports (#2209)

Fixes

  • Fixed recursive $ref schema validation edge cases (#2238)
  • Fixed enum arrays being incorrectly converted to objects (#2224)
  • Fixed harness timeouts and self-hosted docker-compose.yaml issues (#2242, #2252)

New Contributors

🔗 Full Changelog: v2.3.0 → v2.4.0

What's Changed

New Contributors

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.3.0...v2.4.0

Firecrawl v2.5

Major release featuring open-source scrape-evals benchmark testing 13 web scraping engines on 1,000 URLs, improved full-page extraction with enhanced browser stack, semantic index for faster retrieval of fresh or previously indexed data, 5x cheaper search with auto-recharge credit packs, smarter concurrency and crawl architecture for improved throughput and reliability, and Excel (.xlsx) scraping support for spreadsheets and CSV files.

Sep 19, 2025

New Features

  • YouTube Support: You can now get YouTube transcripts
  • Enterprise Auto-Recharge: Added enterprise support for auto-recharge
  • odt and .rtf: Now support odt and rtf file parsing
  • Docx Parsing: 50x faster docx parsing
  • K8s Deployment: Added NuQ worker deployment example
  • Self Host: Tons of improvements for our self host users

Improvements & Fixes

  • Stability: Fixed timeout race condition, infinite scrape loop, and location query bug
  • Tooling: Replaced ts-prune with knip, updated pnpm with minimumReleaseAge
  • Docs: Added Rust to CONTRIBUTING and fixed typos
  • Security: Fixed pkgvuln issue

What's Changed

New Contributors

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.2.0...v2.3.0

Last Checked
17m ago
Domain
firecrawl.dev
Accounts
mendableai
Tracking since Aug 29, 2024