releases.shpreview
Home/Firecrawl
Firecrawl

Firecrawl

Enter a URL, describe what you want to track, and /monitor notifies your AI agent via webhook the moment pages or sites change. Use up to 90% fewer LLM tokens by only ingesting what actually changes.

Key Features:

  • Set a goal in plain English — Describe what to watch and /monitor configures the URLs, schema, and schedule for you
  • Up to 90% fewer LLM tokens — Your agent only ingests what changes on a page
  • Any cadence, with cost upfront — 5 minutes, hourly, daily, or custom cron schedule with estimated monthly cost shown upfront
  • Webhook or email delivery — Signed webhooks with custom headers or email with diff in body
  • Permalinks for every change — Diffs are first-class objects you can share or hand to another agent

Improvements

  • /parse endpoint — Upload local files (PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, HTML) up to 50 MB and get back clean, LLM-ready Markdown, JSON, or a summary. Tables and reading order are preserved, with full Zero Data Retention support for enterprise plans. Available in JS, Python, Go, Rust, Java, .NET, PHP, Ruby, and Elixir SDKs.
  • Lockdown Mode — Set lockdown: true on /scrape to serve results exclusively from Firecrawl's index with zero outbound requests and zero data retention by default. Gated outbound paths include HTTP fetches, robots.txt, audio downloads, and media. Available in every SDK, the CLI (--lockdown), and MCP.
  • question format — Pass a natural-language prompt to /scrape and get a grounded, hallucination-free answer back in data.question. Runs on a managed model chain with automatic fallback, prompt-injection isolation via XML tagging and zero-width-space escaping, and up to 100x fewer tokens per call.
  • highlights format — Returns the exact sentences, code blocks, and table rows on a page that match your query. Consecutive sentences re-join into paragraphs, code lines wrap in fenced blocks with their original language, and table rows rebuild into Markdown tables with headers — all from the source page, using up to 100x fewer tokens per call.
  • video format — Added video to scrape formats. Returns a signed downloadable video URL for supported sites (e.g. YouTube), with cookie forwarding for authenticated downloads and explicit Lockdown gating.
  • /search domain filters — Added includeDomains and excludeDomains parameters to /search for scoping results to a specific set of sites.
  • /search feedback endpoint — Submit a rating on a search result with POST /v2/search/:jobId/feedback. Each accepted submission refunds 1 credit, capped per UTC day, with idempotent retries.
  • Custom robots.txt user agent — Added robotsUserAgent to crawl requests to evaluate robots.txt rules and crawl delays against a custom agent string, and a separate customRobotsAgent org flag independent from ignoreRobots. Available in JS, Python, and Java SDKs.
  • Official Go SDK — Added a first-party Go SDK for the v2 API, replacing the community module. Includes context-aware retry backoff and proper MapData.Links typing.
  • Ruby SDK — Added the official Firecrawl Ruby SDK v2 with full endpoint coverage and v2-native typing.
  • PHP SDK — Added the official PHP SDK with Laravel support, scrape/search/crawl/map/parse coverage, and a published firecrawl/firecrawl-sdk Composer package.
  • .NET SDK — Added the official .NET SDK with v2 API support, parse, and an firecrawl-sdk NuGet package.
  • Rust SDK v2 — The Rust SDK has been promoted to the official v2 SDK with parity across scrape, search, crawl, map, agent, and parse.
  • /interact suggestion — Calls to /scrape that pass an actions array now return a warning suggesting /interact for stateful browser automation.
  • PDF size cap — Raised the PDF upload size limit from 10 MB to 30 MB.
  • PDF page-processed billing — Updated PDF billing to reflect pages processed instead of raw page count.
  • Docker harness — Exposed HARNESS_STARTUP_TIMEOUT_MS through docker-compose for self-hosted users who need longer startup windows.
  • Elixir SDK — Added parse_file/3 to the Elixir SDK for the /parse endpoint.
  • JS SDK request timeout — Added an explicit request timeout option to the JS SDK to prevent hanging requests.

Fixes

  • Resolved multiple CVEs across the API and SDKs including axios, postcss, fast-xml-parser, protobufjs, follow-redirects, langsmith, lodash, fast-uri, and fast-xml-builder.
  • Fixed branding colors.secondary being incorrectly populated when the LLM omitted a value — secondary is now optional and is no longer applied as a default.
  • Fixed the Playwright service ignoring the caller's User-Agent request header.
  • Fixed screenshot signed URLs returning stale results from cache by forcing a cache miss when the signed URL has expired.
  • Fixed Lockdown requests being billed twice for ZDR by treating Lockdown as zero data retention by default.
  • Fixed proxy billing for cached scrapes incorrectly charging proxy credits when no proxy egress occurred.
  • Fixed YouTube transcript scripts running on audio-only scrapes and audio downloads not receiving CDP cookies.
  • Fixed html-to-md conversion service ignoring zero data retention.
  • Fixed a stack overflow in marked.parse when handling certain PDF outputs.
  • Fixed robotsUserAgent not being honored by the native link filter and not being included in JS SDK crawl payloads.
  • Fixed /v1 status endpoints returning 500 on non-UUID job IDs — now returns a proper 400.
  • Fixed empty actions: [] arrays being treated as actions in feature flags.
  • Fixed JS SDK watcher emitting duplicate events, leaking timeouts, and hanging start() on watcher timeouts.
  • Fixed Ruby SDK unwrapping of credit_usage data fields and defaulted skipTlsVerification to false.
  • Fixed missing negative-limit validation in Python, Java, and Go SDKs.
  • Fixed Java SDK accepting empty API keys and missing async lifecycle methods.
  • Fixed billing period timestamps, subscription lookups, and plan credit reporting.
  • Fixed crawl-backlog timeouts being unbounded — now capped at 48h.

API

  • Added POST /v2/parse for multipart file uploads up to 50 MB. Returns a standard Document. Disallowed scrape options on parse: changeTracking, screenshot, branding, actions, waitFor, location, mobile; proxy is restricted to auto or basic. Errors with PARSE_UNSUPPORTED_OPTIONS on disallowed input.
  • Added lockdown: boolean to /scrape. Cache misses return 404 with SCRAPE_LOCKDOWN_CACHE_MISS. Billing: +4 credits when lockdown is enabled, 1 credit on cache miss. Available across all SDKs.
  • Added question and highlights to /scrape formats, returning data.question and data.highlights respectively.
  • Added video to /scrape formats. Returns document.video as a signed URL. +4 credits per request. Unsupported URLs raise SCRAPE_VIDEO_UNSUPPORTED_URL; parse rejects the video format client- and server-side.
  • Added includeDomains and excludeDomains arrays on /v2/search for scoping results to specific domains.
  • Added POST /v2/search/:jobId/feedback for rating search results. Each accepted submission refunds 1 credit, capped per UTC day via SEARCH_FEEDBACK_DAILY_CAP_CREDITS, with idempotent retries returning alreadySubmitted: true. Feedback submissions older than SEARCH_FEEDBACK_MAX_AGE_SEC (default 120s) are rejected. Search billing is now ceil(results/10) * 2 credits, surfaced in responses.
  • Added robotsUserAgent to /v2/crawl crawlerOptions for custom-agent robots.txt evaluation. Gated behind the ignoreRobots org flag.
  • Added a separate customRobotsAgent org flag independent from ignoreRobots, so teams can ship custom user-agents without disabling robots.txt enforcement.
  • Migrated the ignoreRobots org flag from a boolean to a disabled / allowed / forced pattern. The legacy ignoreRobots: boolean request shape has been removed — clients must use the new flag values.
  • Deprecated /v0/scrape, /v0/crawl, /v0/crawl/status/:jobId, DELETE /v0/crawl/cancel/:jobId, /v0/search, /v1/extract, /v1/extract/:jobId, /v2/extract, /v2/extract/:jobId, /v1/deep-research, /v1/deep-research/:jobId, /v1/llmstxt, and /v1/llmstxt/:jobId. Deprecated endpoints emit Deprecation: true, Warning: 299 - "<message>", Link; rel="successor-version", and (when configured) Sunset headers, plus warnings[] and replacement in the JSON body. JS and Python SDKs surface these to clients.

Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.9.0...v2.10

Firecrawl v2.10 ships a new /parse endpoint, Lockdown Mode, Question and Highlights formats, and four new official SDKs (Go, Ruby, PHP, .NET) plus reliability and security fixes.

Key Features:

  • /parse endpoint — Upload PDFs, Word docs, and spreadsheets up to 50 MB and get clean, LLM-ready Markdown, JSON, or summaries back. Powered by a new Rust-based engine that's up to 5x faster
  • Lockdown Mode — Set lockdown: true on /scrape to serve results exclusively from Firecrawl's index with no outbound requests and zero data retention by default
  • Question Format — Pass a natural-language prompt to /scrape and get a grounded answer back, with up to 100x fewer tokens per call
  • Highlights Format — Get back the exact sentences, code blocks, and table rows on a page that match your query, with original formatting preserved
  • Four New Official SDKs — Go, Ruby, PHP (with Laravel support), and .NET all joined the SDK family with v2 parity. The Rust SDK has been promoted to the official v2 SDK

Firecrawl v2.10 ships a new /parse endpoint, Lockdown Mode, Question and Highlights formats, and four new official SDKs (Go, Ruby, PHP, .NET) - plus a long list of reliability and security fixes.

Highlights
  • /parse endpoint — Upload PDFs, Word docs, and spreadsheets up to 50 MB and get clean, LLM-ready Markdown, JSON, or summaries back. Powered by a new Rust-based engine that's up to 5x faster.
  • Lockdown Mode — Set lockdown: true on /scrape to serve results exclusively from Firecrawl's index with no outbound requests and zero data retention by default. Available everywhere, including the CLI (--lockdown) and MCP.
  • Question Format — Pass a natural-language prompt to /scrape and get a grounded answer back, with up to 100x fewer tokens per call.
  • Highlights Format — Get back the exact sentences, code blocks, and table rows on a page that match your query, with original formatting preserved — also using up to 100x fewer tokens per call.
  • Four New Official SDKs — Go, Ruby, PHP (with Laravel support), and .NET all joined the SDK family with v2 parity. The Rust SDK has been promoted to the official v2 SDK.

Highlights is a new format for /scrape that returns the exact sentences, code blocks, and table rows on a page that match your query, all while using up to 100x fewer tokens.

Highlights
  • Citable, hallucination-free output — Nothing in the response is rewritten, translated, or hallucinated. Every sentence is provably from the source page, in the page's own words.
  • Code blocks and tables preserved — Consecutive sentences from the same block re-join into paragraphs, consecutive code lines wrap in fenced blocks with their original language, and table rows rebuild into Markdown tables with headers auto-included.
  • Up to 100x fewer tokens per call — Returning just the matching lines instead of the full page lowers inference costs, speeds up responses, and keeps your context window lean.

Question is a format for /scrape that returns high-quality, grounded answers from any web page using up to 100x fewer tokens.

Highlights
  • High-quality, grounded answersquestion pulls the page content most relevant to your prompt and answers strictly from it, with zero hallucinations.
  • Up to 100x fewer tokens per callquestion returns just the answer, not the page, giving you significantly lower inference costs, faster responses, and a leaner agent context window on every request.
  • Built for AI agents — Skip the scrape-parse-prompt pipeline. Drop precise, page-grounded answers straight into agent loops with a single call.
  • Fully managed LLM stackquestion runs on a managed model chain with automatic fallback and a production-tuned system prompt. Token usage and cost roll into the same billing surface as /scrape.
  • Hardened against prompt injection — Page content is isolated with XML tagging and zero-width-space escaping, and the model is instructed to ignore any instructions embedded in the page.

Lockdown Mode is a cache-only option for /scrape that keeps security-sensitive requests inside Firecrawl. Set lockdown: true to serve results exclusively from Firecrawl's index, with zero data retention by default.

Highlights
  • No outbound request - Lockdown serves results from Firecrawl's index only and gates every outbound path, including HTTP and robots.txt.
  • Zero data retention by default - URLs aren't persisted, response data isn't stored, and the scrape job is cleaned up after delivery.
  • One flag, every surface - lockdown: true works the same across the API, every SDK (Python, Node, Go, Rust, Java, .NET, Ruby, PHP, Elixir), the CLI (--lockdown), and MCP.

The /parse endpoint turns documents into clean, structured data for AI agents and RAG pipelines. Powered by a new Rust-based engine that's up to 5x faster, it works across PDFs, Word docs, spreadsheets, and more.

Highlights
  • Clean, LLM-ready output — Get back Markdown, JSON, or a summary, with tables and reading order preserved. No post-processing required.
  • Rust-based engine — A high-performance Rust core delivers up to 5x faster parsing, cutting latency in document ingestion and embedding workflows.
  • Zero Data Retention support — Enterprise plans with ZDR enabled ensure parsed output is never stored, so data from contracts, medical records, and internal reports stays secure.
  • Upload files up to 50 MB — Supports PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, and HTML.

Firecrawl web-agent is an open framework for building AI agents that search, scrape, and interact with the web. Powered by the same architecture behind our /agent endpoint.

Highlights
  • Bring any model — Anthropic, OpenAI, Google, or your own. You control the logic, tools, and infra.
  • One command, full stack$ firecrawl create agent gives you /scrape, /search, and /interact in a plan-act loop, parallel sub-agents for concurrent research, and your choice of Streaming UI, API server, or library templates.
  • Teachable by design — Add Skill playbooks and your agent learns reusable routines. Paginate e-commerce sites, run multi-source research, and extract structured data your way.

Fire-PDF is a Rust-based parsing engine that converts any PDF - scanned, text-based, or mixed - into structured markdown, up to 5x faster.

Highlights
  • 5x Faster — Our open-source Rust library pdf-inspector classifies each page in milliseconds and picks the fastest extraction path. Pages are processed in under 400ms on average.
  • Layout-Aware Accuracy — A neural document layout model detects tables, formulas, text blocks, and headers individually. Tables get full markdown output, formulas are preserved in LaTeX, and reading order is predicted neurally.
  • Zero Configuration — Every PDF sent through Firecrawl's API now goes through Fire-PDF automatically.

Firecrawl v2.9.0

Improvements

  • Browser Interaction via /interact endpoint — Scrape a page, then call /interact to take actions on it — click buttons, fill forms, navigate deeper, or extract dynamic content. Describe what you want in natural language via prompt, or write Playwright code (Node.js, Python) and Bash (agent-browser) for full control. Sessions persist across calls, with live view and interactive live view URLs for real-time browser streaming. Persistent profiles let you save and reuse browser state (cookies, localStorage) across scrapes. Available in JS, Python, Java, and Rust SDKs.
  • query format — Added query format to the /scrape endpoint — pass a natural-language prompt and get a direct answer back in data.answer.
  • audio format — Added audio format option to scrape responses, returning audio output as a field on the document.
  • onlyCleanContent parameter — Added onlyCleanContent parameter to the /scrape endpoint, which strips navigation, ads, cookie banners, and other non-semantic content from markdown output.
  • PDF parsing modes — Added PDF parsing modes (fast, auto, ocr) and a maxPages option to control extraction depth and OCR behavior.
  • Java and Elixir SDKs — Added official Java and Elixir SDKs with full v2 API support.
  • Legacy .doc file support — Added support for parsing legacy .doc files.
  • Wikimedia engine — Added a dedicated engine for scraping Wikipedia and Wikimedia pages with improved output quality.
  • contentType in scrape responses — Added contentType to scrape responses for PDFs and documents.
  • PDF pipeline improvements — Improved PDF pipeline with better table detection, header/footer stripping, mixed PDF handling, inline image parsing, and magic byte detection.
  • Branding extraction — Improved branding extraction to skip hidden DOM elements for cleaner output.
  • HTML-to-markdown performance — Improved HTML-to-markdown conversion performance and fixed code blocks losing content during conversion.
  • Concurrency queue — New concurrency queue system with reconciler and backfill for more reliable job scheduling.
  • Rust SDK v2 — Added v2 API namespace with agent support to the Rust SDK.
  • Fixed Python SDK parameters timeout, max_retries, and backoff_factor — these were previously accepted but silently ignored.
  • Capped job timeouts at 48 hours to prevent runaway jobs from consuming resources.
  • Added retry limits to prevent scrape loops.
  • Binary content types are now rejected early in the scrape pipeline to avoid wasted processing.

Fixes

  • Fixed empty responses when using the o3-mini model on extract jobs.
  • Fixed revoked API keys remaining valid for up to 10 minutes after deletion.
  • Fixed a race condition in extract jobs that caused "Job not found" crashes.
  • Fixed time_taken in /v1/map always returning ~0.
  • Fixed crawl status responses now surfacing a failed status with an error message and partial data when a crawl-level failure occurs.
  • Fixed maxPages not being passed to the PDF extractor — previously, full PDF content was returned while only charging for the limited page count.
  • Fixed free request credits being incorrectly consumed and billed on agent jobs exceeding the maxCredits threshold.
  • Fixed dashboard displaying incorrect concurrency limits due to stale reads.
  • Fixed branding colors.secondary not being populated.
  • Fixed removeBase64Images running after deriveDiff in the transformer pipeline, causing diff issues.
  • Fixed GCS fetch using wrong row index for cache info lookups.
  • Fixed unhandled ZodError in /v1/search controller.
  • Resolved multiple CVEs across dependencies including handlebars, path-to-regexp, fast-xml-parser, rollup (CVE-2026-27606), undici, and others.
  • Hardened the Playwright service against SSRF attacks.

API

  • Added GET /v2/team/activity endpoint for listing recent scrape, crawl, and extract jobs with cursor-based pagination (last 24 hours, up to 100 results per page, filterable by endpoint type).
  • Added regexOnFullURL parameter on crawl requests to apply includePaths/excludePaths filtering against the full URL including query parameters. Available in JS, Python, Java, and Elixir SDKs.
  • Added deduplicateSimilarURLs parameter on crawl requests. Available in JS, Python, Java, and Elixir SDKs.
  • Deprecated the extract endpoint — use the /agent endpoint instead. Existing extract methods in JS and Python SDKs are marked deprecated.
  • Renamed persistentSession to profile on browser/interact requests (writeMode is now saveChanges). The old parameter name remains functional but is no longer documented.

New Contributors

Contributors


Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.8.0...v2.9.0

Firecrawl v2.9.0 includes browser interaction via /interact, new scrape formats, smarter PDF handling, two new SDKs, and reliability fixes.

Key Features:

  • Browser Interaction via /interact — Scrape a page, then call /interact to click buttons, fill forms, navigate, or extract dynamic content using natural language or Playwright/Bash code. Sessions persist across calls with live view URLs and reusable browser profiles.
  • Query Format — Pass natural-language prompts to /scrape and get direct answers in data.answer.
  • Audio Format — Request audio output from any scrape as a field on the document.
  • onlyCleanContent Parameter — Strip navigation, ads, and non-semantic content from markdown output.
  • PDF Parsing Modes — Choose fast, auto, or ocr parsing with maxPages option for fine-grained extraction control.
  • Java & Elixir SDKs — Official SDKs with full v2 API support, joining JS, Python, Go, and Rust.

Introduce the new /interact endpoint that turns any scrape into a live browser session where agents can click, type, and navigate using natural language.

Key Features:

  • Natural Language Control — Describe what you want in plain English; the agent clicks, types, scrolls, and extracts data automatically without selectors or scripts.
  • Live Browser Sessions — Every session includes a live URL you can embed, share, or interact with in real time for debugging and demos.
  • Persistent Profiles — Log in once and pick up where you left off with cookies and localStorage carrying across scrapes with named profiles.
  • Full Playwright Control — Switch to code mode and run Playwright (Node.js or Python) or Bash for precision control.
  • Session Reuse — Chain multiple interact calls on the same scrape with the browser maintaining state between calls for complex multi-step workflows.

Full support for core endpoints including scrape, search, and crawl. Works with Maven, Gradle, and Java 17+.

Key Features:

  • Maven & Gradle Ready — Drop into any Java project via JitPack with standard dependency management.
  • Java 17+ Support — Built for modern Java environments.
  • Core Endpoint Coverage — Scrape, search, crawl, map, and agent endpoints all supported.

New PDF parsing engine delivers 3x faster parsing and significantly improved reliability. Rebuilt in Rust, it automatically adapts to any PDF from clean text files to scanned reports and complex layouts.

Key Features:

  • Rust-Based Parser — High-performance engine built in Rust delivers up to 3x faster parsing, reducing latency in data ingestion and embedding workflows.
  • Three Parsing Modes:
    • fast — text-only parsing for maximum performance.
    • auto — new default; starts in fast mode and automatically falls back to OCR when needed, intelligently detecting edge cases like embedded images, graphs, multi-column layouts, and unusual text encodings.
    • ocr — forces OCR parsing for fully image-based or scanned documents.
  • Built for Production Reliability — Extensively tested across thousands of real-world PDFs for consistent, accurate extraction.

Browser Sandbox gives agents a secure, fully managed browser environment for interactive web automation with no local setup, Chromium installs, or driver compatibility issues. Each session runs in an isolated, disposable sandbox that scales without infrastructure management.

Key Features:

  • Browser Sandbox — Launch secure, isolated browser sessions with Python, JavaScript, and bash execution. Pre-installed with agent-browser CLI and Playwright.
  • Multi-Language Support — Execute Python, JavaScript, or bash code remotely via API, CLI, or SDK with instant results.
  • agent-browser Integration — Pre-installed CLI with 40+ commands for AI agents to write simple bash commands instead of complex Playwright code.
  • Live View & CDP Access — Watch sessions in real time via embeddable stream URL or connect own Playwright instance over WebSocket.
  • Session Management — Configurable TTL controls, parallel sessions (up to 20 concurrent), and automatic cleanup. 2 credits per browser minute with 5 minutes free.

Significantly improved logo extraction accuracy for Branding Format v2, the endpoint for extracting brand identities from websites.

Key Features:

  • Significantly improved logo detection — More reliable logo extraction with fewer false positives and better handling of edge cases like logos embedded in background images.
  • Works with modern site builders — Branding Format now properly detects logos built with Wix, Framer, and other drag-and-drop platforms generating complex or non-semantic HTML.
  • Built for AI agents and developers — Captures colors, typography, spacing, and UI components in structured format to power AI agents and apps.

Firecrawl v2.8.0 is here!

Firecrawl v2.8.0 brings major improvements to agent workflows, developer tooling, and self-hosted deployments across the API and SDKs, including our new Skill.

  • Parallel Agents for running thousands of /agent queries simultaneously, powered by our new Spark 1 Fast model.
  • Firecrawl CLI with full support for scrape, search, crawl, and map commands.
  • Firecrawl Skill for enabling AI agents (Claude Code, Codex, OpenCode) to use Firecrawl autonomously.
  • Three new models powering /agent: Spark 1 Fast for instant retrieval (currently only available in Playground), Spark 1 Mini for complex research queries, and Spark 1 Pro for advanced extraction tasks.
  • Agent enhancements including webhooks, model selection, and new MCP Server tools.
  • Platform-wide performance improvements including faster search execution and optimized Redis calls.
  • SDK improvements including Zod v4 compatibility.

And much more, check it out below!

New Features

  • Parallel Agents
    Execute thousands of /agent queries in parallel with automatic failure handling and intelligent waterfall execution. Powered by Spark 1-Fast for instant retrieval, automatically upgrading to Spark 1 Mini for complex queries requiring full research.

  • Firecrawl CLI
    New command-line interface for Firecrawl with full support for scrape, search, crawl, and map commands. Install with npm install -g firecrawl-cli.

  • Firecrawl Skill
    Enables agents like Claude Cursor, Codex, and OpenCode to use Firecrawl for web scraping and data extraction, installable via npx skills add firecrawl/cli.

  • Spark Model Family
    Three new models powering /agent: Spark 1 Fast for instant retrieval (currently available in Playground), Spark 1 Mini (default) for everyday extraction tasks at 60% lower cost, and Spark 1 Pro for complex multi-domain research requiring maximum accuracy. Spark 1 Pro achieves ~50% recall while Mini delivers ~40% recall, both significantly outperforming tools costing 4-7x more per task.

  • Firecrawl MCP Server Agent Tools
    New firecrawl_agent and firecrawl_agent_status tools for autonomous web data gathering via MCP-enabled agents.

  • Agent Webhooks
    Agent endpoint now supports webhooks for real-time notifications on job completion and progress.

  • Agent Model Selection
    Agent endpoint now accepts a model parameter and includes model info in status responses.

  • Multi-Arch Docker Images
    Self-hosted deployments now support linux/arm64 architecture in addition to amd64.

  • Sitemap-Only Crawl Mode
    New crawl option to exclusively use sitemap URLs without following links.

  • ignoreCache Map Parameter
    New option to bypass cached results when mapping URLs.

  • Custom Headers for /map
    Map endpoint now supports custom request headers.

  • Background Image Extraction
    Scraper now extracts background images from CSS styles.

  • Improved Error Messages
    All user-facing error messages now include detailed explanations to help diagnose issues.


API Improvements

  • Search without concurrency limits — scrapes in search now execute directly without queue overhead.
  • Return 400 for unsupported actions with clear errors when requested actions aren't supported by available engines.
  • Job ID now included in search metadata for easier tracking.
  • Metadata responses now include detected timezone.
  • Backfill metadata title from og:title or twitter:title when missing.
  • Preserve gid parameter when rewriting Google Sheets URLs.
  • Fixed v2 path in batch scrape status pagination.
  • Validate team ownership when appending to existing crawls.
  • Screenshots with custom viewport or quality settings now bypass cache.
  • Optimized Redis calls across endpoints.
  • Reduced excessive robots.txt fetching and parsing.
  • Minimum request timeout parameter now configurable.

SDK Improvements

JavaScript SDK
  • Zod v4 Compatibility — schema conversion now works with Zod v4 with improved error detection.
  • Watcher ExportsWatcher and WatcherOptions now exported from the SDK entrypoint.
  • Agent Webhook Support — new webhook options for agent calls.
  • Error Retry Polling — SDK retries polling after transient errors.
  • Job ID in Exceptions — error exceptions now include jobId for debugging.
Python SDK
  • Manual pagination helpers for iterating through results.
  • Agent webhook support added to agent client.
  • Agent endpoint now accepts model selection parameter.
  • Metadata now includes concurrency limit information.
  • Fixed max_pages handling in crawl requests.

Dashboard Improvements

  • Dark mode is now supported.
  • On the usage page, you can now view credit usage broken down by day.
  • On the activity logs page, you can now filter by the API key that was used.
  • The "images" output format is now supported in the Playground.
  • All admins can now manage their team's subscriptions.

Quality & Performance

  • Skip markdown conversion checks for large HTML documents.
  • Export Google Docs as HTML instead of PDF for improved performance.
  • Improved branding format with better logo detection and error messages for PDFs and documents.
  • Improved lopdf metadata loading performance.
  • Updated html-to-markdown module with multiple bug fixes.
  • Increased markdown service body limit and added request ID logging.
  • Better Sentry filtering for cancelled jobs and engine errors.
  • Fixed extract race conditions and RabbitMQ poison pill handling.
  • Centralized Firecrawl configuration across the codebase.
  • Multiple security vulnerability fixes, including CVE-2025-59466 and lodash prototype pollution.

Self-Hosted Improvements

  • CLI custom API URL support via firecrawl --api-url http://localhost:3002 for local instances.
  • ARM64 Docker support via multi-arch images for Apple Silicon and ARM servers.
  • Fixed docker-compose database credentials out of the box.
  • Fixed Playwright service startup caused by Chromium path issues.
  • Updated Node.js to major version 22 instead of a pinned minor.
  • Added RabbitMQ health check endpoint.
  • Fixed PostgreSQL port exposure in docker-compose.

New Contributors


Full Changelog: https://github.com/firecrawl/firecrawl/compare/v2.7.0...v2.8.0

What's Changed

Firecrawl v2.8.0 brings major improvements to agent workflows, developer tooling, and self-hosted deployments across the API and SDKs.

Key Features:

  • Parallel Agents — Execute thousands of /agent queries simultaneously with automatic failure handling and intelligent waterfall execution. Powered by Spark 1 Fast for instant retrieval, automatically upgrading to Spark 1 Mini for complex queries.
  • Firecrawl Skill — Enables agents to use Firecrawl for web scraping and data extraction.
  • Firecrawl CLI — Command-line interface with full scrape, search, crawl, and map support.
  • Spark Model Family — Three new models: Spark 1 Fast for instant retrieval, Spark 1 Mini for complex research queries, and Spark 1 Pro for advanced extraction tasks.
  • Agent Enhancements — Webhook support, model selection, and new MCP Server tools for autonomous web data gathering.

Bringing parallel processing to /agent, letting you batch hundreds or thousands of queries simultaneously. What took hours of sequential queries now completes in minutes with automatic failure handling and parallel execution.

Key Features:

  • Parallel Batch Processing — Run thousands of /agent queries simultaneously to enrich companies, research competitors, or build datasets at scale.
  • Intelligent Waterfall — Tries instant retrieval first, then automatically upgrades specific cells to full agent research (Spark One Mini) only when needed.
  • Real-Time Spreadsheet Interface — Work in familiar CSV format with instant visual feedback as cells populate in real-time.
  • Zero Configuration — Input data schema, write one prompt, hit run without workflow building.
  • Predictable Pricing — 10 credits per cell with Spark-1 Fast.
Last Checked
7h ago
Tracking since Aug 29, 2024