releases.shpreview
Neon/Neon Blog

Neon Blog

Mon
Wed
Fri
JunJulAugSepOctNovDecJanFebMarAprMay
Less
More
Releases10Avg3/mo

Starting June 1, 2026, every Neon paid plan includes 500 GB of data transfer per month, up from 100 GB. The change is automatic and appears on your June invoice.

We're increasing the amount of public data transfer included in every Neon paid plan, from 100 GB to 500 GB per month. This completely removes data transfer (or "egress") charges for most workloads.

The change takes effect automatically on June 1, 2026, and applies to all paid plans. There's no action required, and the new allowance will appear on your June invoice.

Why we're doing this

Few things are more frustrating than an unexpected egress charge landing on your invoice. A chatty client, a backfill, a misconfigured connection, or a noisy analytics job can move more data than you expect, and the cost often shows up after the fact, when it's too late to do much about it.

Raising the included amount to 500 GB removes that surprise for most Neon customers.

If you do move more than 500 GB in a month, anything above the included amount incurs data transfer fees exactly as before. Nothing else about how data transfer is metered or priced is changing.

What you need to do

Nothing at all! The higher allowance applies automatically to every paid plan on June 1, 2026, and will be reflected in your June invoice.

If you want to see your own usage, you can track egress in the Neon Console under your billing and usage details. Questions about the change are welcome in the Neon community on Discord.

We're building the boring backend for apps and apps and agents

Why Neon is expanding beyond Postgres into a branchable stack of backend primitives — auth, data API, object storage, compute, and an AI gateway — for the agentic era.

Everyone has been talking about throwing it all away and building entirely new magic sci-fi cloud infrastructure for agents.

Amidst all the hype, this tweet stood out to me as a voice of reason:

"The agent-native cloud needs boring primitives more than magic. Identity, permissions, logs, rollback, and cost controls before the sci-fi layer."

@rtheoryxyz

Building real infrastructure is hard enough as it is. AI has only raised the stakes, dialing up the operational requirements and pushing the limits in new and unexpected ways. When autonomous agents or developers move at breakneck speeds, applications break. "Magic" won't help you recover when a runaway agent deletes your production database and all its backups. Robust, familiar infrastructure with rollbacks, AI-friendly APIs and higher operational capacity is the way forward.

The Hard Requirements of the AI Era

When we founded Neon four years ago, the core principles laid out in our Hello World post were aimed at helping human developers move faster. As luck would have it, the AI era has shifted those exact principles into the "hard requirements" column:

  • Low entry cost: When code generation is free and instant, even a $5 upfront infrastructure cost is a non-starter.
  • Branching: Code has always had isolated environments, but the data stack lacked them. This created a massive gap in the ability to experiment safely.
  • Serverless: Infrastructure should live automatically in the background, scaling instantly to meet shifting usage demands. A backend shouldn't be t-shirt sized; it should precisely match what the application demands of it.

Human developers make mistakes (cue the Matrix meme: "Only human"). But AI coding agents make mistakes at a blistering, automated velocity that traditional infrastructure simply wasn't designed to handle. Without strict guardrails, agents will tear down systems just as quickly as they build them.

An Agentic Stack Built by Systems Engineers

Neon's serverless Postgres branching changed how developers work by ensuring every single database change could be validated in an isolated environment. At this point, we start tens of millions of branches every day. Now, we're taking the same copy-on-write, instant branching approach and applying it to the full suite of backend primitives today's agent stack requires.

The Complete Agent-Native Backend
  • Postgres Database — ✅ Available
  • Authentication — ✅ Available
  • Data API — ✅ Available
  • Object Storage — 🔜 Coming Soon
  • Compute — 🔜 Coming Soon
  • AI Gateway — 🔜 Coming Soon

Scaling with Enterprise Muscle

One year after joining Databricks, the benefits are showing on both sides. Lakebase, the same technology as Neon on Databricks, is the fastest-growing new offering in Databricks history. In turn, being part of a larger company has helped us grow our database team with world-class engineers, improve platform performance, lower costs, and now ship mature, battle-tested products to developers on Neon:

The AI Gateway for example already handles more than 125 trillion tokens a month, hardened by rigorous enterprise requirements for day-0 model coverage, high availability, deep metrics, logging, and granular cost controls.

To be clear: We are not shifting focus away from our core database product. Postgres remains the bedrock for everything we do. The Neon team has aggressively expanded within Databricks, and we've hired top-tier, senior engineering talent from other major database services. We are expanding our platform by building entirely new, dedicated teams while simultaneously growing our core Postgres engineering powerhouse.

We're building the boring infrastructure layer. Go build sci-fi.


FAQ

Does this mean you're focusing less on database?

No. The same storage and compute technology powers both Neon Serverless Postgres and Databricks Lakebase, so every improvement to the core engine benefits both products. Lakebase serves large enterprise customers; Neon serves startups, agent platforms and individual developers. Both are growing, and that growth funds a bigger systems engineering team, not a smaller one.Today, around 120 engineers work across storage, compute, proxy, and Postgres itself, including upstream contributions. The new primitives (auth, object storage, compute, AI gateway) are built by new, dedicated teams. We're adding to the platform, not redirecting from it. To accelerate progress of the core database platform, we've brought in senior engineering talent from other major database and cloud services over the past year. The Postgres team is the largest it's ever been.

Are you building an entire cloud platform?

No. We're focused on the primitives that apps and agents need to function: database, authentication, data API, object storage, compute, and an AI gateway. These are the pieces where branching, instant provisioning, and scale-to-zero matter most. For everything else, you'll still want the tools you already use. Front-end hosting (Vercel, Netlify), email (Resend), error tracking (Sentry), and so on. We're not trying to replace them.

Why AI Gateway?

The lines are starting to blur between applications and agents, but regardless of what we call them, the lifeblood of what everyone is building today is inference - we're bringing reliable/scalable inference directly to you when you build your backend in Neon.

We're not building this from scratch. Databricks already operates an AI gateway that handles trillions of requests a day for everyone from fortune 500 enterprises to popular coding agents, with day-0 coverage of new models, rate limiting, logging, metrics, and cost controls.

When will the new primitives be available?

Authentication and the Data API are available today. Object Storage, Compute, and the AI Gateway are coming soon. If you want early access, sign up above and we'll reach out when each one is ready.

Will existing Neon projects need to change?

No. Your existing databases, branches, and connections keep working exactly as they do now. The new primitives are additive. Adopt the ones you need, ignore the ones you don't.

In the last year, agents have strained the limits of cloud infrastructure with new usage patterns: higher throughput of control-plane operations, more demand for on-demand infrastructure, and capacity crunch. The resulting spate of failures and incidents amongst cloud services has taught us lessons that inform our reliability roadmap...

We've managed to give customers up to 5x performance increase on write-heavy workloads by disabling full-page writes, a Postgres durability safety feature that is made redundant by Neon's own storage engine.

David Wein, Vlad Lazar

May 07, 2026

note

This is a cross-post of an engineering blog that was originally published on Databricks. Neon and Databricks Lakebase both run on the same technology, and this engineering optimization benefits customers of both platforms.

In Neon's lakebase architecture, compute and storage are separated by design. While this separation was originally built for operational flexibility, including scaling, branching, and instant recovery, it also unlocks a massive performance frontier.

By decoupling these layers, we can offload work from your Postgres compute to our distributed storage in ways that are structurally impossible in traditional, monolithic Postgres deployments. In this post, we will explore how we exploited this architectural advantage to eliminate a decade-old Postgres bottleneck to improve Postgres write throughput by 5x, while reducing read tail latencies by 2x and WAL traffic by 94%.

The hidden cost of traditional Postgres durability

To understand how we achieved a 5x improvement in managed Postgres performance, we have to look at how traditional Postgres handles durability.

In Postgres, every database change is first saved to a sequential log (the Write-Ahead Log, or WAL) to ensure data isn't lost in a crash. To keep crash recovery times fast, Postgres periodically performs a background cleanup event called a "checkpoint." Unlike a snapshot, a checkpoint is simply a milestone marker in the log. During a checkpoint, Postgres takes all the modified data currently in memory (managed in 8KB chunks called "pages") and flushes it to the main disk, up to a specific point in the log. If a crash happens, Postgres restores your data by starting at that checkpoint milestone and replaying the recent WAL logs over the disk.

However, there's a risk: if the server crashes exactly while saving an 8KB page to disk, the page might only get partially written, resulting in a corrupted "torn page." If Postgres tries to replay a tiny log update over a torn page, the data is permanently ruined. To fix this, Postgres has to ensure it never relies on a corrupted disk for recovery.

It does this using a "Full Page Write" (FPW). The very first time a page is modified after a checkpoint milestone, Postgres doesn't just log the tiny change; it copies the entire 8KB page into the WAL. If a crash happens and the disk page is torn, Postgres ignores the ruined disk, grabs the pristine 8KB backup from the WAL, and uses that as the perfect starting point to replay the rest of the logs. While this guarantees absolute safety, it is expensive: on write-heavy applications, logging entire 8KB pages can inflate log volume by up to 15x, often becoming the system's biggest performance bottleneck.

Neon storage eliminates the risk of torn pages

In Neon your compute is stateless. It does not rely on a local data directory. Instead, it streams WAL to a Paxos-based quorum of safekeepers.

Because there is no local-disk page to tear, the failure mode FPW was designed to prevent simply does not exist. However, naively turning off FPW creates a secondary problem: read performance. Without those periodic full page images in the log, the storage layer would have to replay an infinitely long chain of small deltas to reconstruct a page for a read request. What was once a bounded O(checkpoint frequency) replay becomes an unbounded chain, leading to a spike in read latency and resource consumption.

Image generation pushdown to distributed storage

We solved this by moving the intelligence from the compute node to the storage layer. We call this image generation pushdown.

When Postgres compute requests a page from storage, the pageserver (a component of Neon's distributed storage system) reconstructs it by finding the most recent materialized image of that page and replaying any WAL deltas on top. The full page images that the compute used to embed in WAL doubled as periodic reset points in that delta chain, naturally keeping the chain reasonably bounded and reads fast. For a deeper treatment of this mechanism, see Deep dive into Neon storage engine.

With full page writes disabled, those reset points disappear. Without additional intelligence in the distributed storage system a frequently-updated page could accumulate a long chain of small deltas with no intervening image. The result would be an undesirable increase in read latency and resource consumption as the pageserver replayed the entire chain to serve a read, increasing latency and resource consumption.

To avoid this problem we pushed down the image-generation responsibility from the compute's WAL stream into the storage layer, preserving the bounded read behavior of storage while still eliminating the WAL overhead on the compute. The pageserver now generates full page images when a page has accumulated more delta records than a configured threshold without an intervening image. This is a naturally better approach because the decision to generate a new image is based on the actual number of changes to a page rather than the unrelated Postgres checkpoint process.

Here's why this is significantly better for performance:

  1. Network efficiency: The compute sends only the compact deltas, which are the actual changes, leading to a 94% reduction in traffic in our benchmarks.
  2. Scalability: Work is moved from the single Postgres writer to the distributed, independently scalable storage layer. Image generation for a project branch is now shared across multiple pageservers in the background.
  3. Optimal reads: When images are generated is now based on actual changes to a page rather than the unrelated Postgres checkpoint process.

Quantifying the impact: from lab to production

We benchmarked this optimization using HammerDB TPROC-C (a TPC-C derived OLTP benchmark) and validated the results across real-world production workloads.

1. Serverless compute scaling

Throughput is measured in new orders per minute (NOPM). The gains scale dramatically with the size of the compute instance:

Compute sizeBefore (NOPM)After (NOPM)Throughput gain
4-vCPU78,87694,89120%
16-vCPU95,832269,1892.8x
32-vCPU95,686439,3004.5x+

On a 32 vCPU compute, the improvement exceeded 450%.

With full page images generated on compute, each transaction generates 58Kb of WAL on average. With image generation pushed down, that drops to under 4Kb — a 94% reduction. The throughput improvement follows directly: less WAL means less contention on the write path, less network bandwidth consumed, and less work for the storage layer to ingest.

By removing Postgres's FPW bottleneck, we allowed throughput to scale linearly with compute resources. This is something monolithic Postgres struggles to do under heavy write load.

2. Real-world production validation

In a production environment for a high-profile 56 vCPU project, enabling image pushdown reduced steady-state WAL generation from 30 MB/s to just 1 MB/s.

Prod customer WAL rate (lower is better)

This decrease in volume correlated directly to increased transaction throughput during daily peaks.

This did not just help writes. By optimizing the delta chains, the number of WAL records that must be applied per read dropped significantly. We saw p99 read latencies drop by 30% to 50% and p50 latencies drop by approximately 30%.

Prod customer throughput (higher is better)

Zooming out, at the regional level, post enablement we saw the total amount of WAL generated by computes drop by up to 4x. P99 latency of reads from the storage engine improved by up to 3x and became much more stable.

Regional WAL ingest rate (lower is better)

Regional storage p99 page retrieval latency (lower is better)

3. Synced tables

For data-intensive Synced Tables (A special feature of Databricks where analytics tables are automatically synced to Postgres), the impact was immediate. One customer saw ingestion throughput jump from 17k rows per second to 62k rows per second, which is a 3x increase, simply by enabling image pushdown.

Prod customer sync table (higher is better)

FPW's seamlessly turned off for all databases

Since late March, we have rolled this out across our entire fleet. It is now active for all Neon databases globally.

The change was applied to running computes via our control plane and storage system, which coordinated the transition automatically. This was achieved using the existing Postgres XLOG_FPW_CHANGE WAL record mechanism, meaning no restarts or interruptions were required for our customers.

What's Next

Neon's lakebase architecture was built for flexibility, but it was designed for performance. Pushing down full page writes is part of a systematic effort to harvest the benefits of storage and compute separation.

Just as we introduced cache prewarming for zero-downtime patching, we are continuing to move heavy-lifting tasks away from your transactions and into our scalable background storage stack. The Postgres write tax is officially a thing of the past.

Rest easy knowing you'll get alerted if your spend hits a certain level

Your team ships a new feature, traffic spikes, and autoscaling does its job. Great — until the bill arrives and it's three times what anyone expected. By then it's too late to do anything about it.

Most cloud providers handle this the same way: you find out what you spent after you've already spent it. Monitoring tools can help, but they live outside your database console, require separate setup, and still only tell you what happened — they don't give you a way to act on it.

We think cost controls should be built in, not bolted on. That's why we're introducing spending limits for Neon organizations.

What are spending limits?

A spending limit is a monthly dollar threshold you set for your organization. When your spend approaches or reaches that threshold, Neon takes action — today that means email alerts, and soon it will mean the option to automatically suspend project computes.

You set it once, and it works in the background. No external monitoring to configure, no third-party integrations to maintain.

How it works

Setting up a spending limit takes about 30 seconds:

  1. Navigate to your organization's Billing page in the Neon console.
  2. Find the Spending limit card and click Enable.
  3. Enter a monthly dollar amount.
  4. Choose what happens when the limit is reached — Send email alerts is available now, with Suspend projects coming soon.
  5. Click Enable, and you're done.

Once enabled, your Billing page shows a progress bar with your current spend relative to the limit. Neon checks your organization's spend every 15 minutes.

Approaching and exceeding the limit

When your organization reaches 80% of its spending limit, org admins receive an email alert. A second alert fires at 100%. In addition, a banner appears across the Neon console so that everyone on the team has visibility — not just whoever set up the limit.

The banner includes a direct link to adjust your limit, so you can react immediately without navigating through the billing page.

Editing or disabling

Org admins can edit the dollar amount or disable the spending limit entirely at any time from the same Spending limit card. Changes take effect on the next 15-minute check cycle.

What's coming next

Today, spending limits notify you. Soon, they'll be able to enforce.

The Suspend projects option is already visible in the setup dialog with a "Coming soon" badge. When it ships, reaching your spending limit will automatically pause compute for all projects in the organization — a hard guardrail that prevents runaway costs without requiring anyone to be online to react. Computes will resume automatically when the limit is raised or a new billing cycle begins.

This gives teams two levels of control: alerts for awareness, suspension for enforcement. Use one or both depending on how tightly you need to manage spend.

Get started

Spending limits are available today for all organizations on paid Neon plans. Head to your Billing page to set one up.

Have feedback or questions? Let us know in Discord or check the spending limits documentation for more details.

Lessons from making Neon's docs agent-readable: MDX-to-Markdown pipelines, content negotiation, llms.txt structure, and a scan of 250+ doc sites.

Philip Olson–Documentation Engineer

Apr 23, 2026

A year ago, if you asked an agent about Neon, you got whatever it half-remembered from training. Now it goes looking and reads what it finds. Our docs were written for humans who scroll, not machines that fetch.

We've been fixing this in pieces, not all at once. This post is what worked, what didn't, and what we're still figuring out. Maybe it saves you a few curl commands.

The setup

Agents can read HTML fine. Crawlers have been at it for decades and modern agents handle it well. We just think we can do better. Our pages are built from dozens of rendered React components (<Admonition>, <CodeTabs>, <DetailIconCards>, <Steps>), which expand into nested <div>s, class names, and event handlers in the final HTML. The actual docs are buried in there somewhere.

You might think: just serve your source MDX from GitHub. We once did, and it works. MDX is Markdown with React components mixed in. Our MDX uses 30+ custom ones, and some, like <SharedContent>, inline text from separate files at render time. An agent reading the raw MDX just sees the tag.

You will correctly say: convert them. We do now, after plenty of yak shaving.

Phase 1: hand-maintained text files

Our first approach: ask Claude for "one of them cool llms.txt things that all the kids are talking about." It produced a public/llms/ directory, one .txt file per doc page, and an enormous llms.txt index listing them all. Keeping them current was a handful of Python scripts, run by hand, no CI.

It worked. The thinking at the time was "feed the models" not "serve the agents" (the spec itself leans that way). Live fetching was new and rare. Predictably, the files drifted from the source, went missing, went stale weeks at a time. The implementation was an afterthought because the use case still felt like one.

The lesson: if keeping two copies in sync is a manual job, they will drift. Clearer now than it was at the time.

Phase 2: teach the site to recognize agents

What if the site detected agent requests and served something cleaner than HTML? We built middleware that checks the User-Agent (ChatGPT, Claude, Cursor, Copilot, and others) and the Accept header. When either matches, we serve Markdown instead.

What we actually served was raw MDX from GitHub's API with a text/markdown content type. Technically Markdown-ish, practically Markdown with a pile of React components. We hit GitHub rate limits within hours, switched to pre-built local files, still MDX. Detection was solved, content was not.

Phase 3: converting MDX to Markdown

I (okay, Claude) wrote a Node.js post-build script that converts MDX to Markdown and writes it to public/md/, which we serve via URL rewrites.

For example, <CodeTabs labels={["Node.js", "Python"]}> becomes labeled code blocks. <SharedContent> tags inline the referenced text directly. About 30 components handled, all from one file.

The processor builds ~1,400 files in a few seconds. Doc authors edit MDX as usual. No manual sync, no drift, no thought.

Context matters too

Clean Markdown isn't enough. Agents need to know where they are and what to read next. So we wrap each page with a breadcrumb at the top and related docs at the bottom:

> This page location: Connect to Neon > Connection pooling
> Full Neon documentation index: https://neon.com/docs/llms.txt

...

## Related docs (Connect to Neon)
- [Connect to Neon](https://neon.com/docs/connect/connect-intro)
- [Choosing your connection method](https://neon.com/docs/connect/choose-connection)

...

Without it, an agent fetches one page and doesn't know what else is nearby.

What other sites are doing

Nikita (Neon's fearless leader) has a habit of pointing people back to first principles. It's why we tend to build small tools instead of guessing, even when the tool's whole point is to see how others are doing it. Ours, a scanner, probes doc sites and measures how they serve content to agents: same URL as HTML, with .md appended, Accept: text/markdown, discovery headers, plus variations. Findings across over 250 sites, mostly tech docs such as Vercel, Stripe, Mintlify, Sentry, and Google:

  • 53% serve Markdown by appending .md to the URL.
  • 41% honor content negotiation via Accept: text/markdown. The ones that do also tend to have llms.txt, discovery headers, and structured indexes. They've thought about agents. About 30% also accept text/plain.
  • llms.txt is common but placement varies. 93% of polled sites have one, and 58% also publish llms-full.txt with concatenated doc content. The standard says place llms.txt in root. In practice, sites put it at /docs/llms.txt, at the root, or both. Some have different content at each path, and some use sub-indexes (child llms.txt files within llms.txt).
  • 404 handling is mostly not content-type aware. Only 9% return Markdown for a 404 when Markdown was requested. The rest return HTML, and a handful return empty responses, even when the agent clearly asked for Markdown via .md or Accept: text/markdown. Of those 9%, most sites return 200 instead of 404 (we chose 404).
  • Discovery hints are rarely used, and the conventions aren't settled. Only 9% include a <link rel="alternate" type="text/markdown"> tag in the HTML head, a convention that emerged organically (ours did). The X-LLMs-Txt and Link: rel="llms-txt" headers Mintlify proposed have adoption almost entirely driven by Mintlify itself.
  • Headers are mixed and the impact is unclear. Only 3% set Vary: Accept on HTML (6% on Markdown). 27% set noindex on Markdown. We're still figuring out which of these actually help versus which are habit.

Doc-specific platforms like Mintlify, GitBook, and Fern score near 100% on most of these, because agent readiness is the point. Open-source frameworks are further behind and could use agent advocates. Tooling exists in the community but often sits unmaintained.

A few more lessons

404s should be helpful and aware, not empty. Our 404s match the request: HTML for browsers, Markdown for agents, the latter returning links to the full index, the complete docs bundle, and the API reference. Idea stolen from a Vercel tweet and implemented immediately.

Discovery has to be automatic, and responding to agents has to be too. Agents don't know to look for llms.txt or that appending .md works. Set discovery headers on every HTML response so they find out, and honor Accept: text/markdown when they do ask. Like children, they often ignore the reminders, but we do our best as parents.

The index needs structure, not just a list. Our first llms.txt was a flat list of over 1,000 URLs. Way too much to parse before deciding what to read. We now restructure it with sections and descriptions, sub-indexes for large areas, a "Common Queries" section at the top (pricing, connection methods and troubleshooting, API reference), and collapsed routes for large but useful content (changelog, Postgres tutorials). The primary index is now ~200 entries with sub-indexes for the rest.

Agents use HTTP clients, not browsers. Looking at User-Agent strings, we saw axios, got, node-fetch as often as named agents. Claude Code uses axios, Cursor uses got. The agent identity is in the tool, not always the header. We added those patterns to the detection list. A false positive (Markdown to a human) is harmless; a false negative (HTML to an agent) defeats the purpose. A real question: is changing content based on who's asking a form of cloaking?

What the system looks like now

Four layers:

  • Build time. The MDX processor converts source docs to Markdown. The index generator builds llms.txt, sub-indexes, and llms-full.txt (all docs concatenated).
  • URL rewrites. Appending .md to any doc URL serves its Markdown version from public/md/. Non-doc pages will follow.
  • Middleware. Detects agents via User-Agent and Accept headers. Serves Markdown transparently. Adds discovery headers to HTML responses.
  • Content. Every doc page gets navigation context. The index is hierarchical. 404s are helpful and content-type aware.

What we'd do differently

One URL, two ways to ask for Markdown. We built a parallel /llms/ namespace first. Eventually we moved to serving Markdown from the canonical URL via a .md suffix or an Accept: text/markdown header. That should have been the starting point.

Invest in analytics earlier. We added agent traffic tracking late. Having it from the start would have shown which pages agents request, which ones they 404 on, and how they navigate. That data would have shaped our system sooner.

Design the index first. The flat file list was an afterthought. Structuring it with sections, descriptions, and sub-indexes earlier would have made it more useful.

Build the scanner first. Studying other sites first would have saved us from reinventing patterns and surfaced cracks we didn't think of until later.

None of this was planned from the start. It came together one small change at a time.

What's next

Humans reach docs through agents, not just browsers. That's the new audience and it doesn't execute JavaScript or follow visual navigation. Agents want plain text, structured metadata, and machine-readable discovery. The tools aren't exotic: a remark pipeline, some middleware, a few HTTP headers, a config file. The hard part is recognizing that and choosing to serve them.

An agent can implement most of this for you. What it can't do is write good content without review.

Community tooling is catching up. The afdocs scorecard flagged a coverage issue in our llms.txt that we were briefly convinced wasn't our problem, but it was. The associated agent doc spec is also growing, turning ad-hoc conventions into something documented. The tools are new, the category is new, and everyone is figuring it out together.

On our list:

  • Focus on accuracy. Continue testing whether an agent can complete tasks using a given doc page, similar to agent skills testing. Goal: fewer mistake-then-fix cycles.
  • Offer interfaces built for agents. Like search APIs, and ways for them to send feedback when we get something wrong. Markdown is a human format agents happen to parse well, and we can do better than that.
  • Think more about agent skills. There's something wrong with committing .claude folders into every repo. Treating them like devDependencies feels saner, and we're watching how this evolves.
  • Continue integrating tools like afdocs. Discuss with maintainers and submit PRs to include more (optional) checks, such as 404 handling and headers.
  • But most importantly, what every doc site has tried to do since the dawn of time: write good, reliable content. Treat docs like code, like tests, like the source of truth.

None of this is magic. Just small, honest work that only matters if the content is worth reading.

Thanks

Thanks to Neon and Databricks for letting engineers experiment (and for the tokens), and to my docs-team colleagues Dan and Barry for keeping the real docs moving while I poked at this.

Give your Codex agent Neon superpowers

Andy Hattemer–Member of Product Staff

Apr 16, 2026

An official Neon plugin is now available in the OpenAI Codex marketplace. It connects Codex directly to your Neon databases through MCP, so you can provision and manage Postgres databases without leaving your workflow.

Video

Once installed, Codex can interact with your Neon account, not just read static guidance about it. You can ask it to create a new project, spin up a branch for a feature, run a migration, validate a connection string, or query your schema. It understands Neon-specific concepts like branching and autoscaling, so you get steps that are actually correct for how Neon works.

What you can do with it

The plugin bundles three components:

  • Neon Postgres app — gives Codex MCP-backed tools to create and manage projects, branches, and databases, run SQL queries, and validate connections.
  • Neon Postgres skill — guides Codex through Neon-specific workflows: connection patterns, ORM setup, branching strategies, autoscaling, and Neon Auth.
  • Neon Postgres Egress Optimizer skill — helps diagnose and reduce data transfer costs when egress is higher than expected.

A few things that become straightforward once the plugin is connected: setting up a new Serverless Postgres database and getting a working connection string for your framework, creating an isolated branch before running a migration, or asking Codex to walk through reducing egress without digging through docs manually.

How to add the plugin

To get started, open the plugins menu in Codex, search for Neon, and click install. If you prefer the CLI, run codex, then /plugins to find and add it.

Once connected, you can manage your Neon databases directly from Codex. Ask it to pull your schema, insert rows, create projects, create branches, or run queries. The results show up right in the chat window.

Ship faster with Codex

Database provisioning, branching, migrations — these have always been necessary but rarely the interesting part of building. Giving Codex the tools to handle them closes the loop: the agent can now take a task from code to running database without handing off to you for the operational steps in between.

Try it today, open or download Codex and install the Neon plugin!

The first of several features that make compute restarts invisible.

note

This is a cross-post of an engineering blog that was originally published on Databricks. Neon and Databricks Lakebase both run on the same technology, and this engineering optimization benefits customers of both platforms.

Ensuring customer databases are always available is one of the most important things we do in Neon and Lakebase. We've designed the system with redundancy at every level, automatically failing over and recovering your database in the event of hardware or software failures.

In a large-scale system, such unplanned failures are a statistical expectation, but for an individual database, they're not that frequent. For an individual database, planned maintenance tends to cause more workload disruption. After all, a typical database is patched more frequently than it experiences hardware failure.

Today, nearly every database provider operates with maintenance windows: scheduled periods where your database severs all active connections and gets updated and restarted in a process that can take anywhere from a few seconds to minutes. While Neon lets you schedule updates at a time that's optimal for you, it's still a brief interruption when it happens.

We think we can do better. This blog post is the first in a series on how we're leveraging the lakebase architecture with separation of compute and storage to eliminate the impact of planned maintenance entirely. Our goal: make version updates and security patches completely unnoticeable.

In this post, we'll cover prewarming: a technique that prevents any performance degradation that follows a database restart. In future posts, we'll discuss improvements to the failover process itself and additional optimizations that bring us closer to true zero-downtime patching.

The Problem with Cold Restarts

The challenge with restarting Postgres is that in-memory caches (specifically the buffer cache and local file cache) are lost. Even though the database is back online very quickly (1 second @ P99), the workload may experience a slowdown in the first minutes after restart – we saw a ~70% reduction in pgbench TPS. This is due to a low cache hit ratio while data is read back from storage and the cache warms up. While this might seem like only a performance problem, it can be an availability issue if the slowdown is severe enough that the database cannot keep up with the workload and timeouts occur.

Techniques to address this exist in Postgres: pg_prewarm can be used to warm up buffer caches. However, this runs after a restart when the workload is already impacted. Streaming replication can be used to set up a replica, which can be prewarmed before failing over to it (promoting it to primary). However, this requires creating a full replica and carefully orchestrating the prewarming before failover.

Prewarming on Neon's lakebase Architecture

In the lakebase architecture, we combine stateless, elastic compute nodes with disaggregated, shared storage. The compute nodes employ local caches to deliver maximum performance without sacrificing serverless properties. While the cache faces the same cold-start issues outlined above, we have more options with the Lakebase architecture.

Since Neon's Postgres compute replicas are stateless, we can spin them up and down on demand. We utilize this and combine it with automatic prewarming on planned restarts to minimize the performance impact on the workload. This is how it works:

  1. A new version of Neon's Postgres compute image becomes available. You receive a notification and can schedule the restart for a time that works for you.
  2. Shortly before the scheduled time, our control plane spins up a new Postgres compute in the background. You don't see it, and you're not billed for it. The current primary's workload is unaffected.
  3. A list of pages in the current primary's cache is sent to the new compute. The new compute loads those pages into cache from our shared storage tier without impacting the primary.
  4. The new compute subscribes to the WAL (write-ahead log) to keep its cache up to date. For efficiency, unlike a normal Postgres replica, it can ignore all WAL records that do not affect its cache. It gets the WAL from our Safekeepers, putting no additional load on the primary compute.
  5. When prewarming is complete, we quickly shut down the old primary, promote the new compute to primary, and switch it in. Promotion uses the standard pg_promote from OSS Postgres and does not restart the database server.

BEFORE:

AFTER:

With Neon's lakebase architecture, you get this at no additional cost, without paying for additional replicas. All planned restarts of read/write endpoints in all regions are now performed this way without you having to do anything. Soon we'll be extending it to read-only endpoints as well.

Results

To measure the impact of cold caches, we ran 10 GB pgbench (scale factor 670) on a database while restarting it – first with prewarming enabled, then without prewarming . The first chart shows a read-only workload (pgbench "select only"), while the second shows a read-write workload (pgbench "simple update").

In both cases, we see that throughput recovers nearly instantly with prewarming. Without prewarming, recovery is much slower while the cold cache is warming up. The difference is starkest for the read-only workload because prewarming improves the cache hit ratio which helps reads proportionally more than writes.

On this page
  • The Problem with Cold Restarts
  • Prewarming on Neon's lakebase Architecture
  • Results
<p>For most of 2025, AI coding agents got good at a specific thing: writing code. Give an agent a prompt, and it could scaffold an app, wire up an API, write migrations. But when the code was done, the agent stopped. Spinning up a real database, creating an account, getting credentials into the environment&#8230; that [&hellip;]</p>
<p>&#8220;I&#8217;m genuinely surprised by how well it handles that scale. You can create tons of databases and they&#8217;re available immediately. You can branch out immediately. All of those things make it really nice for agent-managed infra.&#8221; Iman Radjavi, Co-founder, Specific.dev What Specific builds Specific (YC F25) is a cloud platform designed for coding agents. With [&hellip;]</p>
<p>There are a few different reasons to hit the brakes on a Postgres query. Maybe it’s taking too long to finish. Maybe you realised you forgot to create an index that will make it orders of magnitude quicker. Maybe there’s some reason the results are no longer needed. Or maybe you, or your LLM buddy, [&hellip;]</p>
<p>“The biggest strength of Neon is how it decouples storage and compute and makes them independently scalable. When an app isn’t being used, the compute node can be put in idle mode at extremely low cost, which lets us handle a wide range of scale and complexity without compromise.” (Nilesh Trivedi, co-founder and CTO at [&hellip;]</p>
<p>From the start, the team at Encore has been focused on solving a simple problem: shipping production infrastructure shouldn’t require a dedicated platform engineering team. They set out to make deploying real applications feel simple without abstracting away control; in Encore, devs can define infrastructure directly in Go or TypeScript, and the platform turns that [&hellip;]</p>
<p>Every AI lab is shipping research agents. OpenAI&#8217;s Deep Research, Perplexity, and Gemini&#8217;s research mode. These products are not simple RAG pipelines. Recent papers like DeepResearcher and Step-DeepResearch formalize what makes them work: a recursive loop of planning, searching, learning, and reflecting, where the agent decides when to go deeper and when to stop. The [&hellip;]</p>

Cursor just launched plugins, making it easier than ever to give Cursor structured access to external tools and infrastructure. Neon is part of the initial launch set: you can install the Neon plugin today from the Cursor Marketplace to give Cursor live access to your Neon organization along with the knowledge it needs to be […]

A few weeks ago, Vercel released add-skill (now npx skills), a CLI for installing agent skills across different coding agents and editors like Claude Code, Cursor, and VS Code. It solves a very real problem: each tool looks for agent skills in a different place, which makes setup repetitive and documentation painful to maintain. The […]

“We were getting ready to hire dedicated engineers just to manage and scale Zite Database. With Neon, we didn’t need to do that – we were able to give every end user their own database, including on the free plan” (Dominic Whyte, Co-founder at Zite) Zite is an AI-native app builder for the kind of […]

v0 just went through a big rebuild. What started as a fast way to explore ideas has now evolved into a platform designed to ship real, production-ready software, not just quick demos or one-off prototypes. v0 is no longer about generating code – it’s about helping teams ship. With this release, v0 moves beyond UI […]

“Agents don’t want heavyweight infrastructure that lives forever, they want primitives they can spin up, use, and discard as part of their work. Neon fits that model perfectly: it behaves the way agents actually think about state” (Rick Blalock, Co-founder at Agentuity) Existing cloud infrastructure is built around the software model developers have used for […]

Last Checked
36m ago
Latest
Jun 1, 2026
Tracking since Feb 4, 2025