releases.shpreview
Pulumi/Pulumi Blog

Pulumi Blog

Mon
Wed
Fri
JunJulAugSepOctNovDecJanFebMarAprMayJun
Less
More
Releases40Avg12/mo

For about two years, the unit of work with a coding agent was the prompt. You wrote a good one, you gave it enough context, you read what came back, and you wrote the next one. The agent was a tool, and you were holding it the entire time, one turn after another.

That part is ending. Addy Osmani, a director of AI at Google Cloud, has a name for what replaces it, and I have not stopped thinking about it since: loop engineering. You stop being the person who prompts the agent. You design the loop that prompts it for you.

In my phrasing: you stop being the thing that runs, and start designing the thing that runs. The leverage moves up a layer. What I want to do here is take an honest look at the pieces, and at the part nobody automates.

The leverage moved up a layer

The people building these tools have already made the jump. Peter Steinberger has been posting it as a monthly reminder.

Peter Steinberger (@steipete) on X.

Boris Cherny, who heads Claude Code at Anthropic, says the same thing about his own job. He does not prompt Claude anymore. He has loops running that prompt Claude and decide what to do next, scanning the issue tracker, the team chat, and the timeline for what to build. “My job is to write loops.”

A loop is a goal that prompts itself. You set the purpose, and the system keeps iterating until it’s met. In practice it finds the work, hands it out, checks the result, writes down what got finished, and decides the next thing, then it pokes the agent instead of you. You build that small system once and let it run.

Look closer, and a loop is really two loops nested. The inner one does the work against a spec. The outer one decides what the work should be: it watches an issue tracker, an error feed, a changelog, then writes the next spec and hands it down. Most people are still running that outer loop by hand, in their head, and calling it a backlog.

The part that surprised me is that this is barely a tooling problem anymore. A year ago a loop meant a pile of bash you wrote and maintained forever. Now the pieces ship inside the products, and the same shapes show up in Claude Code and in Codex. Osmani puts loop engineering one floor above the harness, the context and tooling you wire around a single agent. I wrote about that harness a couple of weeks ago. The loop is the thing that runs on top of it: it runs on a timer, it spawns helpers, and it feeds itself.

The five pieces, and the one that holds them together

Strip loop engineering down and you get roughly five building blocks, plus one place to remember things. Both Claude Code and Codex have all five now. The names differ here and there; the capability is the same.

  1. Automations are the heartbeat. They are what make a loop an actual loop and not one run you did once. A prompt or command on a cadence, a scheduled task, a hook that fires at a point in the agent’s lifecycle, or a job on CI that keeps running after you close the laptop. Discovery and triage run themselves, and the findings that matter come to you.
  2. Worktrees keep parallel from turning into chaos. The second you run more than one agent, the files start colliding. Two agents writing the same file is the same headache as two engineers committing to the same lines with nobody talking first. A git worktree is a separate working directory on its own branch, so one agent’s edits cannot touch another’s checkout.
  3. Skills are intent, written down. An agent starts every session cold and fills any hole in your intent with a confident guess. A skill is that intent written on the outside: the conventions, the build steps, the “we don’t do it like this because of that one incident,” recorded once where the agent reads it every run. Without skills, the loop re-derives your whole project from zero every cycle.
  4. Connectors let the loop touch your real tools. Built on MCP, they let the agent read the issue tracker, query a database, hit a staging API, or drop a message in chat. This is the difference between an agent that says “here is the fix” and a loop that opens the pull request, links the ticket, and pings the channel once CI goes green.
  5. Sub-agents keep the maker away from the checker. The model that wrote the code is far too generous grading its own homework. A second agent with different instructions, and sometimes a different model, catches the things the first one talked itself into. Worktrees and a cold-context reviewer are two pieces I have written about before, back when the question was running agents in parallel without them trampling each other.

Then the sixth thing: memory. A markdown file, a Linear board, a state file, anything that lives outside the single conversation and holds what is done and what is next. It sounds too dumb to matter, and it’s the whole game. The model forgets everything between runs, so the memory has to live on disk, not in the context window. The agent forgets. The repo does not.

What makes a loop hold together

A loop running unattended is also a loop making mistakes unattended. The one thing that keeps it honest is verification, and verification needs an oracle, something outside the model that returns a hard yes or no. Passing tests, a clean build, a green pipeline, a real production signal. Without an oracle, the loop compounds confidently wrong work, faster than you can read it.

The cleanest version of this already ships in the tools. Claude Code’s /goal keeps working across turns until a condition you actually wrote holds, something like “every test in auth/ passes and lint is clean,” and after every turn a separate, faster model reads the transcript and decides whether you are there yet. The agent that wrote the code is not the one that grades it. That is the maker-and-checker split applied to the stop condition itself. Codex’s /goal reaches the same finish line a different way: the agent audits its own work against the evidence before it can call the goal done.

What the loop still won’t do for you

The loop changes the shape of the work. It does not take it off your desk. And a few things get sharper as the loop gets better, not softer.

Verification is still on you. The split reviewer is what makes “it’s done” mean something, but “done” is a claim, not a proof. Your job is still to ship code you confirmed works, which is harder to remember when the diff arrived while you were at lunch.

The bill comes in two currencies. Tokens and attention. A single unattended run can burn through millions of tokens, and that is only worth it when the tokens buy something worth more than they cost. The quieter trap is the second currency: memory is what lets a loop compound over time, and slop compounds right alongside it. A loop pointed at a vague goal does not get tired and stop. It gets faster.

Your understanding rots if you let it. The faster the loop ships code you did not write, the wider the gap between what exists in the repo and what you actually understand. A smooth loop grows that gap faster, not slower, unless you read what it made. The comfortable posture, where you stop having an opinion and take whatever the loop gives back, is the risky one. Two engineers can build the exact same loop and get opposite results, one moving faster on work they understand deeply, the other avoiding the work entirely. The loop cannot tell which one you are.

When the loop reaches production

Most of this thinking grew up around application code, where a bad run costs you a revert. When the loop reaches into infrastructure, the blast radius is a production outage rather than a revert, and the verification bar has to rise to meet it. The upside is that infrastructure hands the loop a better oracle than application code does. A plan diff is deterministic and machine-readable, a policy check returns a hard verdict, and drift and cost are numbers you can put a threshold on. A reviewer, whether human or agent, can read the change cold, with no memory of the prompt that produced it. That cold-context check is exactly what an unattended loop needs, and it’s the reason an infrastructure loop can be built to hold together while you sleep. Pulumi Neo reasons over the state graph directly, so the checker grounds every claim in what the change actually does, not in what the writer says it does.

Where to start

Pick the loop you can actually trust first. In order:

  1. Start where “done” is unambiguous. CI triage, dependency bumps, a flaky-test hunt, a failing job you keep re-running by hand. Loops need an oracle, so begin where the oracle already exists.
  2. Write the memory file before the loop. One markdown file, or a board. What is done, what is next, what was tried and failed. This is the spine, and everything else hangs off it.
  3. Split the checker from the maker. Use /goal with a verifiable condition, or a second agent with its own instructions. Never let the agent that did the work be the one that decides the work is finished.
  4. Cap it, then read everything. A max-iteration count, a token budget, a teardown step. Run it once, end to end, then read every line it shipped. The first run is the measurement, not the payoff.

Then look at what you built. You designed it once, and it ran without you steering each step. That is the real shift. But the leverage only holds if you wire the loop like an engineer, not like someone looking for permission to stop thinking. Read what it ships. Keep an opinion. Judgment is the one part that does not move up a layer.

The loop will do the typing. The thinking is the work.

See how Pulumi Neo closes the loop on your infrastructure

A git tag is how many teams mark a release as ready. Pulumi Deployments can now act on that signal directly: configure a tag-based trigger, push a version tag like v1.2.0, and Pulumi automatically runs pulumi up for your stack. No extra pipeline glue, no manual click — your release tag is the deployment.

Why tags?

Push to Deploy has long let you preview changes on a pull request and update a stack when commits merge to a branch. That branch-based model is a great fit for continuous delivery to shared development and QA environments, where every merge should flow straight through.

But promotion to production is often deliberate, not continuous. You merge throughout the day, then decide — separately — that a particular commit is the release. The conventional way to record that decision is a git tag: v1.2.0, 2026.06.0, release-2026-06-04. Tagging is already part of most teams’ release rituals.

Tag-based triggers connect that ritual to your infrastructure. Instead of wiring up a separate CI job to call the Pulumi Deployments REST API on a tag event, you configure the trigger once in your stack’s deployment settings and let Pulumi handle the rest.

How it works

Tag triggers are controlled by two settings on your stack’s deployment configuration:

  • Run updates for pushed tags — a toggle that enables running pulumi up when a matching tag is pushed.
  • Tag filters — a list of glob patterns that decide which tag names qualify.

Tag filters use the same model as the path filters you may already know, except the patterns match against the tag name rather than changed file paths. A few examples:

  • v* — deploy on any tag beginning with v, such as v1.0.0 and v2.3.1.
  • v* plus !*-rc* — deploy on release tags but skip release candidates like v1.2.0-rc1.
  • 2026.* — deploy on calendar-versioned releases such as 2026.06.0.

Filters prefixed with ! are exclusions, and an exclusion always wins over an include. With no filters configured and the toggle on, every tag push deploys. Deleting a tag never triggers a deployment.

When a tag push kicks off a deployment, Pulumi sets the PULUMI_CI_TAG_NAME environment variable to the tag name. Your pre-run commands or your Pulumi program can read it — for example, to stamp the release version onto a resource tag or an application config value.

Works across every VCS integration

Tag triggers are available across all five version control integrations: GitHub, GitLab, Bitbucket, Azure DevOps, and Custom VCS.

Get started

You can configure tag triggers wherever you manage deployment settings today — the Pulumi Cloud console, the REST API, or as code with the pulumiservice.DeploymentSettings resource.

To try it out:

  1. Open a stack’s Settings > Deploy tab in the Pulumi Cloud console.
  2. Enable Run updates for pushed tags and add a tag filter such as v*.
  3. Push a tag — git tag v1.0.0 && git push origin v1.0.0 — and watch the deployment run.

For the full details, see the deployment triggers and tag filtering documentation. We’d love to hear how you put tag-based deployments to work.

If you run AI tools and agents, you’ve probably accepted three tradeoffs: your data leaves your network, you can’t work offline, and your bill scales with usage.

Open-weight models now run well on consumer hardware. Once the model is on your machine, your data stays local, inference works offline, and tokens cost nothing. If you own a modern Mac, you can run a high-quality model yourself.

Gemma 4 is an open-weights model family from Google. This post focuses on Gemma 4 12 B, released in June 2026, using Unsloth’s Q8_0 GGUF. The 12 B model fits comfortably on a modern Mac while leaving enough headroom for local llama.cpp and a chat UI.

We’ll use llama.cpp for host-native inference, k3d for a local Kubernetes cluster, Pulumi for infrastructure as code, and Tailscale for secure access.

Prerequisites

This setup was validated on the following hardware:

  • macOS 26 Tahoe, version 26.5
  • MacBook Pro with Apple M3 Max
  • 36 GB RAM

On this machine, llama.cpp reported about 20 output tokens per second for a 160-token validation response with unsloth/gemma-4-12b-it-GGUF, gemma-4-12b-it-Q8_0.gguf, and a 131,072-token context. Sustained throughput varies by prompt length, thermal state, and llama.cpp settings.

You’ll need brew, docker, pulumi, and tailscale installed. We’ll also install k3d during the process.

Run Gemma 4 with host-native llama.cpp

We use llama.cpp directly on macOS to leverage Apple Metal acceleration. Running the LLM on the host is more efficient than trying to pass GPU access into a local Kubernetes VM.

Install the build tools:

<span class="line"><span class="cl">brew install cmake git
</span></span>

Then build llama.cpp from source and download the multimodal projector. In validation, Homebrew llama.cpp 9430 could run text inference, but it could not load the new Gemma 4 12 B projector and failed with unknown projector type: gemma4uv. Building current llama.cpp from source fixed that.

<span class="line"><span class="cl"><span class="nv">llm_home</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm"</span>
</span></span><span class="line"><span class="cl">mkdir -p <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/models"</span> <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/logs"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="o">[</span> ! -d <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp/.git"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl"> git clone --depth <span class="m">1</span> https://github.com/ggml-org/llama.cpp.git <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp"</span>
</span></span><span class="line"><span class="cl"><span class="k">fi</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cmake -S <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -B <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp/build"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -DGGML_METAL<span class="o">=</span>ON <span class="se">\
</span></span></span><span class="line"><span class="cl"> -DGGML_BLAS<span class="o">=</span>ON <span class="se">\
</span></span></span><span class="line"><span class="cl"> -DCMAKE_BUILD_TYPE<span class="o">=</span>Release
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cmake --build <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp/build"</span> --target llama-server -j <span class="m">10</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">curl -L --fail <span class="se">\
</span></span></span><span class="line"><span class="cl"> --output <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/models/mmproj-F16.gguf"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/main/mmproj-F16.gguf
</span></span>

Then download and run the model with this command:

<span class="line"><span class="cl"><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm/llama.cpp/build/bin/llama-server"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --hf-repo unsloth/gemma-4-12b-it-GGUF <span class="se">\
</span></span></span><span class="line"><span class="cl"> --hf-file gemma-4-12b-it-Q8_0.gguf <span class="se">\
</span></span></span><span class="line"><span class="cl"> --mmproj <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm/models/mmproj-F16.gguf"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --host 127.0.0.1 <span class="se">\
</span></span></span><span class="line"><span class="cl"> --port <span class="m">18080</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --ctx-size <span class="m">131072</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --parallel <span class="m">1</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --jinja <span class="se">\
</span></span></span><span class="line"><span class="cl"> --reasoning off
</span></span>

We use port 18080 because 8080 is commonly used and is likely to conflict with another service you may already have running locally. If your port 8080 is free, you can use it and adjust the Pulumi config later.

The model file is about 12.65 GB, and the projector is about 116 MB. Gemma 4 12 B advertises a 131,072-token context, and this Mac loaded that full context with --parallel 1. llama.cpp projected about 15.1 GiB of Apple Metal device memory for the text model and about 258 MiB worst-case memory for the projector, leaving enough headroom for Open WebUI and the rest of the local stack. The --reasoning off flag keeps OpenAI-compatible chat responses visible in clients that do not read separate reasoning fields.

With --mmproj, /v1/models advertised capabilities: ["completion","multimodal"]. In local validation, Open WebUI accepted an uploaded Pulumi logo image and Gemma 4 described it correctly. A small WAV file also worked through the OpenAI-compatible input_audio request shape, though llama.cpp logs still mark audio input as experimental.

Verify the LLM API

Open a new terminal and check if llama.cpp is responding:

<span class="line"><span class="cl">curl http://127.0.0.1:18080/v1/models
</span></span>

The /v1/models endpoint should return the model ID unsloth/gemma-4-12b-it-GGUF. Now try a chat completion:

<span class="line"><span class="cl">curl http://127.0.0.1:18080/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="cl"> -H <span class="s2">"Content-Type: application/json"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -d <span class="s1">'{
</span></span></span><span class="line"><span class="cl"><span class="s1"> "model": "unsloth/gemma-4-12b-it-GGUF",
</span></span></span><span class="line"><span class="cl"><span class="s1"> "messages": [{"role": "user", "content": "Reply with exactly: OK"}],
</span></span></span><span class="line"><span class="cl"><span class="s1"> "temperature": 0,
</span></span></span><span class="line"><span class="cl"><span class="s1"> "max_tokens": 32
</span></span></span><span class="line"><span class="cl"><span class="s1"> }'</span>
</span></span>

The chat prompt Reply with exactly: OK should return content OK. In validation, llama.cpp reported an output token velocity of about 20 tokens per second for a longer 160-token response.

Keep llama.cpp running after reboot

For a permanent setup, put the llama.cpp startup script and logs under a folder in your home directory and let launchd restart it when you sign in:

<span class="line"><span class="cl"><span class="nv">llm_home</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm"</span>
</span></span><span class="line"><span class="cl">mkdir -p <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/logs"</span> <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cat > <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/start-llama-server.sh"</span> <span class="s"><<'EOF'
</span></span></span><span class="line"><span class="cl"><span class="s">#!/bin/zsh
</span></span></span><span class="line"><span class="cl"><span class="s">set -euo pipefail
</span></span></span><span class="line"><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="cl"><span class="s">export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
</span></span></span><span class="line"><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="cl"><span class="s">exec "$HOME/pulumi-gemma4-llm/llama.cpp/build/bin/llama-server" \
</span></span></span><span class="line"><span class="cl"><span class="s"> --hf-repo unsloth/gemma-4-12b-it-GGUF \
</span></span></span><span class="line"><span class="cl"><span class="s"> --hf-file gemma-4-12b-it-Q8_0.gguf \
</span></span></span><span class="line"><span class="cl"><span class="s"> --mmproj "$HOME/pulumi-gemma4-llm/models/mmproj-F16.gguf" \
</span></span></span><span class="line"><span class="cl"><span class="s"> --host 127.0.0.1 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --port 18080 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --ctx-size 131072 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --parallel 1 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --jinja \
</span></span></span><span class="line"><span class="cl"><span class="s"> --reasoning off
</span></span></span><span class="line"><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">chmod +x <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/start-llama-server.sh"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cat > <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents/com.pulumi.gemma4.llama-server.plist"</span> <span class="s"><<EOF
</span></span></span><span class="line"><span class="cl"><span class="s"><?xml version="1.0" encoding="UTF-8"?>
</span></span></span><span class="line"><span class="cl"><span class="s"><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
</span></span></span><span class="line"><span class="cl"><span class="s"><plist version="1.0">
</span></span></span><span class="line"><span class="cl"><span class="s"><dict>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>Label</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>com.pulumi.gemma4.llama-server</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>ProgramArguments</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <array>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home/start-llama-server.sh</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> </array>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>WorkingDirectory</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>RunAtLoad</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <true/>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>KeepAlive</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <true/>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>StandardOutPath</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home/logs/llama-server.out.log</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>StandardErrorPath</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home/logs/llama-server.err.log</string>
</span></span></span><span class="line"><span class="cl"><span class="s"></dict>
</span></span></span><span class="line"><span class="cl"><span class="s"></plist>
</span></span></span><span class="line"><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">launchctl bootout gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server 2>/dev/null <span class="o">||</span> <span class="nb">true</span>
</span></span><span class="line"><span class="cl">launchctl bootstrap gui/<span class="k">$(</span>id -u<span class="k">)</span> <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents/com.pulumi.gemma4.llama-server.plist"</span>
</span></span><span class="line"><span class="cl">launchctl kickstart -k gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server
</span></span>

Check the launchd service and llama.cpp logs:

<span class="line"><span class="cl">launchctl print gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server
</span></span><span class="line"><span class="cl">tail -f <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm/logs/llama-server.err.log"</span>
</span></span>

If you want to stop llama.cpp later, unload the launchd service:

<span class="line"><span class="cl">launchctl bootout gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server
</span></span>

Deploy Open WebUI with Pulumi and k3d

Now we’ll deploy Open WebUI into a local Kubernetes cluster. This provides a polished chat interface that connects to our host-native LLM.

First, install k3d if you haven’t already:

<span class="line"><span class="cl">brew install k3d
</span></span>

Create a new cluster for this project:

<span class="line"><span class="cl">k3d cluster create pulumi-gemma4-blog-qa
</span></span>

We’ll use the Pulumi program in pulumi/examples. This program defaults to runtimeMode=host, which creates a Kubernetes ExternalName service pointing to your host machine.

Why not run the LLM inside Kubernetes on this Mac? Pulumi can do that, and the example supports it with runtimeMode=cluster, but that path is meant for Linux hosts with NVIDIA or AMD GPU device plugins.

On macOS, llama.cpp enables Metal by default, and Metal acceleration is available to native macOS processes. k3d runs Linux containers through Docker Desktop, so those pods do not get direct access to the Mac’s Metal device. Docker’s own vLLM Metal announcement calls out the same boundary: Metal-backed inference runs natively on the host because there is no Metal GPU passthrough for containers. That is why this setup keeps inference host-native and lets Pulumi manage the Kubernetes UI, service wiring, and optional Tailscale access around it.

Clone the examples repo, navigate to the program directory, and initialize a new stack:

<span class="line"><span class="cl">git clone https://github.com/pulumi/examples.git
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> examples/kubernetes-py-self-host-gemma4-llm
</span></span><span class="line"><span class="cl">pulumi stack init gemma4-local
</span></span>

Configure the program to match your local setup:

<span class="line"><span class="cl">pulumi config <span class="nb">set</span> hostLlmPort <span class="m">18080</span>
</span></span><span class="line"><span class="cl">pulumi config <span class="nb">set</span> llmBaseUrl http://llm-server:18080/v1
</span></span>

The Kubernetes service named llm-server maps to host.k3d.internal. In our validation, we confirmed that a disposable k3d pod could reach the Mac’s llama.cpp API at http://llm-server:18080/v1/models after a CoreDNS restart.

<span class="line"><span class="cl">kubectl rollout restart deployment coredns -n kube-system
</span></span>

Run pulumi up to deploy Open WebUI and connect it to host-native llama.cpp:

<span class="line"><span class="cl">pulumi up
</span></span>

In our validation environment, this command successfully reached the resource synthesis phase without requiring Tailscale credentials because Tailscale exposure is opt-in.

Access Open WebUI through Tailscale

Tailscale allows you to access your private Open WebUI instance from any device on your tailnet. Note that we only expose the web interface, not the raw LLM API, to keep the system secure.

The base Open WebUI deployment works without Tailscale credentials. To expose the web UI on your tailnet, enable Tailscale resources and provide an explicit api_key or OAuth/identity token:

<span class="line"><span class="cl">pulumi config <span class="nb">set</span> enableTailscale <span class="nb">true</span>
</span></span><span class="line"><span class="cl">pulumi config <span class="nb">set</span> tailscale:apiKey YOUR_API_KEY --secret
</span></span>

Once configured, Pulumi will create a Tailscale device or proxy that routes traffic to your Open WebUI service.

Use the model with Pi

Open WebUI gives you a browser-based chat interface, but local models are also useful from coding agents. Pi is the local coding agent used for this validation; if you do not use Pi, treat this section as an example of how any OpenAI-compatible client can point at the same local endpoint. Pi can point at the same OpenAI-compatible llama.cpp endpoint and use the model running on your Mac.

For a fresh Pi config, create ~/.pi/agent/models.json with a local provider that points at the llama.cpp API:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"providers"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"local-llama"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"baseUrl"</span><span class="p">:</span> <span class="s2">"http://127.0.0.1:18080/v1"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"api"</span><span class="p">:</span> <span class="s2">"openai-completions"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"apiKey"</span><span class="p">:</span> <span class="s2">"local"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"compat"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"supportsDeveloperRole"</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"supportsReasoningEffort"</span><span class="p">:</span> <span class="kc">false</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"models"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"unsloth/gemma-4-12b-it-GGUF"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"name"</span><span class="p">:</span> <span class="s2">"Gemma 4 12B Q8 (local llama.cpp)"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"reasoning"</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"input"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"text"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"contextWindow"</span><span class="p">:</span> <span class="mi">131072</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"maxTokens"</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"cost"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"input"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"output"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"cacheRead"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"cacheWrite"</span><span class="p">:</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

Then set Pi to use that provider and model by default in ~/.pi/agent/settings.json:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"defaultProvider"</span><span class="p">:</span> <span class="s2">"local-llama"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"defaultModel"</span><span class="p">:</span> <span class="s2">"unsloth/gemma-4-12b-it-GGUF"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"defaultThinkingLevel"</span><span class="p">:</span> <span class="s2">"off"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hideThinkingBlock"</span><span class="p">:</span> <span class="kc">true</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

If you already have Pi configuration files, merge the local-llama provider and defaults into your existing JSON instead of replacing the files.

Advanced: Linux GPU in-cluster serving

If you’re running on a Linux host with an NVIDIA or AMD GPU, you can run the LLM directly inside the Kubernetes cluster. This requires the NVIDIA or ROCm device plugins.

The Pulumi program supports this through runtimeMode=cluster. In this mode, it deploys a LlmServer pod that manages the llama.cpp process within the cluster, using GPU resource requests to ensure hardware acceleration.

Cleanup

When you’re done, you can tear down the resources:

<span class="line"><span class="cl">pulumi destroy
</span></span><span class="line"><span class="cl">k3d cluster delete pulumi-gemma4-blog-qa
</span></span><span class="line"><span class="cl"><span class="c1"># Stop llama.cpp using the PID from your terminal</span>
</span></span><span class="line"><span class="cl"><span class="nb">kill</span> <PID>
</span></span>

AWS reports in an AWS Architecture Blog case study that Deloitte’s move to a virtual cluster model on Amazon EKS resulted in 89% faster testing environment provisioning. By consolidating dozens of disparate clusters into a single host cluster with over 50 vCluster instances, the case study says Deloitte saved about 500 QA hours per year. This “Environment Factory” pattern allows platform teams to provide isolated, ephemeral Kubernetes environments on demand without the cost or lag of full cluster provisioning.

This post adapts that general architecture with Pulumi to orchestrate Amazon EKS Auto Mode and vCluster.

The problem: environment sprawl and provisioning lag

Traditional development workflows often rely on one full EKS cluster per developer or feature branch. While this provides strong isolation, it introduces major pain points. Provisioning a full cluster can take 15 minutes or more, which slows down CI/CD pipelines. Managing dozens of clusters also leads to high costs and significant operational overhead.

Platform teams need a “soft multi-tenancy” model. This model should feel like a dedicated cluster to the developer but run on shared infrastructure to keep costs low and startup times fast.

Architecture overview: the host and the tenants

The environment factory architecture consists of two main layers.

  1. Host cluster: A single, reliable EKS cluster managed with EKS Auto Mode. This cluster provides the underlying compute, networking, and storage.
  2. Tenant environments: Virtual clusters (vCluster) running as pods within host namespaces.

According to the vCluster architecture, the virtual control plane handles API requests while a syncer maps virtual resources to the host cluster. This separation allows tenants to manage their own CRDs, namespaces, and RBAC while platform teams use quotas, NetworkPolicies, pod security, IAM boundaries, and node isolation controls to protect the host and other tenants.

Implementation: the EKS Auto Mode host

EKS Auto Mode simplifies the host cluster by automating infrastructure management. It handles node provisioning, scaling, and updates based on pod requirements.

The following snippet shows how to define an EKS cluster with Auto Mode enabled using Pulumi.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">awsx</span> <span class="kr">from</span> <span class="s2">"@pulumi/awsx"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">eks</span> <span class="kr">from</span> <span class="s2">"@pulumi/eks"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="p">{</span> <span class="nx">SubnetType</span> <span class="p">}</span> <span class="kr">from</span> <span class="s2">"@pulumi/awsx/ec2"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">clusterName</span> <span class="o">=</span> <span class="s2">"environment-factory"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vpc</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">awsx</span><span class="p">.</span><span class="nx">ec2</span><span class="p">.</span><span class="nx">Vpc</span><span class="p">(</span><span class="s2">"environment-factory"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">enableDnsHostnames</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">cidrBlock</span><span class="o">:</span> <span class="s2">"10.0.0.0/16"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">subnetSpecs</span><span class="o">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="kr">type</span><span class="o">:</span> <span class="nx">SubnetType</span><span class="p">.</span><span class="nx">Public</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">tags</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="sb">`kubernetes.io/cluster/</span><span class="si">${</span><span class="nx">clusterName</span><span class="si">}</span><span class="sb">`</span><span class="p">]</span><span class="o">:</span> <span class="s2">"shared"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"kubernetes.io/role/elb"</span><span class="o">:</span> <span class="s2">"1"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="kr">type</span><span class="o">:</span> <span class="nx">SubnetType</span><span class="p">.</span><span class="nx">Private</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">tags</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="sb">`kubernetes.io/cluster/</span><span class="si">${</span><span class="nx">clusterName</span><span class="si">}</span><span class="sb">`</span><span class="p">]</span><span class="o">:</span> <span class="s2">"shared"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"kubernetes.io/role/internal-elb"</span><span class="o">:</span> <span class="s2">"1"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">subnetStrategy</span><span class="o">:</span> <span class="s2">"Auto"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Create an EKS cluster with Auto Mode enabled.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">hostCluster</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">eks</span><span class="p">.</span><span class="nx">Cluster</span><span class="p">(</span><span class="s2">"host-cluster"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">name</span>: <span class="kt">clusterName</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">authenticationMode</span>: <span class="kt">eks.AuthenticationMode.Api</span><span class="p">,</span> <span class="c1">// Use API authentication mode for EKS access entries.
</span></span></span><span class="line"><span class="cl"> <span class="nx">vpcId</span>: <span class="kt">vpc.vpcId</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">publicSubnetIds</span>: <span class="kt">vpc.publicSubnetIds</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">privateSubnetIds</span>: <span class="kt">vpc.privateSubnetIds</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">autoMode</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">enabled</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">hostProvider</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">Provider</span><span class="p">(</span><span class="s2">"host-provider"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kubeconfig</span>: <span class="kt">hostCluster.kubeconfig</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

Implementation: the environment factory

Once the host cluster is ready, we can build the factory that stamps out tenant environments. Each tenant needs a dedicated namespace, resource quotas, and the vCluster itself.

Tenant guardrails

Before installing vCluster, we set up a namespace and resource quotas to ensure one tenant cannot consume all host resources.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Define a tenant namespace on the host cluster.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">tenantNamespace</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">core</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">Namespace</span><span class="p">(</span><span class="s2">"tenant-alpha"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="nx">name</span><span class="o">:</span> <span class="s2">"tenant-alpha"</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Apply resource quotas to the tenant namespace.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">quota</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">core</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">ResourceQuota</span><span class="p">(</span><span class="s2">"tenant-quota"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nx">spec</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">hard</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">pods</span><span class="o">:</span> <span class="s2">"20"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"requests.cpu"</span><span class="o">:</span> <span class="s2">"4"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"requests.memory"</span><span class="o">:</span> <span class="s2">"8Gi"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"limits.cpu"</span><span class="o">:</span> <span class="s2">"8"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"limits.memory"</span><span class="o">:</span> <span class="s2">"16Gi"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Define a Role for the tenant within their namespace.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">tenantRole</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">rbac</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"tenant-role"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nx">rules</span><span class="o">:</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">apiGroups</span><span class="o">:</span> <span class="p">[</span><span class="s2">""</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">resources</span><span class="o">:</span> <span class="p">[</span><span class="s2">"pods"</span><span class="p">,</span> <span class="s2">"services"</span><span class="p">,</span> <span class="s2">"configmaps"</span><span class="p">,</span> <span class="s2">"secrets"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">verbs</span><span class="o">:</span> <span class="p">[</span><span class="s2">"get"</span><span class="p">,</span> <span class="s2">"list"</span><span class="p">,</span> <span class="s2">"watch"</span><span class="p">,</span> <span class="s2">"create"</span><span class="p">,</span> <span class="s2">"update"</span><span class="p">,</span> <span class="s2">"patch"</span><span class="p">,</span> <span class="s2">"delete"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="p">}],</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Bind the Role to a tenant user or group.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">tenantRoleBinding</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">rbac</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">RoleBinding</span><span class="p">(</span><span class="s2">"tenant-role-binding"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nx">subjects</span><span class="o">:</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kind</span><span class="o">:</span> <span class="s2">"User"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="c1">// Replace "tenant-user" with the IAM-mapped user or group for this tenant.
</span></span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"tenant-user"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">apiGroup</span><span class="o">:</span> <span class="s2">"rbac.authorization.k8s.io"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">}],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">roleRef</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kind</span><span class="o">:</span> <span class="s2">"Role"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">name</span>: <span class="kt">tenantRole.metadata.name</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">apiGroup</span><span class="o">:</span> <span class="s2">"rbac.authorization.k8s.io"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span>

For production use, map these Kubernetes identities to IAM principals using EKS Access Entries, with the legacy aws-auth ConfigMap still appearing in older clusters.

Deploying vCluster with Helm

We use the kubernetes.helm.v3.Release resource to install vCluster. This resource provides controlled Helm lifecycle management for the vCluster release. The values block should be adjusted for each tenant profile to control resource synchronization and control plane behavior. Review the vCluster release notes when changing chart versions because values schema and generated secret names can change across releases.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Install vCluster using the Helm Release resource.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vcluster</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">helm</span><span class="p">.</span><span class="nx">v3</span><span class="p">.</span><span class="nx">Release</span><span class="p">(</span><span class="s2">"vcluster-alpha"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"vcluster-alpha"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">chart</span><span class="o">:</span> <span class="s2">"vcluster"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">version</span><span class="o">:</span> <span class="s2">"0.20.0"</span><span class="p">,</span> <span class="c1">// Tested with vCluster 0.20.x; review release notes before changing versions.
</span></span></span><span class="line"><span class="cl"> <span class="nx">repositoryOpts</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">repo</span><span class="o">:</span> <span class="s2">"https://charts.loft.sh"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">values</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="c1">// Explicit sync configuration; adjust per tenant profile.
</span></span></span><span class="line"><span class="cl"> <span class="nx">sync</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">toHost</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">pods</span><span class="o">:</span> <span class="p">{</span> <span class="nx">enabled</span>: <span class="kt">true</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span>
Accessing the virtual cluster

The vCluster generates a kubeconfig that allows developers to interact with the virtual API server. We must treat this kubeconfig as a secret, and the endpoint in that kubeconfig must be reachable from the Pulumi runner before using it to create resources inside the virtual cluster.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">pulumi</span> <span class="kr">from</span> <span class="s2">"@pulumi/pulumi"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Retrieve the vCluster kubeconfig from the generated secret.
</span></span></span><span class="line"><span class="cl"><span class="c1">// The vCluster-generated secret can lag behind Helm release readiness on first creation,
</span></span></span><span class="line"><span class="cl"><span class="c1">// so teams may choose an explicit readiness check or rerun after the virtual control plane initializes.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vclusterKubeconfig</span> <span class="o">=</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">core</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">Secret</span><span class="p">.</span><span class="kr">get</span><span class="p">(</span><span class="s2">"vcluster-secret"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">pulumi</span><span class="p">.</span><span class="nx">interpolate</span><span class="sb">`</span><span class="si">${</span><span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span><span class="si">}</span><span class="sb">/vc-vcluster-alpha`</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">provider</span>: <span class="kt">hostProvider</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">dependsOn</span><span class="o">:</span> <span class="p">[</span><span class="nx">vcluster</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">).</span><span class="nx">data</span><span class="p">.</span><span class="nx">apply</span><span class="p">(</span><span class="nx">data</span> <span class="o">=></span> <span class="nx">Buffer</span><span class="p">.</span><span class="kr">from</span><span class="p">(</span><span class="nx">data</span><span class="p">[</span><span class="s2">"config"</span><span class="p">],</span> <span class="s2">"base64"</span><span class="p">).</span><span class="nx">toString</span><span class="p">());</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Export the kubeconfig as a secret.
</span></span></span><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">tenantKubeconfig</span> <span class="o">=</span> <span class="nx">pulumi</span><span class="p">.</span><span class="nx">secret</span><span class="p">(</span><span class="nx">vclusterKubeconfig</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Create a provider for the virtual cluster using the secret kubeconfig.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vclusterProvider</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">Provider</span><span class="p">(</span><span class="s2">"vcluster-provider"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kubeconfig</span>: <span class="kt">tenantKubeconfig</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

Operational caveats

  • RBAC and permissions: vCluster generates default RBAC rules that work for most scenarios. However, if your host cluster is heavily locked down, you may need to provide additional permissions to the vCluster service account.
  • Helm release previews: When using kubernetes.helm.v3.Release, Pulumi previews may not show every detail of the rendered Kubernetes resources. It primarily tracks the state of the Helm release itself.
  • EKS Auto Mode node lifetime: EKS Auto Mode uses immutable AMIs and has a 21-day node lifetime. Kubernetes reschedules vCluster pods and tenant workloads when nodes are replaced, so configure replicas, PodDisruptionBudgets, requests, and persistent storage for disruption tolerance.

Conclusion: ephemeral environments at scale

By combining Pulumi with EKS Auto Mode and vCluster, you can build a scalable environment factory. This approach provides the isolation developers need while maintaining the speed and cost-efficiency required by platform teams.

The snippets provided here are adapted for illustration. In a production environment, you would likely wrap these resources into a Pulumi ComponentResource to provide a clean, reusable API for your internal developers. When a feature branch is merged, deleting the Pulumi stack removes the resources managed by that stack, but validate namespace finalizers, persistent volume reclaim policies, and external cloud artifacts as part of cleanup.

For more on managing EKS with Pulumi, see the EKS guide.

AI coding has two shapes right now. One agent in a loop, sequential work, you babysitting the chat window. Call that 2x. Most teams live here. Five agents in worktrees, parallel work, fresh-context review on every change. Call that 10x. The trick: 2x is mostly prompting, 10x is mostly plumbing.

The parallel coding playbook is a five-pattern setup for running multiple AI coding agents at the same time without them stepping on each other: an issue used as the spec, a plan/build/validate loop, parallel git worktrees, fresh-session review, and a self-healing layer. The whole thing targets application code. The interesting question, and the one I keep ending up at, is what changes when the five agents are touching infrastructure.

2x is prompting, 10x is plumbing

2x is one human, one agent, one repo, one branch. The agent writes, you review, you tell it to try again, it tries again. The bottleneck is your attention. Whatever the agent’s raw throughput, your reading speed sets the ceiling.

10x moves you out of the per-change loop and into the issue loop. You write five issues with sharp acceptance criteria, send each one to its own agent in its own worktree, and let them plan, build, and validate end-to-end. You read five PRs at lunch instead of pair-programming on one all morning.

Concurrent isolation does the work. And isolation is mostly an infrastructure problem.

The five pillars

The five pillars, in one sentence each.

  1. Issue is the spec. The GitHub issue carries the acceptance criteria. The pull request is the artifact that gets validated. Input and output of every implementation are versioned, scoped, and reviewable on their own.
  2. Plan, build, validate. Three stages, three artifacts. A markdown plan you can read in thirty seconds. A build that produces a diff. A validate step that checks the diff against the spec.
  3. Parallel worktrees. Each agent runs in its own git worktree so concurrent changes never trample each other. One repo, five working trees, five branches.
  4. Fresh-session review. A different agent, a different conversation, no shared context, reads the output and judges it. The reviewer never sees the writer’s chat. An agent reviewing its own output in the same context is theater.
  5. Self-healing layer. When the same issue keeps coming back, fix the system that allowed it. Update the rules, the skills, the AGENTS.md. The agent gets better; the bug class disappears.

The application-code version of this playbook leans on ports, node_modules, and databases to get isolation right. The infrastructure version has a different toolbox.

What changes when the agents are touching infrastructure

Walk the pillars again, this time with a Pulumi shop in mind.

Issue is the spec. For application code, the spec describes behavior. For infrastructure, the spec is a Pulumi component contract plus a CrossGuard policy excerpt. “The resulting bucket is private, lives in eu-west-1, has SSE-KMS, and is tagged owner=team-x.” That sentence compiles to a typed component signature and three policy assertions. The agent does not get to interpret “looks right.” The acceptance criteria are deterministic, which is the whole reason this works.

Plan, build, validate. Pulumi already ships the validate step. pulumi preview produces a deterministic, machine-readable diff a second reviewer can judge without the conversation that produced it. The plan is a markdown doc the agent writes before touching code. The build is pulumi up --target against a review stack scoped to the resources the issue covers. The validate step is the preview output plus the CrossGuard verdict.

Parallel worktrees. Worktrees alone are not enough. Two worktrees pointing at the same Pulumi stack will fight over state on the first concurrent up. The unit of isolation for infrastructure is the stack, not the worktree. Each worktree gets its own ephemeral review stack and its own ESC environment for credentials. State branches with the work, credentials branch with the work, and the cloud account does not see five agents elbowing each other.

Fresh-session review. The hardest part of the application-code version is keeping the reviewer cold. For infrastructure, the substrate hands you the cold context. The pulumi preview JSON has no memory of the prompt that produced it. A separate agent reading it has the same starting point a human reviewer has: a diff, a stack name, a policy report. Pulumi Neo reasons over the state graph directly, so the reviewer grounds every claim in what the change actually does, not what the writer says it does. Reviewer quality still depends on how well your policies cover the stack, but the cold-context part comes built in.

Self-healing layer. Most CrossGuard rule messages today read like assertions. “S3 bucket has no encryption.” A self-healing layer needs them to read like instructions. “S3 bucket has no encryption. Set serverSideEncryptionConfiguration with SSE-KMS to fix.” That single rewrite is the difference between an agent flailing and an agent fixing the violation on the first try. When the same rule keeps tripping, the fix is upstream of the next pull request: in the rules, in the skills, in the policy itself.

The five catches, infra edition

Every parallelism story has a catch list. The application-code version lists port conflicts, node_modules sprawl, database conflicts, token blowouts, and PR pile-up. The infrastructure equivalents map almost one to one.

Port conflicts become stack-name collisions. Two agents naming their stack dev and racing each other into Pulumi Cloud. The fix is the same hash-the-path trick the app-code playbook uses: derive the stack name from pulumi.getProject() plus a hash of the worktree path. Resource names follow the same pattern. Collisions go away.

node_modules sprawl becomes provider plugin sprawl, mostly already solved. Three worktrees each pulling their own copy of pulumi-aws would add up fast, except Pulumi already shares plugins through a single cache at ~/.pulumi/plugins. Identical provider versions are reused across worktrees automatically. Per-worktree language SDKs (node_modules, venv) still need the usual care, but the provider layer is free.

Database conflicts become state conflicts. Two agents racing each other into pulumi up on the same stack is the same hazard as two agents writing to the same migrated database. The app-code playbook reaches for Neon branches or per-worktree SQLite files to isolate state. The infra answer is simpler: each worktree gets its own review stack. State branches with the work, by construction.

Token blowouts become cloud spend per ephemeral stack. The cost vector flips. For app code, the worry is LLM bills. For infrastructure, the worry is what your five agents just spun up in five review stacks. The mitigations are boring and they work. Use TTL stacks to tear review stacks down on a schedule. Avoid retainOnDelete on review-stack resources so the teardown actually frees them. Cap retries per spec. Watch the bill.

PR pile-up is the same problem. Five reviewed diffs are still five things waiting on the merge queue. The infra-flavored mitigations: stack-scoped reviewers (the human who owns the stack approves the change to it), the Pulumi Cloud audit log for grouping by stack and time, and auto-merge for the narrow class of changes where the preview diff is clean and every policy passes. That last one is where most of the throughput hides.

Where to start, this afternoon

Three steps, in order, on a stack with a small blast radius.

  1. Write an AGENTS.md for the repo. Five paragraphs is enough. The component library, the stack naming convention, the policy rules, the review-stack TTL, and the one thing in this repo that bites every newcomer. Neo reads AGENTS.md natively, as do most coding agents. This file is the spec for how the agent should behave even before you write a spec for what it should build.
  2. Cut a 24-hour review-stack TTL. Spin up a review stack on PR open, tear it down on PR close or after 24 hours, whichever comes first. This is the gate that turns “ephemeral” from a slogan into a line item that does not appear on next month’s bill.
  3. Run three issues in parallel. Pick three open issues that touch unrelated resources. Spin up three worktrees, three review stacks, three ESC environments. Let each agent run end-to-end against its own stack. Then have a fourth agent read each preview JSON cold and produce a one-paragraph review. Read three PRs plus the reviewer’s summary at lunch.

That last step is the measurement. The first time you run it, half of the changes will fail validation. The second time, fewer. By the third time you will know whether your spec quality, your policies, and your stack hygiene are good enough to scale this to five, then ten, then to every issue tagged infra:fix.

If three issues finish cleanly, you have the substrate. If they do not, the gap is almost always in the spec or the policy rules, not the agent. Fix the spec, tighten the rule, run it again.

Five stacks before lunch

10x is five concurrent agents, working from five issues, against five stacks, behind five fresh-session reviews. The substrate is already there. Stacks isolate state. ESC isolates credentials. pulumi preview is the deterministic artifact a fresh reviewer can read cold. CrossGuard is the self-healing layer when you write the rule messages as instructions.

The remaining work is small and mostly wiring. Write the AGENTS.md. Cut the TTL. Pick three issues that touch unrelated resources. Read three PRs at lunch. Five stacks before the room empties out is a realistic Monday.

See how Pulumi Neo runs your stacks

Today, we are announcing v1.0 of the Pulumi Service Provider: a major milestone in managing Pulumi Cloud with Pulumi itself. The provider is now generated directly from the Pulumi Cloud OpenAPI specification, unlocking a dramatically expanded pulumiservice:api/* resource surface and enabling Pulumi Cloud capabilities to become available in the provider faster than ever before.

This release also brings several major new capabilities to infrastructure as code, including fine-grained RBAC as code, Pulumi IDP as code, and audit log export as IaC. Together, these changes make the Pulumi Service Provider the most powerful and extensible way yet to manage and automate your Pulumi Cloud infrastructure.

Why this matters for users

Historically, every new Pulumi Cloud feature implied a follow-up PR in the provider before that feature could be used from a Pulumi program. The provider was always slightly behind the API it wrapped, and entirely new capability areas could take months to land.

The api/* surface changes both timelines. Because the schema is derived from the OpenAPI spec at runtime:

  1. Whole new resource families land in the provider the same release they reach Pulumi Cloud.
  2. New fields, features, and enum values on existing resources show up across all five language SDKs the soon after they appear in the spec.

What’s new in v1.0

v1.0 lifts whole capability areas of Pulumi Cloud into the api/* surface, not just incremental field additions. None of it required bespoke provider code.

  1. Fine-grained RBAC as code. Custom roles, organization membership, and team role assignments are now managed resources. For example, defining a read-only role and assigning it to a team:

    <span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">readOnly</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ps</span><span class="p">.</span><span class="nx">api</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"readOnly"</span><span class="p">,</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">orgName</span><span class="o">:</span> <span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">description</span><span class="o">:</span> <span class="s2">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">uxPurpose</span><span class="o">:</span> <span class="s2">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">details</span><span class="o">:</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">__type</span><span class="o">:</span> <span class="s2">"PermissionDescriptorAllow"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">permissions</span><span class="o">:</span> <span class="p">[</span><span class="s2">"stack:read"</span><span class="p">,</span> <span class="s2">"stack:list"</span><span class="p">],</span>
    </span></span><span class="line"><span class="cl"> <span class="p">},</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="k">new</span> <span class="nx">ps</span><span class="p">.</span><span class="nx">api</span><span class="p">.</span><span class="nx">teams</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"readOnlyForPlatform"</span><span class="p">,</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">orgName</span><span class="o">:</span> <span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">teamName</span><span class="o">:</span> <span class="s2">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">roleID</span>: <span class="kt">readOnly.roleID</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span>
    <span class="line"><span class="cl"><span class="n">read_only</span> <span class="o">=</span> <span class="n">pulumiservice</span><span class="o">.</span><span class="n">api</span><span class="o">.</span><span class="n">Role</span><span class="p">(</span><span class="s2">"readOnly"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">org_name</span><span class="o">=</span><span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">name</span><span class="o">=</span><span class="s2">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">description</span><span class="o">=</span><span class="s2">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">ux_purpose</span><span class="o">=</span><span class="s2">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">details</span><span class="o">=</span><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="s2">"__type"</span><span class="p">:</span> <span class="s2">"PermissionDescriptorAllow"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="s2">"permissions"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"stack:read"</span><span class="p">,</span> <span class="s2">"stack:list"</span><span class="p">],</span>
    </span></span><span class="line"><span class="cl"> <span class="p">})</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="n">pulumiservice</span><span class="o">.</span><span class="n">api</span><span class="o">.</span><span class="n">teams</span><span class="o">.</span><span class="n">Role</span><span class="p">(</span><span class="s2">"readOnlyForPlatform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">org_name</span><span class="o">=</span><span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">team_name</span><span class="o">=</span><span class="s2">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">role_id</span><span class="o">=</span><span class="n">read_only</span><span class="o">.</span><span class="n">role_id</span><span class="p">)</span>
    </span></span>
    <span class="line"><span class="cl"><span class="nx">readOnly</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">api</span><span class="p">.</span><span class="nf">NewRole</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">"readOnly"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">api</span><span class="p">.</span><span class="nx">RoleArgs</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">OrgName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"acme"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack-reader"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Description</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"Read-only access to stacks across the org."</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">UxPurpose</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"role"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Details</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nx">Map</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"__type"</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"PermissionDescriptorAllow"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"permissions"</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nx">StringArray</span><span class="p">{</span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack:read"</span><span class="p">),</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack:list"</span><span class="p">)},</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">},</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="p">})</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="nx">teams</span><span class="p">.</span><span class="nf">NewRole</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">teams</span><span class="p">.</span><span class="nx">RoleArgs</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">OrgName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"acme"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">TeamName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"platform"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">RoleID</span><span class="p">:</span><span class="w"> </span><span class="nx">readOnly</span><span class="p">.</span><span class="nx">RoleID</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="p">})</span><span class="w">
    </span></span></span>
    <span class="line"><span class="cl"><span class="kt">var</span> <span class="n">readOnly</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Ps</span><span class="p">.</span><span class="n">Api</span><span class="p">.</span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnly"</span><span class="p">,</span> <span class="k">new</span><span class="p">()</span>
    </span></span><span class="line"><span class="cl"><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="n">OrgName</span> <span class="p">=</span> <span class="s">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Name</span> <span class="p">=</span> <span class="s">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Description</span> <span class="p">=</span> <span class="s">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">UxPurpose</span> <span class="p">=</span> <span class="s">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Details</span> <span class="p">=</span> <span class="n">ImmutableDictionary</span><span class="p">.</span><span class="n">CreateRange</span><span class="p">(</span><span class="k">new</span><span class="p">[]</span>
    </span></span><span class="line"><span class="cl"> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">object</span><span class="p">>(</span><span class="s">"__type"</span><span class="p">,</span> <span class="s">"PermissionDescriptorAllow"</span><span class="p">),</span>
    </span></span><span class="line"><span class="cl"> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">object</span><span class="p">>(</span><span class="s">"permissions"</span><span class="p">,</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"stack:read"</span><span class="p">,</span> <span class="s">"stack:list"</span> <span class="p">}),</span>
    </span></span><span class="line"><span class="cl"> <span class="p">}),</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="k">new</span> <span class="n">Ps</span><span class="p">.</span><span class="n">Api</span><span class="p">.</span><span class="n">Teams</span><span class="p">.</span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span> <span class="k">new</span><span class="p">()</span>
    </span></span><span class="line"><span class="cl"><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="n">OrgName</span> <span class="p">=</span> <span class="s">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">TeamName</span> <span class="p">=</span> <span class="s">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">RoleID</span> <span class="p">=</span> <span class="n">readOnly</span><span class="p">.</span><span class="n">RoleID</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span>
    <span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="n">readOnly</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnly"</span><span class="p">,</span><span class="w"> </span><span class="n">RoleArgs</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">orgName</span><span class="p">(</span><span class="s">"acme"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">name</span><span class="p">(</span><span class="s">"stack-reader"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">description</span><span class="p">(</span><span class="s">"Read-only access to stacks across the org."</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">uxPurpose</span><span class="p">(</span><span class="s">"role"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">details</span><span class="p">(</span><span class="n">Map</span><span class="p">.</span><span class="na">of</span><span class="p">(</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"__type"</span><span class="p">,</span><span class="w"> </span><span class="s">"PermissionDescriptorAllow"</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"permissions"</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="p">.</span><span class="na">of</span><span class="p">(</span><span class="s">"stack:read"</span><span class="p">,</span><span class="w"> </span><span class="s">"stack:list"</span><span class="p">)))</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">());</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="k">new</span><span class="w"> </span><span class="n">com</span><span class="p">.</span><span class="na">pulumi</span><span class="p">.</span><span class="na">pulumiservice</span><span class="p">.</span><span class="na">api_teams</span><span class="p">.</span><span class="na">Role</span><span class="p">(</span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="n">com</span><span class="p">.</span><span class="na">pulumi</span><span class="p">.</span><span class="na">pulumiservice</span><span class="p">.</span><span class="na">api_teams</span><span class="p">.</span><span class="na">RoleArgs</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">orgName</span><span class="p">(</span><span class="s">"acme"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">teamName</span><span class="p">(</span><span class="s">"platform"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">roleID</span><span class="p">(</span><span class="n">readOnly</span><span class="p">.</span><span class="na">roleID</span><span class="p">())</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">());</span><span class="w">
    </span></span></span>
    <span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">readOnly</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">pulumiservice:api:Role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">properties</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgName</span><span class="p">:</span><span class="w"> </span><span class="l">acme</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">stack-reader</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l">Read-only access to stacks across the org.</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">uxPurpose</span><span class="p">:</span><span class="w"> </span><span class="l">role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">details</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">__type</span><span class="p">:</span><span class="w"> </span><span class="l">PermissionDescriptorAllow</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">permissions</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">stack:read</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">stack:list</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">readOnlyForPlatform</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">pulumiservice:api/teams:Role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">properties</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgName</span><span class="p">:</span><span class="w"> </span><span class="l">acme</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">teamName</span><span class="p">:</span><span class="w"> </span><span class="l">platform</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">roleID</span><span class="p">:</span><span class="w"> </span><span class="l">${readOnly.roleID}</span><span class="w">
    </span></span></span>
  2. Pulumi IDP as code. services:Service makes the Pulumi IDP catalog manageable from your Pulumi programs, surfaced the same release IDP ships in Pulumi Cloud. Platform teams can publish service definitions as code rather than only through the IDP console.

  3. Audit-log export as IaC. AuditLogExportConfiguration brings audit-log export sinks under Pulumi management with a real destroy path.

How it works

Pulumi Cloud’s OpenAPI document (published at https://api.pulumi.com/api/openapi/pulumi-spec.json) is embedded in the provider binary at build time, so the provider version you pin is the API surface you get. Preview and update are deterministic, and a version released today will still behave the same way years from now. Alongside the spec, the runtime loads a small companion metadata file that captures the Pulumi-specific semantics OpenAPI can’t express: which endpoints pair into a single resource, what a resource’s ID looks like, and which response fields are secrets that arrive exactly once at create time. That metadata is what lets api/* resources behave as expected.

Most of that metadata is auto-derived by a scaffolder, but the editorial layer, including resource descriptions, examples, and the v0 aliases that make migration safe, stays handmade. Any human override is pinned across regeneration so a future spec change can’t quietly override it. The language SDKs are still generated against the runtime schema, so new fields and enum values reach typed SDKs in all five languages the moment the spec ships.

What the api namespace covers

The api namespace already spans most of Pulumi Cloud’s resource model.

For resources that have an ancestor under pulumiservice:index:*, the mapping lives in docs/v0-api-coverage.md. That file is auto-generated, so it stays in sync. Each api/* resource ships hand-maintained per-language examples in TypeScript, Python, Go, C#, Java, and YAML.

What to know before adopting the preview

The pulumiservice:api:* resource surface is in preview. Resource shape and module layout may change before GA.

The existing pulumiservice:index:* resources remain supported. They are not being deprecated as part of v1.0 and continue to be supported. Migration to api/* is opt-in via Pulumi aliases.

Try it

If you want to take the expanded provider for a spin:

  1. The Pulumi Registry page for pulumiservice has install instructions for every language.
  2. The examples/api/ directory has runnable programs for each resource, in every supported language.
  3. The pulumi-pulumiservice repo is open source if you want to read the runtime, the embedded spec, or the metadata file directly.

Feedback during preview is very beneficial. Please open an issue here if you run into any problems.

Anthropic shipped a piece earlier this month called How Claude Code Works in Large Codebases. I have not read anything more useful about coding agents this year. The core claim, in their words: “the ecosystem built around the model—the harness—determines how Claude Code performs more than the model alone.” In my phrasing: in a real codebase, the model is the smaller variable. The layer of context and tooling you wire around the agent matters more than which version of Sonnet or Opus is behind it.

The post stays high-level, which is the right move for a launch piece. What I want to do here is land it. Same seven pieces, but with the wiring you would actually put in a repo, in the order I would put it.

How Claude Code navigates without an index

Anthropic’s writeup says Claude Code works from the live codebase and does not require a codebase index to be built, maintained, or uploaded. The agent navigates the way an engineer would, with grep, find, ls, file reads, and reference-following. Anthropic calls this agentic search, and the upside is obvious: no separate index exists for you to keep fresh.

The downside is also obvious. An engineer who has never seen your repo and only has shell tools will flounder if you drop them in the root with no map. That is your agent on day one. Everything that follows is about giving it the map.

The AI layer in seven pieces

Every codebase used to have two artifacts engineers cared about: the code and the tests. A third exists now. Call it the AI layer, or the harness, or whatever you want. This layer is the set of context and tools you give your coding agent to operate in this specific repo. Anthropic breaks it into seven pieces, and each one solves a different scaling problem.

Anthropic gives each piece a role: CLAUDE.md is the foundation, hooks do self-improvement, skills are progressive disclosure, plugins handle distribution, LSP gives navigation, MCP is extension, subagents split exploration from editing. They are not equal in usage either. CLAUDE.md is read at the start of each session and stays in context for the duration. The others fire when relevant.

Lean and layered CLAUDE.md

The single biggest mistake I see is a root CLAUDE.md that has grown into a small book. Two thousand lines of conventions for parts of the repo the current task will never touch. Every session pays the tax. Anthropic’s own guidance is to keep these files focused on what applies broadly so they do not become a drag on performance, and you can feel that drag in practice: the agent gets cautious, slow, and oddly literal.

Keep the root file lean. What is this repo, broadly. The tech stack. The commands the agent will need (make test, make lint, how to run the dev server). General conventions that apply everywhere. That is most of what belongs there.

Local conventions go in subdirectory CLAUDE.md files. When the agent starts in a subdirectory, Claude Code walks upward from the working directory and loads every CLAUDE.md it finds on the way to the repo root, so root context is never lost and intermediate layers stack in the order you would expect. Claude Code can also discover files below the current working directory when it reads files in those subdirectories. That means services/api/CLAUDE.md only joins the session when the work reaches that service. Same for services/billing/, the frontend, the data layer.

If you already know the task is scoped to one service, start the agent in that subdirectory. The working directory becomes the focus, and the agent stays out of unrelated code unless you tell it otherwise. Most of the time, you know.

Two more cheap wins live in the same neighborhood. Scope the make test and make lint commands so the subdirectory version runs only the slice the agent is working in, instead of the whole repo on every change. And version-control your exclusion rules in .claude/settings.json so the agent never reads dist/, generated SDKs, or vendored code. Every file the agent skips is tokens you keep for the work that matters. If your directory layout is unconventional or has historical baggage, Anthropic also suggests adding a short codebase map to the root CLAUDE.md so the agent has somewhere to anchor.

Hooks that make the harness self-improving

Most teams use hooks as guardrails. Block edits in vendor/, refuse to delete migrations, kill the run if a secret turns up in a diff. That is fine and you should do it. But hooks have a second life that almost no one uses, and that second life is the more interesting one.

Both kinds register the same way, in .claude/settings.json, against named events Claude Code fires during a session:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"SessionStart"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"command"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"uv run --directory \"$CLAUDE_PROJECT_DIR\" python .claude/hooks/session_start_context.py"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"Stop"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"command"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"uv run --directory \"$CLAUDE_PROJECT_DIR\" python .claude/hooks/propose_claude_md.py"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

A SessionStart hook fires before the agent has done anything. Whatever the script prints to stdout is injected straight into the session as context, so you can preload the things the agent would otherwise have to spend a turn discovering: the current branch, the uncommitted diff, the last few commits. For a larger team you might fetch the Confluence or Notion page that owns the directory the engineer is working in. Every developer starts each session pre-oriented, with no manual setup.

<span class="line"><span class="cl"><span class="s2">"""SessionStart hook — prints orientation Claude reads as session context."""</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">subprocess</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"CLAUDE_PROJECT_DIR"</span><span class="p">,</span> <span class="s2">"."</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">git</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">out</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"git"</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">],</span> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">out</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">"# Orientation</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Branch</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'rev-parse'</span><span class="p">,</span> <span class="s1">'--abbrev-ref'</span><span class="p">,</span> <span class="s1">'HEAD'</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Uncommitted changes</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'status'</span><span class="p">,</span> <span class="s1">'--porcelain'</span><span class="p">)</span> <span class="ow">or</span> <span class="s1">'(clean)'</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Recent commits</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'log'</span><span class="p">,</span> <span class="s1">'-5'</span><span class="p">,</span> <span class="s1">'--oneline'</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span>

The Stop hook is the more interesting one. It fires when the agent finishes its turn. At that moment the session context is still fresh, the diff is still small, and you have a free shot at a question nobody asks: did anything I changed invalidate the rules I wrote down? Spawn a separate headless Claude session, hand it the diff and the relevant CLAUDE.md files, ask it to propose updates, and write the result to a markdown review file. You read it when you are ready. The CLAUDE.md files stop going stale on their own.

The trick is to make the hook itself cheap and dispatch the LLM call in the background, so the end of every turn does not block on a reflection:

<span class="line"><span class="cl"><span class="s2">"""Stop hook — dispatch a headless Claude reflection in the background."""</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">subprocess</span><span class="o">,</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># The reflector spawns its own headless Claude, whose Stop hook lands back</span>
</span></span><span class="line"><span class="cl"><span class="c1"># here. The lock prevents infinite recursion.</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"REFLECT_LOCK"</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"CLAUDE_PROJECT_DIR"</span><span class="p">,</span> <span class="s2">"."</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">diff</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">"git"</span><span class="p">,</span> <span class="s2">"diff"</span><span class="p">,</span> <span class="s2">"HEAD"</span><span class="p">],</span> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span><span class="o">.</span><span class="n">stdout</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="ow">not</span> <span class="n">diff</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
</span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">env</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">,</span> <span class="s2">"REFLECT_LOCK"</span><span class="p">:</span> <span class="s2">"1"</span><span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">"uv"</span><span class="p">,</span> <span class="s2">"run"</span><span class="p">,</span> <span class="s2">"python"</span><span class="p">,</span> <span class="s2">".claude/hooks/reflect_claude_md.py"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="n">env</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">DEVNULL</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">DEVNULL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span>

reflect_claude_md.py is the part that calls a headless claude against the diff and writes .claude/claude-md-review.md. You can grow it from twenty lines to two hundred without ever blocking the agent.

The pattern that ties the two together: hooks let the harness improve itself in the background while you do the actual work.

Path-scoped skills

Skills are where the agent learns how to do a thing. CLAUDE.md is conventions (“every route is registered here”). Skills are workflows (“here is how you add a new route in this repo, end to end”). The two overlap, but the framing keeps me honest: rules in CLAUDE.md, recipes in skills.

The piece of the skills system most teams miss is the path scope. A skill can declare which directories it activates in. A create-api-endpoint skill that only loads when the agent is editing under services/api/ is invisible the rest of the time. With dozens of skills in a real repo, scoping is the difference between a useful library and a wall of irrelevant prompts.

The mental model: progressive disclosure for expertise. Most knowledge in a large codebase is local. Load it locally.

Symbol-level search through LSP and MCP

grep is fine until it isn’t. Past six-digit line counts, plain string search gets slow, returns too much, and burns tokens reading files the agent did not need to open. You also lose what every IDE has done for decades: jump-to-definition, find-references, hover-for-types.

You can give the agent the same navigation. Run a language server locally, wrap it in a small MCP server, expose two or three tools: where_is, find_references, goto_definition. The agent now searches by symbol, not by string. A request like “find every place monthly_total_cents is referenced” returns one definition and the actual references, instead of fifty grep hits that mention the substring in unrelated comments.

This is also where bigger orgs invest. Custom MCP servers that expose internal search systems, the code-ownership graph, the design-doc index. The patterns are the same; the targets are domain-specific. The point is that the agent does not have to brute-force its way through your repo when you already have better tools for finding things.

Image: Anthropic, How Claude Code Works in Large Codebases.

Subagents for exploration

The rule I follow: split exploration from editing. A subagent runs in its own context window. You ask which files implement the billing webhook flow, or what the user model looks like across services. It does the digging, and only the summary comes back to your primary session.

The win is context budget, not parallelism. Exploration is wasteful by nature. The agent reads forty files to find the three that matter, and most of those forty get thrown away. If that happens in your primary session, your editing turns start with a context window already half full of noise. If it happens in a subagent, the noise stays there. You get the answer.

Use the built-in Explore subagent liberally. Custom subagents earn their place when you have a workflow specific enough that a generic explorer is the wrong tool. The file shape is small: a single markdown file under .claude/agents/, a short frontmatter block, and a prompt body. name, description, tools, and model are enough to start:

<span class="line"><span class="cl">---
</span></span><span class="line"><span class="cl">name: explorer
</span></span><span class="line"><span class="cl">description: Read-only repo explorer. Map a service or package without burning the main session's context, then return findings.
</span></span><span class="line"><span class="cl">tools: Read, Grep, Glob
</span></span><span class="line"><span class="cl">model: sonnet
</span></span><span class="line"><span class="cl">---
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">You are a read-only explorer. The parent agent will hand you one service or
</span></span><span class="line"><span class="cl">package to map. Read its <span class="sb">`CLAUDE.md`</span> if there is one, then trace entry points,
</span></span><span class="line"><span class="cl">the public surface, and dependencies. Return findings as your final response.
</span></span><span class="line"><span class="cl">No edits.
</span></span>

Restricting tools to read-only is the load-bearing line. The model only sees the tools you expose, so an explorer subagent without Write or Edit has nothing to call when it gets tempted, even if the prompt body forgot to say so. Treat that as a strong default. If you need a hard guarantee, layer a PreToolUse hook on top.

Don’t let it rot

The harness is not a one-time setup. Models improve, and rules written for last year’s model often constrain this year’s. A note like “always split refactors into single-file changes” might have saved you in 2024 and might block a beneficial cross-file edit in 2026. Anthropic suggests reviewing your CLAUDE.md files every three to six months, or whenever performance feels like it has plateaued after a major model release. The stop-hook reflection gives you a head start. The rest is on you.

Assign an owner

The last piece is not technical. The teams that get value out of Claude Code at scale have someone who owns the harness. A small platform-engineering team, or one DRI, or a hybrid PM/engineer doing it half-time. Their job is the same shape as owning a CI pipeline: write the conventions, build the skills, run the LSP wrapper, version the hooks, evangelize what works, retire what does not.

Plugins are the distribution vehicle. A good harness that lives in one engineer’s dotfiles stays tribal. The same harness packaged as a plugin (or a private marketplace) is how a team of five hundred ends up running the same skills, the same MCP servers, and the same hooks without anyone having to remember to copy a config.

The pattern that fails: ship Claude Code to the org on a Friday, hope adoption goes viral, watch every team grow its own slightly different version of CLAUDE.md for six months. The pattern that works: a quiet build-out period, a small set of approved skills, a working plugin or two, a documented governance story, then broad access.

Treat the harness like infrastructure.

Where to start

The order that has worked for me, in any repo:

  1. Trim the root CLAUDE.md until it fits on one screen. Move the rest into subdirectories.
  2. Add a Stop hook that proposes updates to those CLAUDE.md files in headless mode.
  3. Convert your three most common repeated tasks into path-scoped skills.
  4. Run a language server behind an MCP server. Stop searching by string.
  5. Get comfortable dispatching exploration to subagents.

Most teams will plateau on step one for a week and find the agent is already noticeably sharper. The rest compounds. I have written more on the agent-tooling shift this is part of in How Building AI Agents Has Changed in 2026, and on the workflow side in The Claude Skills I Actually Use for DevOps and Superpowers, GSD, and GSTACK.

The model will keep getting better. The harness is the work.

The phrase “AI infrastructure” now means two different things. One is the GPUs, schedulers, and MLOps platforms that exist to run AI workloads. The other is AI that runs infrastructure: agents and assistants that generate, deploy, and govern cloud resources on your behalf. They’re different markets with different vendors, and most teams need to think about both.

The pressure to think about both is real. McKinsey research puts the productivity lift from generative AI in software development at 20–45%, which is great for application teams and a problem for platform teams trying to keep up with the resulting feature flow. Infrastructure investment is climbing on both fronts: more spend on the compute that trains and serves models, more spend on AI tools that manage everything else.

This guide covers both categories: the compute and MLOps stack in Part 1, and AI-powered infrastructure management in Part 2, where the more interesting product shift is happening.

AI infrastructure tools overview

Tools for building AI infrastructure
  1. CoreWeave: GPU cloud built for AI workloads
  2. Lambda Labs: straightforward GPU cloud for research and startups
  3. Modal: serverless GPU compute
  4. Weights & Biases: ML experiment tracking and model management
  5. MLflow: open-source ML lifecycle platform
  6. Hyperscaler AI platforms: AWS SageMaker, Google Vertex AI, Azure ML
AI-powered infrastructure management tools
  1. Pulumi Neo: agentic AI with policy automation
  2. Firefly AIaC: asset codification and IaC generation
  3. env0 Cloud Compass: multi-IaC insights and analysis
  4. Spacelift AI: run explanation and troubleshooting
  5. Crossplane with Upbound: Kubernetes-native infrastructure
  6. General-purpose code assistants: Copilot, Claude Code, Cursor, Gemini
  7. AWS Application Composer: visual serverless builder

Quick picks

If you only have two minutes:

  • Enterprise compliance: Pulumi Neo. Executes changes (not only suggestions), ships with policy packs for CIS, HITRUST, NIST, and PCI DSS, and works with Terraform, CloudFormation, and resources created by hand.
  • Serious GPU compute: CoreWeave. Purpose-built for AI workloads, deep NVIDIA partnership, and prices that generally undercut the hyperscalers.
  • Best developer experience for ML: Modal. Decorate a Python function, get a GPU, pay by the second.
  • Open-source MLOps: MLflow. No vendor lock-in, runs anywhere, plays well with everything.

What is AI infrastructure?

The term covers two distinct categories that share almost no vendors.

Infrastructure for AI is the compute, storage, and orchestration that AI workloads run on. Training a large model is not a normal cloud workload: it wants thousands of GPUs talking to each other over fat, low-latency networks for weeks at a time. Inference is different again: lower latency, smarter batching, different hardware. General-purpose cloud was not designed for either case, which is why specialized GPU clouds and MLOps platforms exist.

AI-powered infrastructure management is the inverse: AI tools that manage cloud infrastructure. They generate IaC, run deployments, detect drift, and remediate policy violations. The pitch is that modern infrastructure (multi-cloud, containers, microservices, regulated workloads) has gotten too complex for humans to manage by hand and too varied for scripted automation to keep up with.

Most organizations end up needing both: somewhere to run their ML workloads, and something to keep the rest of the cloud sane.

Part 1: Tools for building AI infrastructure

These are the platforms you run AI and ML workloads on: GPU clouds for raw compute, MLOps platforms for the lifecycle around them.

CoreWeave

CoreWeave is the GPU cloud that broke out of the AI hype cycle into a real public company. They went public in 2025, signed a multi-billion-dollar capacity deal with OpenAI, and acquired Weights & Biases. Their thesis from day one was that AI workloads deserve infrastructure designed for AI workloads, not a GPU SKU bolted onto a general-purpose cloud.

  • License: Proprietary
  • Best for: Large-scale training and high-throughput inference; teams that need dedicated GPU capacity with first access to new NVIDIA hardware
  • Strengths: GPU infrastructure designed for AI; Kubernetes-native; direct NVIDIA partnership; handles distributed training at scale
  • Watch out for: Smaller global footprint than AWS/GCP/Azure; not a general-purpose cloud, so if you need RDS, S3, and a managed Kafka in the same provider, this isn’t it
Lambda Labs

Lambda has been the approachable GPU cloud for a long time. Environments come pre-configured with PyTorch and TensorFlow, and you can be running on an H100 in about as long as it takes to copy your SSH key.

  • License: Proprietary
  • Best for: Research teams, startups, and individual practitioners who want GPUs without a configuration tax
  • Strengths: Straightforward to start on; pre-configured deep learning environments; competitive on-demand pricing; strong learning resources
  • Watch out for: Smaller scale than CoreWeave or the hyperscalers; availability gets tight during demand spikes
Modal

Modal’s pitch is that you write a Python function, decorate it, and Modal handles the GPU. No capacity planning, no idle instances burning money overnight, no Dockerfile to maintain.

  • License: Proprietary
  • Best for: Variable ML workloads where reserved capacity would sit idle; data scientists who’d rather not learn Kubernetes
  • Strengths: Strong developer experience; serverless GPUs with automatic scaling; pay-per-second pricing; cold starts are fast for what they are
  • Watch out for: You give up infrastructure control. Not ideal for long training jobs that need reserved hardware or strict configuration requirements.
Weights & Biases

Weights & Biases is the de facto standard for ML experiment tracking and model management, integrated with essentially every framework and cloud you’d plausibly use. CoreWeave acquired the company in 2025, which has accelerated the joint roadmap but raised some neutrality questions for teams that prefer their tooling cloud-agnostic.

  • License: Proprietary with a free tier
  • Best for: ML teams that need shared experiment tracking, model versioning, and reporting
  • Strengths: Industry-leading experiment tracking and visualization; comprehensive model registry; strong team collaboration; broad integration surface
  • Watch out for: Costs scale quickly past the free tier; some teams self-host alternatives for data residency reasons
MLflow

MLflow is the leading open-source MLOps platform: experiment tracking, packaging, registry, and serving, with no lock-in. Originally built at Databricks, it’s now a broad open-source ecosystem with managed offerings from multiple vendors (including Databricks and the major clouds).

  • License: Apache 2.0
  • Best for: Teams that want MLOps without a vendor; or want the option to start managed and self-host later
  • Strengths: Open source; covers the full ML lifecycle; runs locally, on-prem, or managed; broad framework support
  • Watch out for: Self-hosting carries the usual operational tax; commercial alternatives have stronger collaboration UX out of the box
Hyperscaler AI platforms

The major clouds all sell end-to-end ML platforms. Each leads on the dimensions that line up with its parent cloud (Vertex for Google’s models and TPUs, SageMaker for AWS-native data pipelines, Azure ML for Microsoft-stack integration), but the wider integration with the rest of the cloud is the deciding factor.

  • AWS SageMaker: end-to-end ML on AWS, deeply integrated with S3 and Glue, with first-class connections to Lambda for serverless inference and to the rest of the AWS data stack. The default pick if your data already lives in AWS.
  • Google Vertex AI: Google’s ML stack, including TPUs for workloads that need them, plus access to Google’s foundation models. Strongest when paired with BigQuery.
  • Azure Machine Learning: the natural choice when the rest of your stack is Microsoft; first-party MLOps integrations across GitHub Actions, Azure DevOps, and Microsoft Fabric for downstream reporting. The right choice if you’re already an Azure shop with Microsoft compliance requirements.

The shared tradeoff: hyperscaler GPU compute typically runs 2–3x the per-hour price of specialized providers, and the platforms work best when you commit to them top to bottom. For organizations already inside one cloud, the unified billing and single support contract usually justifies the premium. For a new ML team starting from scratch, it rarely does.

Part 2: AI-powered infrastructure management tools

This is where the more interesting product shift is happening. Instead of running AI on infrastructure, these tools point AI at your infrastructure and let it do work.

From code generation to agentic execution

Before the tool list, one distinction matters more than any feature comparison: whether the tool generates code or executes changes.

Code generation tools like GitHub Copilot suggest infrastructure code based on context. You review it, maybe edit it, run it yourself. The AI helps, but you’re still the one doing the work.

Agentic platforms generate the code and run it, with the guardrails you define. They understand your environment, handle multi-step workflows, and enforce policies on the way through. You describe the outcome; the platform makes it happen.

Capability

Code generation

Agentic execution

Generates infrastructure code

Yes

Yes

Understands infrastructure context

Limited

Deep

Executes changes

No

Yes

Handles multi-step workflows

No

Yes

Enforces policies automatically

No

Yes

Remediates drift and violations

No

Yes

Where you want to land on this spectrum is mostly a governance question, not a productivity one.

Pulumi Neo

Pulumi Neo is Pulumi’s agentic AI for infrastructure. The distinguishing claim is execution: Neo doesn’t only suggest a Terraform snippet, it figures out the right resources, generates the code, and runs the deployment inside whatever guardrails you’ve set.

  • License: Proprietary (Pulumi Cloud)
  • Best for: Platform engineering teams that want AI automation with real policy controls, especially in regulated industries

A few things that set it apart in practice:

Policy automation and compliance. Neo is integrated with Pulumi Insights and Governance, which ships pre-built policy packs for CIS benchmarks, HITRUST CSF, NIST SP 800-53, and PCI DSS. Detection and remediation run in the same loop: Neo finds a violation, generates a fix, and (subject to approvals) applies it. You can batch-remediate across stacks and accounts with prompts like “find and fix all unencrypted S3 buckets across our AWS accounts.”

Works with infrastructure you didn’t create with Pulumi. Neo’s governance applies to Pulumi-managed resources, Terraform state, CloudFormation stacks, and resources someone clicked together in the AWS console. That matters because the realistic adoption path is to point Neo at what you have, audit it, and gradually bring it under management, not to migrate everything first.

Progressive autonomy. Trust levels are configurable. Start with human approval for everything; loosen it for well-defined, low-risk operations as confidence builds; keep production and sensitive resources behind strict approvals. This is the part that tends to determine whether enterprises actually deploy agentic AI in anger, versus letting it sit as a sandbox toy.

IDE and CI/CD integration. The Pulumi MCP Server brings Neo into Cursor, Claude Code, Claude Desktop, Windsurf, and any other MCP-compatible client. The Pulumi Cloud UI is the home base for approvals, history, and remediation status. Neo also slots into CI/CD pipelines for pre-merge policy remediation.

Case studies:

  • Werner Enterprises reduced infrastructure provisioning time from 3 days to 4 hours using Pulumi.
  • Spear AI cut their Authority to Operate (ATO) timeline from an expected 1.5 years to roughly 3 months by using policy-as-code to evidence compliance controls for auditors.

Tradeoff to be honest about: Neo gets more valuable the deeper you are in the Pulumi ecosystem. If you’re running IaC, ESC, and policy packs already, Neo has a lot of context to draw on. If you’re kicking the tires, it’s still useful, but the differentiating capability (context-aware, policy-respecting agentic execution) is harder to feel.

Firefly AIaC

Firefly is an asset management platform with AI features bolted on top of a strong core. The core capability is asset codification: it discovers cloud resources you already have and generates the IaC for them.

  • License: Proprietary
  • Best for: Teams that need to codify existing cloud footprints or generate IaC from natural language

Strengths: solid asset discovery, multi-cloud coverage, natural-language IaC generation, drift detection with remediation hooks. Caveat: AI features here are supplementary to the asset management product, not the main event, and Firefly is less focused on agentic execution than on inventory and policy.

env0 Cloud Compass

env0’s Cloud Compass adds AI to env0’s IaC automation platform, focusing on analysis rather than autonomous execution.

  • License: Proprietary
  • Best for: Multi-IaC shops that want AI-generated PR summaries, drift explanations, and cost insights

Strengths: multi-tool support across Terraform, OpenTofu, Pulumi, and Terragrunt; AI-generated PR summaries; drift cause analysis; cost estimation. Caveat: this is analysis and explanation, not action: Cloud Compass complements an agentic tool rather than replacing one.

Spacelift AI

Spacelift’s AI work is focused on the post-run experience: explaining what happened in a deployment and helping troubleshoot failures.

  • License: Proprietary
  • Best for: GitOps shops that want AI assistance reading complex runs and diagnosing failed deployments

Strengths: AI-powered run explanation; troubleshooting guidance for failures; broad IaC tool support; mature CI/CD integration. Caveat: like Spacelift as a whole, this is observation and explanation, not generation or execution. Pair with something that writes the code.

Crossplane with Upbound

Crossplane brings Kubernetes-style declarative management to cloud resources. Upbound is the company that commercializes it, and is layering AI-native control-plane capabilities into the 2.0 generation.

  • License: Apache 2.0 (Crossplane); proprietary (Upbound)
  • Best for: Teams already deep in Kubernetes that want to manage cloud resources the same way

Strengths: Kubernetes-native model; native GitOps fit; very active OSS community; AI control-plane work emerging from Upbound. Caveat: the learning curve is real if you’re not already living in Kubernetes; the commercial AI features are still maturing.

General-purpose code assistants

General-purpose AI coding assistants are the tools your developers already have open: GitHub Copilot, Claude Code, Cursor, and Google’s Gemini and Antigravity. They write Terraform HCL, Pulumi programs, and CloudFormation templates competently, about as well as they write anything else.

  • License: Proprietary (subscription), varies by tool
  • Best for: Developers who want broad code assistance, including infrastructure code, inside their existing editor

Strengths: excellent line-by-line code completion; broad language support; first-class editor integration; trained on huge corpora. Caveat: no infrastructure context. They don’t know what’s in your account, what your policies are, or which subnet you should pick. Treat their IaC suggestions as first-pass scaffolding, not production output.

AWS Application Composer

Application Composer is AWS’s visual builder for serverless applications. Drag services onto a canvas, get a CloudFormation template out, with AI suggestions for service configuration along the way.

  • License: Proprietary (AWS, included)
  • Best for: Teams building AWS serverless apps who prefer a visual workflow

Strengths: visual development for serverless; direct AWS integration; AI suggestions for service configuration; emits CloudFormation. Caveat: AWS-only, CloudFormation-only, and best suited to serverless rather than general infrastructure.

Comparison tables

Infrastructure for AI

Tool

Category

Key strength

Limitation

Pricing

Best for

CoreWeave

GPU cloud

Purpose-built GPU infra, NVIDIA partnership

Not a general-purpose cloud

Per-GPU-hour

Large-scale AI training

Lambda Labs

GPU cloud

Approachable, pre-configured environments

Smaller scale

Per-GPU-hour

Research teams, startups

Modal

Serverless GPU

Developer experience, pay-per-second

Less infrastructure control

Pay-per-use

Variable ML workloads

Weights & Biases

MLOps

Industry-standard experiment tracking

Costs scale quickly

Free tier + paid

ML team collaboration

MLflow

MLOps

Open source, no lock-in

Self-hosting overhead

Free (self-hosted)

Flexible ML lifecycle

AWS SageMaker

Hyperscaler

AWS ecosystem integration

Higher cost, lock-in

Per-use

AWS-native orgs

Google Vertex AI

Hyperscaler

Google models, TPU access

Lock-in

Per-use

Google Cloud users

Azure ML

Hyperscaler

Microsoft integration, enterprise features

Lock-in

Per-use

Microsoft ecosystem

AI-powered infrastructure management

Tool

Approach

Key strength

Limitation

Pricing

Best for

Pulumi Neo

Agentic AI

Execution + policy automation

Best within Pulumi ecosystem

Pulumi Cloud tiers

Enterprise platform teams

Firefly AIaC

Asset management

Asset codification, IaC generation

AI is supplementary

Proprietary

Codifying existing infra

env0 Cloud Compass

Multi-IaC platform

Multi-tool support, PR analysis

Analysis, not execution

Proprietary

Multi-IaC environments

Spacelift AI

CI/CD platform

Run explanation, troubleshooting

Observation, not action

Proprietary

GitOps workflows

Crossplane / Upbound

Kubernetes-native

K8s patterns for infra

Requires K8s expertise

Open source + commercial

Kubernetes-native teams

Code assistants

Code assistant

Broad language support, IDE

No infrastructure context

Subscription

General code assistance

AWS Composer

Visual builder

Visual serverless development

AWS- and CFN-only

Included with AWS

AWS serverless apps

How to choose

There’s no universal best tool. Five questions sort the field quickly:

  • Cloud strategy. Multi-cloud means tools like Pulumi Neo, Firefly, env0, or Crossplane. Single-cloud commitment means hyperscaler-native tools may integrate more deeply (AWS Composer, SageMaker, and so on).
  • Team expertise. Programmers gravitate to tools that use real languages (Pulumi Neo, Pulumi IaC). Kubernetes teams find Crossplane natural; everyone else finds it steep. Teams that prefer visual workflows should look at AWS Composer or env0’s UI.
  • Compliance. Regulated industries (healthcare, finance, government) get the most value from tools with pre-built compliance packs and audit trails. Pulumi Neo’s CIS/HITRUST/NIST/PCI packs are the most direct fit. If preventative policy enforcement matters, prefer tools that block non-compliant deployments rather than flag them after the fact.
  • Existing footprint. Greenfield projects can use anything. Brownfield is where it gets interesting: Pulumi Neo works against Terraform, CloudFormation, and manually-created resources, which lets you adopt incrementally instead of migrating first. Mixed-IaC shops should also look at env0.
  • Budget. Open source first: MLflow for MLOps, Crossplane for Kubernetes-native infra. Open source is not free, though: self-hosting carries a real total cost of ownership in hosting, maintenance, and the expertise to operate it. Commercial tools (Pulumi Cloud, env0, Spacelift) fold that operational cost into the price, on top of support, SLAs, and the enterprise-tier features open source can lack.

Before adopting anything, get visibility into what you have today, pilot on staging where mistakes are cheap, and define success metrics up front: time to provision, policy violation rates, mean time to remediate. The best AI infrastructure tool is the one your team will actually use, which means meeting developers where they already work.

Key trends and outlook

From copilots to agents. “AI suggests code” and “AI runs the deploy” are different products with different governance implications. The teams getting value from agentic tools have figured out which tasks to delegate fully, which to keep human-in-the-loop, and which to leave alone.

Progressive autonomy. Enterprise adoption follows a predictable shape: visibility → recommendations → human-approved execution → autonomous execution for well-understood scenarios. Tools that support that graduation will see stronger enterprise traction than tools that force an all-or-nothing choice.

Policy as the control plane. As AI takes on more infrastructure tasks, policy frameworks become the primary control plane. Done well, policy becomes an enabler (guardrails that let you safely expand automation) rather than a brake on velocity.

MCP standardization. The Model Context Protocol is becoming the integration standard between AI assistants and infrastructure tools. The practical upshot is that the IDE is increasingly a viable surface for managing infrastructure, with AI mediating between natural language and the underlying APIs.

Consolidation. CoreWeave acquiring Weights & Biases and NVIDIA acquiring Run:ai both point toward integrated platforms across the AI infrastructure stack. For tool selection today, that’s an argument for picking vendors with clear strategic direction over point solutions likely to be acquired or out-competed.

Frequently asked questions

What is the best AI agent for cloud infrastructure management?

For enterprise governance plus true agentic capability, Pulumi Neo is currently the most complete offering: it executes changes (not just suggests them), integrates with pre-built compliance frameworks, and works with infrastructure regardless of how it was provisioned. For Kubernetes-native shops, Crossplane with Upbound’s emerging AI features is worth tracking.

How can I use generative AI to manage cloud infrastructure?

Start by identifying the repetitive, time-consuming infrastructure work in your team. The highest-value early use cases tend to be:

  • Code generation: write IaC from natural-language descriptions, then review.
  • Documentation: explain unfamiliar configurations and reduce onboarding time.
  • Troubleshooting: analyze logs, errors, and configs to suggest likely causes.
  • Security and compliance: scan for violations and generate fixes.
  • Full automation: for shops that want it, agentic platforms like Pulumi Neo execute provisioning workflows end-to-end with governance controls intact.
What is agentic AI for infrastructure?

Agentic AI for infrastructure means AI systems that autonomously execute infrastructure tasks, not just generate code suggestions. The difference from a code assistant is action: an agent understands your environment, respects your policies, and performs multi-step work (provisioning, configuration, security controls) within the boundaries you’ve defined.

How do AI agents improve DevOps workflows?

By automating the repetitive parts (provisioning, drift remediation, policy enforcement), reducing context-switching, and catching issues earlier. Teams that have rolled out agentic tools well report faster provisioning, fewer policy violations slipping into production, and quicker compliance remediation. The compounding effect (engineers freed for higher-value work as the agent absorbs the routine) is the actual point.

What’s the difference between AI code generation and agentic execution?

Code generation suggests IaC for a human to review and run. Agentic execution generates the code and runs it, with policy and governance enforced along the way. It’s the difference between a knowledgeable colleague who suggests an approach and a knowledgeable colleague who also ships the change with appropriate oversight.

Can AI generate Terraform or Pulumi programs?

Yes. Most general-purpose AI assistants (Copilot, Claude, Gemini, ChatGPT, Cursor) can produce Terraform HCL, Pulumi programs in TypeScript / Python / Go, and CloudFormation. Quality varies. Generic assistants lack environment context and will happily emit syntactically correct but operationally wrong code. Infrastructure-specific tools like Pulumi Neo generate code that’s aware of your existing resources, policies, and provider constraints.

Can AI help with infrastructure compliance and policy automation?

Yes, and this is one of the highest-leverage uses of AI in infrastructure. Tools like Pulumi Neo detect policy violations across your footprint (including resources created outside IaC), generate compliant remediation, and apply it with the approvals you require. Pre-built frameworks for CIS, HITRUST, NIST, and PCI DSS shorten what would otherwise be a long manual compliance project.

Are AI infrastructure tools secure for enterprise use?

Enterprise-grade ones are. Look for RBAC, full audit logging of AI actions, preventative policy enforcement (not just detection), and human-in-the-loop approvals for sensitive operations. SOC 2, data residency options, and configurable autonomy levels are table stakes. The risk to avoid is wiring a consumer AI assistant directly into a production cloud account without those controls.

How do I choose between different AI infrastructure tools?

Match the tool to your context: existing clouds and IaC, team skills, compliance requirements, budget. Enterprise platform teams with governance needs should evaluate Pulumi Neo first. MLOps-focused teams should look at Weights & Biases or MLflow. For general code assistance inside the editor, a general-purpose assistant like Copilot, Cursor, or Gemini is the default. Most organizations end up with more than one: a code assistant for daily development and an agentic platform for production infrastructure.

What are the best tools for machine learning infrastructure?

For GPU compute, CoreWeave leads at scale, Modal wins for variable workloads and developer experience, and the hyperscalers are the default pick if you’re already inside one of them. For experiment tracking and model management, Weights & Biases is the leading commercial platform; MLflow is the leading open-source one. Most teams pick on the deploy model and pricing fit rather than capability gap. For the cloud infrastructure underneath the ML workloads, the same infrastructure management story applies: Pulumi Neo can provision and govern ML infrastructure the same way it handles everything else.

Conclusion

Two categories, two problems. GPU clouds and MLOps platforms (CoreWeave, Lambda, Modal, hyperscaler trio, W&B, MLflow) solve the compute and lifecycle problem for running AI workloads. AI-powered infrastructure tools (Neo, Firefly, env0, Spacelift, Crossplane, code assistants, Composer) solve the management problem for everything else.

For GPU workloads, the choice mostly comes down to scale and where you already are. For infrastructure management, the real question is how much you actually want AI to do. Code assistants help you write IaC faster, but you’re still running it. Agentic platforms like Pulumi Neo execute changes and enforce policy on the way through, with the guardrails you control.

The pattern from teams getting real value: treat AI as a force multiplier on routine work (provisioning, drift, compliance) and keep human judgment in the loop for the architecture and the edge cases.

If you want to see agentic infrastructure management running against real resources, start with Pulumi Neo.

Infrastructure as code is the right model for production systems. State tracking, drift detection, and repeatable deployments all matter when you’re managing real workloads.

But sometimes, you also need a quick, one-off interaction with the cloud: create a bucket or a database, look up a VPC, delete a stray resource.

Today we’re introducing pulumi do, a new command for direct resource operations. With pulumi do, you can create, read, update, delete, and query any cloud resource from the terminal with a single command, across thousands of Pulumi-supported providers — no project, code, or state required.

The problem: Sometimes IaC is more than you need

When you’re managing production workloads, IaC is the proven solution. Code lets you declare complex systems, state tracking catches drift before it becomes a problem, dependency graphs sequence changes safely, and policy keeps everything in bounds. That full lifecycle, especially with the backing of a platform like Pulumi Cloud, is exactly what you want to build systems that scale.

But when you (or your coding agent) need an ad-hoc Postgres database, the simplest path with IaC still takes several steps: make a directory, create a project, configure your credentials, write the code, preview, deploy. It works, but it’s not always necessary for what should be a simple operation. pulumi do collapses all of those steps into one, using the same Pulumi providers, resource model, and ecosystem that powers the core Pulumi platform.

Resource creation is also only part of the problem. As Joe laid out in The Agentic Infrastructure Era, the real challenge for AI agents isn’t with code or CLI commands, it’s with everything else: getting a cloud account, resolving credentials, wiring configuration across multiple services. Agent accounts, also released this week, simplify this by letting an agent provision its own ephemeral Pulumi Cloud account, and Pulumi ESC takes care of consolidating credentials across providers. Together, with pulumi do, agents can now go from zero to deployed infrastructure without requiring a human in the loop — and when that one-off resource needs to grow into a more permanent system, there’s a clear graduation path back to full Pulumi IaC.

What it looks like

As an example, say you wanted to provision an S3 bucket. With the AWS CLI, you’d need to assemble an aws s3api create-bucket invocation with the right set of command-line flags, region constraints, a globally unique name, and so on. With pulumi do, it’s just this:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create
</span></span>

That might not look all that different on the surface — but because you’re using the Pulumi engine and resource model, you can provide a minimal set of input properties, take advantage of provider-defined defaults, and use Pulumi’s auto-naming feature to give the bucket a unique name automatically:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">This will create aws:s3/bucket:Bucket with the following inputs:
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"bucket"</span>: <span class="s2">"bucket-279ea56"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"tagsAll"</span>: <span class="o">{}</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Please confirm that this is what you<span class="err">'</span>d like to <span class="k">do</span> by typing <span class="sb">`</span>yes<span class="sb">`</span>:
</span></span>

Answer yes (or just pass --yes), and you’re done. To delete the bucket:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket delete bucket-279ea56 --yes
</span></span>

Need to look up an existing resource? Use a provider function:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:ec2:getVpc --default
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"arn"</span>: <span class="s2">"arn:aws:ec2:us-west-2:663782525873:vpc/vpc-d7b311af"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"cidrBlock"</span>: <span class="s2">"172.31.0.0/16"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableDnsHostnames"</span>: true,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableDnsSupport"</span>: true,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableNetworkAddressUsageMetrics"</span>: false,
</span></span><span class="line"><span class="cl"> <span class="s2">"id"</span>: <span class="s2">"vpc-d7b311af"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

Same CLI, same output contract, same provider ecosystem.

The command shape

The do command accepts a Pulumi resource type, or type token, to determine the action to take. Type tokens have the form <package:module:resource>. For example, aws:s3:Bucket refers to the Amazon S3 Bucket resource that belongs to the s3 module of the aws package.

You can also provide a portion of the token to help you find what you’re looking for without ever having to leave the terminal:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Functions and resources <span class="k">for</span> the s3 module.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Run <span class="s1">'pulumi do <module/resource/function> --help'</span> <span class="k">for</span> more details on usage.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Functions:
</span></span><span class="line"><span class="cl"> aws:s3:getAccessPoint
</span></span><span class="line"><span class="cl"> aws:s3:getAccountPublicAccessBlock
</span></span><span class="line"><span class="cl"> aws:s3:getBucket
</span></span><span class="line"><span class="cl"> aws:s3:getBucketObject
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Resources:
</span></span><span class="line"><span class="cl"> aws:s3:AccessPoint
</span></span><span class="line"><span class="cl"> aws:s3:AccountPublicAccessBlock
</span></span><span class="line"><span class="cl"> aws:s3:AnalyticsConfiguration
</span></span><span class="line"><span class="cl"> aws:s3:Bucket
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-d20976f
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"arn"</span>: <span class="s2">"arn:aws:s3:::bucket-d20976f"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucket"</span>: <span class="s2">"bucket-d20976f"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucketDomainName"</span>: <span class="s2">"bucket-d20976f.s3.amazonaws.com"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucketNamespace"</span>: <span class="s2">"global"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

The package, module, and resource/function segments all come directly from the Pulumi provider schema, so --help works at every level of the tree. Pass a package name, optional module, and optional function or resource type, and do returns the appropriate level of detail.

You can also provide the input properties of a resource in a YAML or JSON file with the --input option. To create a container service in Google Cloud Run for example:

<span class="line"><span class="cl"><span class="c"># service.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">location</span><span class="p">:</span><span class="w"> </span><span class="l">us-central1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">deletionProtection</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">us-docker.pkg.dev/cloudrun/container/hello</span><span class="w">
</span></span></span>
<span class="line"><span class="cl">$ pulumi <span class="k">do</span> gcp:cloudrunv2:Service create <span class="se">\
</span></span></span><span class="line"><span class="cl"> --input yaml <span class="se">\
</span></span></span><span class="line"><span class="cl"> --input-file service.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">This will create gcp:cloudrunv2/service:Service with the following inputs:
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"deletionProtection"</span>: false,
</span></span><span class="line"><span class="cl"> <span class="s2">"location"</span>: <span class="s2">"us-central1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"name"</span>: <span class="s2">"service-b8af752"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"template"</span>: <span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"containers"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"image"</span>: <span class="s2">"us-docker.pkg.dev/cloudrun/container/hello"</span>
</span></span><span class="line"><span class="cl"> <span class="o">}</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>
</span></span><span class="line"><span class="cl"> <span class="o">}</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

The result:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"createTime"</span><span class="p">:</span> <span class="s2">"2026-05-22T23:00:22.415839Z"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="err">...</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"urls"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"https://service-b8af752-921927215178.us-central1.run.app"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"https://service-b8af752-ctnulmzwoa-uc.a.run.app"</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>
Resource operations

Most resources support the full set of CRUD operations — create, read, update, delete, and list — directly from the CLI. Each operation maps to a provider CRUD method using the same provider logic a full Pulumi program would use, and resources are addressable by their cloud provider IDs:

<span class="line"><span class="cl"><span class="c1"># Create a resource</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create --yes <span class="p">|</span> jq -r <span class="s2">".name"</span>
</span></span><span class="line"><span class="cl">bucket-4f5cb22
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Fetch it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-4f5cb22 <span class="p">|</span> jq -r <span class="s2">".hostedZoneId"</span>
</span></span><span class="line"><span class="cl">Z3BJ6K6RIION7M
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Update/patch it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket patch bucket-4f5cb22 --input yaml --input-file tags.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-4f5cb22 <span class="p">|</span> jq <span class="s2">".tags"</span>
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"key"</span>: <span class="s2">"value"</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Delete it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket delete bucket-4f5cb22
</span></span>
Provider configuration

Today, pulumi do resolves provider configuration — for example, applying your AWS credentials — using environment variables or credential files as supported by each individual Pulumi provider. See the Pulumi Registry for provider-specific configuration details.

Designed for humans and agents

We’ve designed pulumi do to serve humans and coding agents equally well, guided by three fundamental ideas:

  • Consistent command structure across every provider. The do <package:module:type> <operation> pattern is the same for AWS, Azure, Google Cloud, Kubernetes, Cloudflare, Datadog, and every provider, including packages containing higher-level component resources. Once an agent learns that pattern, it applies across the board.

  • Predictable output contract. JSON on stdout, progress on stderr, consistent exit codes. An agent can parse the result programmatically without scraping human-formatted tables.

  • A single CLI command that works across every cloud. Many cloud and SaaS providers don’t have a full CLI at all. pulumi do generates commands from the provider schema, so if a Pulumi provider exists for it, the CLI just works. Neither humans nor agents need to install, learn, or even know about cloud provider-specific tooling.

What’s next

Resource operations and provider functions are the foundation. The pulumi do roadmap extends the same direct-operation model with credential management, state tracking, and a path to full IaC.

Unified credentials with Pulumi ESC

One of the hardest parts of multi-cloud operations is credential management. Every provider has its own authentication scheme, environment variables, and session lifecycle. An agent working across AWS, Cloudflare, and Datadog today manages three separate credential mechanisms.

We’re building Pulumi ESC integration into pulumi do so you can manage credentials in one place and resolve them everywhere. ESC handles credential resolution (including OIDC-based dynamic credential generation and short-lived tokens) across all of your providers. Name the credential set, reference it, and ESC does the rest, with rotation, RBAC, and audit built in.

Cross-resource references

Real infrastructure has dependencies — subnets need VPCs, security group rules need their security groups, and so on. When you’re building resources one at a time, those references need to flow between commands somehow.

A future version of pulumi do will let resource inputs reference outputs from previously created resources, allowing the CLI to resolve them automatically and preserve the dependency graph. Later, when the time comes to graduate to a full IaC program, the generated code contains proper resource references rather than hard-coded strings.

Stateful mode and the graduation path

Today, pulumi do is stateless. Each command runs independently. A planned stateful mode will persist resource state across operations, enabling drift detection, lifecycle management, and a graduation path to full infrastructure as code.

Here’s what we’re planning:

  1. Zero setup. Your first pulumi do implicitly creates a project and stack. No manual initialization.

  2. Accumulate resources. Each operation stores resource state. After a few commands, you have a lightweight representation of your infrastructure.

  3. Eject to a full project. When the time comes, generate a Pulumi project in your chosen language with all resources imported and dependency graphs intact.

  4. Connect to Pulumi Cloud. Layer on governance, compliance, team collaboration, and deployment automation through Pulumi Cloud. Resources created via pulumi do can be governed by Pulumi Insights from day one, even before you opt into full IaC.

This path works because pulumi do uses the same providers, resource types, and property schemas as every other pulumi operation. Provisioned cloud resources stay where they are as management capabilities are added as needed.

Get started

pulumi do ships as a research preview in Pulumi CLI v3.242.0 and later. Install or update the CLI, install a provider plugin, and start running commands. The documentation has the full reference.

We can’t wait to hear your feedback. Give it a try today, tell us what works (and what doesn’t), and help shape the CLI that agents and humans both reach for first.

This week, Pulumi Neo started working in two more places: GitHub and Slack. The agent that already runs Pulumi tasks from the Cloud console and the terminal now participates in the threads where your team discusses changes.

Mention @pulumi-neo in a pull request or issue and Neo replies in the thread. Mention @Neo in a Slack channel and Neo starts a task, continuing the conversation as you reply.

Neo in GitHub

Mention @pulumi-neo in a pull request description, a top-level or inline review comment, or an issue. Neo sees the diff, the stacks linked to the repository, and their current state. Reviewers can ask Neo to walk through what a proposed change does, including resources that change in stacks the PR doesn’t touch directly. Responses land in the same thread, so the analysis becomes part of the review record and any follow-up stays with it.

Neo in Slack

Mention @Neo in any channel where Neo has been added, and Neo starts a task in the thread. The reply lands in the same thread, and follow-up messages continue the conversation there. The rest of the channel can see what was asked and what Neo found. Neo has the same capabilities here as in the Pulumi Cloud console or the terminal: check stack state, investigate failures, walk through what a change will do, or carry out actions the team has approved.

Integrations in action

A teammate posts in #platform-engineering: “API latency p95 has been climbing for two days, nobody can figure out why.” You reply:

You: @Neo check the production API stack. Anything change in the last 72 hours?

Neo starts a task in the thread, walks the stack history, and finds a configuration change to the load balancer’s idle-timeout setting that landed Friday afternoon. It posts the change, who deployed it, and when. The rest of the channel sees the finding without you having to retell it.

You: @Neo open a PR to revert idle-timeout to the previous value.

Neo edits the stack’s Pulumi program, runs pulumi preview to confirm the change touches only the load balancer, and opens a pull request with the diff and the preview output. A reviewer pulls it up:

Reviewer: @pulumi-neo what else does this change affect downstream?

Neo replies in the same review thread with the resources that change: the listener config and the target group health check. The reviewer reads, approves, and the change ships.

The investigation moved from Slack to GitHub, and both threads keep the record.

Permissions and governance

Whether the conversation starts in GitHub or Slack, Neo runs with the RBAC permissions of your Pulumi Cloud user. Stack-level controls, organization-level guardrails, and audit logging apply the same way they do for a task started from the console. Starting a conversation in a new place doesn’t grant Neo new permissions; it just changes where the conversation happens.

Try it out

Both integrations are available now for Neo-enabled organizations. The GitHub integration docs and Slack integration docs cover the one-time setup. From there, every engineer with a linked Pulumi Cloud identity can mention Neo from the threads they already work in.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, the Neo CLI launch post for Neo’s new home in the terminal, and the Neo Integrations post for the MCP servers and cloud CLIs that ship with this release.

As always, we’d love to hear what you think — and if you have any suggestions for places we should put Neo next, file an issue in pulumi-cloud-requests.

Recurring platform work slips: provider versions fall behind, drift accumulates between checks, and the quarterly audit keeps getting pushed back another month. Pulumi Neo can now run any task on a cadence you set, opening a pull request for each run.

Automations in action

Your platform team runs stacks across staging and production, and the AWS, GCP, and Kubernetes providers keep shipping new versions. Nobody has time to bump them stack by stack.

You write one automation:

Every Monday at 8 AM, check the infra/ project for stacks where the AWS, GCP, or Kubernetes provider is more than two minor versions behind. For each one, bump the out-of-date provider, run pulumi preview, and open a PR if the preview is clean.

Monday morning, Neo runs the prompt. It finds three stacks behind on the AWS provider, edits each program, runs preview, and opens a PR for each clean run. You review the PRs like you would any other dependency bump, merge them, and Neo runs again next Monday.

What automations are for

The launch includes four built-in templates: a provider freshness check, an encryption audit, a backup audit, and an activity digest. You can also skip the templates and write your own prompt.

Pick from hourly, daily, weekdays, or weekly cadences. Each automation gets its own page in the Automations tab, where you can edit the prompt, change the schedule, run it once on demand, or pause it.

Safe by default

Automations default to two settings that fit recurring work. Approval mode is auto, so a run doesn’t wait for human confirmation between steps. Permission mode is read-only, so a run can read state and propose changes through pull requests but can’t apply changes directly. You can override either default per automation.

How automations fit with the rest of Neo

A scheduled task uses the same context as an interactive Neo task. Custom Instructions at the organization and project level apply, so a scheduled run respects the same naming conventions, tagging policies, and architecture rules your team has written down.

MCP integrations and CLI integrations work in scheduled tasks the same way they work in interactive ones, so a weekly drift check can query AWS through the aws CLI, file Linear issues, and link related PagerDuty incidents. Scheduled tasks also run with the RBAC permissions of the user who scheduled them, checked at run time; if permissions change between scheduling and execution, the new permissions apply.

Try it out

Open Neo in Pulumi Cloud, switch to the Automations tab, and pick a template or write your own prompt. The automations docs cover the form, scheduling options, and per-automation overrides.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo Integrations post for the third-party tools and CLIs your automations can use.

As always, we’d love to hear what you think — and if you have any suggestions for automations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Ewan Dawson is CTO of Compostable AI, where five engineers run an AI-native software factory: nineteen clients, custom AWS deployments, most of them shipped within a day of contract signing. This article is adapted from his recent Pulumi webinar, and covers rules in more depth than we had time for on stage.

For the past twenty years, I’ve viewed software development as a craft. The best engineers drew on decades of experience to get every function right.

But two years into the agentic AI revolution, I realised software is going to look more like a factory than a craft. The economics have changed. We can’t treat code as bespoke anymore. To scale, we have to think industrial — use the tools to ship more value with fewer engineers.

I joined Compostable AI soon after it was founded 2.5 years ago, and I built the engineering org AI-native from day one. The technology has come a long way since then, and so has my understanding of what AI-native actually means. Here are seven rules I keep coming back to.

1. Transform, don’t enhance

Going AI-native isn’t an upgrade to your existing process. If you treat AI as a way to hand your developers smarter tools, you leave most of the value on the table. You get the leverage by rebuilding how you write software — and the culture and processes around it.

I know that’s a tall order for a large, mature engineering org. My advice: start small. Pick one team or one business area and run it as a fully AI-native function. Take what you learn and roll it out from there. And do the political work early, especially with your Governance, Risk, and Compliance function. Get GRC on your side early. Otherwise AI becomes a compliance fight instead of a structural advantage.

Don’t bolt AI onto your existing workflow. Redesign the workflow around what agents can do.

Most of the leverage in this technology comes from rebuilding around it. The tool change is the small part.

2. Remove the problem, don’t solve it

Going AI-native flips which problems are hard and which are easy. The right move often isn’t to engineer a solution. It’s to reframe the problem so it goes away.

Here’s an example. Serving multiple clients with agents writing the code, blast radius wasn’t a hypothetical. One bad agent run could trash a customer’s database, or leak one client’s data into another’s. Our instinct was to build a secure multi-tenant sandbox with guardrails, approvals, rollback. But every version we tried still had agents loose in a shared environment, one bug away from making one customer’s data visible to another’s. So we removed the problem: every client gets two dedicated AWS accounts, one for production and one “digital twin” staging account. Agents iterate on staging until the work checks out. Only then does it ship to production. We have nineteen accounts now, one per client.

Managing nineteen AWS accounts with five engineers used to be an administrative nightmare. When code is cheap, infrastructure-as-code tools like AWS Control Tower and Pulumi make it the easier path.

Remove the problem before you try to solve it.

It’s cheaper to reframe the problem than to engineer your way through it.

3. Pick tools your agents can drive

Removing problems is the process side. The other side is tooling. If you want an automated factory, your tech stack has to be something agents can drive. This overlaps a lot with tools that have great developer experience. If a tool has a robust API plus a clean CLI, agents can drive it. If it’s heavy click-ops around a web UI, agents stop there.

We didn’t get there first try. Our first IaC tool worked fine when we had a couple of clients. As we added more, accounts drifted, deployments slowed, retries got complicated. We needed something built for where we were heading.

I went looking, and Pulumi fit. We express infrastructure as type-safe code — TypeScript, in our case, rather than HCL — and agents are good at writing it. Pair that with Pulumi Neo — pre-loaded with domain-specific Pulumi skills — and we ship infrastructure that follows best practices. One of my colleagues put it: “The scary thing about Neo is it just seems to know everything about what we do.” Pulumi IaC plus Pulumi ESC for configuration beats stitching tools together. And TypeScript lets us build higher-level abstractions that keep the AWS account fleet tractable.

“I don’t actually care if it’s HCL or TypeScript, as long as my software development agents can write it. And they do a better job with TypeScript than HCL.”

Tools have to share your AI-native mindset. If they don’t integrate deeply, the human becomes the glue.

If part of your stack still requires a human to click through a web UI to provision an account, your agents stop there.

4. Don’t let one agent do everything

When I first started with agents, I reached for a god prompt: one massive system prompt meant to guide a single agent through the whole software lifecycle. It didn’t work. Agents struggle when you give them multiple goals. The writer is lenient on its own work — it won’t catch what it just shipped. You don’t want it reviewing the code, checking for security flaws, or hunting bugs.

We get better results from a constellation of specialized agents, each handling one part of the line. Pulumi Neo handles infrastructure. Alongside it sit agents specialized in:

  • Code implementation
  • Code review and testing
  • Security auditing
  • Internal standards compliance
  • Documentation updates

Tasks pass down the line. Clean code comes out the other end, with almost no human involved.

Don’t let any agent mark its own homework. Specialize by job.

Treat agents the way you’d treat a team. The one who writes the code shouldn’t be the one signing it off.

5. Measure human hours per unit of value

Once we had agents writing and agents reviewing, throughput went up — but the bottleneck moved past the PR. Engineering hours were still the most expensive thing in the building, so my core metric is human hours per unit of value produced. Minimize that.

That means hunting for every step that still goes through a person — especially the mid-pipeline steps between ideation and production. Automate the human touchpoints along that line, and the factory runs 24/7.

Pushing automation this hard also forces good engineering. A chaotic, undocumented process is impossible to automate. Good engineering is still good engineering, AI or not. Agents won’t fix a weak process.

Measure human hours per unit of value. Treat every one as a bottleneck to remove.

You can’t automate what you can’t describe. Every human in the pipeline marks a piece that hasn’t been described yet.

6. Design for convergence, not one-shot correctness

Even with the human touchpoints removed, the agents don’t ship right the first try. Once you embrace the factory pipeline, you stop needing them to. We design for convergence instead — a system that lands on the right answer through automated iteration.

The loop we run looks like this:

  1. Refinement: agents iterate on the Product Requirements Document until the problem is clear.
  2. Planning: agents draft multiple technical approaches, and evaluation agents pick the best one.
  3. Implementation: coding agents write the software.
  4. Review: specialized checking agents look for bugs, API misuse, and security flaws.

If the checkers find a problem, they hand it back to the implementation agent. The loop repeats until the tests pass and the agents agree on a clean PR. Once it converges, we merge and deploy to staging.

Two things have to be true. You need a way to evaluate the output. Without that, you don’t know when to stop. And the loop has to converge — each pass has to get closer. A checker that fails every PR for a different reason isn’t helping — it just keeps the work going in circles. The feedback has to narrow the search, not widen it.

Once it converges, the question moves on. How cheap can we make it? Lower the time to PR, reduce token count, drop the overall cost. The optimization never really ends.

Don’t aim for one-shot correctness. Design for convergence.

It doesn’t matter how many tries it takes, as long as the loop closes without a human in it. Get convergence first. The optimization comes after.

7. Run the factory in the cloud, not on a laptop

Even a converged factory has to live somewhere. Try running a fully automated factory on individual developers’ laptops, and it falls apart. Laptops are highly trusted machines. Put autonomous agents on them and your security posture drops, fast. And the factory has to run 24/7. Events come from elsewhere — PR comments, Slack threads, errors in test environments.

Cloud also kills configuration drift across a dozen developer machines. The same prompts run against different model versions, and env vars sit half-set on half the laptops. The thing you’re trying to optimize lives in different states across the team. Cloud isn’t just where the factory runs; it’s the only place a team can iterate on it together. Keep everything in one place — AWS, Pulumi Cloud, GitHub. The specific stack matters less than the principle of one place.

And the part that matters most: the factory keeps running, testing, and deploying long after we’ve closed our laptops and gone to sleep.

Build the factory somewhere you can work on it — not just somewhere it can run.

A factory scattered across laptops can’t be improved as a system. Cloud keeps it in one shape, 24/7, and lets the team iterate together.

Closing thought

I’ve shipped more code in the last two years than I did in the fifteen before that. Most of it in languages I couldn’t write by hand. And that’s after a stretch in leadership where I wrote almost none.

If you’re where I was two years ago: don’t ask how AI fits into what you already do. The factory is built one rule at a time, and it’s not a template — it’s the practice of finding where you’re taking advantage of the new economics and where you’re not, where your practices still need an update. The leverage is in finding these places and improving them.


Watch the original Pulumi webinar. Learn more about Compostable AI and Pulumi Neo.

Since launching Pulumi Neo, over 4,500 organizations have used it to delegate real infrastructure work: scaffolding, migrating, investigating, operationalizing, and more. Though that usage has come entirely through Pulumi Cloud, we know a large portion of Pulumi users live in the terminal, and increasingly that’s where AI tools run too. Now we’re bringing Neo there.

pulumi neo brings the same Neo experience you’ve had in Pulumi Cloud to your terminal. Running locally means there’s no separate branch to push, no credentials to provision, and no context to paste: Neo picks up the setup you already have.

What local execution unlocks

Neo inherits your setup when it runs locally. The CLIs you’ve authenticated, the environment variables and kubeconfigs you’ve configured, and the project you’re editing right now are all available without any setup on your part. That means Neo can run the same commands you would, against the same systems you have access to.

That makes pulumi neo a fit for paired, interactive sessions where you and Neo work through a problem together. For asynchronous, autonomous tasks you set up and come back to, Pulumi Cloud Neo is still the surface to reach for. Both reach the same Neo.

You can also hand tasks to Neo from other agent sessions. Simply ask your agent, such as Claude Code or Codex, to hand the task off to Neo, and the Neo handoff skill packages the current thread (goal, repo pointers, conversation summary) and starts a Neo task using pulumi neo under the hood. This works anywhere skills are supported, without leaving your current session.

What carries over

Local tools and context are what’s new. The full set of controls you have in Pulumi Cloud Neo applies in the terminal: approval modes (manual, balanced, auto) for tool calls, permission modes (default, read-only) for what Neo can change, and Plan Mode for research and planning before execution.

Integrations carry over too. The integration catalog (connectors to Atlassian, Datadog, Linear, PagerDuty, and others) works the same way from the terminal. Identity, RBAC, and audit all run through your pulumi login, the same way they do in the console. See the Pulumi Neo docs for details.

Get started

pulumi neo ships with the latest Pulumi CLI. To start a session:

  1. Authenticate to Pulumi Cloud with pulumi login.
  2. Run pulumi neo, or pass an initial prompt: pulumi neo "what's in this stack?".

pulumi neo is part of a broader launch on agentic infrastructure. See the pulumi neo command reference and the Pulumi Neo docs for details. 10 things you can do with Neo is a good starting point for tasks to try. The Pulumi Community Slack is the place for questions and feedback.

Pulumi Neo already understands your infrastructure: your code, your stacks, your state. Today we’re launching new capabilities that extend Neo’s reach in two directions: into the third-party systems your team uses to plan and observe, and out to the cloud CLIs that actually drive your infrastructure.

The first half is MCP integrations: connections to Atlassian, Datadog, Honeycomb, Linear, PagerDuty, and Supabase that show up as tools Neo can call during a task. The second half is CLI integrations: scopable access to aws, gcloud, az, and kubectl. Both are configured once at the org level and available to every Neo task in the organization.

Integrations in action

A PagerDuty alert just fired: RDS storage on payments-prod is at 90% and climbing. You want to know how fast, and whether you can buy yourself any runway before it fills.

You: Neo, RDS storage on payments-prod just paged at 90%. How fast is it growing, and what do we have configured?

Neo pulls the active incident from PagerDuty, decides on its own to check Datadog for the storage-utilization curve over the last 30 days, and runs aws rds describe-db-instances --db-instance-identifier payments-prod through your production-aws CLI integration (the name your org gave its production AWS credentials). The database has been growing about 5 GB a day. The instance has AllocatedStorage at 200 GB and MaxAllocatedStorage also at 200, so storage autoscaling is effectively disabled. At current growth, the disk fills in three days.

You: Bump max allocated storage to 500. Open a PR.

Neo edits the payments stack’s Pulumi program to raise maxAllocatedStorage from 200 to 500 on the RDS instance, runs pulumi preview to confirm the change is scoped to that one resource, and opens a pull request with the diff, the preview output, and links to the PagerDuty incident and the Datadog graph. You review the PR and merge it. Pulumi applies the change, and Neo posts the resolution back to PagerDuty.

With three integrations and one conversation, the change is reviewed, shipped, and the alert resolved a few minutes later.

MCP integrations: context from your existing tools

The launch catalog covers six services that show up most often in infrastructure investigations: Atlassian for Jira issues and Confluence runbooks, Datadog for metrics and logs, Honeycomb for traces, Linear for issue tracking, PagerDuty for incidents and on-call schedules, and Supabase for managed database changes. Each connects Neo to a remote MCP server hosted by the provider, so the agent has access to the full set of tools the vendor chooses to expose.

Integrations can be enabled by organization administrators on the Neo Settings page. Once configured, they’re available to every Neo task in your organization.

CLI integrations: live cloud insights

CLI integrations cover what MCP doesn’t reach: live cloud insights. With AWS, GCP, Azure, or Kubernetes connected, Neo can check live database utilization, look up the current state of a running service, verify a service quota before scaling, or reach into resources that aren’t managed by any Pulumi stack.

An admin enables a CLI integration the same way as an MCP one, from your org’s Neo settings. Each integration gets a name your team chooses, like production-aws or staging-gcloud, and tasks reference that name to tell Neo which environment to reach into. You can connect multiple instances of the same CLI (for example, production-aws and staging-aws) so Neo can investigate staging without touching production. Credentials are backed by Pulumi ESC environments your org owns; the CLI integrations docs walk through setup.

Per-task control and failure handling

Both surfaces default to org-wide availability, with per-task overrides. Before starting a task, you can toggle individual MCP integrations off. The toggles only affect that task; the org-level configuration is unchanged.

Failures behave the same way for both. If an integration can’t be reached, Neo logs a warning, skips it, and continues with the rest. A single broken integration doesn’t stop a task. CLI integration connect and disconnect events go to your organization’s audit log, and Neo’s individual CLI calls appear in the task transcript alongside its other tool calls.

Try it out

Both MCP and CLI integrations are available now for Neo-enabled organizations. Open your org’s Neo settings, connect the MCP server or CLI of your choice, and let Neo do the next investigation against the tools you already use. The MCP integrations docs and CLI integrations docs walk through credential setup for each one, and the Neo integrations hub ties it all together.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo CLI launch post for Neo’s new home in the terminal.

As always, we’d love to hear what you think — and if you have any suggestions for integrations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Last fall, after launching Pulumi Neo, we wrote up 10 things you could do with it. In the months that followed, as platform teams handed Neo more real work, we watched and listened, shipping a steady stream of features like plan mode, read-only mode, AGENTS.md, an integration catalog, cross-cloud migration, and task sharing. With today’s release, Neo extends beyond the Pulumi Cloud console into the Pulumi CLI, GitHub, and Slack.

So here are 10 more things you can do with Neo.

1. Deploy your app to AWS without writing IaC

Hand Neo a repo and a target cloud. Neo picks the right services, writes the Pulumi, and opens a PR.

The cloud infrastructure part of getting a new service running, especially one in a new language, is always a few hours of boilerplate: a VPC and subnets, an IAM role, security groups, a load balancer, DNS, and a TLS cert.

With Neo, that work collapses into a prompt. Point Neo at a repo and ask:

Deploy this app to AWS as a publicly accessible service.

Plan mode comes back with the resources Neo will create, named and sized: ECS Fargate, an ALB, and the VPC wiring. Approve, and Neo writes the Pulumi program, runs a preview, and opens a PR. You, the human in the loop, merge it after review.

Neo planning a PR and deploying an app to AWS.

[

Start a Neo task Ask Neo to deploy your app to AWS and make a PR

](https://app.pulumi.com/neo?prompt=I%27d+like+to+deploy+this+app+to+AWS.+Confirm+what+you%27ll+create.)

2. Diagnose a slow API from metrics, logs, and code

Slow endpoints live at the seam between runtime metrics and the stack that runs them. Neo reads both and proposes a fix with the metric evidence as the rationale.

Production incidents often involve multiple tools. When the checkout endpoint’s p95 climbs from 200ms to 1.2s, the metric is in Datadog, but the cause might be somewhere in your AWS account: maybe RDS is out of IOPS, maybe the connection pool is too small, maybe the autoscaler isn’t keeping up. Connecting “this metric looks bad” to a recent backend change and then to a one-line fix in your Pulumi program is an exercise in detective work.

Neo’s integration catalog bridges this gap. With built-in Datadog, PagerDuty, and Honeycomb integrations sitting alongside your Pulumi state, Neo can read traces and metrics from the tools your team already uses and take action.

Ask Neo:

Find the scaling bottleneck on /checkout from the last 7 days of metrics and propose a fix.

Neo pulls the metric history, matches the Datadog tag db.cluster=checkout-rds to the RDS instance in your prod-checkout Pulumi stack, and opens a PR with a Pulumi diff that bumps the storage IOPS and raises the connection-pool ceiling. You review and roll out the fix.

Toggle on the Honeycomb integration so Neo can read traces and metrics alongside your Pulumi stacks.

3. Triage a PagerDuty alert from Slack

A page comes in. You paste it into your on-call channel and tag Neo, and Neo replies with the cross-system view you’d otherwise spend the first 20 minutes assembling.

On-call triage is often about getting up to speed quickly. You get paged because something is in the red, and you don’t know why.

You mention Neo in the on-call Slack channel:

@neo, what’s going on with this alert?

Neo starts querying metrics and traces. With PagerDuty and Datadog in the integration catalog, it correlates the alert with every deploy and stack change tagged with the alert’s service in the last hour, and finds the change that lines up:

Two deploys in the last hour touched services tagged service:checkout: checkout-api@a3f9c2 (12 min ago, app-layer deploy) and Pulumi stack prod-checkout-rds (45 min ago, decreased max_connections from 200 → 100). p99 inflection at 14:03 lines up with the stack change. Likely cause: the connection-pool reduction is starving the API under current load.

You ask a couple of clarifying questions in-thread, then ask Neo to open a rollback PR against the Pulumi stack.

Authorize PagerDuty and Datadog in Neo's settings. Neo can then read alerts in your on-call Slack channel, find the change that correlates, and open a PR when you ask.

4. Implement a Linear ticket end-to-end

Hand Neo a ticket number from Linear, Jira, or GitHub Issues. Neo reads the description and acceptance criteria, plans against your stack, and opens a PR.

Tickets often pile up not because they’re unimportant, but because they’re not urgent. Ongoing maintenance quietly accumulates. Bumping a provider version, centralizing secret management, working through small policy violations: each one matters, but none of them ever moves to the top of the queue. Explaining each one to an agent is its own overhead.

The fix is letting Neo read the ticket itself. Connect Linear or Jira through the integration catalog (GitHub Issues works too), and Neo pulls the ticket the same way an engineer would: title, description, acceptance criteria.

Ask Neo:

Implement CAD-1234 in our payments stack.

Neo reads the ticket, plans against your existing stack, opens a PR, and drops a comment back on the ticket. The ticket and the PR end up linked, and your backlog shrinks.

Neo running locally in the Pulumi CLI: fielding a Linear issue, analyzing the codebase, and producing a PR that upgrades multiple projects to the latest Pulumi and AWS provider versions.

[

Start a Neo task Implement a Linear ticket end-to-end

](https://app.pulumi.com/neo?prompt=I%27d+like+to+implement+a+ticket+from+Linear+%28or+Jira%2C+or+GitHub+Issues%29.+Ask+me+for+the+ticket+number.)

5. Tighten over-privileged IAM roles

Neo audits each role against what your stack code actually does, and proposes scoped policies that improve your security posture.

IAM cleanup is the kind of work nobody has the time to prioritize. Production has 40 roles. Half of them started with s3:* because nobody had time to scope them, and the cleanup slips quarter to quarter.

Ask Neo:

Audit IAM permissions across my accounts and propose narrower policies for over-privileged stack-managed roles.

Neo cross-references each role’s policy against what the stack code actually calls, and opens a PR per role. The PR body lists the API calls Neo found in the stack code, like s3:GetObject on audit-logs-* and s3:PutObject on audit-logs-staging, as the justification for the scoped policy. The evidence sits next to the diff.

If you’re unclear about which roles count as in-scope or what your team considers over-privileged, start in plan mode and agree on that with Neo first.

Neo auditing an over-privileged IAM role and proposing a narrower policy, with the actually-used permissions as evidence.

[

Start a Neo task Audit IAM and tighten over-privileged roles

](https://app.pulumi.com/neo?prompt=Audit+IAM+permissions+across+my+accounts+and+propose+narrower+policies+for+over-privileged+stack-managed+roles.)

6. Migrate from AWS CDK onto your platform’s golden paths

Neo reads your existing CDK app and lands a PR that swaps AWS’s defaults for your team’s published components.

CDK’s L2 constructs encode AWS’s defaults. s3.Bucket with encryption: BucketEncryption.S3_MANAGED is a sane choice, but it’s AWS’s idea of sane, not yours. A platform team that’s published its own components to the Pulumi Private Registry has already decided what your bucket defaults look like: encryption with the right KMS key, tagging by cost center.

Ask Neo:

Migrate the payments-vpc CDK stack to Pulumi using our published components.1

Neo reads the source CDK app and your registry side by side. It maps each CDK construct to its closest team-published equivalent, clarifying with you where the mapping is ambiguous.

<span class="line"><span class="cl"><span class="c1">// Before (AWS CDK, AWS's defaults)
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">bucket</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">s3</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="s2">"Assets"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">bucketName</span><span class="o">:</span> <span class="s2">"payments-assets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">encryption</span>: <span class="kt">s3.BucketEncryption.S3_MANAGED</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">versioned</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>
<span class="line"><span class="cl"><span class="c1">// After (Pulumi, your team's published component)
</span></span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">platform</span> <span class="kr">from</span> <span class="s2">"@payments/platform"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">bucket</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">platform</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="s2">"assets"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">bucketName</span><span class="o">:</span> <span class="s2">"payments-assets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">classification</span><span class="o">:</span> <span class="s2">"internal"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

[

Start a Neo task Migrate CDK onto your golden paths

](https://app.pulumi.com/neo?prompt=I%27d+like+to+migrate+this+CDK+stack+to+Pulumi.+Use+our+published+components+where+you+can.)

7. Migrate a service to Kubernetes from a runbook

Once the migration pattern is written down, the next service to move is a prompt away.

Containerizing an app and moving it to Kubernetes involves several small decisions: which base image, what labels go on deployments, how ingress is wired, and how secrets reach the pod. But after a team has moved two or three services, the pattern is set. The decisions get written down in a runbook, and every subsequent migration is mostly the same shape.

Ask Neo:

Containerize the billing-api service and write its Kubernetes manifests, following our K8s migration runbook in Confluence.

Neo reads the source repo and the runbook in Confluence via the integration catalog and starts working on your request.

You can save this as a Neo skill that splits the work into multiple PRs — Dockerfile first, ECR config next, Deployment/Service/Ingress manifests after — and link back to each runbook convention for ease of review. The output reflects your conventions: the labels you actually use, the ingress class you’ve standardized on, and the External Secrets Operator config your team prefers.

You’re still the one reviewing the PRs and deciding what the cutover looks like in production. Neo follows your internal standards, so the new service ends up shaped like the last one you migrated.

Neo migrating a VM-based service to Kubernetes step by step, following the team's Confluence runbook.

Once you’ve delegated something a few times, the next move is to automate it. The remaining three tasks are the kind Neo doesn’t need to be asked for. Drift, deps, compliance: they’re the operations you put on a schedule.

8. Schedule daily drift checks across your cloud infrastructure

Schedule a daily drift check across your cloud. Wake up to PRs that fix what changed overnight.

Configuration drift is an ongoing challenge. The security team rotated an IAM role at 04:47 UTC. Someone changed a security group in the AWS console three weeks ago. Left alone, drift turns into security gaps, into compliance issues, and into the kind of “wait, who changed that?” confusion nobody wants to chase down.

Pulumi Cloud is already good at drift detection. Neo takes it a step further.

Ask Neo:

Every morning at 6 AM, check all production infrastructure for drift and create PRs to fix any issues you find.

From then on, the task runs on its own, and you wake up to a PR per drifted resource. The description spells out what happened (iam_role.audit-reader had inline policy AllowReadAuditLogs added at 04:47 UTC) and cites the section of infra/runbooks/drift.md Neo followed.

Some drift gets encoded into the Pulumi program, like the IAM rotation above. Some gets reverted, like the security group rule added from the console. Some gets ignored entirely, like autoscaler-managed Lambda concurrency reservations the runbook tells Neo to skip. You write the runbook once; Neo follows it every morning to decide what to do.

Neo's morning drift PR. The body names the resource, the change, when it happened, and the section of the runbook Neo followed to decide what to do.

[

Start a Neo task Schedule a daily drift check

](https://app.pulumi.com/neo?prompt=Every+morning+at+6+AM%2C+check+all+production+infrastructure+for+drift+and+create+PRs+to+fix+any+issues+you+find.)

9. Schedule weekly upgrades for outdated providers and runtimes

Lambda runtimes and container base images age out. Schedule the upgrade pass; review the PRs Neo opens.

AWS Lambda end-of-life notices come out months ahead. Node 20 stopped receiving runtime updates at the end of April. Python 3.9 ended last December. After the deadline, AWS blocks new deploys and eventually stops invoking the function. Each one needs to move to a supported runtime before the cutoff.2

Schedule it:

Every Sunday night at 10 PM, check our Lambdas for runtimes nearing end-of-support and open PRs to upgrade them.

Neo reads the AWS Lambda runtime deprecation page, matches the end-of-support runtimes against every Lambda in your stacks, and opens one PR per stack.

If Python 3.9 is reaching end-of-support, the upgrade is to Python 3.12, and datetime.utcnow() calls need to move to datetime.now(datetime.UTC). Neo can make all of those replacements in the same PR.

The same task can catch container base images with critical CVEs and bump them too.

Setting up a weekly task in the Scheduled Tasks UI. Once saved, Neo runs the prompt every Sunday night and opens PRs you review on Monday.

[

Start a Neo task Schedule a weekly runtime upgrade check

](https://app.pulumi.com/neo?prompt=Every+Sunday+night+at+10+PM%2C+check+our+Lambdas+for+runtimes+nearing+end-of-support+and+open+PRs+to+upgrade+them.)

10. Fix CIS Benchmark failures with daily PRs

Run the benchmark on a schedule. Wake up to PRs that fix what failed.

The CIS AWS Foundations Benchmark, available through AWS Security Hub, is something every team should be keeping an eye on. The benchmark finds issues like S3 buckets that allow public read access (S3.1), root user access keys that shouldn’t exist (IAM.4), or CloudTrail not being enabled (CloudTrail.1). Scanning for these issues is a solved problem, but closing and addressing them is not. They pile up between audits because each one is a code change in a different stack, and nobody owns the cross-stack cleanup.3

Schedule the cleanup:

Every morning, read CIS Benchmark failures from Security Hub. For every failure on an IaC-managed resource, open a PR with the fix.

Neo opens one PR per failure. A bucket failing S3.1 arrives as a Pulumi diff that adds blockPublicAccess to the bucket in your prod-checkout stack. The PR body lists the CIS rule number, the resource ID, the diff, and a clean pulumi preview against the live infrastructure.

The runbook is where your security team writes down what each control means for your stacks. Block public S3 buckets, except the ones tagged public-content=true for CloudFront origins. Don’t auto-touch the break-glass IAM roles; page a human instead. Multi-region CloudTrail stays on, no exceptions. Neo reads that file, checks each Security Hub finding against it, and only opens a PR for the ones you’ve said are safe to fix. The rest get routed or ignored, the way your team already handles them.

A PR raised by Neo to fix a CIS Benchmark failure, with the failing rule, the resource, and the runbook decision laid out in the body.

[

Start a Neo task Schedule a daily compliance scan

](https://app.pulumi.com/neo?prompt=Every+morning%2C+verify+all+resources+meet+our+compliance+policies+and+create+PRs+to+fix+violations.)

Neo: your newest platform engineer

Over the past year, many product teams have stopped treating AI as a request-by-request assistant and started delegating to it outright. Agents open pull requests, investigate issues, and iterate on review feedback.

But platform engineers have held back because a bad infrastructure change doesn’t just fail, it can take production down. Coding agents benefit from fast, forgiving feedback loops, but infrastructure recovery is rarely as simple as reverting a commit.

What was missing wasn’t the appetite. It was an agent with enough organizational context and grounding to plan reliably, enough guardrails to feel safe and contain mistakes, and enough discipline to keep working without being asked.

The theme across these tasks is clear. A thing platform engineers used to keep in their heads becomes a task you delegate, then becomes work that runs without you. Neo isn’t generating infrastructure from a template. It’s a teammate who knows your code, your providers, your conventions, your production metrics, and can raise PRs for you to review.

Neo now lives in your terminal, in your pull requests, in your Slack workspace, and in Pulumi Cloud. Pick one of these workflows and give it a try.


  1. The observant reader will notice Terraform-to-Pulumi was covered in the original post. ↩︎

  2. Also covered in the original post. Last year you could ask Neo to do it once. This year you can put it on a schedule. ↩︎

  3. Also covered in the original post. Last year Neo could remediate violations on demand. This year Security Hub feeds findings to a scheduled task that knows your runbook’s interpretation of each control. ↩︎

AI agents do a lot of their work through CLIs. They’re easier to call than HTTP APIs and they produce predictable output. Over the last few months our own CLI traffic has shifted from mostly people typing commands to people and agents running commands together, often in the same session.

Today we’re shipping a release built for both. The Pulumi CLI is reorganized around three ideas: the right command should be the one you can guess, anything you can do in Pulumi Cloud should also be doable from the terminal, and what comes back should be just as readable to an agent as it is to a person.

Designing for guessability

The bar we set was that both developers and coding agents should be able to guess at the right command for a particular task: pulumi env edit to modify an environment, pulumi stack get to see what’s going on with a stack, pulumi org member list to see who’s on the team. If we had to explain which command did what, the usability bar hadn’t been met.

Branches in the tree are now singular nouns like stack, env, org, and deployment. Leaves are now verbs from a canonical vocabulary — list, get, set, new, edit, remove — and they mean the same thing wherever they’re used. edit always means modify an existing thing. Wherever the old vocabulary differed, though, the old name still works: ls, rm, update, and open are all aliased to preserve backward compatibility.

For the most part, product names have also been replaced with familiar nouns. Users (human or otherwise) don’t think in product names; they think in terms of resources, stacks, environments. For example, take Pulumi ESC: the product may be named ESC (and for a while the command was too), but nobody thinks I need to initialize a new ESC — they think I need to create a new environment. The command is therefore pulumi env new, with esc init preserved as an alias to avoid disrupting anyone’s existing workflows.

<span class="line"><span class="cl">$ pulumi env new my-project my-env
</span></span><span class="line"><span class="cl">Environment created.
</span></span>

All of Pulumi Cloud in the terminal

Up to now, most of what you could do with Pulumi Cloud had to be done either in the browser or through direct API calls. Things like reviewing deployments, setting up webhooks, finding non-compliant resources, or managing deployment settings all required you to break out curl and hit the API docs or open a browser and navigate the Pulumi Cloud console.

That changes today. Pulumi Cloud is now fully accessible from the command line through the pulumi CLI, with consistently named nouns and verbs aligned to what you’d expect:

  • pulumi stack get returns a complete stack overview, metadata, resource list, and more:

    <span class="line"><span class="cl">$ pulumi stack get <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --stack cnunciato/chris.nunciato.org/production <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output json <span class="p">|</span> jq -r <span class="s2">".resources[].type"</span> <span class="p">|</span> grep <span class="s2">"aws:s3"</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">aws:s3:BucketEventSubscription
    </span></span><span class="line"><span class="cl">aws:s3/bucket:Bucket
    </span></span><span class="line"><span class="cl">aws:s3/bucket:Bucket
    </span></span><span class="line"><span class="cl">aws:s3/bucketPublicAccessBlock:BucketPublicAccessBlock
    </span></span><span class="line"><span class="cl">aws:s3/bucketWebsiteConfiguration:BucketWebsiteConfiguration
    </span></span><span class="line"><span class="cl">aws:s3/bucketOwnershipControls:BucketOwnershipControls
    </span></span><span class="line"><span class="cl">aws:s3/bucketNotification:BucketNotification
    </span></span>

    … with other stack-related commands like pulumi stack history get events, pulumi stack drift list, pulumi stack schedule new, and pulumi stack webhook new alongside it.

  • Organizational commands like pulumi org member list, pulumi org role list, pulumi org usage get, and pulumi org audit-log export can help you dig into the details when you need to as well.

  • Deployment-related commands like pulumi deployment list, get, log, and cancel let you see what’s running, dive into what happened, and take action without having to leave the terminal.

    <span class="line"><span class="cl">$ pulumi deployment list <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --stack cnunciato/chris.nunciato.org/production <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output table
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">┌──────────────────────────────────────┬───────────┬─────────┬───────────┬──────────────┬─────────────────────────┐
    </span></span><span class="line"><span class="cl"> ID OPERATION VERSION STATUS INITIATED BY MODIFIED
    </span></span><span class="line"><span class="cl">├──────────────────────────────────────┼───────────┼─────────┼───────────┼──────────────┼─────────────────────────┤
    </span></span><span class="line"><span class="cl"> 83e44b8c-643c-4e9f-9f36-0c6a81d9db2e update <span class="m">140</span> running cnunciato 2026-05-17 21:26:37.340
    </span></span><span class="line"><span class="cl"> 52a37cbe-b7fd-4027-8e0f-7b4785ab12e8 update <span class="m">139</span> succeeded cnunciato 2026-05-16 23:36:07.999
    </span></span><span class="line"><span class="cl"> 94e04525-b3a4-42b5-9987-e344018a3324 preview <span class="m">138</span> succeeded cnunciato 2026-05-16 23:29:19.709
    </span></span><span class="line"><span class="cl">└──────────────────────────────────────┴───────────┴─────────┴───────────┴──────────────┴─────────────────────────┘
    </span></span>
  • And when you need to query across managed (and even unmanaged) resources, pulumi insights resource search and get can help you find what you’re looking for quickly:

    <span class="line"><span class="cl">$ pulumi insights resource search <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --query <span class="s1">'type:aws:s3/bucket:Bucket org:cnunciato project:photomap stack:dev'</span> <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output table
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">┌──────────────────────────────────────────────────────────────────────────┬──────────────────────┬───────┬──────────────────────────┐
    </span></span><span class="line"><span class="cl"> URN TYPE STACK MODIFIED
    </span></span><span class="line"><span class="cl">├──────────────────────────────────────────────────────────────────────────┼──────────────────────┼───────┼──────────────────────────┤
    </span></span><span class="line"><span class="cl"> urn:pulumi:dev::photomap::aws:apigateway:x:API<span class="nv">$aws</span>:s3/bucket:Bucket::api aws:s3/bucket:Bucket dev 2020-10-31T00:39:47.926Z
    </span></span><span class="line"><span class="cl"> urn:pulumi:dev::photomap::aws:s3/bucket:Bucket::images aws:s3/bucket:Bucket dev 2020-10-31T00:39:47.926Z
    </span></span><span class="line"><span class="cl">└──────────────────────────────────────────────────────────────────────────┴──────────────────────┴───────┴──────────────────────────┘
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">Showing <span class="m">2</span> of <span class="m">2</span> resources.
    </span></span>

Flags and output formats are consistent across commands (--output table, json), as are the shapes of cross-cutting features like webhooks. If you’ve used pulumi stack webhook, for example, you already know how to use pulumi env webhook and pulumi org webhook, and so on.

Direct access to the Pulumi Cloud API

For any features of Pulumi Cloud that don’t yet have their own commands, you’ve also got pulumi api. It’s a gh api-inspired command designed to give you direct access to the full REST API, without having to manage separate access tokens, auth settings, or request/response payloads. Everything is handled for you through your authenticated pulumi CLI.

There’s even pulumi api list, which enumerates every single endpoint that’s exposed:

<span class="line"><span class="cl">$ pulumi api list
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">┌───────────────┬────────┬───────────────────────────────────────┬──────────────────────────────┐
</span></span><span class="line"><span class="cl"> TAG METHOD PATH SUMMARY
</span></span><span class="line"><span class="cl">├───────────────┼────────┼───────────────────────────────────────┼──────────────────────────────┤
</span></span><span class="line"><span class="cl"> AccessTokens GET /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens ListOrgTokens
</span></span><span class="line"><span class="cl"> AccessTokens POST /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens CreateOrgToken
</span></span><span class="line"><span class="cl"> AccessTokens DELETE /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens/<span class="o">{</span>tokenId<span class="o">}</span> DeleteOrgToken
</span></span><span class="line"><span class="cl"> AccessTokens GET /api/user/tokens ListPersonalTokens
</span></span><span class="line"><span class="cl"> AccessTokens POST /api/user/tokens CreatePersonalToken
</span></span><span class="line"><span class="cl"> AccessTokens DELETE /api/user/tokens/<span class="o">{</span>tokenId<span class="o">}</span> DeletePersonalToken
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="m">537</span> operations. Pass --output<span class="o">=</span>json <span class="k">for</span> a stable, scriptable contract.
</span></span>

To get the details about a particular API, use pulumi api describe:

<span class="line"><span class="cl">$ pulumi api describe <span class="s1">'DELETE /api/user/tokens/{tokenId}'</span> <span class="c1"># or DeletePersonalToken</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">DELETE /api/user/tokens/<span class="o">{</span>tokenId<span class="o">}</span>
</span></span><span class="line"><span class="cl">Tag: AccessTokens
</span></span><span class="line"><span class="cl">Operation: DeletePersonalToken
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">DeletePersonalToken
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Permanently deletes a personal access token by its identifier. The token is immediately
</span></span><span class="line"><span class="cl">invalidated and can no longer be used <span class="k">for</span> authentication. Returns <span class="m">204</span> on success or <span class="m">404</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> the token does not exist.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Parameters:
</span></span><span class="line"><span class="cl"> <span class="o">[</span>path<span class="o">]</span> tokenId* <span class="o">(</span>string<span class="o">)</span> — The access token identifier
</span></span>

All requests are made through your authenticated pulumi CLI:

<span class="line"><span class="cl">$ pulumi login
</span></span><span class="line"><span class="cl">Logged in to pulumi.com as cnunciato.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi whoami
</span></span><span class="line"><span class="cl">cnunciato
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi api /api/user/tokens/2cf15c7d-afad-458f-ace0-fc7ff0512b10 <span class="se">\
</span></span></span><span class="line"><span class="cl"> --method DELETE <span class="o">&&</span> <span class="nb">echo</span> <span class="s2">"Token deleted."</span>
</span></span><span class="line"><span class="cl">Token deleted.
</span></span>

Newly published endpoints are available through pulumi api immediately, so you don’t have to wait for a new CLI release before you can start using them. See the Pulumi Cloud REST API documentation to learn more.

Finding templates in the Pulumi Cloud Registry

Finding out which templates are available to you through your Pulumi organization used to mean having to navigate to the Pulumi Cloud Registry and start searching. The new pulumi template commands make this easier by letting you ask for what’s available right from the shell, either by fetching the full list or filtering with the --name or --search params:

<span class="line"><span class="cl">$ pulumi template list --search <span class="s2">"container typescript"</span> --org cnunciato
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">┌─────────────────────────────────────────────┬────────┬────────────┬────────────┐
</span></span><span class="line"><span class="cl"> Name Source Language Visibility
</span></span><span class="line"><span class="cl">├─────────────────────────────────────────────┼────────┼────────────┼────────────┤
</span></span><span class="line"><span class="cl"> pulumi/templates/container-aws-typescript github typescript public
</span></span><span class="line"><span class="cl"> pulumi/templates/container-azure-typescript github typescript public
</span></span><span class="line"><span class="cl"> pulumi/templates/container-gcp-typescript github typescript public
</span></span><span class="line"><span class="cl">└─────────────────────────────────────────────┴────────┴────────────┴────────────┘
</span></span>

This is especially useful when you’re working with an agent because it helps the agent discover your org’s approved templates without having to name them. Start with a prompt that tells the agent what you want to build, and let the agent find the right template for you.

Agent-friendly Markdown docs for providers and components

Both humans and agents need to be able to understand what’s inside a Pulumi package before they can use it. And while the Registry is an excellent resource for that, it was mainly designed to deliver HTML — a human-friendly format that agents can certainly use, but that’s much more verbose than they actually need.

With pulumi api, agents can fetch the details about a package from the Registry directly and get back those details either in markdown or json, whichever works best, filtering on properties like language where applicable:

<span class="line"><span class="cl">$ pulumi api <span class="s2">"/api/registry/packages/pulumi/pulumi/random/versions/4.19.1"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"name"</span>: <span class="s2">"random"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"publisher"</span>: <span class="s2">"pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"publisherDisplayName"</span>: <span class="s2">"Pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"source"</span>: <span class="s2">"pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"version"</span>: <span class="s2">"4.19.1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"description"</span>: <span class="s2">"A Pulumi package to safely use randomness in Pulumi programs."</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"repoUrl"</span>: <span class="s2">"https://github.com/pulumi/pulumi-random"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>
<span class="line"><span class="cl">$ pulumi api <span class="s2">"/api/registry/packages/pulumi/pulumi/random/versions/4.19.1/docs/random%3Aindex%2FrandomPassword%3ARandomPassword"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --output markdown
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># RandomPassword</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">resource <span class="sb">`</span>random:index/randomPassword:RandomPassword<span class="sb">`</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">## Example Usage</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">package main
</span></span><span class="line"><span class="cl">...
</span></span>

Resources are individually addressable using their URL-encoded Pulumi type tokens — e.g., random:index/randomPassword:RandomPassword — and API endpoints are configured to deliver Markdown when agents ask for it:

<span class="line"><span class="cl">$ curl <span class="s2">"https://api.pulumi.com/api/registry/packages/pulumi/pulumi/random/versions/latest/readme?lang=python"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -H <span class="s2">"Accept: text/markdown"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Installation</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">The Random provider is available as a package in all Pulumi languages:
</span></span><span class="line"><span class="cl">...
</span></span>

Even compared to JSON (which is itself a significant improvement over HTML), Markdown is a much more token-efficient format for agents to work with:

Package

Endpoint

JSON

Markdown

Tokens saved

random

/readme

10.68 KB

6.04 KB

43%

aws

/readme

4.22 KB

2.54 KB

40%

aws

/nav?depth=full

204 KB

170 KB

17%

aws

/docs/{resource token}

15.24 KB

11.28 KB

26%

azure-native

/docs/{resource token}

48.13 KB

30.37 KB

37%

aws

/docs/{function token}

2.40 KB

1.46 KB

39%

Learn more about our Registry endpoints in the REST API docs. (Or just ask your agent!)

New to the CLI: Pulumi Neo

When we launched Pulumi Neo last year, the only way to use it was in the Pulumi Cloud Console. But while there’s a ton you can do with Neo in the browser, if you’re an engineer already living in the terminal, chances are that eventually you’re going to wish you had Neo right in the CLI along with you.

Now you do. Running pulumi neo with or without a prompt launches a Pulumi Cloud-connected session that gives Neo access to your local environment just like any other coding agent. Use it on its own to scaffold a new project, understand an existing codebase, or debug a failing deployment — or pull it into an active session with the coding agent you’re already using. Either way, it stays in the shell you’re already working in.

We’ll cover Neo in the CLI in more detail later this week. In the meantime, here’s a peek:

Smaller changes that add up

A long list of smaller changes also runs through this release:

  • The core loop now speaks JSON end to end, with pulumi up, pulumi destroy, and pulumi import all emitting structured JSON output when called with --output json.

  • Streams now behave the way scripts expect them to, with data on stdout, progress and diagnostics on stderr.

  • Exit codes are more consistent across the board. Every failure mode — auth, resource, policy, missing stack, cancellation, timeout, and others — has its own exit code, so agents can branch on the actual cause instead of having to interpret output. The full table is in the docs.

  • Help text explains why a command exists, not just what it does, and includes at least one concrete example. Examples in --help are one of the most effective ways to improve LLM accuracy on first-try invocations — and it turns out they’re pretty handy for humans, too.

A sneak peek at a new command

Later this week, you’ll get a closer look at pulumi do, a new top-level command that enables direct resource operations like create, read, update, delete, and list across every Pulumi-supported cloud provider and resource, all in one command. A simple example:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws getAvailabilityZones
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"groupNames"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2-zg-1"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"id"</span>: <span class="s2">"us-west-2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"names"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2a"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2b"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2c"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2d"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"region"</span>: <span class="s2">"us-west-2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"zoneIds"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az3"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az4"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

It might look like that’s calling the AWS CLI, but it’s not — it’s using the same AWS provider function a full Pulumi program would use, only without the program, and invoked directly from the CLI.

More on how it works, and what you can do with it, in the days ahead.

Try it yourself

A lot of what makes a developer tool worth using is in the details, and most of what’s in this release is exactly that, across the whole CLI, with humans and agents in mind.

We’d love for you to grab the latest release and give it a try. Tell us what’s now easy, what’s still hard, and what to fix next on GitHub or in the community Slack. The fastest way the CLI gets better is feedback from the humans and agents who live in it.

Twelve months ago, building an AI agent meant picking a framework, defining your tools, standing up a RAG pipeline, and writing a stack of glue code to wire it all together. That was the default playbook. The post-mortem on six months of work usually went the same way: half the time went into infrastructure that had nothing to do with the agent’s actual job.

That isn’t where the work is anymore. Most of the middle layer is gone. The SDKs ship with the tools, the skills system replaced the upfront tool registry, and longer context windows pushed vector search out of the default slot it held all of last year.

The shape is the same as a lot of infrastructure shifts before it. The hard thing got cheap, the cheap thing got expected, and the question moved up a level.

The old playbook

A 2024 to 2025 agent project looked like this. You picked a framework, usually LangChain, LlamaIndex, or an early version of Pydantic AI. You wrote tool definitions, usually a wrapper around an API the agent would call. You stood up a RAG pipeline: chunk your documents, embed them, pick a vector database, write retrievers, layer reranking on top. Then you wrote the agent loop yourself, including prompt assembly, tool dispatch, retry logic, and observability.

This was the default for good reasons. Foundation models had short context windows. They didn’t ship with file access. They couldn’t run code. If you wanted an agent to do anything useful with your data, you had to bring the data to the model in pre-digested chunks.

The cost wasn’t only setup time. It was infra bills, retries against embedding APIs, and a context strategy that fought the model as the model got better. By mid-2025 the retrieval layer was often the bottleneck on quality. The agent would ask a question, get five plausible-looking chunks, and answer from those instead of the document you actually wanted it to read. Chunking decisions made on a Tuesday in March were still hurting answer quality six months later.

Most teams I talked to in 2025 were tuning their RAG pipeline. Almost nobody enjoyed it.

The shift: three things changed at once

Three changes landed close enough together that they collapsed the middle layer.

Built-in tools. The Claude Agent SDK ships with Read, Write, Edit, Bash, Grep, Glob, WebSearch, and WebFetch out of the box. OpenAI’s Codex SDK is similar in shape, with shell and file tools available to the agent by default. These are the tools every agent project was rebuilding in 2024, often as a side quest to the work the agent was actually meant to do. A Read that handles binary files. A Bash that streams output and respects working directory. A Grep that doesn’t choke on large files. The 80% of agent tooling everyone was paying their team to reimplement is now table stakes.

The consequence is that you can give an agent the ability to do real work with about ten lines of configuration. The flip side is that the differentiator moved up a layer. The value isn’t in having Read. It’s in what the agent does with it.

Anything outside the built-in toolbox plugs in through MCP servers. The registry has grown nearly 8x since early 2025, and every major model vendor now ships first-party support. The picture in 2026 is more layered than that, though. A lot of what used to call for an MCP server is now better served by the agent invoking a CLI through Bash and wrapping the recipe in a skill. Benchmarks put CLI-based tool calls at a fraction of the context cost of equivalent MCP calls, with fewer round-trips and fewer failure modes. MCP still earns its place for protocol-heavy work like browser control, OAuth flows, and streaming services, but it stopped being the automatic answer to “how do I give my agent a new capability.”

Skills replaced tool stuffing. The old way was to register every tool the agent might need at startup, eating context every turn whether the agent used the tool or not. A hundred tools meant a heavy system prompt before the agent had thought about anything. The skills pattern flips that. A skill is a small markdown package with a name and a one-line description. The agent sees the description (around 100 tokens) and only loads the body when it decides the skill is relevant. A hundred skills no longer means a hundred tools’ worth of context tax. Anthropic frames this as progressive disclosure: because the body only loads on demand, the amount of content you can bundle into a single skill is effectively unbounded.

Progressive disclosure isn’t a new idea. What’s new is that the agent harness now treats it as the default loading strategy instead of something you have to engineer.

RAG got demoted. This is the change with the biggest blast radius and the smallest amount of commentary. A year ago, “we need to add RAG” was the reflex answer when somebody asked how an agent would handle a corpus. Today that question splits three ways. If the corpus fits in the context window, put it in. If the agent can grep the filesystem, let it grep. If the corpus is genuinely too large for either, vector search is still right, but you’ll find that’s a smaller set of cases than it used to be. You can see this in the coding agents that already ship today. Cursor, Claude Code, and Devin lean on grep, find, and direct file reads more than vector search. LlamaIndex’s own writing on agentic retrieval is one of the clearer reads on where this is going.

Vector search didn’t get worse. The context around it improved enough that it stopped being the right first move.

Taken together, what got pulled into the SDK is the middle of an agent project: the tools layer, the retrieval layer, and the loop. What’s left for the team is the system prompt, the skills, and the policies around what the agent is allowed to do.

When you still need a framework

The first reaction to a lot of this is to declare that frameworks are over. They aren’t, but the cases where you reach for one have narrowed.

Pydantic AI is still the right choice when you want strong typing, deterministic output schemas, and an evaluation loop that matches how the rest of your Python codebase already thinks. LangGraph is still the right choice when your problem is genuinely a graph of agent states with branching and human approval steps. OpenAI’s Agents SDK is built around explicit handoffs between agents and earns its place when that pattern fits how you want to decompose the work. CrewAI is the fastest path I’ve seen for prototyping a multi-agent system, as long as you can live with its opinions. Any team running production traffic across multiple model providers is going to want a routing layer that the official SDK from any single vendor isn’t going to give them. Anthropic’s own writing on building effective agents lands in the same place: start with the simplest thing, add complexity only when the problem demands it.

The mental model that works for me: start with the SDK, reach for a framework when you outgrow it. “Outgrow” usually means one of four things:

  • Multi-provider routing. You’re running production traffic across more than one model vendor and need a routing layer the official SDKs don’t ship.

  • Multi-agent orchestration. Your problem genuinely decomposes into separate agents with handoffs, branching, or human approval steps.

  • Deterministic typing. You need strong schemas and validation around inputs and outputs, and the rest of your codebase already thinks that way.

  • Production observability. You need eval loops, replay, or tracing beyond what the SDK provides out of the box.

If none of those four are biting, the SDK is probably enough, and adding a framework on top is a layer you’ll regret in six months.

Where this lands for infrastructure work

Two things from the new agent shape map cleanly onto infrastructure work. The first is that “built-in tools plus governed actions” is the model an IaC platform was already running. The SDK assumes the agent has tools that do real work. The platform assumes those tools have policies, audit logs, and short-lived credentials around them. Those assumptions stack.

The second is that a state graph is already structured context. You don’t need to chunk it. You don’t need to embed it. An agent reasoning over a Pulumi stack can grep its way through the program graph the same way it greps a codebase, and the answers are grounded in the same source of truth the rest of your platform uses. I wrote the deeper version in Grounded AI: Why Neo Knows Your Infrastructure. The dark-factory and sprawl posts (The Dark Factory Pattern for Infrastructure and Agent Sprawl Is Here. Your IaC Platform Is the Answer.) are the places to go if you want to push on this further.

Start with the SDK

A year ago, an agent project was 80% glue code and 20% the thing the agent actually did. On most projects today that ratio is flipped. If you’ve been sitting on an agent idea, build it the SDK way first and reach for a framework only when you hit something the SDK genuinely can’t do. Most teams will be surprised how often they don’t.

There’s one agent you don’t have to build at all. Pulumi Neo is the same SDK-first shape applied to the IaC slice: tools that reason directly over your state graph, governed by the controls the rest of your platform already runs on. Save your own SDK time for the agents only you can build.

See how Pulumi Neo works

The original dark factory was Fanuc’s robotics plant in Oshino, Japan, where the lights are off because nobody is on the floor. Robots build robots. Parts move through the line for weeks at a time without a person walking past them.

The same pattern is now showing up in software. Three engineers at StrongDM shipped roughly 32,000 lines of production code without writing or reviewing any of it. Stripe’s “Minions” agent system merges over a thousand pull requests every week. In January, Dan Shapiro of Glowforge published a five-level autonomy ladder that landed cleanly enough to become the shorthand most people now use, and BCG put out a piece calling it the dark software factory.

Almost every public writeup so far is about application code. The harder question is what this looks like for infrastructure.

What a dark factory actually is

Shapiro’s ladder is the cleanest framing I’ve seen. He borrows it from the SAE’s self-driving levels, and it fits surprisingly well:

Level What it is Driving analogy

0 Spicy autocomplete Stick shift; you do everything.

1 Coding intern (boilerplate) Cruise control.

2 Junior developer (interactive pair) One hand on the wheel.

3 AI writes the majority; you review every PR Eyes still on the road.

4 Spec-driven; agent runs unattended for hours; you review later Sleeping at the wheel, you can still wake up.

5 Dark factory; no human review of code before production No steering wheel at all.

Most teams are at level 2 or 3. A few of the more aggressive ones are at 4. Level 5 is the experiment. Most teams won’t get there safely, and probably shouldn’t try to. The interesting design question is what has to be true for level 5 to be safe at all, and that question gets sharper when the thing being shipped is infrastructure.

A dark factory is not a coding harness. A harness is the framework an agent runs inside; the dark factory is the surrounding system that makes a harness’s output mergeable without a human reading the diff. Copilot and Cursor sit at the other end: interactive, the human stays in the loop on every keystroke. The dark factory takes the human out of the per-change loop entirely and puts them at the top, writing the spec and the acceptance criteria.

The wall between generator and validator

Strip the dark factory down to its layers and there are four of them.

flowchart LR A[Inputs Humans] --> B[Code Generation Autonomous] B --> C[Validation Autonomous, isolated] C -->|pass| D[Merge & Deploy Autonomous + existing CI/CD] C -->|fail| B A -.->|holdout scenarios generator never sees these| C

The single most important rule is that Code Generation and Validation must be completely isolated. The generator never sees the acceptance scenarios. A separate evaluator does, and it judges the generator’s output against scenarios the generator could not have memorized.

The reason is sycophancy. LLMs are too eager to agree with their own prior turns and too willing to declare victory on something they just produced. Without isolation, the same model that wrote the change is the one telling you it’s fine. The practical concern is direct: a test stored in the same codebase as the implementation will get lazily rewritten to match the code, not the other way around. It isn’t malice; it’s the agent doing exactly what it was asked, badly. The wall is what stops that.

StrongDM’s pattern for this is holdout scenarios: plain-English BDD acceptance tests stored where the generator cannot reach them. Each scenario runs three times against an ephemeral deployment, two of three must pass, and the overall pass rate has to clear 90% before the change moves forward. If the generator fails, it gets a one-line failure message (“SQL Injection Detection failed: endpoint returned 500”), not the scenario text. It cannot game the test.

Without that wall, you don’t have a quality gate. You have theater.

Why infrastructure is the harder version

Application code factories can lean on tests, linters, and type checkers. Infrastructure adds blast radius, drift, secrets, irreversible actions, and multi-region state. A code dark factory shipping a broken UI causes a bad user experience. An infrastructure dark factory shipping a broken IAM policy ends in a postmortem.

A few things make this manageable on Pulumi specifically.

The orchestrator does not need to be invented. The Pulumi Automation API is the engine as an SDK in Python, TypeScript, Go, .NET, Java, or YAML, which is the same surface a dark factory orchestrator runs on. Credentials don’t have to be long-lived: ESC and OIDC issue short-lived ones per run, so the agent never sees a static secret.

Policy doesn’t have to be probabilistic: CrossGuard enforces deterministic rules at preview time. Execution doesn’t have to happen on a laptop: Pulumi Cloud Deployments runs pulumi up inside a governed runner with audit logs and approval rules already wired. And the reasoning layer doesn’t have to start from scratch: Pulumi Neo is grounded in your state graph and ships with three modes (Auto, Balanced, Review) that line up cleanly with Shapiro’s levels 5, 4, and 3.

That doesn’t make Pulumi a dark factory by itself. It means the parts that an application-code factory has to build from scratch are pieces a Pulumi shop already has: a credential broker, a policy engine, a governed runner, a state-aware reasoning layer, an audit trail.

And one more piece nobody talks about: pulumi preview produces a clean, deterministic validation artifact, and CrossGuard evaluates that artifact without ever seeing the conversation that produced the program. That’s the same context-free judgment the holdout pattern depends on, applied at the policy layer instead of the acceptance-test layer. For infrastructure, half the wall is already built.

The interesting work is the part that nobody ships in a box.

The interesting work

What no platform ships for you is the wall: the holdout scenarios for infrastructure, the isolated evaluator that runs them, and the agreement on which stacks are even allowed to run lights-out.

The happy-path orchestrator is small. It pulls a spec, runs preview, hands the preview to an isolated evaluator (with its own credentials and its own access to the cloud, no access to the generator’s prompt or output), and branches on the verdict. Auto mode runs up immediately. Balanced mode submits a deployment that requires approval. Review mode opens a PR for a human. Every branch records a stack version traceable in the audit log. Retries, observability, secret rotation, and the rest of the production-grade plumbing add up to real code, but the shape is small.

The wall is the part that takes a week to get right. You write five plain-English scenarios for one stack (“after pulumi up, the bucket is private, has SSE-KMS, lives in eu-west-1, and is tagged owner=team-x”) and a janky evaluator that runs preview and up against an ephemeral copy, queries the cloud, and asks a separate model whether the resulting state satisfies the scenario. Triple-run, 90% pass gate. Then you watch it for a few weeks before you let anything auto-apply.

A four-phase rollout

This is the same path the application-code factories walked, with the gates tightened.

Phase 1: better context, this afternoon

Write an AGENTS.md for your most active stack repo. Pulumi Neo reads it natively, as do most coding agents. While you’re there, look at your CrossGuard rules and rewrite the error messages as instructions. Not “S3 bucket has no encryption” but “S3 bucket has no encryption. Set serverSideEncryptionConfiguration with SSE-KMS to fix.” That single change is the difference between an agent flailing and an agent fixing the policy violation on the first try. Wire pulumi preview as a build-before-push gate so PRs don’t show up just to fail CI.

Phase 2: spec-driven with holdouts, this week

Pick one stack with a small blast radius. A review-stack lifecycle is ideal. Write five plain-English holdout scenarios for it and the janky evaluator above. Humans still approve every PR. Don’t auto-merge yet. You’re earning the data, not declaring trust.

Phase 3: take the human out of the merge

Only after the three measurable gates hold over twenty PRs (scenario pass rate above 90%, false positive rate below 5%, human override rate below 10%) flip auto-apply on for that one stack. Add a weekly drift sweep that goes through the same scenario gate as everything else.

Phase 4: lights out

Expand the auto-apply flag to every stack with strong scenario numbers. Wire your issue tracker so tickets tagged infra:fix flow through the pipeline. Mock the cloud APIs that are slow or flaky enough to make scenario evaluation expensive. At this point the orchestrator is configuration, not architecture.

What could go wrong

None of these have clean fixes. The mitigations below reduce risk; they don’t eliminate it. Any team running level 5 should expect to eat one or two of these in the first year.

The validator approves a bad change. This is the obvious one. The standard mitigation is layered: triple-run each scenario with a 2-of-3 threshold, a 90% gate over the run set, a human audit of the first fifty auto-applied changes, and your existing policies still run after the validator says yes.

The agent gets a destroy permission it shouldn’t have. There’s a class of operations that should not sit in the autonomous loop yet: dropping a database, deleting a hosted zone, rotating a root key, anything that crosses a regulated data boundary. Scope what each agent identity can do at the credential layer, require human approval for anything destructive, and start every stack at Review mode. Tag changes, security-group adjustments, and instance resizes can run autonomously today. Release-branch cuts and config promotions can probably run by next quarter. The destructive class earns its way in over months.

You need all three of those layers. Approvals without policy means anything a human approves in a hurry ships. Policy without approvals means a sufficiently clever spec eventually finds the gap. Both without a human kill switch means an incident at 3 a.m. has nobody to escalate to.

Costs blow up. Cap retries at three per spec, alert on token spend per run, and remember that StrongDM reported roughly $1,000 per day per engineer-equivalent. That’s still cheaper than a salary, but only if you put the cap in place before you find out.

Where to start

Most of what a dark factory needs already exists in any reasonably mature platform. Whatever you have for state, policy, credentials, audit, and a deployment runner is the substrate. The interesting work is not building the factory. It’s the wall: the holdout scenarios that make the gap between “the model says it’s fine” and “the system is actually fine” mean something.

For most teams, Phase 1 alone is the win. Full Level 5 may stay out of reach indefinitely, and that’s fine. The path itself forces useful work: clearer specs, named bottlenecks, the deterministic gates humans had been running in their heads.

Write an AGENTS.md and five holdout scenarios for one stack this week. That’s enough to get a real signal on whether the pattern fits your team. The rest of the path is the same problem the application-code factories have already worked through, with the gates set tighter.

Custom VCS is a new Pulumi Cloud integration that connects any Git or Mercurial version control system to Pulumi Deployments using webhooks and centrally managed credentials. Pulumi Cloud already has native integrations with GitHub, GitLab, and Azure DevOps, but if your team uses a self-hosted or third-party VCS, you’ve been limited to manually configuring credentials per stack with no webhook-driven automation. Custom VCS closes that gap.

The problem

Many teams run self-hosted or third-party Git servers that Pulumi Cloud doesn’t have a native integration for, and some teams still use Mercurial. Until now, their only option was the raw git source approach: embedding credentials directly in each stack’s deployment settings, with no way to trigger deployments automatically on push, and no support for Mercurial at all.

This meant:

  • No push-to-deploy: Every deployment had to be triggered manually or through a separate CI pipeline.

  • Scattered credentials: Each stack configured its own credentials independently, with no centralized management.

  • No org-level integration: There was no shared configuration that multiple stacks could reference.

How Custom VCS works

Custom VCS integrations introduce an org-level integration type that works with any Git or Mercurial server. The setup has three parts:

Credentials through ESC: Instead of OAuth flows, you store your VCS credentials (a personal access token, SSH key, or username/password) in a Pulumi ESC environment. The same credential structure works for both Git and Mercurial. The integration references this environment by name and resolves credentials at deployment time. Multiple stacks can share the same credentials without duplicating secrets.

Manual repository registration: You add repositories to the integration by name. Pulumi joins the repository name with the integration’s base URL to form clone URLs. There’s no auto-discovery, so you control exactly which repositories are available.

Webhook-driven deployments: Pulumi provides a webhook endpoint and an HMAC shared secret. You configure your VCS server to POST a JSON payload on push events, and Pulumi automatically triggers deployments for matching stacks. The webhook supports branch filtering and optional path filtering.

What’s supported

Custom VCS focuses on the deployment automation use case. Here’s how it compares to native integrations:

Capability Native integrations Custom VCS

Push-to-deploy Yes Yes

Path filtering Yes Yes

PR/MR previews Yes No

Commit status checks Yes No

PR comments Yes No

Review stacks Yes No

Features like PR comments, commit statuses, and review stacks require deep API integration with each VCS platform, so they aren’t available with Custom VCS. If your VCS provider is GitHub, GitLab, or Azure DevOps, we recommend using the native integration for the full feature set.

Neo support

Neo, Pulumi’s AI assistant, works with Custom VCS integrations for repository operations that don’t depend on VCS-specific APIs. Neo can clone and push to Git and Mercurial repositories registered with your Custom VCS integration using the credentials from the integration’s ESC environment. Neo cannot open pull requests or create new repositories on Custom VCS servers at this time. Those operations require APIs unique to each VCS platform and are only available through native integrations.

Get started

To set up a Custom VCS integration:

  • Navigate to Management > Version control in Pulumi Cloud.

  • Select Add integration and choose Custom VCS.

  • Provide a name, base URL, and ESC environment containing your credentials.

  • Add your repositories.

  • Configure your VCS server to send webhooks to the provided URL.

For the full setup guide including webhook payload format, HMAC signing, and credential configuration, see the Custom VCS documentation.

Learn more

Neo already helps your team manage Pulumi infrastructure, but no infrastructure team works inside Pulumi alone. Pages come from PagerDuty, telemetry from Datadog or Honeycomb, follow-ups from Linear or Jira. Most of the job is shuttling context between those tools.

Today we’re launching the Integration Catalog for Pulumi Neo: one place to connect Neo to the tools your team already uses, so your agent has the context it needs to help.

Six integrations in the launch catalog

Neo ships with six integrations at launch, each exposed to the agent through the Model Context Protocol (MCP):

  • Atlassian — Jira issues, Confluence pages, project context

  • Datadog — metrics, logs, monitors

  • Honeycomb — traces and observability queries

  • Linear — issue tracking and project workflows

  • PagerDuty — incidents, on-call schedules, escalations

  • Supabase — database management and edge functions

Each integration is a remote MCP server. Neo calls the integration through a structured tool protocol and only sees the tools the vendor chooses to expose.

Neo in action: one task, many systems

A latency spike showed up in Datadog yesterday afternoon, and you want to know whether your deploy caused it.

You: Neo, our payments stack saw elevated p95 starting around 3pm yesterday. Did our deploy cause it? Check Datadog and Honeycomb.

Neo lines up the Pulumi update history for the payments stack against the latency and error-rate metrics in Datadog around the same window, then surfaces the top slow traces in Honeycomb to confirm the suspect change.

You: Open a Linear ticket on the platform team with the findings and link the offending update.

Neo opens the Linear issue with the summary, the Pulumi update URL, and a pointer to the Datadog dashboard, all without you leaving the chat or copy-pasting context between tabs.

How the Integration Catalog works

Admins configure credentials once. In your org’s Neo settings, open the Integration Catalog, pick an integration, and paste in an API token or service-account key.

Your team gets the capability immediately. No per-user setup, no extra OAuth flow for each developer, no asking platform to share a token in 1Password.

Credentials stay encrypted at rest. When a task runs, the service decrypts the configured credentials just long enough to hand them to the agent runtime as MCP server auth.

What’s coming next: CLI, OAuth, and access controls

This is the first cut. Here’s what we’re working on:

  • CLI integrations — give Neo access to command-line tools like kubectl, aws, gcloud, and az.

  • OAuth integrations — for providers whose hosted MCP servers only speak OAuth (Notion, Sentry, Vercel), and for orgs that want per-user credentials.

  • Per-integration access controls — team-scoped policies so admins can say “only the platform team can let Neo touch PagerDuty.”

Try it out

The Integration Catalog is available now for Neo-enabled organizations. Open your org’s Neo settings, head to the Integrations tab, and connect the first tool you reach for when something breaks. The Neo integrations docs walk through the setup for each one.

As always, we’d love to hear what’s missing. File a feature request in pulumi-cloud-requests with the integration you want next. We’re prioritizing based on what teams actually use.

Happy building.

Last Checked
7h ago
Latest
Jun 9, 2026
Tracking since Dec 16, 2025