releases.shpreview
Home/Pulumi
Pulumi

Pulumi

Deploy on git tag push

This release1 featureNew capabilitiesAI-tallied from the release notes

A git tag is how many teams mark a release as ready. Pulumi Deployments can now act on that signal directly: configure a tag-based trigger, push a version tag like v1.2.0, and Pulumi automatically runs pulumi up for your stack. No extra pipeline glue, no manual click — your release tag is the deployment.

Why tags?

Push to Deploy has long let you preview changes on a pull request and update a stack when commits merge to a branch. That branch-based model is a great fit for continuous delivery to shared development and QA environments, where every merge should flow straight through.

But promotion to production is often deliberate, not continuous. You merge throughout the day, then decide — separately — that a particular commit is the release. The conventional way to record that decision is a git tag: v1.2.0, 2026.06.0, release-2026-06-04. Tagging is already part of most teams’ release rituals.

Tag-based triggers connect that ritual to your infrastructure. Instead of wiring up a separate CI job to call the Pulumi Deployments REST API on a tag event, you configure the trigger once in your stack’s deployment settings and let Pulumi handle the rest.

How it works

Tag triggers are controlled by two settings on your stack’s deployment configuration:

  • Run updates for pushed tags — a toggle that enables running pulumi up when a matching tag is pushed.
  • Tag filters — a list of glob patterns that decide which tag names qualify.

Tag filters use the same model as the path filters you may already know, except the patterns match against the tag name rather than changed file paths. A few examples:

  • v* — deploy on any tag beginning with v, such as v1.0.0 and v2.3.1.
  • v* plus !*-rc* — deploy on release tags but skip release candidates like v1.2.0-rc1.
  • 2026.* — deploy on calendar-versioned releases such as 2026.06.0.

Filters prefixed with ! are exclusions, and an exclusion always wins over an include. With no filters configured and the toggle on, every tag push deploys. Deleting a tag never triggers a deployment.

When a tag push kicks off a deployment, Pulumi sets the PULUMI_CI_TAG_NAME environment variable to the tag name. Your pre-run commands or your Pulumi program can read it — for example, to stamp the release version onto a resource tag or an application config value.

Works across every VCS integration

Tag triggers are available across all five version control integrations: GitHub, GitLab, Bitbucket, Azure DevOps, and Custom VCS.

Get started

You can configure tag triggers wherever you manage deployment settings today — the Pulumi Cloud console, the REST API, or as code with the pulumiservice.DeploymentSettings resource.

To try it out:

  1. Open a stack’s Settings > Deploy tab in the Pulumi Cloud console.
  2. Enable Run updates for pushed tags and add a tag filter such as v*.
  3. Push a tag — git tag v1.0.0 && git push origin v1.0.0 — and watch the deployment run.

For the full details, see the deployment triggers and tag filtering documentation. We’d love to hear how you put tag-based deployments to work.

Stack init renamed to new; onError hooks no longer dropped

This release4 featuresNew capabilities1 enhancementImprovements to existing features3 fixesBug fixesAI-tallied from the release notes
Pulumi · v3.245.0

3.245.0 (2026-06-04)

Improvements
  • [backend/diy] Update gocloud.dev to 0.46 and aws sdk to v2 #23421
  • [cli] Show download and unpack progress when installing provider plugins #https://github.com/pulumi/pulumi/issues/23408
  • [cli/do] Expose the selected stack's organization and short name to PCL input files when running pulumi do inside a project
  • [cli/do] Suggest similar tokens when an unknown token is passed to pulumi do #23341
  • [cli/neo] Support up/down arrows to scroll through prompt history in pulumi neo #23425
  • [cli/stack] Remame pulumi stack init to new #23422
Bug Fixes
  • [engine] Download the HCL language runtime on demand instead of bundling it #23356
  • [sdk/nodejs] Fix mergeOptions dropping onError hooks from ResourceOptions in the Node.js SDK
  • [cli/auth] Delete shared temporary agent credentials when logging out
Miscellaneous
  • [sdk/go] Remove "name" from plugin loading functions and require "Type" on Configure & DiffConfig calls #23363

If you run AI tools and agents, you’ve probably accepted three tradeoffs: your data leaves your network, you can’t work offline, and your bill scales with usage.

Open-weight models now run well on consumer hardware. Once the model is on your machine, your data stays local, inference works offline, and tokens cost nothing. If you own a modern Mac, you can run a high-quality model yourself.

Gemma 4 is an open-weights model family from Google. This post focuses on Gemma 4 12 B, released in June 2026, using Unsloth’s Q8_0 GGUF. The 12 B model fits comfortably on a modern Mac while leaving enough headroom for local llama.cpp and a chat UI.

We’ll use llama.cpp for host-native inference, k3d for a local Kubernetes cluster, Pulumi for infrastructure as code, and Tailscale for secure access.

Prerequisites

This setup was validated on the following hardware:

  • macOS 26 Tahoe, version 26.5
  • MacBook Pro with Apple M3 Max
  • 36 GB RAM

On this machine, llama.cpp reported about 20 output tokens per second for a 160-token validation response with unsloth/gemma-4-12b-it-GGUF, gemma-4-12b-it-Q8_0.gguf, and a 131,072-token context. Sustained throughput varies by prompt length, thermal state, and llama.cpp settings.

You’ll need brew, docker, pulumi, and tailscale installed. We’ll also install k3d during the process.

Run Gemma 4 with host-native llama.cpp

We use llama.cpp directly on macOS to leverage Apple Metal acceleration. Running the LLM on the host is more efficient than trying to pass GPU access into a local Kubernetes VM.

Install the build tools:

<span class="line"><span class="cl">brew install cmake git
</span></span>

Then build llama.cpp from source and download the multimodal projector. In validation, Homebrew llama.cpp 9430 could run text inference, but it could not load the new Gemma 4 12 B projector and failed with unknown projector type: gemma4uv. Building current llama.cpp from source fixed that.

<span class="line"><span class="cl"><span class="nv">llm_home</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm"</span>
</span></span><span class="line"><span class="cl">mkdir -p <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/models"</span> <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/logs"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="o">[</span> ! -d <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp/.git"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl"> git clone --depth <span class="m">1</span> https://github.com/ggml-org/llama.cpp.git <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp"</span>
</span></span><span class="line"><span class="cl"><span class="k">fi</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cmake -S <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -B <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp/build"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -DGGML_METAL<span class="o">=</span>ON <span class="se">\
</span></span></span><span class="line"><span class="cl"> -DGGML_BLAS<span class="o">=</span>ON <span class="se">\
</span></span></span><span class="line"><span class="cl"> -DCMAKE_BUILD_TYPE<span class="o">=</span>Release
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cmake --build <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/llama.cpp/build"</span> --target llama-server -j <span class="m">10</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">curl -L --fail <span class="se">\
</span></span></span><span class="line"><span class="cl"> --output <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/models/mmproj-F16.gguf"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/main/mmproj-F16.gguf
</span></span>

Then download and run the model with this command:

<span class="line"><span class="cl"><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm/llama.cpp/build/bin/llama-server"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --hf-repo unsloth/gemma-4-12b-it-GGUF <span class="se">\
</span></span></span><span class="line"><span class="cl"> --hf-file gemma-4-12b-it-Q8_0.gguf <span class="se">\
</span></span></span><span class="line"><span class="cl"> --mmproj <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm/models/mmproj-F16.gguf"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --host 127.0.0.1 <span class="se">\
</span></span></span><span class="line"><span class="cl"> --port <span class="m">18080</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --ctx-size <span class="m">131072</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --parallel <span class="m">1</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --jinja <span class="se">\
</span></span></span><span class="line"><span class="cl"> --reasoning off
</span></span>

We use port 18080 because 8080 is commonly used and is likely to conflict with another service you may already have running locally. If your port 8080 is free, you can use it and adjust the Pulumi config later.

The model file is about 12.65 GB, and the projector is about 116 MB. Gemma 4 12 B advertises a 131,072-token context, and this Mac loaded that full context with --parallel 1. llama.cpp projected about 15.1 GiB of Apple Metal device memory for the text model and about 258 MiB worst-case memory for the projector, leaving enough headroom for Open WebUI and the rest of the local stack. The --reasoning off flag keeps OpenAI-compatible chat responses visible in clients that do not read separate reasoning fields.

With --mmproj, /v1/models advertised capabilities: ["completion","multimodal"]. In local validation, Open WebUI accepted an uploaded Pulumi logo image and Gemma 4 described it correctly. A small WAV file also worked through the OpenAI-compatible input_audio request shape, though llama.cpp logs still mark audio input as experimental.

Verify the LLM API

Open a new terminal and check if llama.cpp is responding:

<span class="line"><span class="cl">curl http://127.0.0.1:18080/v1/models
</span></span>

The /v1/models endpoint should return the model ID unsloth/gemma-4-12b-it-GGUF. Now try a chat completion:

<span class="line"><span class="cl">curl http://127.0.0.1:18080/v1/chat/completions <span class="se">\
</span></span></span><span class="line"><span class="cl"> -H <span class="s2">"Content-Type: application/json"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -d <span class="s1">'{
</span></span></span><span class="line"><span class="cl"><span class="s1"> "model": "unsloth/gemma-4-12b-it-GGUF",
</span></span></span><span class="line"><span class="cl"><span class="s1"> "messages": [{"role": "user", "content": "Reply with exactly: OK"}],
</span></span></span><span class="line"><span class="cl"><span class="s1"> "temperature": 0,
</span></span></span><span class="line"><span class="cl"><span class="s1"> "max_tokens": 32
</span></span></span><span class="line"><span class="cl"><span class="s1"> }'</span>
</span></span>

The chat prompt Reply with exactly: OK should return content OK. In validation, llama.cpp reported an output token velocity of about 20 tokens per second for a longer 160-token response.

Keep llama.cpp running after reboot

For a permanent setup, put the llama.cpp startup script and logs under a folder in your home directory and let launchd restart it when you sign in:

<span class="line"><span class="cl"><span class="nv">llm_home</span><span class="o">=</span><span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm"</span>
</span></span><span class="line"><span class="cl">mkdir -p <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/logs"</span> <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cat > <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/start-llama-server.sh"</span> <span class="s"><<'EOF'
</span></span></span><span class="line"><span class="cl"><span class="s">#!/bin/zsh
</span></span></span><span class="line"><span class="cl"><span class="s">set -euo pipefail
</span></span></span><span class="line"><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="cl"><span class="s">export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
</span></span></span><span class="line"><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="cl"><span class="s">exec "$HOME/pulumi-gemma4-llm/llama.cpp/build/bin/llama-server" \
</span></span></span><span class="line"><span class="cl"><span class="s"> --hf-repo unsloth/gemma-4-12b-it-GGUF \
</span></span></span><span class="line"><span class="cl"><span class="s"> --hf-file gemma-4-12b-it-Q8_0.gguf \
</span></span></span><span class="line"><span class="cl"><span class="s"> --mmproj "$HOME/pulumi-gemma4-llm/models/mmproj-F16.gguf" \
</span></span></span><span class="line"><span class="cl"><span class="s"> --host 127.0.0.1 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --port 18080 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --ctx-size 131072 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --parallel 1 \
</span></span></span><span class="line"><span class="cl"><span class="s"> --jinja \
</span></span></span><span class="line"><span class="cl"><span class="s"> --reasoning off
</span></span></span><span class="line"><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">chmod +x <span class="s2">"</span><span class="nv">$llm_home</span><span class="s2">/start-llama-server.sh"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">cat > <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents/com.pulumi.gemma4.llama-server.plist"</span> <span class="s"><<EOF
</span></span></span><span class="line"><span class="cl"><span class="s"><?xml version="1.0" encoding="UTF-8"?>
</span></span></span><span class="line"><span class="cl"><span class="s"><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
</span></span></span><span class="line"><span class="cl"><span class="s"><plist version="1.0">
</span></span></span><span class="line"><span class="cl"><span class="s"><dict>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>Label</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>com.pulumi.gemma4.llama-server</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>ProgramArguments</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <array>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home/start-llama-server.sh</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> </array>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>WorkingDirectory</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>RunAtLoad</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <true/>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>KeepAlive</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <true/>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>StandardOutPath</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home/logs/llama-server.out.log</string>
</span></span></span><span class="line"><span class="cl"><span class="s"> <key>StandardErrorPath</key>
</span></span></span><span class="line"><span class="cl"><span class="s"> <string>$llm_home/logs/llama-server.err.log</string>
</span></span></span><span class="line"><span class="cl"><span class="s"></dict>
</span></span></span><span class="line"><span class="cl"><span class="s"></plist>
</span></span></span><span class="line"><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">launchctl bootout gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server 2>/dev/null <span class="o">||</span> <span class="nb">true</span>
</span></span><span class="line"><span class="cl">launchctl bootstrap gui/<span class="k">$(</span>id -u<span class="k">)</span> <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents/com.pulumi.gemma4.llama-server.plist"</span>
</span></span><span class="line"><span class="cl">launchctl kickstart -k gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server
</span></span>

Check the launchd service and llama.cpp logs:

<span class="line"><span class="cl">launchctl print gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server
</span></span><span class="line"><span class="cl">tail -f <span class="s2">"</span><span class="nv">$HOME</span><span class="s2">/pulumi-gemma4-llm/logs/llama-server.err.log"</span>
</span></span>

If you want to stop llama.cpp later, unload the launchd service:

<span class="line"><span class="cl">launchctl bootout gui/<span class="k">$(</span>id -u<span class="k">)</span>/com.pulumi.gemma4.llama-server
</span></span>

Deploy Open WebUI with Pulumi and k3d

Now we’ll deploy Open WebUI into a local Kubernetes cluster. This provides a polished chat interface that connects to our host-native LLM.

First, install k3d if you haven’t already:

<span class="line"><span class="cl">brew install k3d
</span></span>

Create a new cluster for this project:

<span class="line"><span class="cl">k3d cluster create pulumi-gemma4-blog-qa
</span></span>

We’ll use the Pulumi program in pulumi/examples. This program defaults to runtimeMode=host, which creates a Kubernetes ExternalName service pointing to your host machine.

Why not run the LLM inside Kubernetes on this Mac? Pulumi can do that, and the example supports it with runtimeMode=cluster, but that path is meant for Linux hosts with NVIDIA or AMD GPU device plugins.

On macOS, llama.cpp enables Metal by default, and Metal acceleration is available to native macOS processes. k3d runs Linux containers through Docker Desktop, so those pods do not get direct access to the Mac’s Metal device. Docker’s own vLLM Metal announcement calls out the same boundary: Metal-backed inference runs natively on the host because there is no Metal GPU passthrough for containers. That is why this setup keeps inference host-native and lets Pulumi manage the Kubernetes UI, service wiring, and optional Tailscale access around it.

Clone the examples repo, navigate to the program directory, and initialize a new stack:

<span class="line"><span class="cl">git clone https://github.com/pulumi/examples.git
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> examples/kubernetes-py-self-host-gemma4-llm
</span></span><span class="line"><span class="cl">pulumi stack init gemma4-local
</span></span>

Configure the program to match your local setup:

<span class="line"><span class="cl">pulumi config <span class="nb">set</span> hostLlmPort <span class="m">18080</span>
</span></span><span class="line"><span class="cl">pulumi config <span class="nb">set</span> llmBaseUrl http://llm-server:18080/v1
</span></span>

The Kubernetes service named llm-server maps to host.k3d.internal. In our validation, we confirmed that a disposable k3d pod could reach the Mac’s llama.cpp API at http://llm-server:18080/v1/models after a CoreDNS restart.

<span class="line"><span class="cl">kubectl rollout restart deployment coredns -n kube-system
</span></span>

Run pulumi up to deploy Open WebUI and connect it to host-native llama.cpp:

<span class="line"><span class="cl">pulumi up
</span></span>

In our validation environment, this command successfully reached the resource synthesis phase without requiring Tailscale credentials because Tailscale exposure is opt-in.

Access Open WebUI through Tailscale

Tailscale allows you to access your private Open WebUI instance from any device on your tailnet. Note that we only expose the web interface, not the raw LLM API, to keep the system secure.

The base Open WebUI deployment works without Tailscale credentials. To expose the web UI on your tailnet, enable Tailscale resources and provide an explicit api_key or OAuth/identity token:

<span class="line"><span class="cl">pulumi config <span class="nb">set</span> enableTailscale <span class="nb">true</span>
</span></span><span class="line"><span class="cl">pulumi config <span class="nb">set</span> tailscale:apiKey YOUR_API_KEY --secret
</span></span>

Once configured, Pulumi will create a Tailscale device or proxy that routes traffic to your Open WebUI service.

Use the model with Pi

Open WebUI gives you a browser-based chat interface, but local models are also useful from coding agents. Pi is the local coding agent used for this validation; if you do not use Pi, treat this section as an example of how any OpenAI-compatible client can point at the same local endpoint. Pi can point at the same OpenAI-compatible llama.cpp endpoint and use the model running on your Mac.

For a fresh Pi config, create ~/.pi/agent/models.json with a local provider that points at the llama.cpp API:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"providers"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"local-llama"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"baseUrl"</span><span class="p">:</span> <span class="s2">"http://127.0.0.1:18080/v1"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"api"</span><span class="p">:</span> <span class="s2">"openai-completions"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"apiKey"</span><span class="p">:</span> <span class="s2">"local"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"compat"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"supportsDeveloperRole"</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"supportsReasoningEffort"</span><span class="p">:</span> <span class="kc">false</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"models"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"id"</span><span class="p">:</span> <span class="s2">"unsloth/gemma-4-12b-it-GGUF"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"name"</span><span class="p">:</span> <span class="s2">"Gemma 4 12B Q8 (local llama.cpp)"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"reasoning"</span><span class="p">:</span> <span class="kc">false</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"input"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"text"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"contextWindow"</span><span class="p">:</span> <span class="mi">131072</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"maxTokens"</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"cost"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"input"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"output"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"cacheRead"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"cacheWrite"</span><span class="p">:</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

Then set Pi to use that provider and model by default in ~/.pi/agent/settings.json:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"defaultProvider"</span><span class="p">:</span> <span class="s2">"local-llama"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"defaultModel"</span><span class="p">:</span> <span class="s2">"unsloth/gemma-4-12b-it-GGUF"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"defaultThinkingLevel"</span><span class="p">:</span> <span class="s2">"off"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hideThinkingBlock"</span><span class="p">:</span> <span class="kc">true</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

If you already have Pi configuration files, merge the local-llama provider and defaults into your existing JSON instead of replacing the files.

Advanced: Linux GPU in-cluster serving

If you’re running on a Linux host with an NVIDIA or AMD GPU, you can run the LLM directly inside the Kubernetes cluster. This requires the NVIDIA or ROCm device plugins.

The Pulumi program supports this through runtimeMode=cluster. In this mode, it deploys a LlmServer pod that manages the llama.cpp process within the cluster, using GPU resource requests to ensure hardware acceleration.

Cleanup

When you’re done, you can tear down the resources:

<span class="line"><span class="cl">pulumi destroy
</span></span><span class="line"><span class="cl">k3d cluster delete pulumi-gemma4-blog-qa
</span></span><span class="line"><span class="cl"><span class="c1"># Stop llama.cpp using the PID from your terminal</span>
</span></span><span class="line"><span class="cl"><span class="nb">kill</span> <PID>
</span></span>

AWS reports in an AWS Architecture Blog case study that Deloitte’s move to a virtual cluster model on Amazon EKS resulted in 89% faster testing environment provisioning. By consolidating dozens of disparate clusters into a single host cluster with over 50 vCluster instances, the case study says Deloitte saved about 500 QA hours per year. This “Environment Factory” pattern allows platform teams to provide isolated, ephemeral Kubernetes environments on demand without the cost or lag of full cluster provisioning.

This post adapts that general architecture with Pulumi to orchestrate Amazon EKS Auto Mode and vCluster.

The problem: environment sprawl and provisioning lag

Traditional development workflows often rely on one full EKS cluster per developer or feature branch. While this provides strong isolation, it introduces major pain points. Provisioning a full cluster can take 15 minutes or more, which slows down CI/CD pipelines. Managing dozens of clusters also leads to high costs and significant operational overhead.

Platform teams need a “soft multi-tenancy” model. This model should feel like a dedicated cluster to the developer but run on shared infrastructure to keep costs low and startup times fast.

Architecture overview: the host and the tenants

The environment factory architecture consists of two main layers.

  1. Host cluster: A single, reliable EKS cluster managed with EKS Auto Mode. This cluster provides the underlying compute, networking, and storage.
  2. Tenant environments: Virtual clusters (vCluster) running as pods within host namespaces.

According to the vCluster architecture, the virtual control plane handles API requests while a syncer maps virtual resources to the host cluster. This separation allows tenants to manage their own CRDs, namespaces, and RBAC while platform teams use quotas, NetworkPolicies, pod security, IAM boundaries, and node isolation controls to protect the host and other tenants.

Implementation: the EKS Auto Mode host

EKS Auto Mode simplifies the host cluster by automating infrastructure management. It handles node provisioning, scaling, and updates based on pod requirements.

The following snippet shows how to define an EKS cluster with Auto Mode enabled using Pulumi.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">awsx</span> <span class="kr">from</span> <span class="s2">"@pulumi/awsx"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">eks</span> <span class="kr">from</span> <span class="s2">"@pulumi/eks"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="p">{</span> <span class="nx">SubnetType</span> <span class="p">}</span> <span class="kr">from</span> <span class="s2">"@pulumi/awsx/ec2"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">clusterName</span> <span class="o">=</span> <span class="s2">"environment-factory"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vpc</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">awsx</span><span class="p">.</span><span class="nx">ec2</span><span class="p">.</span><span class="nx">Vpc</span><span class="p">(</span><span class="s2">"environment-factory"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">enableDnsHostnames</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">cidrBlock</span><span class="o">:</span> <span class="s2">"10.0.0.0/16"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">subnetSpecs</span><span class="o">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="kr">type</span><span class="o">:</span> <span class="nx">SubnetType</span><span class="p">.</span><span class="nx">Public</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">tags</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="sb">`kubernetes.io/cluster/</span><span class="si">${</span><span class="nx">clusterName</span><span class="si">}</span><span class="sb">`</span><span class="p">]</span><span class="o">:</span> <span class="s2">"shared"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"kubernetes.io/role/elb"</span><span class="o">:</span> <span class="s2">"1"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="kr">type</span><span class="o">:</span> <span class="nx">SubnetType</span><span class="p">.</span><span class="nx">Private</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">tags</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="sb">`kubernetes.io/cluster/</span><span class="si">${</span><span class="nx">clusterName</span><span class="si">}</span><span class="sb">`</span><span class="p">]</span><span class="o">:</span> <span class="s2">"shared"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"kubernetes.io/role/internal-elb"</span><span class="o">:</span> <span class="s2">"1"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">subnetStrategy</span><span class="o">:</span> <span class="s2">"Auto"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Create an EKS cluster with Auto Mode enabled.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">hostCluster</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">eks</span><span class="p">.</span><span class="nx">Cluster</span><span class="p">(</span><span class="s2">"host-cluster"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">name</span>: <span class="kt">clusterName</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">authenticationMode</span>: <span class="kt">eks.AuthenticationMode.Api</span><span class="p">,</span> <span class="c1">// Use API authentication mode for EKS access entries.
</span></span></span><span class="line"><span class="cl"> <span class="nx">vpcId</span>: <span class="kt">vpc.vpcId</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">publicSubnetIds</span>: <span class="kt">vpc.publicSubnetIds</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">privateSubnetIds</span>: <span class="kt">vpc.privateSubnetIds</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">autoMode</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">enabled</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">hostProvider</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">Provider</span><span class="p">(</span><span class="s2">"host-provider"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kubeconfig</span>: <span class="kt">hostCluster.kubeconfig</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

Implementation: the environment factory

Once the host cluster is ready, we can build the factory that stamps out tenant environments. Each tenant needs a dedicated namespace, resource quotas, and the vCluster itself.

Tenant guardrails

Before installing vCluster, we set up a namespace and resource quotas to ensure one tenant cannot consume all host resources.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Define a tenant namespace on the host cluster.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">tenantNamespace</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">core</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">Namespace</span><span class="p">(</span><span class="s2">"tenant-alpha"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="nx">name</span><span class="o">:</span> <span class="s2">"tenant-alpha"</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Apply resource quotas to the tenant namespace.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">quota</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">core</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">ResourceQuota</span><span class="p">(</span><span class="s2">"tenant-quota"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nx">spec</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">hard</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">pods</span><span class="o">:</span> <span class="s2">"20"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"requests.cpu"</span><span class="o">:</span> <span class="s2">"4"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"requests.memory"</span><span class="o">:</span> <span class="s2">"8Gi"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"limits.cpu"</span><span class="o">:</span> <span class="s2">"8"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"limits.memory"</span><span class="o">:</span> <span class="s2">"16Gi"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Define a Role for the tenant within their namespace.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">tenantRole</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">rbac</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"tenant-role"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nx">rules</span><span class="o">:</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">apiGroups</span><span class="o">:</span> <span class="p">[</span><span class="s2">""</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">resources</span><span class="o">:</span> <span class="p">[</span><span class="s2">"pods"</span><span class="p">,</span> <span class="s2">"services"</span><span class="p">,</span> <span class="s2">"configmaps"</span><span class="p">,</span> <span class="s2">"secrets"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">verbs</span><span class="o">:</span> <span class="p">[</span><span class="s2">"get"</span><span class="p">,</span> <span class="s2">"list"</span><span class="p">,</span> <span class="s2">"watch"</span><span class="p">,</span> <span class="s2">"create"</span><span class="p">,</span> <span class="s2">"update"</span><span class="p">,</span> <span class="s2">"patch"</span><span class="p">,</span> <span class="s2">"delete"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="p">}],</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Bind the Role to a tenant user or group.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">tenantRoleBinding</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">rbac</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">RoleBinding</span><span class="p">(</span><span class="s2">"tenant-role-binding"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">metadata</span><span class="o">:</span> <span class="p">{</span> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nx">subjects</span><span class="o">:</span> <span class="p">[{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kind</span><span class="o">:</span> <span class="s2">"User"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="c1">// Replace "tenant-user" with the IAM-mapped user or group for this tenant.
</span></span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"tenant-user"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">apiGroup</span><span class="o">:</span> <span class="s2">"rbac.authorization.k8s.io"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">}],</span>
</span></span><span class="line"><span class="cl"> <span class="nx">roleRef</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kind</span><span class="o">:</span> <span class="s2">"Role"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">name</span>: <span class="kt">tenantRole.metadata.name</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">apiGroup</span><span class="o">:</span> <span class="s2">"rbac.authorization.k8s.io"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span>

For production use, map these Kubernetes identities to IAM principals using EKS Access Entries, with the legacy aws-auth ConfigMap still appearing in older clusters.

Deploying vCluster with Helm

We use the kubernetes.helm.v3.Release resource to install vCluster. This resource provides controlled Helm lifecycle management for the vCluster release. The values block should be adjusted for each tenant profile to control resource synchronization and control plane behavior. Review the vCluster release notes when changing chart versions because values schema and generated secret names can change across releases.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Install vCluster using the Helm Release resource.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vcluster</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">helm</span><span class="p">.</span><span class="nx">v3</span><span class="p">.</span><span class="nx">Release</span><span class="p">(</span><span class="s2">"vcluster-alpha"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"vcluster-alpha"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">chart</span><span class="o">:</span> <span class="s2">"vcluster"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">version</span><span class="o">:</span> <span class="s2">"0.20.0"</span><span class="p">,</span> <span class="c1">// Tested with vCluster 0.20.x; review release notes before changing versions.
</span></span></span><span class="line"><span class="cl"> <span class="nx">repositoryOpts</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">repo</span><span class="o">:</span> <span class="s2">"https://charts.loft.sh"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="kr">namespace</span><span class="o">:</span> <span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">values</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="c1">// Explicit sync configuration; adjust per tenant profile.
</span></span></span><span class="line"><span class="cl"> <span class="nx">sync</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">toHost</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">pods</span><span class="o">:</span> <span class="p">{</span> <span class="nx">enabled</span>: <span class="kt">true</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">},</span> <span class="p">{</span> <span class="nx">provider</span>: <span class="kt">hostProvider</span> <span class="p">});</span>
</span></span>
Accessing the virtual cluster

The vCluster generates a kubeconfig that allows developers to interact with the virtual API server. We must treat this kubeconfig as a secret, and the endpoint in that kubeconfig must be reachable from the Pulumi runner before using it to create resources inside the virtual cluster.

<span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">pulumi</span> <span class="kr">from</span> <span class="s2">"@pulumi/pulumi"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">k8s</span> <span class="kr">from</span> <span class="s2">"@pulumi/kubernetes"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Retrieve the vCluster kubeconfig from the generated secret.
</span></span></span><span class="line"><span class="cl"><span class="c1">// The vCluster-generated secret can lag behind Helm release readiness on first creation,
</span></span></span><span class="line"><span class="cl"><span class="c1">// so teams may choose an explicit readiness check or rerun after the virtual control plane initializes.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vclusterKubeconfig</span> <span class="o">=</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">core</span><span class="p">.</span><span class="nx">v1</span><span class="p">.</span><span class="nx">Secret</span><span class="p">.</span><span class="kr">get</span><span class="p">(</span><span class="s2">"vcluster-secret"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">pulumi</span><span class="p">.</span><span class="nx">interpolate</span><span class="sb">`</span><span class="si">${</span><span class="nx">tenantNamespace</span><span class="p">.</span><span class="nx">metadata</span><span class="p">.</span><span class="nx">name</span><span class="si">}</span><span class="sb">/vc-vcluster-alpha`</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">provider</span>: <span class="kt">hostProvider</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">dependsOn</span><span class="o">:</span> <span class="p">[</span><span class="nx">vcluster</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">).</span><span class="nx">data</span><span class="p">.</span><span class="nx">apply</span><span class="p">(</span><span class="nx">data</span> <span class="o">=></span> <span class="nx">Buffer</span><span class="p">.</span><span class="kr">from</span><span class="p">(</span><span class="nx">data</span><span class="p">[</span><span class="s2">"config"</span><span class="p">],</span> <span class="s2">"base64"</span><span class="p">).</span><span class="nx">toString</span><span class="p">());</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Export the kubeconfig as a secret.
</span></span></span><span class="line"><span class="cl"><span class="kr">export</span> <span class="kr">const</span> <span class="nx">tenantKubeconfig</span> <span class="o">=</span> <span class="nx">pulumi</span><span class="p">.</span><span class="nx">secret</span><span class="p">(</span><span class="nx">vclusterKubeconfig</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">// Create a provider for the virtual cluster using the secret kubeconfig.
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">vclusterProvider</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">k8s</span><span class="p">.</span><span class="nx">Provider</span><span class="p">(</span><span class="s2">"vcluster-provider"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">kubeconfig</span>: <span class="kt">tenantKubeconfig</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

Operational caveats

  • RBAC and permissions: vCluster generates default RBAC rules that work for most scenarios. However, if your host cluster is heavily locked down, you may need to provide additional permissions to the vCluster service account.
  • Helm release previews: When using kubernetes.helm.v3.Release, Pulumi previews may not show every detail of the rendered Kubernetes resources. It primarily tracks the state of the Helm release itself.
  • EKS Auto Mode node lifetime: EKS Auto Mode uses immutable AMIs and has a 21-day node lifetime. Kubernetes reschedules vCluster pods and tenant workloads when nodes are replaced, so configure replicas, PodDisruptionBudgets, requests, and persistent storage for disruption tolerance.

Conclusion: ephemeral environments at scale

By combining Pulumi with EKS Auto Mode and vCluster, you can build a scalable environment factory. This approach provides the isolation developers need while maintaining the speed and cost-efficiency required by platform teams.

The snippets provided here are adapted for illustration. In a production environment, you would likely wrap these resources into a Pulumi ComponentResource to provide a clean, reusable API for your internal developers. When a feature branch is merged, deleting the Pulumi stack removes the resources managed by that stack, but validate namespace finalizers, persistent volume reclaim policies, and external cloud artifacts as part of cleanup.

For more on managing EKS with Pulumi, see the EKS guide.

AI coding has two shapes right now. One agent in a loop, sequential work, you babysitting the chat window. Call that 2x. Most teams live here. Five agents in worktrees, parallel work, fresh-context review on every change. Call that 10x. The trick: 2x is mostly prompting, 10x is mostly plumbing.

The parallel coding playbook is a five-pattern setup for running multiple AI coding agents at the same time without them stepping on each other: an issue used as the spec, a plan/build/validate loop, parallel git worktrees, fresh-session review, and a self-healing layer. The whole thing targets application code. The interesting question, and the one I keep ending up at, is what changes when the five agents are touching infrastructure.

2x is prompting, 10x is plumbing

2x is one human, one agent, one repo, one branch. The agent writes, you review, you tell it to try again, it tries again. The bottleneck is your attention. Whatever the agent’s raw throughput, your reading speed sets the ceiling.

10x moves you out of the per-change loop and into the issue loop. You write five issues with sharp acceptance criteria, send each one to its own agent in its own worktree, and let them plan, build, and validate end-to-end. You read five PRs at lunch instead of pair-programming on one all morning.

Concurrent isolation does the work. And isolation is mostly an infrastructure problem.

The five pillars

The five pillars, in one sentence each.

  1. Issue is the spec. The GitHub issue carries the acceptance criteria. The pull request is the artifact that gets validated. Input and output of every implementation are versioned, scoped, and reviewable on their own.
  2. Plan, build, validate. Three stages, three artifacts. A markdown plan you can read in thirty seconds. A build that produces a diff. A validate step that checks the diff against the spec.
  3. Parallel worktrees. Each agent runs in its own git worktree so concurrent changes never trample each other. One repo, five working trees, five branches.
  4. Fresh-session review. A different agent, a different conversation, no shared context, reads the output and judges it. The reviewer never sees the writer’s chat. An agent reviewing its own output in the same context is theater.
  5. Self-healing layer. When the same issue keeps coming back, fix the system that allowed it. Update the rules, the skills, the AGENTS.md. The agent gets better; the bug class disappears.

The application-code version of this playbook leans on ports, node_modules, and databases to get isolation right. The infrastructure version has a different toolbox.

What changes when the agents are touching infrastructure

Walk the pillars again, this time with a Pulumi shop in mind.

Issue is the spec. For application code, the spec describes behavior. For infrastructure, the spec is a Pulumi component contract plus a CrossGuard policy excerpt. “The resulting bucket is private, lives in eu-west-1, has SSE-KMS, and is tagged owner=team-x.” That sentence compiles to a typed component signature and three policy assertions. The agent does not get to interpret “looks right.” The acceptance criteria are deterministic, which is the whole reason this works.

Plan, build, validate. Pulumi already ships the validate step. pulumi preview produces a deterministic, machine-readable diff a second reviewer can judge without the conversation that produced it. The plan is a markdown doc the agent writes before touching code. The build is pulumi up --target against a review stack scoped to the resources the issue covers. The validate step is the preview output plus the CrossGuard verdict.

Parallel worktrees. Worktrees alone are not enough. Two worktrees pointing at the same Pulumi stack will fight over state on the first concurrent up. The unit of isolation for infrastructure is the stack, not the worktree. Each worktree gets its own ephemeral review stack and its own ESC environment for credentials. State branches with the work, credentials branch with the work, and the cloud account does not see five agents elbowing each other.

Fresh-session review. The hardest part of the application-code version is keeping the reviewer cold. For infrastructure, the substrate hands you the cold context. The pulumi preview JSON has no memory of the prompt that produced it. A separate agent reading it has the same starting point a human reviewer has: a diff, a stack name, a policy report. Pulumi Neo reasons over the state graph directly, so the reviewer grounds every claim in what the change actually does, not what the writer says it does. Reviewer quality still depends on how well your policies cover the stack, but the cold-context part comes built in.

Self-healing layer. Most CrossGuard rule messages today read like assertions. “S3 bucket has no encryption.” A self-healing layer needs them to read like instructions. “S3 bucket has no encryption. Set serverSideEncryptionConfiguration with SSE-KMS to fix.” That single rewrite is the difference between an agent flailing and an agent fixing the violation on the first try. When the same rule keeps tripping, the fix is upstream of the next pull request: in the rules, in the skills, in the policy itself.

The five catches, infra edition

Every parallelism story has a catch list. The application-code version lists port conflicts, node_modules sprawl, database conflicts, token blowouts, and PR pile-up. The infrastructure equivalents map almost one to one.

Port conflicts become stack-name collisions. Two agents naming their stack dev and racing each other into Pulumi Cloud. The fix is the same hash-the-path trick the app-code playbook uses: derive the stack name from pulumi.getProject() plus a hash of the worktree path. Resource names follow the same pattern. Collisions go away.

node_modules sprawl becomes provider plugin sprawl, mostly already solved. Three worktrees each pulling their own copy of pulumi-aws would add up fast, except Pulumi already shares plugins through a single cache at ~/.pulumi/plugins. Identical provider versions are reused across worktrees automatically. Per-worktree language SDKs (node_modules, venv) still need the usual care, but the provider layer is free.

Database conflicts become state conflicts. Two agents racing each other into pulumi up on the same stack is the same hazard as two agents writing to the same migrated database. The app-code playbook reaches for Neon branches or per-worktree SQLite files to isolate state. The infra answer is simpler: each worktree gets its own review stack. State branches with the work, by construction.

Token blowouts become cloud spend per ephemeral stack. The cost vector flips. For app code, the worry is LLM bills. For infrastructure, the worry is what your five agents just spun up in five review stacks. The mitigations are boring and they work. Use TTL stacks to tear review stacks down on a schedule. Avoid retainOnDelete on review-stack resources so the teardown actually frees them. Cap retries per spec. Watch the bill.

PR pile-up is the same problem. Five reviewed diffs are still five things waiting on the merge queue. The infra-flavored mitigations: stack-scoped reviewers (the human who owns the stack approves the change to it), the Pulumi Cloud audit log for grouping by stack and time, and auto-merge for the narrow class of changes where the preview diff is clean and every policy passes. That last one is where most of the throughput hides.

Where to start, this afternoon

Three steps, in order, on a stack with a small blast radius.

  1. Write an AGENTS.md for the repo. Five paragraphs is enough. The component library, the stack naming convention, the policy rules, the review-stack TTL, and the one thing in this repo that bites every newcomer. Neo reads AGENTS.md natively, as do most coding agents. This file is the spec for how the agent should behave even before you write a spec for what it should build.
  2. Cut a 24-hour review-stack TTL. Spin up a review stack on PR open, tear it down on PR close or after 24 hours, whichever comes first. This is the gate that turns “ephemeral” from a slogan into a line item that does not appear on next month’s bill.
  3. Run three issues in parallel. Pick three open issues that touch unrelated resources. Spin up three worktrees, three review stacks, three ESC environments. Let each agent run end-to-end against its own stack. Then have a fourth agent read each preview JSON cold and produce a one-paragraph review. Read three PRs plus the reviewer’s summary at lunch.

That last step is the measurement. The first time you run it, half of the changes will fail validation. The second time, fewer. By the third time you will know whether your spec quality, your policies, and your stack hygiene are good enough to scale this to five, then ten, then to every issue tagged infra:fix.

If three issues finish cleanly, you have the substrate. If they do not, the gap is almost always in the spec or the policy rules, not the agent. Fix the spec, tighten the rule, run it again.

Five stacks before lunch

10x is five concurrent agents, working from five issues, against five stacks, behind five fresh-session reviews. The substrate is already there. Stacks isolate state. ESC isolates credentials. pulumi preview is the deterministic artifact a fresh reviewer can read cold. CrossGuard is the self-healing layer when you write the rule messages as instructions.

The remaining work is small and mostly wiring. Write the AGENTS.md. Cut the TTL. Pick three issues that touch unrelated resources. Read three PRs at lunch. Five stacks before the room empties out is a realistic Monday.

See how Pulumi Neo runs your stacks

Snapshot integrity fixed with up --refresh; provider errors now forwarded to hooks

This release6 fixesBug fixesAI-tallied from the release notes
Pulumi · v3.244.0

3.244.0 (2026-05-28)

Bug Fixes
  • [cli/do] Fix top level flags like --logtostderr being recognized when using pulumi do #23355
  • [cli/install] Distinguish multiple packages with the same plugin
  • [engine] Fix snapshot integrity issue with up --refresh
  • [engine] Forward all provider errors to error hooks for retry #23347
  • [engine] Trace cancel RPCs sent to plugins during shutdown as children of the active operation instead of emitting separate root spans #23362
  • [sdk/python] Merge requirements.txt into pyproject.toml when using uv without a project section #23340

Service Provider now auto-generates from OpenAPI; RBAC and audit logs as code

This release3 featuresNew capabilitiesAI-tallied from the release notes

Today, we are announcing v1.0 of the Pulumi Service Provider: a major milestone in managing Pulumi Cloud with Pulumi itself. The provider is now generated directly from the Pulumi Cloud OpenAPI specification, unlocking a dramatically expanded pulumiservice:api/* resource surface and enabling Pulumi Cloud capabilities to become available in the provider faster than ever before.

This release also brings several major new capabilities to infrastructure as code, including fine-grained RBAC as code, Pulumi IDP as code, and audit log export as IaC. Together, these changes make the Pulumi Service Provider the most powerful and extensible way yet to manage and automate your Pulumi Cloud infrastructure.

Why this matters for users

Historically, every new Pulumi Cloud feature implied a follow-up PR in the provider before that feature could be used from a Pulumi program. The provider was always slightly behind the API it wrapped, and entirely new capability areas could take months to land.

The api/* surface changes both timelines. Because the schema is derived from the OpenAPI spec at runtime:

  1. Whole new resource families land in the provider the same release they reach Pulumi Cloud.
  2. New fields, features, and enum values on existing resources show up across all five language SDKs the soon after they appear in the spec.

What’s new in v1.0

v1.0 lifts whole capability areas of Pulumi Cloud into the api/* surface, not just incremental field additions. None of it required bespoke provider code.

  1. Fine-grained RBAC as code. Custom roles, organization membership, and team role assignments are now managed resources. For example, defining a read-only role and assigning it to a team:

    <span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">readOnly</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ps</span><span class="p">.</span><span class="nx">api</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"readOnly"</span><span class="p">,</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">orgName</span><span class="o">:</span> <span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">description</span><span class="o">:</span> <span class="s2">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">uxPurpose</span><span class="o">:</span> <span class="s2">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">details</span><span class="o">:</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">__type</span><span class="o">:</span> <span class="s2">"PermissionDescriptorAllow"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">permissions</span><span class="o">:</span> <span class="p">[</span><span class="s2">"stack:read"</span><span class="p">,</span> <span class="s2">"stack:list"</span><span class="p">],</span>
    </span></span><span class="line"><span class="cl"> <span class="p">},</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="k">new</span> <span class="nx">ps</span><span class="p">.</span><span class="nx">api</span><span class="p">.</span><span class="nx">teams</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"readOnlyForPlatform"</span><span class="p">,</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">orgName</span><span class="o">:</span> <span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">teamName</span><span class="o">:</span> <span class="s2">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">roleID</span>: <span class="kt">readOnly.roleID</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span>
    <span class="line"><span class="cl"><span class="n">read_only</span> <span class="o">=</span> <span class="n">pulumiservice</span><span class="o">.</span><span class="n">api</span><span class="o">.</span><span class="n">Role</span><span class="p">(</span><span class="s2">"readOnly"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">org_name</span><span class="o">=</span><span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">name</span><span class="o">=</span><span class="s2">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">description</span><span class="o">=</span><span class="s2">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">ux_purpose</span><span class="o">=</span><span class="s2">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">details</span><span class="o">=</span><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="s2">"__type"</span><span class="p">:</span> <span class="s2">"PermissionDescriptorAllow"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="s2">"permissions"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"stack:read"</span><span class="p">,</span> <span class="s2">"stack:list"</span><span class="p">],</span>
    </span></span><span class="line"><span class="cl"> <span class="p">})</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="n">pulumiservice</span><span class="o">.</span><span class="n">api</span><span class="o">.</span><span class="n">teams</span><span class="o">.</span><span class="n">Role</span><span class="p">(</span><span class="s2">"readOnlyForPlatform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">org_name</span><span class="o">=</span><span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">team_name</span><span class="o">=</span><span class="s2">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">role_id</span><span class="o">=</span><span class="n">read_only</span><span class="o">.</span><span class="n">role_id</span><span class="p">)</span>
    </span></span>
    <span class="line"><span class="cl"><span class="nx">readOnly</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">api</span><span class="p">.</span><span class="nf">NewRole</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">"readOnly"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">api</span><span class="p">.</span><span class="nx">RoleArgs</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">OrgName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"acme"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack-reader"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Description</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"Read-only access to stacks across the org."</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">UxPurpose</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"role"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Details</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nx">Map</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"__type"</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"PermissionDescriptorAllow"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"permissions"</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nx">StringArray</span><span class="p">{</span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack:read"</span><span class="p">),</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack:list"</span><span class="p">)},</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">},</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="p">})</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="nx">teams</span><span class="p">.</span><span class="nf">NewRole</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">teams</span><span class="p">.</span><span class="nx">RoleArgs</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">OrgName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"acme"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">TeamName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"platform"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">RoleID</span><span class="p">:</span><span class="w"> </span><span class="nx">readOnly</span><span class="p">.</span><span class="nx">RoleID</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="p">})</span><span class="w">
    </span></span></span>
    <span class="line"><span class="cl"><span class="kt">var</span> <span class="n">readOnly</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Ps</span><span class="p">.</span><span class="n">Api</span><span class="p">.</span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnly"</span><span class="p">,</span> <span class="k">new</span><span class="p">()</span>
    </span></span><span class="line"><span class="cl"><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="n">OrgName</span> <span class="p">=</span> <span class="s">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Name</span> <span class="p">=</span> <span class="s">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Description</span> <span class="p">=</span> <span class="s">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">UxPurpose</span> <span class="p">=</span> <span class="s">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Details</span> <span class="p">=</span> <span class="n">ImmutableDictionary</span><span class="p">.</span><span class="n">CreateRange</span><span class="p">(</span><span class="k">new</span><span class="p">[]</span>
    </span></span><span class="line"><span class="cl"> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">object</span><span class="p">>(</span><span class="s">"__type"</span><span class="p">,</span> <span class="s">"PermissionDescriptorAllow"</span><span class="p">),</span>
    </span></span><span class="line"><span class="cl"> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">object</span><span class="p">>(</span><span class="s">"permissions"</span><span class="p">,</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"stack:read"</span><span class="p">,</span> <span class="s">"stack:list"</span> <span class="p">}),</span>
    </span></span><span class="line"><span class="cl"> <span class="p">}),</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="k">new</span> <span class="n">Ps</span><span class="p">.</span><span class="n">Api</span><span class="p">.</span><span class="n">Teams</span><span class="p">.</span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span> <span class="k">new</span><span class="p">()</span>
    </span></span><span class="line"><span class="cl"><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="n">OrgName</span> <span class="p">=</span> <span class="s">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">TeamName</span> <span class="p">=</span> <span class="s">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">RoleID</span> <span class="p">=</span> <span class="n">readOnly</span><span class="p">.</span><span class="n">RoleID</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span>
    <span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="n">readOnly</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnly"</span><span class="p">,</span><span class="w"> </span><span class="n">RoleArgs</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">orgName</span><span class="p">(</span><span class="s">"acme"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">name</span><span class="p">(</span><span class="s">"stack-reader"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">description</span><span class="p">(</span><span class="s">"Read-only access to stacks across the org."</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">uxPurpose</span><span class="p">(</span><span class="s">"role"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">details</span><span class="p">(</span><span class="n">Map</span><span class="p">.</span><span class="na">of</span><span class="p">(</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"__type"</span><span class="p">,</span><span class="w"> </span><span class="s">"PermissionDescriptorAllow"</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"permissions"</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="p">.</span><span class="na">of</span><span class="p">(</span><span class="s">"stack:read"</span><span class="p">,</span><span class="w"> </span><span class="s">"stack:list"</span><span class="p">)))</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">());</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="k">new</span><span class="w"> </span><span class="n">com</span><span class="p">.</span><span class="na">pulumi</span><span class="p">.</span><span class="na">pulumiservice</span><span class="p">.</span><span class="na">api_teams</span><span class="p">.</span><span class="na">Role</span><span class="p">(</span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="n">com</span><span class="p">.</span><span class="na">pulumi</span><span class="p">.</span><span class="na">pulumiservice</span><span class="p">.</span><span class="na">api_teams</span><span class="p">.</span><span class="na">RoleArgs</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">orgName</span><span class="p">(</span><span class="s">"acme"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">teamName</span><span class="p">(</span><span class="s">"platform"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">roleID</span><span class="p">(</span><span class="n">readOnly</span><span class="p">.</span><span class="na">roleID</span><span class="p">())</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">());</span><span class="w">
    </span></span></span>
    <span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">readOnly</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">pulumiservice:api:Role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">properties</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgName</span><span class="p">:</span><span class="w"> </span><span class="l">acme</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">stack-reader</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l">Read-only access to stacks across the org.</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">uxPurpose</span><span class="p">:</span><span class="w"> </span><span class="l">role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">details</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">__type</span><span class="p">:</span><span class="w"> </span><span class="l">PermissionDescriptorAllow</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">permissions</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">stack:read</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">stack:list</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">readOnlyForPlatform</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">pulumiservice:api/teams:Role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">properties</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgName</span><span class="p">:</span><span class="w"> </span><span class="l">acme</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">teamName</span><span class="p">:</span><span class="w"> </span><span class="l">platform</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">roleID</span><span class="p">:</span><span class="w"> </span><span class="l">${readOnly.roleID}</span><span class="w">
    </span></span></span>
  2. Pulumi IDP as code. services:Service makes the Pulumi IDP catalog manageable from your Pulumi programs, surfaced the same release IDP ships in Pulumi Cloud. Platform teams can publish service definitions as code rather than only through the IDP console.

  3. Audit-log export as IaC. AuditLogExportConfiguration brings audit-log export sinks under Pulumi management with a real destroy path.

How it works

Pulumi Cloud’s OpenAPI document (published at https://api.pulumi.com/api/openapi/pulumi-spec.json) is embedded in the provider binary at build time, so the provider version you pin is the API surface you get. Preview and update are deterministic, and a version released today will still behave the same way years from now. Alongside the spec, the runtime loads a small companion metadata file that captures the Pulumi-specific semantics OpenAPI can’t express: which endpoints pair into a single resource, what a resource’s ID looks like, and which response fields are secrets that arrive exactly once at create time. That metadata is what lets api/* resources behave as expected.

Most of that metadata is auto-derived by a scaffolder, but the editorial layer, including resource descriptions, examples, and the v0 aliases that make migration safe, stays handmade. Any human override is pinned across regeneration so a future spec change can’t quietly override it. The language SDKs are still generated against the runtime schema, so new fields and enum values reach typed SDKs in all five languages the moment the spec ships.

What the api namespace covers

The api namespace already spans most of Pulumi Cloud’s resource model.

For resources that have an ancestor under pulumiservice:index:*, the mapping lives in docs/v0-api-coverage.md. That file is auto-generated, so it stays in sync. Each api/* resource ships hand-maintained per-language examples in TypeScript, Python, Go, C#, Java, and YAML.

What to know before adopting the preview

The pulumiservice:api:* resource surface is in preview. Resource shape and module layout may change before GA.

The existing pulumiservice:index:* resources remain supported. They are not being deprecated as part of v1.0 and continue to be supported. Migration to api/* is opt-in via Pulumi aliases.

Try it

If you want to take the expanded provider for a spin:

  1. The Pulumi Registry page for pulumiservice has install instructions for every language.
  2. The examples/api/ directory has runnable programs for each resource, in every supported language.
  3. The pulumi-pulumiservice repo is open source if you want to read the runtime, the embedded spec, or the metadata file directly.

Feedback during preview is very beneficial. Please open an issue here if you run into any problems.

Anthropic shipped a piece earlier this month called How Claude Code Works in Large Codebases. I have not read anything more useful about coding agents this year. The core claim, in their words: “the ecosystem built around the model—the harness—determines how Claude Code performs more than the model alone.” In my phrasing: in a real codebase, the model is the smaller variable. The layer of context and tooling you wire around the agent matters more than which version of Sonnet or Opus is behind it.

The post stays high-level, which is the right move for a launch piece. What I want to do here is land it. Same seven pieces, but with the wiring you would actually put in a repo, in the order I would put it.

How Claude Code navigates without an index

Anthropic’s writeup says Claude Code works from the live codebase and does not require a codebase index to be built, maintained, or uploaded. The agent navigates the way an engineer would, with grep, find, ls, file reads, and reference-following. Anthropic calls this agentic search, and the upside is obvious: no separate index exists for you to keep fresh.

The downside is also obvious. An engineer who has never seen your repo and only has shell tools will flounder if you drop them in the root with no map. That is your agent on day one. Everything that follows is about giving it the map.

The AI layer in seven pieces

Every codebase used to have two artifacts engineers cared about: the code and the tests. A third exists now. Call it the AI layer, or the harness, or whatever you want. This layer is the set of context and tools you give your coding agent to operate in this specific repo. Anthropic breaks it into seven pieces, and each one solves a different scaling problem.

Anthropic gives each piece a role: CLAUDE.md is the foundation, hooks do self-improvement, skills are progressive disclosure, plugins handle distribution, LSP gives navigation, MCP is extension, subagents split exploration from editing. They are not equal in usage either. CLAUDE.md is read at the start of each session and stays in context for the duration. The others fire when relevant.

Lean and layered CLAUDE.md

The single biggest mistake I see is a root CLAUDE.md that has grown into a small book. Two thousand lines of conventions for parts of the repo the current task will never touch. Every session pays the tax. Anthropic’s own guidance is to keep these files focused on what applies broadly so they do not become a drag on performance, and you can feel that drag in practice: the agent gets cautious, slow, and oddly literal.

Keep the root file lean. What is this repo, broadly. The tech stack. The commands the agent will need (make test, make lint, how to run the dev server). General conventions that apply everywhere. That is most of what belongs there.

Local conventions go in subdirectory CLAUDE.md files. When the agent starts in a subdirectory, Claude Code walks upward from the working directory and loads every CLAUDE.md it finds on the way to the repo root, so root context is never lost and intermediate layers stack in the order you would expect. Claude Code can also discover files below the current working directory when it reads files in those subdirectories. That means services/api/CLAUDE.md only joins the session when the work reaches that service. Same for services/billing/, the frontend, the data layer.

If you already know the task is scoped to one service, start the agent in that subdirectory. The working directory becomes the focus, and the agent stays out of unrelated code unless you tell it otherwise. Most of the time, you know.

Two more cheap wins live in the same neighborhood. Scope the make test and make lint commands so the subdirectory version runs only the slice the agent is working in, instead of the whole repo on every change. And version-control your exclusion rules in .claude/settings.json so the agent never reads dist/, generated SDKs, or vendored code. Every file the agent skips is tokens you keep for the work that matters. If your directory layout is unconventional or has historical baggage, Anthropic also suggests adding a short codebase map to the root CLAUDE.md so the agent has somewhere to anchor.

Hooks that make the harness self-improving

Most teams use hooks as guardrails. Block edits in vendor/, refuse to delete migrations, kill the run if a secret turns up in a diff. That is fine and you should do it. But hooks have a second life that almost no one uses, and that second life is the more interesting one.

Both kinds register the same way, in .claude/settings.json, against named events Claude Code fires during a session:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"SessionStart"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"command"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"uv run --directory \"$CLAUDE_PROJECT_DIR\" python .claude/hooks/session_start_context.py"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"Stop"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"command"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"uv run --directory \"$CLAUDE_PROJECT_DIR\" python .claude/hooks/propose_claude_md.py"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

A SessionStart hook fires before the agent has done anything. Whatever the script prints to stdout is injected straight into the session as context, so you can preload the things the agent would otherwise have to spend a turn discovering: the current branch, the uncommitted diff, the last few commits. For a larger team you might fetch the Confluence or Notion page that owns the directory the engineer is working in. Every developer starts each session pre-oriented, with no manual setup.

<span class="line"><span class="cl"><span class="s2">"""SessionStart hook — prints orientation Claude reads as session context."""</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">subprocess</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"CLAUDE_PROJECT_DIR"</span><span class="p">,</span> <span class="s2">"."</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">git</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">out</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"git"</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">],</span> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">out</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">"# Orientation</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Branch</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'rev-parse'</span><span class="p">,</span> <span class="s1">'--abbrev-ref'</span><span class="p">,</span> <span class="s1">'HEAD'</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Uncommitted changes</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'status'</span><span class="p">,</span> <span class="s1">'--porcelain'</span><span class="p">)</span> <span class="ow">or</span> <span class="s1">'(clean)'</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Recent commits</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'log'</span><span class="p">,</span> <span class="s1">'-5'</span><span class="p">,</span> <span class="s1">'--oneline'</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span>

The Stop hook is the more interesting one. It fires when the agent finishes its turn. At that moment the session context is still fresh, the diff is still small, and you have a free shot at a question nobody asks: did anything I changed invalidate the rules I wrote down? Spawn a separate headless Claude session, hand it the diff and the relevant CLAUDE.md files, ask it to propose updates, and write the result to a markdown review file. You read it when you are ready. The CLAUDE.md files stop going stale on their own.

The trick is to make the hook itself cheap and dispatch the LLM call in the background, so the end of every turn does not block on a reflection:

<span class="line"><span class="cl"><span class="s2">"""Stop hook — dispatch a headless Claude reflection in the background."""</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">subprocess</span><span class="o">,</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># The reflector spawns its own headless Claude, whose Stop hook lands back</span>
</span></span><span class="line"><span class="cl"><span class="c1"># here. The lock prevents infinite recursion.</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"REFLECT_LOCK"</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"CLAUDE_PROJECT_DIR"</span><span class="p">,</span> <span class="s2">"."</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">diff</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">"git"</span><span class="p">,</span> <span class="s2">"diff"</span><span class="p">,</span> <span class="s2">"HEAD"</span><span class="p">],</span> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span><span class="o">.</span><span class="n">stdout</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="ow">not</span> <span class="n">diff</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
</span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">env</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">,</span> <span class="s2">"REFLECT_LOCK"</span><span class="p">:</span> <span class="s2">"1"</span><span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">"uv"</span><span class="p">,</span> <span class="s2">"run"</span><span class="p">,</span> <span class="s2">"python"</span><span class="p">,</span> <span class="s2">".claude/hooks/reflect_claude_md.py"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="n">env</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">DEVNULL</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">DEVNULL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span>

reflect_claude_md.py is the part that calls a headless claude against the diff and writes .claude/claude-md-review.md. You can grow it from twenty lines to two hundred without ever blocking the agent.

The pattern that ties the two together: hooks let the harness improve itself in the background while you do the actual work.

Path-scoped skills

Skills are where the agent learns how to do a thing. CLAUDE.md is conventions (“every route is registered here”). Skills are workflows (“here is how you add a new route in this repo, end to end”). The two overlap, but the framing keeps me honest: rules in CLAUDE.md, recipes in skills.

The piece of the skills system most teams miss is the path scope. A skill can declare which directories it activates in. A create-api-endpoint skill that only loads when the agent is editing under services/api/ is invisible the rest of the time. With dozens of skills in a real repo, scoping is the difference between a useful library and a wall of irrelevant prompts.

The mental model: progressive disclosure for expertise. Most knowledge in a large codebase is local. Load it locally.

Symbol-level search through LSP and MCP

grep is fine until it isn’t. Past six-digit line counts, plain string search gets slow, returns too much, and burns tokens reading files the agent did not need to open. You also lose what every IDE has done for decades: jump-to-definition, find-references, hover-for-types.

You can give the agent the same navigation. Run a language server locally, wrap it in a small MCP server, expose two or three tools: where_is, find_references, goto_definition. The agent now searches by symbol, not by string. A request like “find every place monthly_total_cents is referenced” returns one definition and the actual references, instead of fifty grep hits that mention the substring in unrelated comments.

This is also where bigger orgs invest. Custom MCP servers that expose internal search systems, the code-ownership graph, the design-doc index. The patterns are the same; the targets are domain-specific. The point is that the agent does not have to brute-force its way through your repo when you already have better tools for finding things.

Image: Anthropic, How Claude Code Works in Large Codebases.

Subagents for exploration

The rule I follow: split exploration from editing. A subagent runs in its own context window. You ask which files implement the billing webhook flow, or what the user model looks like across services. It does the digging, and only the summary comes back to your primary session.

The win is context budget, not parallelism. Exploration is wasteful by nature. The agent reads forty files to find the three that matter, and most of those forty get thrown away. If that happens in your primary session, your editing turns start with a context window already half full of noise. If it happens in a subagent, the noise stays there. You get the answer.

Use the built-in Explore subagent liberally. Custom subagents earn their place when you have a workflow specific enough that a generic explorer is the wrong tool. The file shape is small: a single markdown file under .claude/agents/, a short frontmatter block, and a prompt body. name, description, tools, and model are enough to start:

<span class="line"><span class="cl">---
</span></span><span class="line"><span class="cl">name: explorer
</span></span><span class="line"><span class="cl">description: Read-only repo explorer. Map a service or package without burning the main session's context, then return findings.
</span></span><span class="line"><span class="cl">tools: Read, Grep, Glob
</span></span><span class="line"><span class="cl">model: sonnet
</span></span><span class="line"><span class="cl">---
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">You are a read-only explorer. The parent agent will hand you one service or
</span></span><span class="line"><span class="cl">package to map. Read its <span class="sb">`CLAUDE.md`</span> if there is one, then trace entry points,
</span></span><span class="line"><span class="cl">the public surface, and dependencies. Return findings as your final response.
</span></span><span class="line"><span class="cl">No edits.
</span></span>

Restricting tools to read-only is the load-bearing line. The model only sees the tools you expose, so an explorer subagent without Write or Edit has nothing to call when it gets tempted, even if the prompt body forgot to say so. Treat that as a strong default. If you need a hard guarantee, layer a PreToolUse hook on top.

Don’t let it rot

The harness is not a one-time setup. Models improve, and rules written for last year’s model often constrain this year’s. A note like “always split refactors into single-file changes” might have saved you in 2024 and might block a beneficial cross-file edit in 2026. Anthropic suggests reviewing your CLAUDE.md files every three to six months, or whenever performance feels like it has plateaued after a major model release. The stop-hook reflection gives you a head start. The rest is on you.

Assign an owner

The last piece is not technical. The teams that get value out of Claude Code at scale have someone who owns the harness. A small platform-engineering team, or one DRI, or a hybrid PM/engineer doing it half-time. Their job is the same shape as owning a CI pipeline: write the conventions, build the skills, run the LSP wrapper, version the hooks, evangelize what works, retire what does not.

Plugins are the distribution vehicle. A good harness that lives in one engineer’s dotfiles stays tribal. The same harness packaged as a plugin (or a private marketplace) is how a team of five hundred ends up running the same skills, the same MCP servers, and the same hooks without anyone having to remember to copy a config.

The pattern that fails: ship Claude Code to the org on a Friday, hope adoption goes viral, watch every team grow its own slightly different version of CLAUDE.md for six months. The pattern that works: a quiet build-out period, a small set of approved skills, a working plugin or two, a documented governance story, then broad access.

Treat the harness like infrastructure.

Where to start

The order that has worked for me, in any repo:

  1. Trim the root CLAUDE.md until it fits on one screen. Move the rest into subdirectories.
  2. Add a Stop hook that proposes updates to those CLAUDE.md files in headless mode.
  3. Convert your three most common repeated tasks into path-scoped skills.
  4. Run a language server behind an MCP server. Stop searching by string.
  5. Get comfortable dispatching exploration to subagents.

Most teams will plateau on step one for a week and find the agent is already noticeably sharper. The rest compounds. I have written more on the agent-tooling shift this is part of in How Building AI Agents Has Changed in 2026, and on the workflow side in The Claude Skills I Actually Use for DevOps and Superpowers, GSD, and GSTACK.

The model will keep getting better. The harness is the work.

The phrase “AI infrastructure” now means two different things. One is the GPUs, schedulers, and MLOps platforms that exist to run AI workloads. The other is AI that runs infrastructure: agents and assistants that generate, deploy, and govern cloud resources on your behalf. They’re different markets with different vendors, and most teams need to think about both.

The pressure to think about both is real. McKinsey research puts the productivity lift from generative AI in software development at 20–45%, which is great for application teams and a problem for platform teams trying to keep up with the resulting feature flow. Infrastructure investment is climbing on both fronts: more spend on the compute that trains and serves models, more spend on AI tools that manage everything else.

This guide covers both categories: the compute and MLOps stack in Part 1, and AI-powered infrastructure management in Part 2, where the more interesting product shift is happening.

AI infrastructure tools overview

Tools for building AI infrastructure
  1. CoreWeave: GPU cloud built for AI workloads
  2. Lambda Labs: straightforward GPU cloud for research and startups
  3. Modal: serverless GPU compute
  4. Weights & Biases: ML experiment tracking and model management
  5. MLflow: open-source ML lifecycle platform
  6. Hyperscaler AI platforms: AWS SageMaker, Google Vertex AI, Azure ML
AI-powered infrastructure management tools
  1. Pulumi Neo: agentic AI with policy automation
  2. Firefly AIaC: asset codification and IaC generation
  3. env0 Cloud Compass: multi-IaC insights and analysis
  4. Spacelift AI: run explanation and troubleshooting
  5. Crossplane with Upbound: Kubernetes-native infrastructure
  6. General-purpose code assistants: Copilot, Claude Code, Cursor, Gemini
  7. AWS Application Composer: visual serverless builder

Quick picks

If you only have two minutes:

  • Enterprise compliance: Pulumi Neo. Executes changes (not only suggestions), ships with policy packs for CIS, HITRUST, NIST, and PCI DSS, and works with Terraform, CloudFormation, and resources created by hand.
  • Serious GPU compute: CoreWeave. Purpose-built for AI workloads, deep NVIDIA partnership, and prices that generally undercut the hyperscalers.
  • Best developer experience for ML: Modal. Decorate a Python function, get a GPU, pay by the second.
  • Open-source MLOps: MLflow. No vendor lock-in, runs anywhere, plays well with everything.

What is AI infrastructure?

The term covers two distinct categories that share almost no vendors.

Infrastructure for AI is the compute, storage, and orchestration that AI workloads run on. Training a large model is not a normal cloud workload: it wants thousands of GPUs talking to each other over fat, low-latency networks for weeks at a time. Inference is different again: lower latency, smarter batching, different hardware. General-purpose cloud was not designed for either case, which is why specialized GPU clouds and MLOps platforms exist.

AI-powered infrastructure management is the inverse: AI tools that manage cloud infrastructure. They generate IaC, run deployments, detect drift, and remediate policy violations. The pitch is that modern infrastructure (multi-cloud, containers, microservices, regulated workloads) has gotten too complex for humans to manage by hand and too varied for scripted automation to keep up with.

Most organizations end up needing both: somewhere to run their ML workloads, and something to keep the rest of the cloud sane.

Part 1: Tools for building AI infrastructure

These are the platforms you run AI and ML workloads on: GPU clouds for raw compute, MLOps platforms for the lifecycle around them.

CoreWeave

CoreWeave is the GPU cloud that broke out of the AI hype cycle into a real public company. They went public in 2025, signed a multi-billion-dollar capacity deal with OpenAI, and acquired Weights & Biases. Their thesis from day one was that AI workloads deserve infrastructure designed for AI workloads, not a GPU SKU bolted onto a general-purpose cloud.

  • License: Proprietary
  • Best for: Large-scale training and high-throughput inference; teams that need dedicated GPU capacity with first access to new NVIDIA hardware
  • Strengths: GPU infrastructure designed for AI; Kubernetes-native; direct NVIDIA partnership; handles distributed training at scale
  • Watch out for: Smaller global footprint than AWS/GCP/Azure; not a general-purpose cloud, so if you need RDS, S3, and a managed Kafka in the same provider, this isn’t it
Lambda Labs

Lambda has been the approachable GPU cloud for a long time. Environments come pre-configured with PyTorch and TensorFlow, and you can be running on an H100 in about as long as it takes to copy your SSH key.

  • License: Proprietary
  • Best for: Research teams, startups, and individual practitioners who want GPUs without a configuration tax
  • Strengths: Straightforward to start on; pre-configured deep learning environments; competitive on-demand pricing; strong learning resources
  • Watch out for: Smaller scale than CoreWeave or the hyperscalers; availability gets tight during demand spikes
Modal

Modal’s pitch is that you write a Python function, decorate it, and Modal handles the GPU. No capacity planning, no idle instances burning money overnight, no Dockerfile to maintain.

  • License: Proprietary
  • Best for: Variable ML workloads where reserved capacity would sit idle; data scientists who’d rather not learn Kubernetes
  • Strengths: Strong developer experience; serverless GPUs with automatic scaling; pay-per-second pricing; cold starts are fast for what they are
  • Watch out for: You give up infrastructure control. Not ideal for long training jobs that need reserved hardware or strict configuration requirements.
Weights & Biases

Weights & Biases is the de facto standard for ML experiment tracking and model management, integrated with essentially every framework and cloud you’d plausibly use. CoreWeave acquired the company in 2025, which has accelerated the joint roadmap but raised some neutrality questions for teams that prefer their tooling cloud-agnostic.

  • License: Proprietary with a free tier
  • Best for: ML teams that need shared experiment tracking, model versioning, and reporting
  • Strengths: Industry-leading experiment tracking and visualization; comprehensive model registry; strong team collaboration; broad integration surface
  • Watch out for: Costs scale quickly past the free tier; some teams self-host alternatives for data residency reasons
MLflow

MLflow is the leading open-source MLOps platform: experiment tracking, packaging, registry, and serving, with no lock-in. Originally built at Databricks, it’s now a broad open-source ecosystem with managed offerings from multiple vendors (including Databricks and the major clouds).

  • License: Apache 2.0
  • Best for: Teams that want MLOps without a vendor; or want the option to start managed and self-host later
  • Strengths: Open source; covers the full ML lifecycle; runs locally, on-prem, or managed; broad framework support
  • Watch out for: Self-hosting carries the usual operational tax; commercial alternatives have stronger collaboration UX out of the box
Hyperscaler AI platforms

The major clouds all sell end-to-end ML platforms. Each leads on the dimensions that line up with its parent cloud (Vertex for Google’s models and TPUs, SageMaker for AWS-native data pipelines, Azure ML for Microsoft-stack integration), but the wider integration with the rest of the cloud is the deciding factor.

  • AWS SageMaker: end-to-end ML on AWS, deeply integrated with S3 and Glue, with first-class connections to Lambda for serverless inference and to the rest of the AWS data stack. The default pick if your data already lives in AWS.
  • Google Vertex AI: Google’s ML stack, including TPUs for workloads that need them, plus access to Google’s foundation models. Strongest when paired with BigQuery.
  • Azure Machine Learning: the natural choice when the rest of your stack is Microsoft; first-party MLOps integrations across GitHub Actions, Azure DevOps, and Microsoft Fabric for downstream reporting. The right choice if you’re already an Azure shop with Microsoft compliance requirements.

The shared tradeoff: hyperscaler GPU compute typically runs 2–3x the per-hour price of specialized providers, and the platforms work best when you commit to them top to bottom. For organizations already inside one cloud, the unified billing and single support contract usually justifies the premium. For a new ML team starting from scratch, it rarely does.

Part 2: AI-powered infrastructure management tools

This is where the more interesting product shift is happening. Instead of running AI on infrastructure, these tools point AI at your infrastructure and let it do work.

From code generation to agentic execution

Before the tool list, one distinction matters more than any feature comparison: whether the tool generates code or executes changes.

Code generation tools like GitHub Copilot suggest infrastructure code based on context. You review it, maybe edit it, run it yourself. The AI helps, but you’re still the one doing the work.

Agentic platforms generate the code and run it, with the guardrails you define. They understand your environment, handle multi-step workflows, and enforce policies on the way through. You describe the outcome; the platform makes it happen.

Capability

Code generation

Agentic execution

Generates infrastructure code

Yes

Yes

Understands infrastructure context

Limited

Deep

Executes changes

No

Yes

Handles multi-step workflows

No

Yes

Enforces policies automatically

No

Yes

Remediates drift and violations

No

Yes

Where you want to land on this spectrum is mostly a governance question, not a productivity one.

Pulumi Neo

Pulumi Neo is Pulumi’s agentic AI for infrastructure. The distinguishing claim is execution: Neo doesn’t only suggest a Terraform snippet, it figures out the right resources, generates the code, and runs the deployment inside whatever guardrails you’ve set.

  • License: Proprietary (Pulumi Cloud)
  • Best for: Platform engineering teams that want AI automation with real policy controls, especially in regulated industries

A few things that set it apart in practice:

Policy automation and compliance. Neo is integrated with Pulumi Insights and Governance, which ships pre-built policy packs for CIS benchmarks, HITRUST CSF, NIST SP 800-53, and PCI DSS. Detection and remediation run in the same loop: Neo finds a violation, generates a fix, and (subject to approvals) applies it. You can batch-remediate across stacks and accounts with prompts like “find and fix all unencrypted S3 buckets across our AWS accounts.”

Works with infrastructure you didn’t create with Pulumi. Neo’s governance applies to Pulumi-managed resources, Terraform state, CloudFormation stacks, and resources someone clicked together in the AWS console. That matters because the realistic adoption path is to point Neo at what you have, audit it, and gradually bring it under management, not to migrate everything first.

Progressive autonomy. Trust levels are configurable. Start with human approval for everything; loosen it for well-defined, low-risk operations as confidence builds; keep production and sensitive resources behind strict approvals. This is the part that tends to determine whether enterprises actually deploy agentic AI in anger, versus letting it sit as a sandbox toy.

IDE and CI/CD integration. The Pulumi MCP Server brings Neo into Cursor, Claude Code, Claude Desktop, Windsurf, and any other MCP-compatible client. The Pulumi Cloud UI is the home base for approvals, history, and remediation status. Neo also slots into CI/CD pipelines for pre-merge policy remediation.

Case studies:

  • Werner Enterprises reduced infrastructure provisioning time from 3 days to 4 hours using Pulumi.
  • Spear AI cut their Authority to Operate (ATO) timeline from an expected 1.5 years to roughly 3 months by using policy-as-code to evidence compliance controls for auditors.

Tradeoff to be honest about: Neo gets more valuable the deeper you are in the Pulumi ecosystem. If you’re running IaC, ESC, and policy packs already, Neo has a lot of context to draw on. If you’re kicking the tires, it’s still useful, but the differentiating capability (context-aware, policy-respecting agentic execution) is harder to feel.

Firefly AIaC

Firefly is an asset management platform with AI features bolted on top of a strong core. The core capability is asset codification: it discovers cloud resources you already have and generates the IaC for them.

  • License: Proprietary
  • Best for: Teams that need to codify existing cloud footprints or generate IaC from natural language

Strengths: solid asset discovery, multi-cloud coverage, natural-language IaC generation, drift detection with remediation hooks. Caveat: AI features here are supplementary to the asset management product, not the main event, and Firefly is less focused on agentic execution than on inventory and policy.

env0 Cloud Compass

env0’s Cloud Compass adds AI to env0’s IaC automation platform, focusing on analysis rather than autonomous execution.

  • License: Proprietary
  • Best for: Multi-IaC shops that want AI-generated PR summaries, drift explanations, and cost insights

Strengths: multi-tool support across Terraform, OpenTofu, Pulumi, and Terragrunt; AI-generated PR summaries; drift cause analysis; cost estimation. Caveat: this is analysis and explanation, not action: Cloud Compass complements an agentic tool rather than replacing one.

Spacelift AI

Spacelift’s AI work is focused on the post-run experience: explaining what happened in a deployment and helping troubleshoot failures.

  • License: Proprietary
  • Best for: GitOps shops that want AI assistance reading complex runs and diagnosing failed deployments

Strengths: AI-powered run explanation; troubleshooting guidance for failures; broad IaC tool support; mature CI/CD integration. Caveat: like Spacelift as a whole, this is observation and explanation, not generation or execution. Pair with something that writes the code.

Crossplane with Upbound

Crossplane brings Kubernetes-style declarative management to cloud resources. Upbound is the company that commercializes it, and is layering AI-native control-plane capabilities into the 2.0 generation.

  • License: Apache 2.0 (Crossplane); proprietary (Upbound)
  • Best for: Teams already deep in Kubernetes that want to manage cloud resources the same way

Strengths: Kubernetes-native model; native GitOps fit; very active OSS community; AI control-plane work emerging from Upbound. Caveat: the learning curve is real if you’re not already living in Kubernetes; the commercial AI features are still maturing.

General-purpose code assistants

General-purpose AI coding assistants are the tools your developers already have open: GitHub Copilot, Claude Code, Cursor, and Google’s Gemini and Antigravity. They write Terraform HCL, Pulumi programs, and CloudFormation templates competently, about as well as they write anything else.

  • License: Proprietary (subscription), varies by tool
  • Best for: Developers who want broad code assistance, including infrastructure code, inside their existing editor

Strengths: excellent line-by-line code completion; broad language support; first-class editor integration; trained on huge corpora. Caveat: no infrastructure context. They don’t know what’s in your account, what your policies are, or which subnet you should pick. Treat their IaC suggestions as first-pass scaffolding, not production output.

AWS Application Composer

Application Composer is AWS’s visual builder for serverless applications. Drag services onto a canvas, get a CloudFormation template out, with AI suggestions for service configuration along the way.

  • License: Proprietary (AWS, included)
  • Best for: Teams building AWS serverless apps who prefer a visual workflow

Strengths: visual development for serverless; direct AWS integration; AI suggestions for service configuration; emits CloudFormation. Caveat: AWS-only, CloudFormation-only, and best suited to serverless rather than general infrastructure.

Comparison tables

Infrastructure for AI

Tool

Category

Key strength

Limitation

Pricing

Best for

CoreWeave

GPU cloud

Purpose-built GPU infra, NVIDIA partnership

Not a general-purpose cloud

Per-GPU-hour

Large-scale AI training

Lambda Labs

GPU cloud

Approachable, pre-configured environments

Smaller scale

Per-GPU-hour

Research teams, startups

Modal

Serverless GPU

Developer experience, pay-per-second

Less infrastructure control

Pay-per-use

Variable ML workloads

Weights & Biases

MLOps

Industry-standard experiment tracking

Costs scale quickly

Free tier + paid

ML team collaboration

MLflow

MLOps

Open source, no lock-in

Self-hosting overhead

Free (self-hosted)

Flexible ML lifecycle

AWS SageMaker

Hyperscaler

AWS ecosystem integration

Higher cost, lock-in

Per-use

AWS-native orgs

Google Vertex AI

Hyperscaler

Google models, TPU access

Lock-in

Per-use

Google Cloud users

Azure ML

Hyperscaler

Microsoft integration, enterprise features

Lock-in

Per-use

Microsoft ecosystem

AI-powered infrastructure management

Tool

Approach

Key strength

Limitation

Pricing

Best for

Pulumi Neo

Agentic AI

Execution + policy automation

Best within Pulumi ecosystem

Pulumi Cloud tiers

Enterprise platform teams

Firefly AIaC

Asset management

Asset codification, IaC generation

AI is supplementary

Proprietary

Codifying existing infra

env0 Cloud Compass

Multi-IaC platform

Multi-tool support, PR analysis

Analysis, not execution

Proprietary

Multi-IaC environments

Spacelift AI

CI/CD platform

Run explanation, troubleshooting

Observation, not action

Proprietary

GitOps workflows

Crossplane / Upbound

Kubernetes-native

K8s patterns for infra

Requires K8s expertise

Open source + commercial

Kubernetes-native teams

Code assistants

Code assistant

Broad language support, IDE

No infrastructure context

Subscription

General code assistance

AWS Composer

Visual builder

Visual serverless development

AWS- and CFN-only

Included with AWS

AWS serverless apps

How to choose

There’s no universal best tool. Five questions sort the field quickly:

  • Cloud strategy. Multi-cloud means tools like Pulumi Neo, Firefly, env0, or Crossplane. Single-cloud commitment means hyperscaler-native tools may integrate more deeply (AWS Composer, SageMaker, and so on).
  • Team expertise. Programmers gravitate to tools that use real languages (Pulumi Neo, Pulumi IaC). Kubernetes teams find Crossplane natural; everyone else finds it steep. Teams that prefer visual workflows should look at AWS Composer or env0’s UI.
  • Compliance. Regulated industries (healthcare, finance, government) get the most value from tools with pre-built compliance packs and audit trails. Pulumi Neo’s CIS/HITRUST/NIST/PCI packs are the most direct fit. If preventative policy enforcement matters, prefer tools that block non-compliant deployments rather than flag them after the fact.
  • Existing footprint. Greenfield projects can use anything. Brownfield is where it gets interesting: Pulumi Neo works against Terraform, CloudFormation, and manually-created resources, which lets you adopt incrementally instead of migrating first. Mixed-IaC shops should also look at env0.
  • Budget. Open source first: MLflow for MLOps, Crossplane for Kubernetes-native infra. Open source is not free, though: self-hosting carries a real total cost of ownership in hosting, maintenance, and the expertise to operate it. Commercial tools (Pulumi Cloud, env0, Spacelift) fold that operational cost into the price, on top of support, SLAs, and the enterprise-tier features open source can lack.

Before adopting anything, get visibility into what you have today, pilot on staging where mistakes are cheap, and define success metrics up front: time to provision, policy violation rates, mean time to remediate. The best AI infrastructure tool is the one your team will actually use, which means meeting developers where they already work.

Key trends and outlook

From copilots to agents. “AI suggests code” and “AI runs the deploy” are different products with different governance implications. The teams getting value from agentic tools have figured out which tasks to delegate fully, which to keep human-in-the-loop, and which to leave alone.

Progressive autonomy. Enterprise adoption follows a predictable shape: visibility → recommendations → human-approved execution → autonomous execution for well-understood scenarios. Tools that support that graduation will see stronger enterprise traction than tools that force an all-or-nothing choice.

Policy as the control plane. As AI takes on more infrastructure tasks, policy frameworks become the primary control plane. Done well, policy becomes an enabler (guardrails that let you safely expand automation) rather than a brake on velocity.

MCP standardization. The Model Context Protocol is becoming the integration standard between AI assistants and infrastructure tools. The practical upshot is that the IDE is increasingly a viable surface for managing infrastructure, with AI mediating between natural language and the underlying APIs.

Consolidation. CoreWeave acquiring Weights & Biases and NVIDIA acquiring Run:ai both point toward integrated platforms across the AI infrastructure stack. For tool selection today, that’s an argument for picking vendors with clear strategic direction over point solutions likely to be acquired or out-competed.

Frequently asked questions

What is the best AI agent for cloud infrastructure management?

For enterprise governance plus true agentic capability, Pulumi Neo is currently the most complete offering: it executes changes (not just suggests them), integrates with pre-built compliance frameworks, and works with infrastructure regardless of how it was provisioned. For Kubernetes-native shops, Crossplane with Upbound’s emerging AI features is worth tracking.

How can I use generative AI to manage cloud infrastructure?

Start by identifying the repetitive, time-consuming infrastructure work in your team. The highest-value early use cases tend to be:

  • Code generation: write IaC from natural-language descriptions, then review.
  • Documentation: explain unfamiliar configurations and reduce onboarding time.
  • Troubleshooting: analyze logs, errors, and configs to suggest likely causes.
  • Security and compliance: scan for violations and generate fixes.
  • Full automation: for shops that want it, agentic platforms like Pulumi Neo execute provisioning workflows end-to-end with governance controls intact.
What is agentic AI for infrastructure?

Agentic AI for infrastructure means AI systems that autonomously execute infrastructure tasks, not just generate code suggestions. The difference from a code assistant is action: an agent understands your environment, respects your policies, and performs multi-step work (provisioning, configuration, security controls) within the boundaries you’ve defined.

How do AI agents improve DevOps workflows?

By automating the repetitive parts (provisioning, drift remediation, policy enforcement), reducing context-switching, and catching issues earlier. Teams that have rolled out agentic tools well report faster provisioning, fewer policy violations slipping into production, and quicker compliance remediation. The compounding effect (engineers freed for higher-value work as the agent absorbs the routine) is the actual point.

What’s the difference between AI code generation and agentic execution?

Code generation suggests IaC for a human to review and run. Agentic execution generates the code and runs it, with policy and governance enforced along the way. It’s the difference between a knowledgeable colleague who suggests an approach and a knowledgeable colleague who also ships the change with appropriate oversight.

Can AI generate Terraform or Pulumi programs?

Yes. Most general-purpose AI assistants (Copilot, Claude, Gemini, ChatGPT, Cursor) can produce Terraform HCL, Pulumi programs in TypeScript / Python / Go, and CloudFormation. Quality varies. Generic assistants lack environment context and will happily emit syntactically correct but operationally wrong code. Infrastructure-specific tools like Pulumi Neo generate code that’s aware of your existing resources, policies, and provider constraints.

Can AI help with infrastructure compliance and policy automation?

Yes, and this is one of the highest-leverage uses of AI in infrastructure. Tools like Pulumi Neo detect policy violations across your footprint (including resources created outside IaC), generate compliant remediation, and apply it with the approvals you require. Pre-built frameworks for CIS, HITRUST, NIST, and PCI DSS shorten what would otherwise be a long manual compliance project.

Are AI infrastructure tools secure for enterprise use?

Enterprise-grade ones are. Look for RBAC, full audit logging of AI actions, preventative policy enforcement (not just detection), and human-in-the-loop approvals for sensitive operations. SOC 2, data residency options, and configurable autonomy levels are table stakes. The risk to avoid is wiring a consumer AI assistant directly into a production cloud account without those controls.

How do I choose between different AI infrastructure tools?

Match the tool to your context: existing clouds and IaC, team skills, compliance requirements, budget. Enterprise platform teams with governance needs should evaluate Pulumi Neo first. MLOps-focused teams should look at Weights & Biases or MLflow. For general code assistance inside the editor, a general-purpose assistant like Copilot, Cursor, or Gemini is the default. Most organizations end up with more than one: a code assistant for daily development and an agentic platform for production infrastructure.

What are the best tools for machine learning infrastructure?

For GPU compute, CoreWeave leads at scale, Modal wins for variable workloads and developer experience, and the hyperscalers are the default pick if you’re already inside one of them. For experiment tracking and model management, Weights & Biases is the leading commercial platform; MLflow is the leading open-source one. Most teams pick on the deploy model and pricing fit rather than capability gap. For the cloud infrastructure underneath the ML workloads, the same infrastructure management story applies: Pulumi Neo can provision and govern ML infrastructure the same way it handles everything else.

Conclusion

Two categories, two problems. GPU clouds and MLOps platforms (CoreWeave, Lambda, Modal, hyperscaler trio, W&B, MLflow) solve the compute and lifecycle problem for running AI workloads. AI-powered infrastructure tools (Neo, Firefly, env0, Spacelift, Crossplane, code assistants, Composer) solve the management problem for everything else.

For GPU workloads, the choice mostly comes down to scale and where you already are. For infrastructure management, the real question is how much you actually want AI to do. Code assistants help you write IaC faster, but you’re still running it. Agentic platforms like Pulumi Neo execute changes and enforce policy on the way through, with the guardrails you control.

The pattern from teams getting real value: treat AI as a force multiplier on routine work (provisioning, drift, compliance) and keep human judgment in the loop for the architecture and the edge cases.

If you want to see agentic infrastructure management running against real resources, start with Pulumi Neo.

pulumi do gains resource support; neo UX improved

This release8 featuresNew capabilities1 enhancementImprovements to existing features4 fixesBug fixesAI-tallied from the release notes
Pulumi · v3.243.0

3.243.0 (2026-05-22)

Features
  • [cli] Make the pulumi project new -y command write a minimal project file with no template #22847

  • [cli] Allow coding agents to create claimable temporary accounts when not authenticated

  • [cli] Suggest pulumi neo in pulumi preview and pulumi up diagnostics output #23326

  • [sdk] Add List to the Go plugin.Provider interface, wired to the streaming ResourceProvider.List RPC #23287

  • [sdkgen] Reserve the package names 'pulumi' and 'input' for internal use #23321

  • [cli/cloud] Prefer text/markdown over JSON in pulumi api when an endpoint produces both #22963

  • [cli/do] Add resource support to pulumi do #23215

  • [cli/neo] Pressing Esc in pulumi neo now clears the input box when it has text; with an empty box, Esc still cancels the agent's current turn #23299

  • [cli/new] Alias pulumi new to pulumi project new #23265

Bug Fixes
  • [cli] Require --yes to confirm pulumi deployment cancel, pulumi stack schedule remove, pulumi org webhook remove, and pulumi stack webhook remove when running non-interactively #23264

  • [pcl] Don't silently ignore ... in function arguments #23309

  • [sdkgen/nodejs] Cache package references per-deployment in generated SDKs to fix concurrent inline programs #23068

  • [backend/service] Error out when setting up journaler fails #22671

Miscellaneous
  • [cli/package] Update the pulumi package add --agent documentation hint to use <type-token> as the placeholder for the /docs/... URL #23294

  • [cli/plugin] Rename plugin ls to list and rm to remove #23291

pulumi do command for ad-hoc cloud resource operations

This release1 featureNew capabilitiesAI-tallied from the release notes

Infrastructure as code is the right model for production systems. State tracking, drift detection, and repeatable deployments all matter when you’re managing real workloads.

But sometimes, you also need a quick, one-off interaction with the cloud: create a bucket or a database, look up a VPC, delete a stray resource.

Today we’re introducing pulumi do, a new command for direct resource operations. With pulumi do, you can create, read, update, delete, and query any cloud resource from the terminal with a single command, across thousands of Pulumi-supported providers — no project, code, or state required.

The problem: Sometimes IaC is more than you need

When you’re managing production workloads, IaC is the proven solution. Code lets you declare complex systems, state tracking catches drift before it becomes a problem, dependency graphs sequence changes safely, and policy keeps everything in bounds. That full lifecycle, especially with the backing of a platform like Pulumi Cloud, is exactly what you want to build systems that scale.

But when you (or your coding agent) need an ad-hoc Postgres database, the simplest path with IaC still takes several steps: make a directory, create a project, configure your credentials, write the code, preview, deploy. It works, but it’s not always necessary for what should be a simple operation. pulumi do collapses all of those steps into one, using the same Pulumi providers, resource model, and ecosystem that powers the core Pulumi platform.

Resource creation is also only part of the problem. As Joe laid out in The Agentic Infrastructure Era, the real challenge for AI agents isn’t with code or CLI commands, it’s with everything else: getting a cloud account, resolving credentials, wiring configuration across multiple services. Agent accounts, also released this week, simplify this by letting an agent provision its own ephemeral Pulumi Cloud account, and Pulumi ESC takes care of consolidating credentials across providers. Together, with pulumi do, agents can now go from zero to deployed infrastructure without requiring a human in the loop — and when that one-off resource needs to grow into a more permanent system, there’s a clear graduation path back to full Pulumi IaC.

What it looks like

As an example, say you wanted to provision an S3 bucket. With the AWS CLI, you’d need to assemble an aws s3api create-bucket invocation with the right set of command-line flags, region constraints, a globally unique name, and so on. With pulumi do, it’s just this:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create
</span></span>

That might not look all that different on the surface — but because you’re using the Pulumi engine and resource model, you can provide a minimal set of input properties, take advantage of provider-defined defaults, and use Pulumi’s auto-naming feature to give the bucket a unique name automatically:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">This will create aws:s3/bucket:Bucket with the following inputs:
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"bucket"</span>: <span class="s2">"bucket-279ea56"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"tagsAll"</span>: <span class="o">{}</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Please confirm that this is what you<span class="err">'</span>d like to <span class="k">do</span> by typing <span class="sb">`</span>yes<span class="sb">`</span>:
</span></span>

Answer yes (or just pass --yes), and you’re done. To delete the bucket:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket delete bucket-279ea56 --yes
</span></span>

Need to look up an existing resource? Use a provider function:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:ec2:getVpc --default
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"arn"</span>: <span class="s2">"arn:aws:ec2:us-west-2:663782525873:vpc/vpc-d7b311af"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"cidrBlock"</span>: <span class="s2">"172.31.0.0/16"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableDnsHostnames"</span>: true,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableDnsSupport"</span>: true,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableNetworkAddressUsageMetrics"</span>: false,
</span></span><span class="line"><span class="cl"> <span class="s2">"id"</span>: <span class="s2">"vpc-d7b311af"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

Same CLI, same output contract, same provider ecosystem.

The command shape

The do command accepts a Pulumi resource type, or type token, to determine the action to take. Type tokens have the form <package:module:resource>. For example, aws:s3:Bucket refers to the Amazon S3 Bucket resource that belongs to the s3 module of the aws package.

You can also provide a portion of the token to help you find what you’re looking for without ever having to leave the terminal:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Functions and resources <span class="k">for</span> the s3 module.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Run <span class="s1">'pulumi do <module/resource/function> --help'</span> <span class="k">for</span> more details on usage.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Functions:
</span></span><span class="line"><span class="cl"> aws:s3:getAccessPoint
</span></span><span class="line"><span class="cl"> aws:s3:getAccountPublicAccessBlock
</span></span><span class="line"><span class="cl"> aws:s3:getBucket
</span></span><span class="line"><span class="cl"> aws:s3:getBucketObject
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Resources:
</span></span><span class="line"><span class="cl"> aws:s3:AccessPoint
</span></span><span class="line"><span class="cl"> aws:s3:AccountPublicAccessBlock
</span></span><span class="line"><span class="cl"> aws:s3:AnalyticsConfiguration
</span></span><span class="line"><span class="cl"> aws:s3:Bucket
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-d20976f
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"arn"</span>: <span class="s2">"arn:aws:s3:::bucket-d20976f"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucket"</span>: <span class="s2">"bucket-d20976f"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucketDomainName"</span>: <span class="s2">"bucket-d20976f.s3.amazonaws.com"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucketNamespace"</span>: <span class="s2">"global"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

The package, module, and resource/function segments all come directly from the Pulumi provider schema, so --help works at every level of the tree. Pass a package name, optional module, and optional function or resource type, and do returns the appropriate level of detail.

You can also provide the input properties of a resource in a YAML or JSON file with the --input option. To create a container service in Google Cloud Run for example:

<span class="line"><span class="cl"><span class="c"># service.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">location</span><span class="p">:</span><span class="w"> </span><span class="l">us-central1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">deletionProtection</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">us-docker.pkg.dev/cloudrun/container/hello</span><span class="w">
</span></span></span>
<span class="line"><span class="cl">$ pulumi <span class="k">do</span> gcp:cloudrunv2:Service create <span class="se">\
</span></span></span><span class="line"><span class="cl"> --input yaml <span class="se">\
</span></span></span><span class="line"><span class="cl"> --input-file service.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">This will create gcp:cloudrunv2/service:Service with the following inputs:
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"deletionProtection"</span>: false,
</span></span><span class="line"><span class="cl"> <span class="s2">"location"</span>: <span class="s2">"us-central1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"name"</span>: <span class="s2">"service-b8af752"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"template"</span>: <span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"containers"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"image"</span>: <span class="s2">"us-docker.pkg.dev/cloudrun/container/hello"</span>
</span></span><span class="line"><span class="cl"> <span class="o">}</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>
</span></span><span class="line"><span class="cl"> <span class="o">}</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

The result:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"createTime"</span><span class="p">:</span> <span class="s2">"2026-05-22T23:00:22.415839Z"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="err">...</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"urls"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"https://service-b8af752-921927215178.us-central1.run.app"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"https://service-b8af752-ctnulmzwoa-uc.a.run.app"</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>
Resource operations

Most resources support the full set of CRUD operations — create, read, update, delete, and list — directly from the CLI. Each operation maps to a provider CRUD method using the same provider logic a full Pulumi program would use, and resources are addressable by their cloud provider IDs:

<span class="line"><span class="cl"><span class="c1"># Create a resource</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create --yes <span class="p">|</span> jq -r <span class="s2">".name"</span>
</span></span><span class="line"><span class="cl">bucket-4f5cb22
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Fetch it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-4f5cb22 <span class="p">|</span> jq -r <span class="s2">".hostedZoneId"</span>
</span></span><span class="line"><span class="cl">Z3BJ6K6RIION7M
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Update/patch it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket patch bucket-4f5cb22 --input yaml --input-file tags.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-4f5cb22 <span class="p">|</span> jq <span class="s2">".tags"</span>
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"key"</span>: <span class="s2">"value"</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Delete it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket delete bucket-4f5cb22
</span></span>
Provider configuration

Today, pulumi do resolves provider configuration — for example, applying your AWS credentials — using environment variables or credential files as supported by each individual Pulumi provider. See the Pulumi Registry for provider-specific configuration details.

Designed for humans and agents

We’ve designed pulumi do to serve humans and coding agents equally well, guided by three fundamental ideas:

  • Consistent command structure across every provider. The do <package:module:type> <operation> pattern is the same for AWS, Azure, Google Cloud, Kubernetes, Cloudflare, Datadog, and every provider, including packages containing higher-level component resources. Once an agent learns that pattern, it applies across the board.

  • Predictable output contract. JSON on stdout, progress on stderr, consistent exit codes. An agent can parse the result programmatically without scraping human-formatted tables.

  • A single CLI command that works across every cloud. Many cloud and SaaS providers don’t have a full CLI at all. pulumi do generates commands from the provider schema, so if a Pulumi provider exists for it, the CLI just works. Neither humans nor agents need to install, learn, or even know about cloud provider-specific tooling.

What’s next

Resource operations and provider functions are the foundation. The pulumi do roadmap extends the same direct-operation model with credential management, state tracking, and a path to full IaC.

Unified credentials with Pulumi ESC

One of the hardest parts of multi-cloud operations is credential management. Every provider has its own authentication scheme, environment variables, and session lifecycle. An agent working across AWS, Cloudflare, and Datadog today manages three separate credential mechanisms.

We’re building Pulumi ESC integration into pulumi do so you can manage credentials in one place and resolve them everywhere. ESC handles credential resolution (including OIDC-based dynamic credential generation and short-lived tokens) across all of your providers. Name the credential set, reference it, and ESC does the rest, with rotation, RBAC, and audit built in.

Cross-resource references

Real infrastructure has dependencies — subnets need VPCs, security group rules need their security groups, and so on. When you’re building resources one at a time, those references need to flow between commands somehow.

A future version of pulumi do will let resource inputs reference outputs from previously created resources, allowing the CLI to resolve them automatically and preserve the dependency graph. Later, when the time comes to graduate to a full IaC program, the generated code contains proper resource references rather than hard-coded strings.

Stateful mode and the graduation path

Today, pulumi do is stateless. Each command runs independently. A planned stateful mode will persist resource state across operations, enabling drift detection, lifecycle management, and a graduation path to full infrastructure as code.

Here’s what we’re planning:

  1. Zero setup. Your first pulumi do implicitly creates a project and stack. No manual initialization.

  2. Accumulate resources. Each operation stores resource state. After a few commands, you have a lightweight representation of your infrastructure.

  3. Eject to a full project. When the time comes, generate a Pulumi project in your chosen language with all resources imported and dependency graphs intact.

  4. Connect to Pulumi Cloud. Layer on governance, compliance, team collaboration, and deployment automation through Pulumi Cloud. Resources created via pulumi do can be governed by Pulumi Insights from day one, even before you opt into full IaC.

This path works because pulumi do uses the same providers, resource types, and property schemas as every other pulumi operation. Provisioned cloud resources stay where they are as management capabilities are added as needed.

Get started

pulumi do ships as a research preview in Pulumi CLI v3.242.0 and later. Install or update the CLI, install a provider plugin, and start running commands. The documentation has the full reference.

We can’t wait to hear your feedback. Give it a try today, tell us what works (and what doesn’t), and help shape the CLI that agents and humans both reach for first.

Neo works in GitHub pull requests and Slack channels

This release2 featuresNew capabilitiesAI-tallied from the release notes

This week, Pulumi Neo started working in two more places: GitHub and Slack. The agent that already runs Pulumi tasks from the Cloud console and the terminal now participates in the threads where your team discusses changes.

Mention @pulumi-neo in a pull request or issue and Neo replies in the thread. Mention @Neo in a Slack channel and Neo starts a task, continuing the conversation as you reply.

Neo in GitHub

Mention @pulumi-neo in a pull request description, a top-level or inline review comment, or an issue. Neo sees the diff, the stacks linked to the repository, and their current state. Reviewers can ask Neo to walk through what a proposed change does, including resources that change in stacks the PR doesn’t touch directly. Responses land in the same thread, so the analysis becomes part of the review record and any follow-up stays with it.

Neo in Slack

Mention @Neo in any channel where Neo has been added, and Neo starts a task in the thread. The reply lands in the same thread, and follow-up messages continue the conversation there. The rest of the channel can see what was asked and what Neo found. Neo has the same capabilities here as in the Pulumi Cloud console or the terminal: check stack state, investigate failures, walk through what a change will do, or carry out actions the team has approved.

Integrations in action

A teammate posts in #platform-engineering: “API latency p95 has been climbing for two days, nobody can figure out why.” You reply:

You: @Neo check the production API stack. Anything change in the last 72 hours?

Neo starts a task in the thread, walks the stack history, and finds a configuration change to the load balancer’s idle-timeout setting that landed Friday afternoon. It posts the change, who deployed it, and when. The rest of the channel sees the finding without you having to retell it.

You: @Neo open a PR to revert idle-timeout to the previous value.

Neo edits the stack’s Pulumi program, runs pulumi preview to confirm the change touches only the load balancer, and opens a pull request with the diff and the preview output. A reviewer pulls it up:

Reviewer: @pulumi-neo what else does this change affect downstream?

Neo replies in the same review thread with the resources that change: the listener config and the target group health check. The reviewer reads, approves, and the change ships.

The investigation moved from Slack to GitHub, and both threads keep the record.

Permissions and governance

Whether the conversation starts in GitHub or Slack, Neo runs with the RBAC permissions of your Pulumi Cloud user. Stack-level controls, organization-level guardrails, and audit logging apply the same way they do for a task started from the console. Starting a conversation in a new place doesn’t grant Neo new permissions; it just changes where the conversation happens.

Try it out

Both integrations are available now for Neo-enabled organizations. The GitHub integration docs and Slack integration docs cover the one-time setup. From there, every engineer with a linked Pulumi Cloud identity can mention Neo from the threads they already work in.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, the Neo CLI launch post for Neo’s new home in the terminal, and the Neo Integrations post for the MCP servers and cloud CLIs that ship with this release.

As always, we’d love to hear what you think — and if you have any suggestions for places we should put Neo next, file an issue in pulumi-cloud-requests.

Neo Automations ships scheduled tasks as pull requests

This release1 featureNew capabilitiesAI-tallied from the release notes

Recurring platform work slips: provider versions fall behind, drift accumulates between checks, and the quarterly audit keeps getting pushed back another month. Pulumi Neo can now run any task on a cadence you set, opening a pull request for each run.

Automations in action

Your platform team runs stacks across staging and production, and the AWS, GCP, and Kubernetes providers keep shipping new versions. Nobody has time to bump them stack by stack.

You write one automation:

Every Monday at 8 AM, check the infra/ project for stacks where the AWS, GCP, or Kubernetes provider is more than two minor versions behind. For each one, bump the out-of-date provider, run pulumi preview, and open a PR if the preview is clean.

Monday morning, Neo runs the prompt. It finds three stacks behind on the AWS provider, edits each program, runs preview, and opens a PR for each clean run. You review the PRs like you would any other dependency bump, merge them, and Neo runs again next Monday.

What automations are for

The launch includes four built-in templates: a provider freshness check, an encryption audit, a backup audit, and an activity digest. You can also skip the templates and write your own prompt.

Pick from hourly, daily, weekdays, or weekly cadences. Each automation gets its own page in the Automations tab, where you can edit the prompt, change the schedule, run it once on demand, or pause it.

Safe by default

Automations default to two settings that fit recurring work. Approval mode is auto, so a run doesn’t wait for human confirmation between steps. Permission mode is read-only, so a run can read state and propose changes through pull requests but can’t apply changes directly. You can override either default per automation.

How automations fit with the rest of Neo

A scheduled task uses the same context as an interactive Neo task. Custom Instructions at the organization and project level apply, so a scheduled run respects the same naming conventions, tagging policies, and architecture rules your team has written down.

MCP integrations and CLI integrations work in scheduled tasks the same way they work in interactive ones, so a weekly drift check can query AWS through the aws CLI, file Linear issues, and link related PagerDuty incidents. Scheduled tasks also run with the RBAC permissions of the user who scheduled them, checked at run time; if permissions change between scheduling and execution, the new permissions apply.

Try it out

Open Neo in Pulumi Cloud, switch to the Automations tab, and pick a template or write your own prompt. The automations docs cover the form, scheduling options, and per-automation overrides.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo Integrations post for the third-party tools and CLIs your automations can use.

As always, we’d love to hear what you think — and if you have any suggestions for automations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Ewan Dawson is CTO of Compostable AI, where five engineers run an AI-native software factory: nineteen clients, custom AWS deployments, most of them shipped within a day of contract signing. This article is adapted from his recent Pulumi webinar, and covers rules in more depth than we had time for on stage.

For the past twenty years, I’ve viewed software development as a craft. The best engineers drew on decades of experience to get every function right.

But two years into the agentic AI revolution, I realised software is going to look more like a factory than a craft. The economics have changed. We can’t treat code as bespoke anymore. To scale, we have to think industrial — use the tools to ship more value with fewer engineers.

I joined Compostable AI soon after it was founded 2.5 years ago, and I built the engineering org AI-native from day one. The technology has come a long way since then, and so has my understanding of what AI-native actually means. Here are seven rules I keep coming back to.

1. Transform, don’t enhance

Going AI-native isn’t an upgrade to your existing process. If you treat AI as a way to hand your developers smarter tools, you leave most of the value on the table. You get the leverage by rebuilding how you write software — and the culture and processes around it.

I know that’s a tall order for a large, mature engineering org. My advice: start small. Pick one team or one business area and run it as a fully AI-native function. Take what you learn and roll it out from there. And do the political work early, especially with your Governance, Risk, and Compliance function. Get GRC on your side early. Otherwise AI becomes a compliance fight instead of a structural advantage.

Don’t bolt AI onto your existing workflow. Redesign the workflow around what agents can do.

Most of the leverage in this technology comes from rebuilding around it. The tool change is the small part.

2. Remove the problem, don’t solve it

Going AI-native flips which problems are hard and which are easy. The right move often isn’t to engineer a solution. It’s to reframe the problem so it goes away.

Here’s an example. Serving multiple clients with agents writing the code, blast radius wasn’t a hypothetical. One bad agent run could trash a customer’s database, or leak one client’s data into another’s. Our instinct was to build a secure multi-tenant sandbox with guardrails, approvals, rollback. But every version we tried still had agents loose in a shared environment, one bug away from making one customer’s data visible to another’s. So we removed the problem: every client gets two dedicated AWS accounts, one for production and one “digital twin” staging account. Agents iterate on staging until the work checks out. Only then does it ship to production. We have nineteen accounts now, one per client.

Managing nineteen AWS accounts with five engineers used to be an administrative nightmare. When code is cheap, infrastructure-as-code tools like AWS Control Tower and Pulumi make it the easier path.

Remove the problem before you try to solve it.

It’s cheaper to reframe the problem than to engineer your way through it.

3. Pick tools your agents can drive

Removing problems is the process side. The other side is tooling. If you want an automated factory, your tech stack has to be something agents can drive. This overlaps a lot with tools that have great developer experience. If a tool has a robust API plus a clean CLI, agents can drive it. If it’s heavy click-ops around a web UI, agents stop there.

We didn’t get there first try. Our first IaC tool worked fine when we had a couple of clients. As we added more, accounts drifted, deployments slowed, retries got complicated. We needed something built for where we were heading.

I went looking, and Pulumi fit. We express infrastructure as type-safe code — TypeScript, in our case, rather than HCL — and agents are good at writing it. Pair that with Pulumi Neo — pre-loaded with domain-specific Pulumi skills — and we ship infrastructure that follows best practices. One of my colleagues put it: “The scary thing about Neo is it just seems to know everything about what we do.” Pulumi IaC plus Pulumi ESC for configuration beats stitching tools together. And TypeScript lets us build higher-level abstractions that keep the AWS account fleet tractable.

“I don’t actually care if it’s HCL or TypeScript, as long as my software development agents can write it. And they do a better job with TypeScript than HCL.”

Tools have to share your AI-native mindset. If they don’t integrate deeply, the human becomes the glue.

If part of your stack still requires a human to click through a web UI to provision an account, your agents stop there.

4. Don’t let one agent do everything

When I first started with agents, I reached for a god prompt: one massive system prompt meant to guide a single agent through the whole software lifecycle. It didn’t work. Agents struggle when you give them multiple goals. The writer is lenient on its own work — it won’t catch what it just shipped. You don’t want it reviewing the code, checking for security flaws, or hunting bugs.

We get better results from a constellation of specialized agents, each handling one part of the line. Pulumi Neo handles infrastructure. Alongside it sit agents specialized in:

  • Code implementation
  • Code review and testing
  • Security auditing
  • Internal standards compliance
  • Documentation updates

Tasks pass down the line. Clean code comes out the other end, with almost no human involved.

Don’t let any agent mark its own homework. Specialize by job.

Treat agents the way you’d treat a team. The one who writes the code shouldn’t be the one signing it off.

5. Measure human hours per unit of value

Once we had agents writing and agents reviewing, throughput went up — but the bottleneck moved past the PR. Engineering hours were still the most expensive thing in the building, so my core metric is human hours per unit of value produced. Minimize that.

That means hunting for every step that still goes through a person — especially the mid-pipeline steps between ideation and production. Automate the human touchpoints along that line, and the factory runs 24/7.

Pushing automation this hard also forces good engineering. A chaotic, undocumented process is impossible to automate. Good engineering is still good engineering, AI or not. Agents won’t fix a weak process.

Measure human hours per unit of value. Treat every one as a bottleneck to remove.

You can’t automate what you can’t describe. Every human in the pipeline marks a piece that hasn’t been described yet.

6. Design for convergence, not one-shot correctness

Even with the human touchpoints removed, the agents don’t ship right the first try. Once you embrace the factory pipeline, you stop needing them to. We design for convergence instead — a system that lands on the right answer through automated iteration.

The loop we run looks like this:

  1. Refinement: agents iterate on the Product Requirements Document until the problem is clear.
  2. Planning: agents draft multiple technical approaches, and evaluation agents pick the best one.
  3. Implementation: coding agents write the software.
  4. Review: specialized checking agents look for bugs, API misuse, and security flaws.

If the checkers find a problem, they hand it back to the implementation agent. The loop repeats until the tests pass and the agents agree on a clean PR. Once it converges, we merge and deploy to staging.

Two things have to be true. You need a way to evaluate the output. Without that, you don’t know when to stop. And the loop has to converge — each pass has to get closer. A checker that fails every PR for a different reason isn’t helping — it just keeps the work going in circles. The feedback has to narrow the search, not widen it.

Once it converges, the question moves on. How cheap can we make it? Lower the time to PR, reduce token count, drop the overall cost. The optimization never really ends.

Don’t aim for one-shot correctness. Design for convergence.

It doesn’t matter how many tries it takes, as long as the loop closes without a human in it. Get convergence first. The optimization comes after.

7. Run the factory in the cloud, not on a laptop

Even a converged factory has to live somewhere. Try running a fully automated factory on individual developers’ laptops, and it falls apart. Laptops are highly trusted machines. Put autonomous agents on them and your security posture drops, fast. And the factory has to run 24/7. Events come from elsewhere — PR comments, Slack threads, errors in test environments.

Cloud also kills configuration drift across a dozen developer machines. The same prompts run against different model versions, and env vars sit half-set on half the laptops. The thing you’re trying to optimize lives in different states across the team. Cloud isn’t just where the factory runs; it’s the only place a team can iterate on it together. Keep everything in one place — AWS, Pulumi Cloud, GitHub. The specific stack matters less than the principle of one place.

And the part that matters most: the factory keeps running, testing, and deploying long after we’ve closed our laptops and gone to sleep.

Build the factory somewhere you can work on it — not just somewhere it can run.

A factory scattered across laptops can’t be improved as a system. Cloud keeps it in one shape, 24/7, and lets the team iterate together.

Closing thought

I’ve shipped more code in the last two years than I did in the fifteen before that. Most of it in languages I couldn’t write by hand. And that’s after a stretch in leadership where I wrote almost none.

If you’re where I was two years ago: don’t ask how AI fits into what you already do. The factory is built one rule at a time, and it’s not a template — it’s the practice of finding where you’re taking advantage of the new economics and where you’re not, where your practices still need an update. The leverage is in finding these places and improving them.


Watch the original Pulumi webinar. Learn more about Compostable AI and Pulumi Neo.

Neo CLI brings AI infrastructure tasks to the terminal

This release1 featureNew capabilitiesAI-tallied from the release notes

Since launching Pulumi Neo, over 4,500 organizations have used it to delegate real infrastructure work: scaffolding, migrating, investigating, operationalizing, and more. Though that usage has come entirely through Pulumi Cloud, we know a large portion of Pulumi users live in the terminal, and increasingly that’s where AI tools run too. Now we’re bringing Neo there.

pulumi neo brings the same Neo experience you’ve had in Pulumi Cloud to your terminal. Running locally means there’s no separate branch to push, no credentials to provision, and no context to paste: Neo picks up the setup you already have.

What local execution unlocks

Neo inherits your setup when it runs locally. The CLIs you’ve authenticated, the environment variables and kubeconfigs you’ve configured, and the project you’re editing right now are all available without any setup on your part. That means Neo can run the same commands you would, against the same systems you have access to.

That makes pulumi neo a fit for paired, interactive sessions where you and Neo work through a problem together. For asynchronous, autonomous tasks you set up and come back to, Pulumi Cloud Neo is still the surface to reach for. Both reach the same Neo.

You can also hand tasks to Neo from other agent sessions. Simply ask your agent, such as Claude Code or Codex, to hand the task off to Neo, and the Neo handoff skill packages the current thread (goal, repo pointers, conversation summary) and starts a Neo task using pulumi neo under the hood. This works anywhere skills are supported, without leaving your current session.

What carries over

Local tools and context are what’s new. The full set of controls you have in Pulumi Cloud Neo applies in the terminal: approval modes (manual, balanced, auto) for tool calls, permission modes (default, read-only) for what Neo can change, and Plan Mode for research and planning before execution.

Integrations carry over too. The integration catalog (connectors to Atlassian, Datadog, Linear, PagerDuty, and others) works the same way from the terminal. Identity, RBAC, and audit all run through your pulumi login, the same way they do in the console. See the Pulumi Neo docs for details.

Get started

pulumi neo ships with the latest Pulumi CLI. To start a session:

  1. Authenticate to Pulumi Cloud with pulumi login.
  2. Run pulumi neo, or pass an initial prompt: pulumi neo "what's in this stack?".

pulumi neo is part of a broader launch on agentic infrastructure. See the pulumi neo command reference and the Pulumi Neo docs for details. 10 things you can do with Neo is a good starting point for tasks to try. The Pulumi Community Slack is the place for questions and feedback.

Pulumi Neo already understands your infrastructure: your code, your stacks, your state. Today we’re launching new capabilities that extend Neo’s reach in two directions: into the third-party systems your team uses to plan and observe, and out to the cloud CLIs that actually drive your infrastructure.

The first half is MCP integrations: connections to Atlassian, Datadog, Honeycomb, Linear, PagerDuty, and Supabase that show up as tools Neo can call during a task. The second half is CLI integrations: scopable access to aws, gcloud, az, and kubectl. Both are configured once at the org level and available to every Neo task in the organization.

Integrations in action

A PagerDuty alert just fired: RDS storage on payments-prod is at 90% and climbing. You want to know how fast, and whether you can buy yourself any runway before it fills.

You: Neo, RDS storage on payments-prod just paged at 90%. How fast is it growing, and what do we have configured?

Neo pulls the active incident from PagerDuty, decides on its own to check Datadog for the storage-utilization curve over the last 30 days, and runs aws rds describe-db-instances --db-instance-identifier payments-prod through your production-aws CLI integration (the name your org gave its production AWS credentials). The database has been growing about 5 GB a day. The instance has AllocatedStorage at 200 GB and MaxAllocatedStorage also at 200, so storage autoscaling is effectively disabled. At current growth, the disk fills in three days.

You: Bump max allocated storage to 500. Open a PR.

Neo edits the payments stack’s Pulumi program to raise maxAllocatedStorage from 200 to 500 on the RDS instance, runs pulumi preview to confirm the change is scoped to that one resource, and opens a pull request with the diff, the preview output, and links to the PagerDuty incident and the Datadog graph. You review the PR and merge it. Pulumi applies the change, and Neo posts the resolution back to PagerDuty.

With three integrations and one conversation, the change is reviewed, shipped, and the alert resolved a few minutes later.

MCP integrations: context from your existing tools

The launch catalog covers six services that show up most often in infrastructure investigations: Atlassian for Jira issues and Confluence runbooks, Datadog for metrics and logs, Honeycomb for traces, Linear for issue tracking, PagerDuty for incidents and on-call schedules, and Supabase for managed database changes. Each connects Neo to a remote MCP server hosted by the provider, so the agent has access to the full set of tools the vendor chooses to expose.

Integrations can be enabled by organization administrators on the Neo Settings page. Once configured, they’re available to every Neo task in your organization.

CLI integrations: live cloud insights

CLI integrations cover what MCP doesn’t reach: live cloud insights. With AWS, GCP, Azure, or Kubernetes connected, Neo can check live database utilization, look up the current state of a running service, verify a service quota before scaling, or reach into resources that aren’t managed by any Pulumi stack.

An admin enables a CLI integration the same way as an MCP one, from your org’s Neo settings. Each integration gets a name your team chooses, like production-aws or staging-gcloud, and tasks reference that name to tell Neo which environment to reach into. You can connect multiple instances of the same CLI (for example, production-aws and staging-aws) so Neo can investigate staging without touching production. Credentials are backed by Pulumi ESC environments your org owns; the CLI integrations docs walk through setup.

Per-task control and failure handling

Both surfaces default to org-wide availability, with per-task overrides. Before starting a task, you can toggle individual MCP integrations off. The toggles only affect that task; the org-level configuration is unchanged.

Failures behave the same way for both. If an integration can’t be reached, Neo logs a warning, skips it, and continues with the rest. A single broken integration doesn’t stop a task. CLI integration connect and disconnect events go to your organization’s audit log, and Neo’s individual CLI calls appear in the task transcript alongside its other tool calls.

Try it out

Both MCP and CLI integrations are available now for Neo-enabled organizations. Open your org’s Neo settings, connect the MCP server or CLI of your choice, and let Neo do the next investigation against the tools you already use. The MCP integrations docs and CLI integrations docs walk through credential setup for each one, and the Neo integrations hub ties it all together.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo CLI launch post for Neo’s new home in the terminal.

As always, we’d love to hear what you think — and if you have any suggestions for integrations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Deployment and Insights commands ship; npx support added

This release9 featuresNew capabilities3 fixesBug fixesAI-tallied from the release notes
Pulumi · v3.242.0

3.242.0 (2026-05-19)

Features
  • [cli] Add a pulumi package for npx support

  • [cli] Add the pulumi org member edit command #23235

  • [cli] Add the pulumi org member remove command #23237

  • [cli/cloud] Add pulumi deployment get to retrieve details for a specific deployment #23238

  • [cli/cloud] Add pulumi insights account scan get <account> <scan-id> to show the full workflow run for a single Insights scan #23255

  • [cli/cloud] Add pulumi insights account scan list <account> to discover recent scan IDs to feed into pulumi insights account scan log #23255

  • [cli/deployment] Add dedicated flags for each deployment setting #23236

  • [cli/do] Add the start of pulumi do #23176

  • [cli/neo] Add --print/-p to pulumi neo to run a single prompt non-interactively and print the agent's final response to stdout #23245

Bug Fixes
  • [cli/cloud] Fix pulumi insights account scan log --all to follow the server's pagination cursor through the end of the log, and render --job/--step mode as structured lines instead of an empty raw-string blob #23256

  • [sdk] Close gzip.Writer in archiveTarGZIP to produce valid tar.gz output #23240

  • [sdkgen/python] Fix usage of ArgsDict types in typed dictionaries #23253

Last fall, after launching Pulumi Neo, we wrote up 10 things you could do with it. In the months that followed, as platform teams handed Neo more real work, we watched and listened, shipping a steady stream of features like plan mode, read-only mode, AGENTS.md, an integration catalog, cross-cloud migration, and task sharing. With today’s release, Neo extends beyond the Pulumi Cloud console into the Pulumi CLI, GitHub, and Slack.

So here are 10 more things you can do with Neo.

1. Deploy your app to AWS without writing IaC

Hand Neo a repo and a target cloud. Neo picks the right services, writes the Pulumi, and opens a PR.

The cloud infrastructure part of getting a new service running, especially one in a new language, is always a few hours of boilerplate: a VPC and subnets, an IAM role, security groups, a load balancer, DNS, and a TLS cert.

With Neo, that work collapses into a prompt. Point Neo at a repo and ask:

Deploy this app to AWS as a publicly accessible service.

Plan mode comes back with the resources Neo will create, named and sized: ECS Fargate, an ALB, and the VPC wiring. Approve, and Neo writes the Pulumi program, runs a preview, and opens a PR. You, the human in the loop, merge it after review.

Neo planning a PR and deploying an app to AWS.

[

Start a Neo task Ask Neo to deploy your app to AWS and make a PR

](https://app.pulumi.com/neo?prompt=I%27d+like+to+deploy+this+app+to+AWS.+Confirm+what+you%27ll+create.)

2. Diagnose a slow API from metrics, logs, and code

Slow endpoints live at the seam between runtime metrics and the stack that runs them. Neo reads both and proposes a fix with the metric evidence as the rationale.

Production incidents often involve multiple tools. When the checkout endpoint’s p95 climbs from 200ms to 1.2s, the metric is in Datadog, but the cause might be somewhere in your AWS account: maybe RDS is out of IOPS, maybe the connection pool is too small, maybe the autoscaler isn’t keeping up. Connecting “this metric looks bad” to a recent backend change and then to a one-line fix in your Pulumi program is an exercise in detective work.

Neo’s integration catalog bridges this gap. With built-in Datadog, PagerDuty, and Honeycomb integrations sitting alongside your Pulumi state, Neo can read traces and metrics from the tools your team already uses and take action.

Ask Neo:

Find the scaling bottleneck on /checkout from the last 7 days of metrics and propose a fix.

Neo pulls the metric history, matches the Datadog tag db.cluster=checkout-rds to the RDS instance in your prod-checkout Pulumi stack, and opens a PR with a Pulumi diff that bumps the storage IOPS and raises the connection-pool ceiling. You review and roll out the fix.

Toggle on the Honeycomb integration so Neo can read traces and metrics alongside your Pulumi stacks.

3. Triage a PagerDuty alert from Slack

A page comes in. You paste it into your on-call channel and tag Neo, and Neo replies with the cross-system view you’d otherwise spend the first 20 minutes assembling.

On-call triage is often about getting up to speed quickly. You get paged because something is in the red, and you don’t know why.

You mention Neo in the on-call Slack channel:

@neo, what’s going on with this alert?

Neo starts querying metrics and traces. With PagerDuty and Datadog in the integration catalog, it correlates the alert with every deploy and stack change tagged with the alert’s service in the last hour, and finds the change that lines up:

Two deploys in the last hour touched services tagged service:checkout: checkout-api@a3f9c2 (12 min ago, app-layer deploy) and Pulumi stack prod-checkout-rds (45 min ago, decreased max_connections from 200 → 100). p99 inflection at 14:03 lines up with the stack change. Likely cause: the connection-pool reduction is starving the API under current load.

You ask a couple of clarifying questions in-thread, then ask Neo to open a rollback PR against the Pulumi stack.

Authorize PagerDuty and Datadog in Neo's settings. Neo can then read alerts in your on-call Slack channel, find the change that correlates, and open a PR when you ask.

4. Implement a Linear ticket end-to-end

Hand Neo a ticket number from Linear, Jira, or GitHub Issues. Neo reads the description and acceptance criteria, plans against your stack, and opens a PR.

Tickets often pile up not because they’re unimportant, but because they’re not urgent. Ongoing maintenance quietly accumulates. Bumping a provider version, centralizing secret management, working through small policy violations: each one matters, but none of them ever moves to the top of the queue. Explaining each one to an agent is its own overhead.

The fix is letting Neo read the ticket itself. Connect Linear or Jira through the integration catalog (GitHub Issues works too), and Neo pulls the ticket the same way an engineer would: title, description, acceptance criteria.

Ask Neo:

Implement CAD-1234 in our payments stack.

Neo reads the ticket, plans against your existing stack, opens a PR, and drops a comment back on the ticket. The ticket and the PR end up linked, and your backlog shrinks.

Neo running locally in the Pulumi CLI: fielding a Linear issue, analyzing the codebase, and producing a PR that upgrades multiple projects to the latest Pulumi and AWS provider versions.

[

Start a Neo task Implement a Linear ticket end-to-end

](https://app.pulumi.com/neo?prompt=I%27d+like+to+implement+a+ticket+from+Linear+%28or+Jira%2C+or+GitHub+Issues%29.+Ask+me+for+the+ticket+number.)

5. Tighten over-privileged IAM roles

Neo audits each role against what your stack code actually does, and proposes scoped policies that improve your security posture.

IAM cleanup is the kind of work nobody has the time to prioritize. Production has 40 roles. Half of them started with s3:* because nobody had time to scope them, and the cleanup slips quarter to quarter.

Ask Neo:

Audit IAM permissions across my accounts and propose narrower policies for over-privileged stack-managed roles.

Neo cross-references each role’s policy against what the stack code actually calls, and opens a PR per role. The PR body lists the API calls Neo found in the stack code, like s3:GetObject on audit-logs-* and s3:PutObject on audit-logs-staging, as the justification for the scoped policy. The evidence sits next to the diff.

If you’re unclear about which roles count as in-scope or what your team considers over-privileged, start in plan mode and agree on that with Neo first.

Neo auditing an over-privileged IAM role and proposing a narrower policy, with the actually-used permissions as evidence.

[

Start a Neo task Audit IAM and tighten over-privileged roles

](https://app.pulumi.com/neo?prompt=Audit+IAM+permissions+across+my+accounts+and+propose+narrower+policies+for+over-privileged+stack-managed+roles.)

6. Migrate from AWS CDK onto your platform’s golden paths

Neo reads your existing CDK app and lands a PR that swaps AWS’s defaults for your team’s published components.

CDK’s L2 constructs encode AWS’s defaults. s3.Bucket with encryption: BucketEncryption.S3_MANAGED is a sane choice, but it’s AWS’s idea of sane, not yours. A platform team that’s published its own components to the Pulumi Private Registry has already decided what your bucket defaults look like: encryption with the right KMS key, tagging by cost center.

Ask Neo:

Migrate the payments-vpc CDK stack to Pulumi using our published components.1

Neo reads the source CDK app and your registry side by side. It maps each CDK construct to its closest team-published equivalent, clarifying with you where the mapping is ambiguous.

<span class="line"><span class="cl"><span class="c1">// Before (AWS CDK, AWS's defaults)
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">bucket</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">s3</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="s2">"Assets"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">bucketName</span><span class="o">:</span> <span class="s2">"payments-assets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">encryption</span>: <span class="kt">s3.BucketEncryption.S3_MANAGED</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">versioned</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>
<span class="line"><span class="cl"><span class="c1">// After (Pulumi, your team's published component)
</span></span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">platform</span> <span class="kr">from</span> <span class="s2">"@payments/platform"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">bucket</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">platform</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="s2">"assets"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">bucketName</span><span class="o">:</span> <span class="s2">"payments-assets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">classification</span><span class="o">:</span> <span class="s2">"internal"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

[

Start a Neo task Migrate CDK onto your golden paths

](https://app.pulumi.com/neo?prompt=I%27d+like+to+migrate+this+CDK+stack+to+Pulumi.+Use+our+published+components+where+you+can.)

7. Migrate a service to Kubernetes from a runbook

Once the migration pattern is written down, the next service to move is a prompt away.

Containerizing an app and moving it to Kubernetes involves several small decisions: which base image, what labels go on deployments, how ingress is wired, and how secrets reach the pod. But after a team has moved two or three services, the pattern is set. The decisions get written down in a runbook, and every subsequent migration is mostly the same shape.

Ask Neo:

Containerize the billing-api service and write its Kubernetes manifests, following our K8s migration runbook in Confluence.

Neo reads the source repo and the runbook in Confluence via the integration catalog and starts working on your request.

You can save this as a Neo skill that splits the work into multiple PRs — Dockerfile first, ECR config next, Deployment/Service/Ingress manifests after — and link back to each runbook convention for ease of review. The output reflects your conventions: the labels you actually use, the ingress class you’ve standardized on, and the External Secrets Operator config your team prefers.

You’re still the one reviewing the PRs and deciding what the cutover looks like in production. Neo follows your internal standards, so the new service ends up shaped like the last one you migrated.

Neo migrating a VM-based service to Kubernetes step by step, following the team's Confluence runbook.

Once you’ve delegated something a few times, the next move is to automate it. The remaining three tasks are the kind Neo doesn’t need to be asked for. Drift, deps, compliance: they’re the operations you put on a schedule.

8. Schedule daily drift checks across your cloud infrastructure

Schedule a daily drift check across your cloud. Wake up to PRs that fix what changed overnight.

Configuration drift is an ongoing challenge. The security team rotated an IAM role at 04:47 UTC. Someone changed a security group in the AWS console three weeks ago. Left alone, drift turns into security gaps, into compliance issues, and into the kind of “wait, who changed that?” confusion nobody wants to chase down.

Pulumi Cloud is already good at drift detection. Neo takes it a step further.

Ask Neo:

Every morning at 6 AM, check all production infrastructure for drift and create PRs to fix any issues you find.

From then on, the task runs on its own, and you wake up to a PR per drifted resource. The description spells out what happened (iam_role.audit-reader had inline policy AllowReadAuditLogs added at 04:47 UTC) and cites the section of infra/runbooks/drift.md Neo followed.

Some drift gets encoded into the Pulumi program, like the IAM rotation above. Some gets reverted, like the security group rule added from the console. Some gets ignored entirely, like autoscaler-managed Lambda concurrency reservations the runbook tells Neo to skip. You write the runbook once; Neo follows it every morning to decide what to do.

Neo's morning drift PR. The body names the resource, the change, when it happened, and the section of the runbook Neo followed to decide what to do.

[

Start a Neo task Schedule a daily drift check

](https://app.pulumi.com/neo?prompt=Every+morning+at+6+AM%2C+check+all+production+infrastructure+for+drift+and+create+PRs+to+fix+any+issues+you+find.)

9. Schedule weekly upgrades for outdated providers and runtimes

Lambda runtimes and container base images age out. Schedule the upgrade pass; review the PRs Neo opens.

AWS Lambda end-of-life notices come out months ahead. Node 20 stopped receiving runtime updates at the end of April. Python 3.9 ended last December. After the deadline, AWS blocks new deploys and eventually stops invoking the function. Each one needs to move to a supported runtime before the cutoff.2

Schedule it:

Every Sunday night at 10 PM, check our Lambdas for runtimes nearing end-of-support and open PRs to upgrade them.

Neo reads the AWS Lambda runtime deprecation page, matches the end-of-support runtimes against every Lambda in your stacks, and opens one PR per stack.

If Python 3.9 is reaching end-of-support, the upgrade is to Python 3.12, and datetime.utcnow() calls need to move to datetime.now(datetime.UTC). Neo can make all of those replacements in the same PR.

The same task can catch container base images with critical CVEs and bump them too.

Setting up a weekly task in the Scheduled Tasks UI. Once saved, Neo runs the prompt every Sunday night and opens PRs you review on Monday.

[

Start a Neo task Schedule a weekly runtime upgrade check

](https://app.pulumi.com/neo?prompt=Every+Sunday+night+at+10+PM%2C+check+our+Lambdas+for+runtimes+nearing+end-of-support+and+open+PRs+to+upgrade+them.)

10. Fix CIS Benchmark failures with daily PRs

Run the benchmark on a schedule. Wake up to PRs that fix what failed.

The CIS AWS Foundations Benchmark, available through AWS Security Hub, is something every team should be keeping an eye on. The benchmark finds issues like S3 buckets that allow public read access (S3.1), root user access keys that shouldn’t exist (IAM.4), or CloudTrail not being enabled (CloudTrail.1). Scanning for these issues is a solved problem, but closing and addressing them is not. They pile up between audits because each one is a code change in a different stack, and nobody owns the cross-stack cleanup.3

Schedule the cleanup:

Every morning, read CIS Benchmark failures from Security Hub. For every failure on an IaC-managed resource, open a PR with the fix.

Neo opens one PR per failure. A bucket failing S3.1 arrives as a Pulumi diff that adds blockPublicAccess to the bucket in your prod-checkout stack. The PR body lists the CIS rule number, the resource ID, the diff, and a clean pulumi preview against the live infrastructure.

The runbook is where your security team writes down what each control means for your stacks. Block public S3 buckets, except the ones tagged public-content=true for CloudFront origins. Don’t auto-touch the break-glass IAM roles; page a human instead. Multi-region CloudTrail stays on, no exceptions. Neo reads that file, checks each Security Hub finding against it, and only opens a PR for the ones you’ve said are safe to fix. The rest get routed or ignored, the way your team already handles them.

A PR raised by Neo to fix a CIS Benchmark failure, with the failing rule, the resource, and the runbook decision laid out in the body.

[

Start a Neo task Schedule a daily compliance scan

](https://app.pulumi.com/neo?prompt=Every+morning%2C+verify+all+resources+meet+our+compliance+policies+and+create+PRs+to+fix+violations.)

Neo: your newest platform engineer

Over the past year, many product teams have stopped treating AI as a request-by-request assistant and started delegating to it outright. Agents open pull requests, investigate issues, and iterate on review feedback.

But platform engineers have held back because a bad infrastructure change doesn’t just fail, it can take production down. Coding agents benefit from fast, forgiving feedback loops, but infrastructure recovery is rarely as simple as reverting a commit.

What was missing wasn’t the appetite. It was an agent with enough organizational context and grounding to plan reliably, enough guardrails to feel safe and contain mistakes, and enough discipline to keep working without being asked.

The theme across these tasks is clear. A thing platform engineers used to keep in their heads becomes a task you delegate, then becomes work that runs without you. Neo isn’t generating infrastructure from a template. It’s a teammate who knows your code, your providers, your conventions, your production metrics, and can raise PRs for you to review.

Neo now lives in your terminal, in your pull requests, in your Slack workspace, and in Pulumi Cloud. Pick one of these workflows and give it a try.


  1. The observant reader will notice Terraform-to-Pulumi was covered in the original post. ↩︎

  2. Also covered in the original post. Last year you could ask Neo to do it once. This year you can put it on a schedule. ↩︎

  3. Also covered in the original post. Last year Neo could remediate violations on demand. This year Security Hub feeds findings to a scheduled task that knows your runbook’s interpretation of each control. ↩︎

CLI redesigned for guessability; Pulumi Cloud fully accessible from terminal

This release1 featureNew capabilities1 enhancementImprovements to existing featuresAI-tallied from the release notes

AI agents do a lot of their work through CLIs. They’re easier to call than HTTP APIs and they produce predictable output. Over the last few months our own CLI traffic has shifted from mostly people typing commands to people and agents running commands together, often in the same session.

Today we’re shipping a release built for both. The Pulumi CLI is reorganized around three ideas: the right command should be the one you can guess, anything you can do in Pulumi Cloud should also be doable from the terminal, and what comes back should be just as readable to an agent as it is to a person.

Designing for guessability

The bar we set was that both developers and coding agents should be able to guess at the right command for a particular task: pulumi env edit to modify an environment, pulumi stack get to see what’s going on with a stack, pulumi org member list to see who’s on the team. If we had to explain which command did what, the usability bar hadn’t been met.

Branches in the tree are now singular nouns like stack, env, org, and deployment. Leaves are now verbs from a canonical vocabulary — list, get, set, new, edit, remove — and they mean the same thing wherever they’re used. edit always means modify an existing thing. Wherever the old vocabulary differed, though, the old name still works: ls, rm, update, and open are all aliased to preserve backward compatibility.

For the most part, product names have also been replaced with familiar nouns. Users (human or otherwise) don’t think in product names; they think in terms of resources, stacks, environments. For example, take Pulumi ESC: the product may be named ESC (and for a while the command was too), but nobody thinks I need to initialize a new ESC — they think I need to create a new environment. The command is therefore pulumi env new, with esc init preserved as an alias to avoid disrupting anyone’s existing workflows.

<span class="line"><span class="cl">$ pulumi env new my-project my-env
</span></span><span class="line"><span class="cl">Environment created.
</span></span>

All of Pulumi Cloud in the terminal

Up to now, most of what you could do with Pulumi Cloud had to be done either in the browser or through direct API calls. Things like reviewing deployments, setting up webhooks, finding non-compliant resources, or managing deployment settings all required you to break out curl and hit the API docs or open a browser and navigate the Pulumi Cloud console.

That changes today. Pulumi Cloud is now fully accessible from the command line through the pulumi CLI, with consistently named nouns and verbs aligned to what you’d expect:

  • pulumi stack get returns a complete stack overview, metadata, resource list, and more:

    <span class="line"><span class="cl">$ pulumi stack get <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --stack cnunciato/chris.nunciato.org/production <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output json <span class="p">|</span> jq -r <span class="s2">".resources[].type"</span> <span class="p">|</span> grep <span class="s2">"aws:s3"</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">aws:s3:BucketEventSubscription
    </span></span><span class="line"><span class="cl">aws:s3/bucket:Bucket
    </span></span><span class="line"><span class="cl">aws:s3/bucket:Bucket
    </span></span><span class="line"><span class="cl">aws:s3/bucketPublicAccessBlock:BucketPublicAccessBlock
    </span></span><span class="line"><span class="cl">aws:s3/bucketWebsiteConfiguration:BucketWebsiteConfiguration
    </span></span><span class="line"><span class="cl">aws:s3/bucketOwnershipControls:BucketOwnershipControls
    </span></span><span class="line"><span class="cl">aws:s3/bucketNotification:BucketNotification
    </span></span>

    … with other stack-related commands like pulumi stack history get events, pulumi stack drift list, pulumi stack schedule new, and pulumi stack webhook new alongside it.

  • Organizational commands like pulumi org member list, pulumi org role list, pulumi org usage get, and pulumi org audit-log export can help you dig into the details when you need to as well.

  • Deployment-related commands like pulumi deployment list, get, log, and cancel let you see what’s running, dive into what happened, and take action without having to leave the terminal.

    <span class="line"><span class="cl">$ pulumi deployment list <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --stack cnunciato/chris.nunciato.org/production <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output table
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">┌──────────────────────────────────────┬───────────┬─────────┬───────────┬──────────────┬─────────────────────────┐
    </span></span><span class="line"><span class="cl"> ID OPERATION VERSION STATUS INITIATED BY MODIFIED
    </span></span><span class="line"><span class="cl">├──────────────────────────────────────┼───────────┼─────────┼───────────┼──────────────┼─────────────────────────┤
    </span></span><span class="line"><span class="cl"> 83e44b8c-643c-4e9f-9f36-0c6a81d9db2e update <span class="m">140</span> running cnunciato 2026-05-17 21:26:37.340
    </span></span><span class="line"><span class="cl"> 52a37cbe-b7fd-4027-8e0f-7b4785ab12e8 update <span class="m">139</span> succeeded cnunciato 2026-05-16 23:36:07.999
    </span></span><span class="line"><span class="cl"> 94e04525-b3a4-42b5-9987-e344018a3324 preview <span class="m">138</span> succeeded cnunciato 2026-05-16 23:29:19.709
    </span></span><span class="line"><span class="cl">└──────────────────────────────────────┴───────────┴─────────┴───────────┴──────────────┴─────────────────────────┘
    </span></span>
  • And when you need to query across managed (and even unmanaged) resources, pulumi insights resource search and get can help you find what you’re looking for quickly:

    <span class="line"><span class="cl">$ pulumi insights resource search <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --query <span class="s1">'type:aws:s3/bucket:Bucket org:cnunciato project:photomap stack:dev'</span> <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output table
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">┌──────────────────────────────────────────────────────────────────────────┬──────────────────────┬───────┬──────────────────────────┐
    </span></span><span class="line"><span class="cl"> URN TYPE STACK MODIFIED
    </span></span><span class="line"><span class="cl">├──────────────────────────────────────────────────────────────────────────┼──────────────────────┼───────┼──────────────────────────┤
    </span></span><span class="line"><span class="cl"> urn:pulumi:dev::photomap::aws:apigateway:x:API<span class="nv">$aws</span>:s3/bucket:Bucket::api aws:s3/bucket:Bucket dev 2020-10-31T00:39:47.926Z
    </span></span><span class="line"><span class="cl"> urn:pulumi:dev::photomap::aws:s3/bucket:Bucket::images aws:s3/bucket:Bucket dev 2020-10-31T00:39:47.926Z
    </span></span><span class="line"><span class="cl">└──────────────────────────────────────────────────────────────────────────┴──────────────────────┴───────┴──────────────────────────┘
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">Showing <span class="m">2</span> of <span class="m">2</span> resources.
    </span></span>

Flags and output formats are consistent across commands (--output table, json), as are the shapes of cross-cutting features like webhooks. If you’ve used pulumi stack webhook, for example, you already know how to use pulumi env webhook and pulumi org webhook, and so on.

Direct access to the Pulumi Cloud API

For any features of Pulumi Cloud that don’t yet have their own commands, you’ve also got pulumi api. It’s a gh api-inspired command designed to give you direct access to the full REST API, without having to manage separate access tokens, auth settings, or request/response payloads. Everything is handled for you through your authenticated pulumi CLI.

There’s even pulumi api list, which enumerates every single endpoint that’s exposed:

<span class="line"><span class="cl">$ pulumi api list
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">┌───────────────┬────────┬───────────────────────────────────────┬──────────────────────────────┐
</span></span><span class="line"><span class="cl"> TAG METHOD PATH SUMMARY
</span></span><span class="line"><span class="cl">├───────────────┼────────┼───────────────────────────────────────┼──────────────────────────────┤
</span></span><span class="line"><span class="cl"> AccessTokens GET /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens ListOrgTokens
</span></span><span class="line"><span class="cl"> AccessTokens POST /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens CreateOrgToken
</span></span><span class="line"><span class="cl"> AccessTokens DELETE /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens/<span class="o">{</span>tokenId<span class="o">}</span> DeleteOrgToken
</span></span><span class="line"><span class="cl"> AccessTokens GET /api/user/tokens ListPersonalTokens
</span></span><span class="line"><span class="cl"> AccessTokens POST /api/user/tokens CreatePersonalToken
</span></span><span class="line"><span class="cl"> AccessTokens DELETE /api/user/tokens/<span class="o">{</span>tokenId<span class="o">}</span> DeletePersonalToken
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="m">537</span> operations. Pass --output<span class="o">=</span>json <span class="k">for</span> a stable, scriptable contract.
</span></span>

To get the details about a particular API, use pulumi api describe:

<span class="line"><span class="cl">$ pulumi api describe <span class="s1">'DELETE /api/user/tokens/{tokenId}'</span> <span class="c1"># or DeletePersonalToken</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">DELETE /api/user/tokens/<span class="o">{</span>tokenId<span class="o">}</span>
</span></span><span class="line"><span class="cl">Tag: AccessTokens
</span></span><span class="line"><span class="cl">Operation: DeletePersonalToken
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">DeletePersonalToken
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Permanently deletes a personal access token by its identifier. The token is immediately
</span></span><span class="line"><span class="cl">invalidated and can no longer be used <span class="k">for</span> authentication. Returns <span class="m">204</span> on success or <span class="m">404</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> the token does not exist.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Parameters:
</span></span><span class="line"><span class="cl"> <span class="o">[</span>path<span class="o">]</span> tokenId* <span class="o">(</span>string<span class="o">)</span> — The access token identifier
</span></span>

All requests are made through your authenticated pulumi CLI:

<span class="line"><span class="cl">$ pulumi login
</span></span><span class="line"><span class="cl">Logged in to pulumi.com as cnunciato.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi whoami
</span></span><span class="line"><span class="cl">cnunciato
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi api /api/user/tokens/2cf15c7d-afad-458f-ace0-fc7ff0512b10 <span class="se">\
</span></span></span><span class="line"><span class="cl"> --method DELETE <span class="o">&&</span> <span class="nb">echo</span> <span class="s2">"Token deleted."</span>
</span></span><span class="line"><span class="cl">Token deleted.
</span></span>

Newly published endpoints are available through pulumi api immediately, so you don’t have to wait for a new CLI release before you can start using them. See the Pulumi Cloud REST API documentation to learn more.

Finding templates in the Pulumi Cloud Registry

Finding out which templates are available to you through your Pulumi organization used to mean having to navigate to the Pulumi Cloud Registry and start searching. The new pulumi template commands make this easier by letting you ask for what’s available right from the shell, either by fetching the full list or filtering with the --name or --search params:

<span class="line"><span class="cl">$ pulumi template list --search <span class="s2">"container typescript"</span> --org cnunciato
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">┌─────────────────────────────────────────────┬────────┬────────────┬────────────┐
</span></span><span class="line"><span class="cl"> Name Source Language Visibility
</span></span><span class="line"><span class="cl">├─────────────────────────────────────────────┼────────┼────────────┼────────────┤
</span></span><span class="line"><span class="cl"> pulumi/templates/container-aws-typescript github typescript public
</span></span><span class="line"><span class="cl"> pulumi/templates/container-azure-typescript github typescript public
</span></span><span class="line"><span class="cl"> pulumi/templates/container-gcp-typescript github typescript public
</span></span><span class="line"><span class="cl">└─────────────────────────────────────────────┴────────┴────────────┴────────────┘
</span></span>

This is especially useful when you’re working with an agent because it helps the agent discover your org’s approved templates without having to name them. Start with a prompt that tells the agent what you want to build, and let the agent find the right template for you.

Agent-friendly Markdown docs for providers and components

Both humans and agents need to be able to understand what’s inside a Pulumi package before they can use it. And while the Registry is an excellent resource for that, it was mainly designed to deliver HTML — a human-friendly format that agents can certainly use, but that’s much more verbose than they actually need.

With pulumi api, agents can fetch the details about a package from the Registry directly and get back those details either in markdown or json, whichever works best, filtering on properties like language where applicable:

<span class="line"><span class="cl">$ pulumi api <span class="s2">"/api/registry/packages/pulumi/pulumi/random/versions/4.19.1"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"name"</span>: <span class="s2">"random"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"publisher"</span>: <span class="s2">"pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"publisherDisplayName"</span>: <span class="s2">"Pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"source"</span>: <span class="s2">"pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"version"</span>: <span class="s2">"4.19.1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"description"</span>: <span class="s2">"A Pulumi package to safely use randomness in Pulumi programs."</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"repoUrl"</span>: <span class="s2">"https://github.com/pulumi/pulumi-random"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>
<span class="line"><span class="cl">$ pulumi api <span class="s2">"/api/registry/packages/pulumi/pulumi/random/versions/4.19.1/docs/random%3Aindex%2FrandomPassword%3ARandomPassword"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --output markdown
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># RandomPassword</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">resource <span class="sb">`</span>random:index/randomPassword:RandomPassword<span class="sb">`</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">## Example Usage</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">package main
</span></span><span class="line"><span class="cl">...
</span></span>

Resources are individually addressable using their URL-encoded Pulumi type tokens — e.g., random:index/randomPassword:RandomPassword — and API endpoints are configured to deliver Markdown when agents ask for it:

<span class="line"><span class="cl">$ curl <span class="s2">"https://api.pulumi.com/api/registry/packages/pulumi/pulumi/random/versions/latest/readme?lang=python"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -H <span class="s2">"Accept: text/markdown"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Installation</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">The Random provider is available as a package in all Pulumi languages:
</span></span><span class="line"><span class="cl">...
</span></span>

Even compared to JSON (which is itself a significant improvement over HTML), Markdown is a much more token-efficient format for agents to work with:

Package

Endpoint

JSON

Markdown

Tokens saved

random

/readme

10.68 KB

6.04 KB

43%

aws

/readme

4.22 KB

2.54 KB

40%

aws

/nav?depth=full

204 KB

170 KB

17%

aws

/docs/{resource token}

15.24 KB

11.28 KB

26%

azure-native

/docs/{resource token}

48.13 KB

30.37 KB

37%

aws

/docs/{function token}

2.40 KB

1.46 KB

39%

Learn more about our Registry endpoints in the REST API docs. (Or just ask your agent!)

New to the CLI: Pulumi Neo

When we launched Pulumi Neo last year, the only way to use it was in the Pulumi Cloud Console. But while there’s a ton you can do with Neo in the browser, if you’re an engineer already living in the terminal, chances are that eventually you’re going to wish you had Neo right in the CLI along with you.

Now you do. Running pulumi neo with or without a prompt launches a Pulumi Cloud-connected session that gives Neo access to your local environment just like any other coding agent. Use it on its own to scaffold a new project, understand an existing codebase, or debug a failing deployment — or pull it into an active session with the coding agent you’re already using. Either way, it stays in the shell you’re already working in.

We’ll cover Neo in the CLI in more detail later this week. In the meantime, here’s a peek:

Smaller changes that add up

A long list of smaller changes also runs through this release:

  • The core loop now speaks JSON end to end, with pulumi up, pulumi destroy, and pulumi import all emitting structured JSON output when called with --output json.

  • Streams now behave the way scripts expect them to, with data on stdout, progress and diagnostics on stderr.

  • Exit codes are more consistent across the board. Every failure mode — auth, resource, policy, missing stack, cancellation, timeout, and others — has its own exit code, so agents can branch on the actual cause instead of having to interpret output. The full table is in the docs.

  • Help text explains why a command exists, not just what it does, and includes at least one concrete example. Examples in --help are one of the most effective ways to improve LLM accuracy on first-try invocations — and it turns out they’re pretty handy for humans, too.

A sneak peek at a new command

Later this week, you’ll get a closer look at pulumi do, a new top-level command that enables direct resource operations like create, read, update, delete, and list across every Pulumi-supported cloud provider and resource, all in one command. A simple example:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws getAvailabilityZones
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"groupNames"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2-zg-1"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"id"</span>: <span class="s2">"us-west-2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"names"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2a"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2b"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2c"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2d"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"region"</span>: <span class="s2">"us-west-2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"zoneIds"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az3"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az4"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

It might look like that’s calling the AWS CLI, but it’s not — it’s using the same AWS provider function a full Pulumi program would use, only without the program, and invoked directly from the CLI.

More on how it works, and what you can do with it, in the days ahead.

Try it yourself

A lot of what makes a developer tool worth using is in the details, and most of what’s in this release is exactly that, across the whole CLI, with humans and agents in mind.

We’d love for you to grab the latest release and give it a try. Tell us what’s now easy, what’s still hard, and what to fix next on GitHub or in the community Slack. The fastest way the CLI gets better is feedback from the humans and agents who live in it.

JSON output for preview/refresh/destroy; pulumi neo no longer experimental

This release48 featuresNew capabilities4 fixesBug fixesAI-tallied from the release notes
Pulumi · v3.241.0

3.241.0 (2026-05-18)

Features
  • [cli] Add --output json to pulumi preview for a structured JSON summary of the operation result #22927

  • [cli] Add --output json to pulumi refresh for a structured JSON summary of the operation result #22928

  • [cli] Add --output json to pulumi destroy for a structured JSON summary of the operation result #22875

  • [cli] Add pulumi stack schedule get to retrieve the configuration of a scheduled action #23153

  • [cli] Add pulumi stack schedule list to list all scheduled actions configured for a stack #23153

  • [cli] Add pulumi stack schedule new to create a raw, drift, or TTL deployment schedule for a stack #23153

  • [cli] Add pulumi stack webhook delivery list to list recent deliveries for a stack webhook #23116

  • [cli] Add pulumi stack webhook delivery redeliver to redeliver a webhook event #23118

  • [cli] Add pulumi stack webhook edit to update an existing stack webhook #23139

  • [cli] Add pulumi stack webhook new to create a new stack webhook #23101

  • [cli] Add pulumi stack webhook remove to delete a stack webhook #23102

  • [cli] Add pulumi org webhook delivery list to list recent deliveries for an organization webhook #23179

  • [cli] Add pulumi org webhook edit to update an organization webhook #23179

  • [cli] Add pulumi org webhook list to list all webhooks configured for an organization #23174

  • [cli] Add pulumi org webhook new to create a new organization webhook #23172

  • [cli] Add pulumi org webhook ping to send a test ping to an organization webhook #23179

  • [cli] Add pulumi org webhook remove to delete an organization webhook #23177

  • [cli] Add pulumi stack drift list to list drift detection runs for a stack #23159

  • [cli] Add pulumi stack drift status to show the drift detection status for a stack #23161

  • [cli] Add pulumi stack schedule edit to update an existing scheduled deployment action #23153

  • [cli] Add pulumi stack schedule remove to delete a scheduled deployment action #23153

  • [cli] Add pulumi audit-log export command #23212

  • [cli] Add pulumi org audit-log list #23211

  • [cli] Add pulumi policy compliance list to list compliance results grouped by entity #23209

  • [cli] Add pulumi policy group new command to allow creating new policy groups #23202

  • [cli] Add the pulumi policy issue get command #23200

  • [cli/cloud] Add pulumi insights account list to list Insights accounts in an organization #23091

  • [cli/cloud] Add pulumi insights account new to create a Pulumi Insights account #23093

  • [cli/cloud] Add pulumi insights account scan log to fetch log output for a Pulumi Insights scan #23092

  • [cli/cloud] Add pulumi insights account scan to trigger a resource discovery scan for an Insights account #23094

  • [cli/cloud] Implement pulumi stack history events to retrieve engine events for a Pulumi Cloud update #23109

  • [cli/cloud] Add pulumi deployment cancel to terminate an in-progress deployment #23164

  • [cli/cloud] Add pulumi org member list to list the members of an organization #23170

  • [cli/cloud] Add pulumi org usage get to fetch the resources-under-management summary for an organization #23166

  • [cli/cloud] Add pulumi org role assign to assign a custom role to a team #23117

  • [cli/cloud] Add pulumi org role edit to update a custom role's name, description, or permission tree #23117

  • [cli/cloud] Add pulumi org role list to list custom roles for an organization #23117

  • [cli/cloud] Add pulumi org role new to create a custom role from a permission descriptor JSON file #23117

  • [cli/cloud] Add pulumi org role remove to delete a custom role from an organization #23117

  • [cli/env] Update esc to v0.24.0 #23213

  • [cli/neo] Show tool call arguments and results in pulumi neo via a ctrl+o overlay #23075

  • [cli/neo] Multi-line input in pulumi neo - Enter sends, Shift+Enter / Alt+Enter / Ctrl+J / trailing \ insert a newline #23151

  • [cli/neo] Make pulumi neo visible by default; the PULUMI_EXPERIMENTAL gate has been removed #23228

  • [cli/policy] Add pulumi policy group edit to edit policy groups #23206

  • [cli/policy] Add pulumi policy group remove command to remove policies #23208

  • [cli/policy] Add the pulumi policy group get command #23203

Bug Fixes
  • [cli] Add pulumi policy issue list command #23198

  • [cli] Fix pulumi org role list to send the uxPurpose query parameter the service requires #23231

  • [engine] Fix pulumi import dropping map entries whose value matched the enum member of a Union<Input<Enum<T>>, ...> element type #23190

  • [cli/cloud] Surface a clear error when pulumi api --all is used against an endpoint whose response is not paginatable, instead of silently emitting an empty array #23191 #23128

  • [cli/cloud] pulumi deployment settings edit no longer clears fields that the patch does not mention #23217

  • [cli/import] Generate PCL for asset and archive inputs when importing resources, instead of returning a "NYI" error #22938

  • [cli/import] Preserve asset/archive/resource-reference values inside map and array inputs, and HCL-escape map keys containing ${ or %{ template sequences #23222

  • [cli/install] Hint at adding a .git / _git when VCS URL resolution fails #22831

  • [cli/neo] Transparently reconnect the Neo event stream after a transient network drop, resuming from the last seen event ID #23134

  • [cli/neo] Show preparing in the live preview/up block until the first resource arrives #23155

  • [cli/new] Support specific versions when using registry-backed templates with pulumi new #22909

Last Checked
5m ago
Tracking since Oct 1, 2024