Pulumi Insights account scanning now supports every AWS partition. If your workloads run in GovCloud, China, the European Sovereign Cloud, or one of the ISO intelligence-community clouds, you can get the same resource discovery, cross-account search, and AI-assisted insights that commercial accounts already have.
AWS Standard (Commercial)
AWS GovCloud (US)
AWS ISO (US)
AWS ISOB (US)
AWS ISOF (US)
AWS ISOE (Europe)
AWS European Sovereign Cloud
AWS China
You can also exclude specific regions from discovery — useful when regions are disabled by SCPs or fall outside an audit’s scope.
Credentials are exchanged against the partition’s STS endpoint, and every scanner API call targets that partition’s regional endpoints. Discovery traffic doesn’t cross the boundary.
In the Pulumi Cloud console:
Go to Accounts → Create account.
Select AWS as the provider.
Under Add your configuration, pick the target partition.
Supply credentials via a Pulumi ESC environment. The OIDC trust policy uses the partition-appropriate ARN prefix (arn:aws-us-gov:, arn:aws-cn:, etc.).
For IAM and ESC setup, see the Insights accounts docs. Log in to Pulumi Cloud to get started.
Three community frameworks have emerged that fix the specific ways AI coding agents break down on real projects. Superpowers enforces test-driven development. GSD prevents context rot. GSTACK adds role-based governance. All three started with Claude Code but now work across Cursor, Codex, Windsurf, Gemini CLI, and more.
Pulumi uses general-purpose programming languages to define infrastructure. TypeScript, Python, Go, C#, Java. Every framework that makes AI agents write better TypeScript also makes your pulumi up better. After spending a few weeks with each one, I have opinions about when to use which.
AI coding agents are impressive for the first 30 minutes. Then things go sideways. The patterns are predictable enough that three separate teams independently built frameworks to fix them.
Context rot. Every LLM has a context window. As that window fills up, earlier instructions fade. You start a session asking for an S3 bucket with AES-256 encryption, proper ACLs, and access logging. Two hours and 200K tokens later, the agent creates a new bucket with none of those requirements. The context window got crowded and your original instructions lost weight.
No test discipline. Agents write code that looks plausible. Plausible code compiles. Plausible code even runs, for a while. But plausible code without tests is a liability. The agent adds a feature and quietly breaks two others because nothing verified the existing behavior was preserved.
Scope drift. You ask for a VPC with three subnets. The agent decides you also need a NAT gateway, a transit gateway, a VPN endpoint, and a custom DNS resolver. Helpful in theory. In practice, you now have infrastructure you never requested and barely understand. You will also pay for it monthly.
These problems are not specific to Claude Code or any particular agent. They happen with Cursor, Codex, Windsurf, and every other LLM-powered coding tool. The context window does not care which brand name is on the wrapper.
Superpowers was created by Jesse Vincent and has accumulated over 149K GitHub stars. The core idea is simple: no production code gets written without a failing test first.
The framework enforces a 7-phase workflow. Brainstorm the approach. Write a spec. Create a plan. Write failing tests (TDD). Spin up subagents to implement. Review. Finalize. Every phase has gates. You cannot skip ahead. The iron law is that production code only exists to make a failing test pass.
This sounds rigid. It is. That is the point.
Superpowers includes a Visual Companion for design decisions, which helps when you are making architectural choices that need visual reasoning. The main orchestrator manages the entire workflow from a single context window, delegating implementation work to subagents that run in isolation.
The tradeoff is that the mega-orchestrator pattern means the orchestrator itself can hit context limits on very long sessions. One big brain coordinating everything works well until the big brain fills up. For most projects, this is not an issue. For marathon sessions with dozens of files, keep it in mind.
The workflow breaks down into skills that trigger automatically:
Skill Phase What it does
brainstorming
Design
Refines rough ideas through Socratic questions, saves design doc
writing-plans
Planning
Breaks work into 2-5 minute tasks with exact file paths and code
test-driven-development
Implementation
RED-GREEN-REFACTOR: failing test first, minimal code, commit
subagent-driven-development
Implementation
Dispatches fresh subagent per task with two-stage review
requesting-code-review
Review
Reviews against plan, blocks progress on critical issues
finishing-a-development-branch
Finalize
Verifies tests pass, presents merge/PR/keep/discard options
The results speak for themselves. The chardet maintainer used Superpowers to rewrite chardet v7.0.0 from scratch, achieving a 41x performance improvement. Not a 41% improvement. 41 times faster. That is what happens when every code change has to pass a test: the agent optimizes aggressively because it has a safety net.
Superpowers works with Claude Code, Cursor, Codex, OpenCode, GitHub Copilot CLI, and Gemini CLI.
GSD (Get Shit Done) was created by Lex Christopherson and has over 51K stars. Where Superpowers focuses on test discipline, GSD attacks the context window problem directly.
The key architectural decision: GSD does not use a single mega-orchestrator. Instead, it assigns a separate orchestrator to each phase of work. Each orchestrator stays under 50% of its context capacity. When a phase completes, the orchestrator writes its state to disk as Markdown files, then a fresh orchestrator picks up where the last one left off.
Think about why this matters. With a single orchestrator, your 200K token context window is a shared resource. Instructions from hour one compete with code from hour three. GSD sidesteps this entirely. Every phase starts with a full context budget because the previous phase’s orchestrator handed off cleanly and shut down.
The state files use XML-formatted instructions because (it turns out) LLMs parse structured XML more reliably than freeform Markdown. GSD also includes quality gates that detect schema drift and scope reduction. If the agent starts cutting corners or wandering from the plan, the gates catch it.
GSD evolved from v1 (pure Markdown configuration) to v2 (TypeScript SDK), which tells you something about the level of engineering behind it. The v2 SDK gives you programmatic control over orchestration, not just static instruction files.
The tradeoff: GSD has more ceremony than the other two frameworks. For a quick script or a single-file change, the phase-based workflow is overkill. GSD earns its keep on projects that span multiple files, multiple sessions, or multiple days.
The core commands map to a phase-based workflow:
Command What it does
/gsd-new-project
Full initialization: questions, research, requirements, roadmap
/gsd-discuss-phase
Capture implementation decisions before planning starts
/gsd-plan-phase
Research, plan, and verify for a single phase
/gsd-execute-phase
Execute all plans in parallel waves, verify when complete
/gsd-verify-work
Manual user acceptance testing
/gsd-ship
Create PR from verified phase work with auto-generated body
/gsd-fast
Inline trivial tasks, skips planning entirely
GSD supports the widest range of agents: 14 and counting. Claude Code, Cursor, Windsurf, Codex, Copilot, Gemini CLI, Cline, Augment, Trae, Qwen Code, and more.
GSTACK was created by Garry Tan (CEO of Y Combinator) and has over 71K stars. It takes a fundamentally different approach from the other two frameworks.
Instead of disciplining a single agent, GSTACK models a 23-person team. CEO, product manager, QA lead, engineer, designer, security reviewer. Each role has its own responsibilities, its own constraints, and its own slice of the problem.
The framework enforces five layers of constraint. Role focus keeps each specialist in their lane. Data flow controls what information passes between roles. Quality control gates ensure standards at handoff points. The “boil the lake” principle means each role finishes what it can do perfectly and skips what it cannot, rather than producing mediocre work across everything. And the simplicity layer pushes back against unnecessary complexity.
The role isolation is what makes GSTACK distinctive. The engineer role does not see the product roadmap. The QA role does not see the implementation details. Each role only receives the context it needs to do its job. This is not just about efficiency. It prevents the kind of scope creep where an agent that knows everything tries to do everything.
“Boil the lake” is my favorite principle across all three frameworks. It is the opposite of how most agents work. Agents default to attempting everything and producing something mediocre. GSTACK says: do fewer things, but do them right.
The tradeoff: 23 specialist roles feels heavy for pure infrastructure work. If you are writing Pulumi programs and deploying cloud resources with component resources, you probably do not need a product manager role or a designer role. GSTACK shines when you are building a product, not just provisioning infrastructure.
Each slash command activates a different specialist:
Command Role What it does
/office-hours
YC partner
Six forcing questions that reframe your product before you write code
/plan-ceo-review
CEO
Four modes: expand scope, selective expand, hold, reduce
/plan-eng-review
Engineering manager
Lock architecture, map data flow, list edge cases
/review
Staff engineer
Find bugs that pass CI but break in production, auto-fix the obvious ones
/qa
QA lead
Real Playwright browser testing, not simulated
/ship
Release engineer
One-command deploy with coverage audit
/cso
Security officer
OWASP and STRIDE security audits
GSTACK works with Claude Code, Codex CLI, OpenCode, Cursor, Factory Droid, Slate, and Kiro.
Superpowers GSD GSTACK
What it locks down The dev process itself The execution environment Who decides what
Orchestration Single orchestrator Per-phase orchestrators 23 specialist roles
Context management One window State-to-disk, fresh per phase Role-scoped handoffs
Where it shines TDD, subagent delegation, disciplined plan execution Marathon sessions, parallel workstreams, crash recovery Product strategy, multi-perspective review, real browser QA
Where it struggles Anything beyond the build phase Overkill for small tasks, no role separation The actual writing-code part
Best for Solo devs who need test discipline Complex projects that span days or weeks Founder-engineers shipping a product
GitHub stars 149K 51K 71K
Agent support 6 agents 14+ agents 7 agents
For infrastructure work, GSD’s context management matters most. Long Pulumi sessions that provision dozens of resources across multiple stacks are exactly the scenario where context rot bites hardest. GSD’s phase-based approach keeps each orchestrator fresh.
Superpowers’ TDD workflow maps well to application code where unit tests are straightforward. Infrastructure testing is different. You cannot unit test whether an IAM policy actually grants the right permissions. You can test the shape of the policy with Pulumi’s testing frameworks, but the real validation happens at pulumi preview and pulumi up. Superpowers still helps here (discipline is discipline), but the TDD cycle is less natural for infra than for app code.
GSTACK shines when the project has product dimensions. If you are building a SaaS platform where the infrastructure serves a product vision, GSTACK’s multi-role governance keeps the product thinking connected to the engineering work. For pure infra provisioning, the extra roles add overhead without much benefit.
My honest take: none of these is universally best. Knowing your failure mode is the real decision.
What keeps going wrong Try this The reason
Code works today, breaks tomorrow Superpowers Forces every change through a failing test first
Quality drops after the first hour GSD Fresh context per phase, nothing carries over
You ship features nobody asked for GSTACK Product review before engineering starts
All of the above GSTACK for direction, bolt on Superpowers TDD No single framework covers everything yet
These frameworks solve the “how” of agent orchestration. Skills (like the ones from Pulumi Agent Skills) solve the “what,” teaching agents the right patterns for specific technologies. Frameworks and skills complement each other. A skill tells the agent to use OIDC instead of hardcoded credentials. A framework makes sure the agent still remembers that instruction 200K tokens later.
GSD’s state-to-disk approach pairs naturally with Pulumi stack outputs. Each phase can read the previous phase’s stack outputs from the state files, so a networking phase can provision a VPC and the compute phase can reference the subnet IDs without any context window gymnastics.
Superpowers’ TDD cycle maps to infrastructure validation. Write a failing test (the expected shape of your infrastructure). Run pulumi preview (red, the resources do not exist yet). Run pulumi up (green, the infrastructure matches the test). This is not a perfect analogy since infrastructure tests are broader than unit tests, but the discipline of “verify before moving on” translates directly.
You do not have to pick one framework and commit forever. Try GSD for a long multi-stack project. Try Superpowers for a focused library. See which failure mode bites you most and let that guide your choice.
[
** github.com/obra/superpowers
](https://github.com/obra/superpowers) [
** github.com/gsd-build/get-shit-done
](https://github.com/gsd-build/get-shit-done) [
** github.com/garrytan/gstack
](https://github.com/garrytan/gstack)
All three frameworks support multiple agents. For Claude Code, the install commands are straightforward:
# Superpowers
/plugin install superpowers@claude-plugins-official
# GSD (the installer asks which agents and whether to install globally or locally)
npx get-shit-done-cc@latest
# GSTACK
git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup
Check each repository’s README for Cursor, Codex, Windsurf, and other agents.
If you want a managed experience that handles orchestration for you, Pulumi Neo is grounded in your actual infrastructure, not internet patterns. It understands your stacks, your dependencies, and your deployment history. The 10 things you can do with Neo post shows what that looks like in practice.
Pick one and give it a project. You will know within an hour whether it fixes your particular failure mode.
Last year we added support for Bun as a package manager for Pulumi TypeScript projects. Today we’re taking the next step: Bun is now a fully supported runtime for Pulumi programs. Set runtime: bun in your Pulumi.yaml and Bun will execute your entire Pulumi program, with no Node.js required. Since Bun’s 1.0 release, this has been one of our most requested features.
Bun is a JavaScript runtime designed as an all-in-one toolkit: runtime, package manager, bundler, and test runner. For Pulumi users, the most relevant advantages are:
Native TypeScript support: Bun runs TypeScript directly without requiring ts-node or a separate compile step.
Fast package management: Bun’s built-in package manager can install dependencies significantly faster than npm.
Node.js compatibility: Bun aims for 100% Node.js compatibility, so the npm packages you already use with Pulumi should work out of the box.
With runtime: bun, Pulumi uses Bun for both running your program and managing your packages, giving you a streamlined single-tool experience.
To create a new Pulumi project with the Bun runtime, run:
pulumi new bun
This creates a TypeScript project configured to use Bun. The generated Pulumi.yaml looks like this:
name: my-bun-project
runtime: bun
From here, write your Pulumi program as usual. For example, to create a random password using the @pulumi/random package:
bun add @pulumi/random
import * as random from "@pulumi/random";
const password = new random.RandomPassword("password", {
length: 20,
});
export const pw = password.result;
Then deploy with:
pulumi up
Prerequisites:
If you have an existing Pulumi TypeScript project running on Node.js, you can convert it to use the Bun runtime in a few steps.
Pulumi.yamlChange the runtime field from nodejs to bun:
Before:
runtime:
name: nodejs
options:
packagemanager: npm
After:
runtime: bun
**
When the runtime is set to bun, Bun is also used as the package manager — there’s no need to configure a separate packagemanager option.
tsconfig.jsonBun handles TypeScript differently from Node.js with ts-node. Update your tsconfig.json to use Bun’s recommended compiler options:
{
"compilerOptions": {
"lib": ["ESNext"],
"target": "ESNext",
"module": "Preserve",
"moduleDetection": "force",
"moduleResolution": "bundler",
"allowJs": true,
"allowImportingTsExtensions": true,
"verbatimModuleSyntax": true,
"strict": true,
"skipLibCheck": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedIndexedAccess": true,
"noImplicitOverride": true
}
}
Key differences from a typical Node.js tsconfig.json:
module: "Preserve" and moduleResolution: "bundler": Let Bun handle module resolution instead of compiling to CommonJS. The bundler resolution strategy allows extensionless imports while still respecting package.json exports, matching how Bun resolves modules in practice.
verbatimModuleSyntax: true: Enforces consistent use of ESM import/export syntax. TypeScript will flag any remaining CommonJS patterns like require() at compile time.
Bun makes it easy to go full ESM and it’s the recommended module format for Bun projects. Add "type": "module" to your package.json:
{
"type": "module"
}
With ECMAScript module (ESM) syntax, one thing that gets easier is working with async code. In a CommonJS Pulumi program, if you need to await a data source or other async call before declaring resources, the program must be wrapped in an async entrypoint function. With ESM and Bun, top-level await just works, so you can skip the wrapper function entirely and await directly at the module level:
import * as aws from "@pulumi/aws";
const azs = await aws.getAvailabilityZones({ state: "available" });
const buckets = azs.names.map(az => new aws.s3.BucketV2(`my-bucket-${az}`));
export const bucketNames = buckets.map(b => b.id);
If your existing program does use an async entrypoint with export =, just replace it with the ESM-standard export default:
// CommonJS (Node.js default)
export = async () => {
const bucket = new aws.s3.BucketV2("my-bucket");
return { bucketName: bucket.id };
};
// ESM (used with Bun)
export default async () => {
const bucket = new aws.s3.BucketV2("my-bucket");
return { bucketName: bucket.id };
};
Make sure you’re running @pulumi/pulumi version 3.226.0 or later:
bun add @pulumi/pulumi@latest
pulumi install
pulumi up
With this release, there are now two ways to use Bun with Pulumi:
Configuration Bun’s role Node.js required?
runtime: bun
Runs your program and manages packages
No
runtime: { name: nodejs, options: { packagemanager: bun } }
Manages packages only
Yes
Use runtime: bun for the full Bun experience. The package-manager-only mode is still available for projects that need Node.js-specific features like function serialization.
The following Pulumi features are not currently supported when using the Bun runtime:
**
Callback functions (magic lambdas) are not supported. APIs like aws.lambda.CallbackFunction and event handler shortcuts (e.g., bucket.onObjectCreated) use function serialization which requires Node.js v8 and inspector modules that are only partially supported in Bun.
Dynamic providers are not supported. Dynamic providers (pulumi.dynamic.Resource) similarly rely on function serialization.
If your project uses any of these features, continue using runtime: nodejs. You can still benefit from Bun’s fast package management by setting packagemanager: bun in your runtime options.
Bun runtime support is available now in Pulumi 3.227.0. To get started:
Create a new project: pulumi new bun
Read the docs: TypeScript (Node.js) SDK
Report issues or share feedback on GitHub or in the Pulumi Community Slack
Thank you to everyone who upvoted, commented on, and contributed to the original feature request. Your feedback helped shape this feature, and we’d love to hear how it works for you.
Microsoft Entra ID (formerly Azure Active Directory) is Azure’s identity and access management service. Any time your application needs to authenticate with Entra ID, you create an app registration and give it a client secret that proves its identity. But those secrets expire, and if you don’t rotate them in time, your app loses access.
If you or your team manages Azure app registrations, you know that keeping track of client secrets is a constant hassle. Forgetting to rotate them before they expire can lead to broken authentication and unexpected outages. With Pulumi ESC’s azure-app-secret rotator, you can automate client secret rotation for your Azure apps, so you never have to worry about expired credentials again.
An Azure App Registration
An azure-login environment
Note for OIDC users: Since Azure does not support wildcard subject matches, you will need to add a federated credential for the azure-login environment as well as each environment that imports it.
Application.ReadWrite.All Graph API permission, or the identity must be added as an Owner of the specific app registration whose secrets will be rotated.Let’s assume your azure-login environment looks like this:
# my-org/logins/production
values:
azure:
login:
fn::open::azure-login:
clientId:
tenantId:
subscriptionId:
oidc: true
Create a new environment for your rotator. If you have the existing credentials, set them in the state object so the rotator will treat them as the current credentials.
# my-org/rotators/secret-rotator
values:
appSecret:
fn::rotate::azure-app-secret:
inputs:
login: ${environments.logins.production.azure.login}
clientId:
lifetimeInDays: 180 # How long each new secret is valid (max 730 days)
state:
current:
secretId:
secretValue:
fn::secret:
The lifetimeInDays field controls how long each generated secret remains valid before it expires. Azure allows a maximum of 730 days (two years), but shorter lifetimes are recommended for better security. Make sure to set a rotation schedule that runs before the lifetime expires so your credentials are always fresh.
Azure app registrations can have at most two client secrets at any given time, so the rotator maintains a current and previous secret. When a rotation occurs, the existing current secret becomes the previous secret, and a new secret is created to take its place as the new current. This ensures a smooth rollover with no downtime, since the previous secret remains valid until the next rotation.
Once this is set up, you’re ready to go! You never need to worry about your client secrets expiring, and you will always have the latest credentials in your ESC Environment.
The fn::rotate::azure-app-secret rotator is available now in all Pulumi ESC environments. For more information, check out the fn::rotate::azure-app-secret documentation!
You can now run policy packs against your existing stack state without running your Pulumi program or making provider calls. The new pulumi policy analyze command evaluates your current infrastructure against local policy packs directly, turning policy validation into a fast, repeatable check.
Policy authoring and policy updates usually involve an iteration loop:
Make a policy change.
Run a policy check.
Inspect violations or remediations.
Repeat until the policy behavior matches intent.
Before this command, that loop often depended on pulumi preview or pulumi up, which can be heavier than you need when your goal is validating policy logic against known state.
With pulumi policy analyze, you can evaluate your current stack state directly and quickly.
At minimum, provide a policy pack path and optionally a stack:
pulumi policy analyze \
--policy-pack ./policy-pack \
--stack dev
You can also pass a config file for each policy pack:
pulumi policy analyze \
--policy-pack ./policy-pack \
--policy-pack-config ./policy-config.dev.json \
--stack dev
If any mandatory policy violations are found, the command exits non-zero.
If remediation policies fire, those changes are reported in output, but stack state is not modified.
For policy pack development, this command is useful as a tight local feedback loop:
Pick a representative stack (dev, staging, or a fixture stack).
Run pulumi policy analyze against that stack after each policy change.
Use the output to verify mandatory, advisory, and remediation behavior.
Repeat before publishing the policy pack or attaching it to broader policy groups.
Two output modes are especially useful:
--diff for a concise, human-readable view while iterating locally.
--json for structured output that can be consumed in scripts and CI.
This command is also a good primitive for AI-assisted policy workflows.
Because pulumi policy analyze can emit JSON and a clear process exit code, agents can use it for deterministic policy evaluation steps:
Propose or edit policy rules.
Run pulumi policy analyze --json against target stacks.
Parse violations and remediation signals.
Suggest policy fixes, config adjustments, or targeted infrastructure changes.
Re-run analysis until mandatory violations are resolved.
For example, an agent tasked with fixing a policy violation can run pulumi policy analyze --json to get a structured list of violations, identify which resources are non-compliant, generate targeted infrastructure changes, then re-run analysis to confirm the violations are resolved, all without triggering a full preview on each iteration. The same loop works for policy authoring: an agent can propose a new policy rule, test it against several representative stacks, and surface unintended violations before the rule is published.
This works well for automation because the command doesn’t execute your Pulumi program or make provider calls, so there are no side effects or runtime variance between runs. The JSON output and non-zero exit code on failure give agents a clear pass/fail contract to build on.
pulumi policy analyze is available in Pulumi v3.229.0. Upgrade with:
brew upgrade pulumi
# or
pulumi self-update
If you are authoring or tuning policy packs, start by running this command against a known stack in your environment. It is a quick way to validate policy behavior before rollout.
For implementation details, see the merged PR: pulumi/pulumi#22250.
Amsterdam in late March still has that sharp North Sea wind, but inside the RAI Convention Centre, 13,350 people generated enough energy to heat the building twice over. KubeCon + CloudNativeCon EU 2026 was the biggest European edition yet, and the shift from previous years was impossible to miss. AI dominated the conference.
I spent most of the conference at the Pulumi booth, and that turned out to be the best vantage point. Hundreds of visitors stopped by over four days, and I kept asking the same question: what are you actually running in production with AI on Kubernetes? The answers shaped this post more than any keynote did. Almost everyone had a proof of concept. Almost nobody had a production story they were happy with.
Here is the stat that framed the entire conference for me: 66% of organizations use Kubernetes to host generative AI workloads, but only 7% deploy to production daily. That gap between experimentation and actual production use matched what I was hearing at the booth. The CNCF’s own survey now counts 19.9 million cloud native developers worldwide, 7.3 million of them building AI workloads. The tooling and the infrastructure need to catch up.
My takeaway after four days on the ground: lots of working demos, very few production setups people trust. Teams are trying to scale inference, put guardrails around agents, and make GPU infrastructure behave like anything else they run.
Here is what I saw.
About 67% of AI compute now goes to inference, not training. The inference market is projected to hit $255 billion by 2030. It’s also where most of the operational complexity lives.
NVIDIA leaned into this hard. Their open-source stack around NeMo and Dynamo got significant stage time, but the bigger move was donating three projects to the CNCF: the DRA driver for fractional GPU allocation, the KAI Scheduler for GPU-aware scheduling, and Grove. Moving these to community governance signals that GPU infra is becoming part of the standard Kubernetes toolkit.
Every KubeCon has its crop of new CNCF projects, but this year’s batch felt different. We are starting to see the building blocks of an AI runtime for Kubernetes.
llm-d was the headline donation. Created by IBM Research, Red Hat, and Google Cloud, it splits inference workloads by separating prefill and decode phases across different pods. The collaborator list reads like an industry consortium: NVIDIA, CoreWeave, AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, UC Berkeley, and UChicago. When that many organizations agree on a single approach to distributed inference, pay attention.
NVIDIA’s DRA driver enables fractional GPU allocation and multi-node NVLink support. GPU multi-tenancy is one of the hardest unsolved problems in Kubernetes right now. Scheduling, isolation, cost attribution — all of it breaks down when multiple workloads share a GPU. The DRA driver does not solve everything, but it gives the community a real starting point.
KAI Scheduler entered the CNCF Sandbox for GPU-aware scheduling. If llm-d handles the inference runtime and the DRA driver handles allocation, KAI Scheduler handles placement. Together, these three projects form the skeleton of a GPU-native Kubernetes stack.
Velero, donated by Broadcom, moved into CNCF Sandbox for backup and restore. AI workloads are stateful now (model weights, checkpoints, fine-tuning data), and backup is no longer optional. Good timing.
Microsoft AI Runway is an open-source Kubernetes API for inference that plugs in Hugging Face model discovery, GPU memory fit calculations, and cost estimates. Think of it as a model-aware control plane. HolmesGPT and Dalec, also from Microsoft, entered CNCF Sandbox for AI-powered troubleshooting and dependency analysis.
The Kubernetes AI Conformance Program is growing fast, with certifications nearly doubled and three new requirements proposed for Kubernetes 1.36. Conformance programs are boring until they are not. This one will determine which distributions can credibly claim AI readiness.
If inference was this year’s production story, agentic AI was the architecture story. Agents are proliferating, and nobody has quite figured out how to manage and secure them inside Kubernetes yet.
kagent, donated to CNCF Sandbox by Solo.io, defines agents as Kubernetes CRDs. It ships with pre-built MCP (Model Context Protocol) servers for Kubernetes, Istio, Helm, Argo, Prometheus, Grafana, and Cilium. An agent becomes a first-class Kubernetes resource, schedulable and observable and subject to RBAC, instead of a rogue process running in someone’s notebook.
kagenti from IBM goes after the identity problem directly. Using SPIFFE/SPIRE, it gives agents cryptographic identities. When an agent calls an API, you can verify exactly which agent made the call, what trust domain it belongs to, and whether it is authorized. This kind of security work needs to happen before agents proliferate across production clusters. Retrofitting identity later is ugly.
Dapr Agents took a different angle with the actor model and durable execution. Each agent gets reliable state management and exactly-once messaging semantics. If your workflows cannot tolerate lost messages or duplicate actions, this matters.
agentregistry showed up as a centralized discovery service for MCP servers and agents. As agents and tool servers multiply, you need a registry to find and manage them, the same way container registries became necessary for images.
David Soria Parra from Anthropic gave a talk on MCP evolving beyond simple tool-calling into richer interaction patterns (sched). Google announced the Kubernetes Agent Sandbox for running agentic AI workloads in secure, isolated environments.
Gateway infrastructure had its own mini-conference within KubeCon. The Gateway API Inference Extension from the Kubernetes SIG introduces model-aware routing and load balancing at the gateway level. Instead of routing by URL path, your gateway routes by model name, version, and capacity. That changes how inference traffic flows through a cluster in a fundamental way.
Envoy AI Gateway builds on Envoy’s existing proxy capabilities with token-aware rate limiting and provider failover. If your primary inference provider is saturated, traffic shifts to a secondary automatically. Rate limiting by token count rather than request count makes much more sense for LLM workloads, where a single request can consume vastly different amounts of compute.
I want to call out Agentgateway specifically. Written in Rust, it proxies LLM traffic, MCP connections, and agent-to-agent communication, with Cedar and CEL policy engines for fine-grained access control. Rust’s performance characteristics matter here because inference gateway latency adds directly to user-perceived response time.
Kuadrant, now in CNCF Sandbox, layers policy on top of gateway infrastructure and includes MCP server aggregation. Gateways are evolving from dumb traffic proxies into intelligent control planes for AI workloads, and these four projects are driving that shift.
The observability and platform engineering vendors showed up in force. The message was consistent: LLMOps is just platform engineering with new requirements.
Chronosphere demonstrated parallel AI investigation, with multiple agents analyzing different aspects of an incident simultaneously and combining their findings. SUSE Liz takes a domain-specialized approach, deploying different AI agents for different operational domains rather than one general-purpose assistant. groundcover combines eBPF with OpenTelemetry to give coding agents rich runtime context about the systems they are modifying. That last one is subtle but important: if an AI agent is writing code that touches a service, it should understand that service’s actual runtime behavior, not just its source code.
Dynatrace and DevCycle partnered to make feature flags observable primitives via OpenFeature. Rolling out AI features behind feature flags is table stakes, but having those flags show up in your observability pipeline as first-class signals closes a real gap.
Shadow AI governance emerged as its own theme. CAST AI’s Kimchi can route requests across 50+ models while providing centralized visibility into what models are being used, by whom, and at what cost. Every large organization I talked to had some version of the same problem: teams spinning up model endpoints without central oversight, burning through GPU budgets, creating compliance blind spots they did not even know about.
GPU multi-tenancy remains genuinely unsolved. Scheduling, workload isolation, cost attribution across shared GPUs — all of it breaks down at scale. Multiple talks addressed pieces of this, but nobody had a complete answer.
Regulation came up in almost every conversation. The EU Cyber Resilience Act is driving compliance requirements deep into software supply chains, and every European organization I spoke with is feeling the pressure. Teams are already changing how they build and deploy software.
Sovereign Kubernetes is a platform architecture requirement now, not something you can defer to next quarter. Organizations need Kubernetes distributions and cloud regions that guarantee data residency, and they need the tooling to enforce those guarantees programmatically. Self-hosted models are proliferating partly because of capability and cost, but data sovereignty is the accelerant. If your data cannot leave a jurisdiction, neither can your model.
Runtime isolation is expanding beyond containers. Several talks covered KVM-based isolation for AI workloads, which is heavier than containers but necessary when the threat model includes side-channel attacks on shared GPU memory. The sandboxing conversation has gotten more sophisticated since last year.
These constraints are not uniquely European. Any organization operating across jurisdictions faces similar pressures, and the regulatory direction globally is toward more data sovereignty requirements, not fewer.
Four days in Amsterdam distilled into five things I would act on now:
Treat inference workloads like production services. If you are still deploying models with scripts and hope, stop. Inference infrastructure needs the same IaC discipline as any other production system: version-controlled, tested, policy-enforced.
Evaluate the Gateway API Inference Extension and llm-d. These are not speculative projects. They have broad industry backing and solve real problems around inference routing and distributed serving. Get them into your test environments.
Plan agent identity before agents proliferate. SPIFFE/SPIRE for agent identity is not optional if you are running agents in production. Retrofitting identity onto an existing agent fleet is painful. Start with kagenti now.
Platform teams should own AI infrastructure. Shadow AI is already happening in your organization. The platform engineering team needs to provide self-service AI infrastructure with guardrails before ungoverned model endpoints become a security and cost problem.
Sovereignty and GPU multi-tenancy are universal. Even if you are not subject to the EU Cyber Resilience Act today, data residency requirements are spreading globally. GPU multi-tenancy will affect every organization running inference at scale.
Kubernetes spent the past decade proving it could orchestrate containers. The next decade will test whether it can orchestrate intelligence. Based on what I saw in Amsterdam, the community is building the right pieces, but the gap between what exists and what production demands is still wide. That gap is where the interesting work happens.
Tracing is an important part of our CLI observability story. So far we’ve relied on (the now deprecated) OpenTracing for this. We have now added OTel tracing to the CLI, which is more future-proof, and should in most cases give you a better view over what the CLI is doing.
We introduced tracing using OpenTracing all the way back in 2017, before OpenTelemetry was a thing. This served us well over the years, but as OpenTracing was deprecated, and OTel emerged as the new and maintained thing, it got harder and harder to justify further investment in a tracing infrastructure that was deprecated. Last year we started focusing more on performance, and it became more and more clear that we’d either have to enhance our current OpenTracing setup, or do the work to switch to OTel.
In the end it was a relatively easy decision to move to the more modern and fully supported OTel, especially as more and more tooling around it starts emerging.
With the decision to move to OTel made, the only thing left to decide was how to implement it. There were a couple of constraints we faced here. First, we wanted to make sure that traces are easily shareable. This means ideally a text file in whatever format, that can be shared easily. Second, pulumi’s plugin system works by spawning a new process per plugin. We want to get traces from each of these plugins to make sure we have as much coverage as possible. And third, ideally we also want to get traces from plugins that only implement OpenTracing, but not OTel yet, since someone can upgrade the CLI, but not the plugins for example.
Given these constraints, we decided to implement an OTel collector in the CLI, that could then forward the traces to whatever output format we want. This means that plugins only ever need to send traces back to the CLI over gRPC, and the CLI will do any further processing. This means only one process will write to the file, if requested.
For plugins we always request both OpenTracing and OTel traces. If both are requested and OTel is supported by the plugin, the plugin is expected to only send the OTel version of the traces. For OpenTracing traces, we collect them in the collector in the CLI, and then translate them internally to OTel traces. This way we can still get the traces from older plugins, without them needing to change anything.
Currently the OTel exporter supports both exporting the traces directly via gRPC to a collector, or to a file, where the traces are JSON encoded. This file can then be shared and imported into a trace viewer at a later time. To do this, you can use the --otel-traces |grpc://> flag, using pulumi version v3.226 or newer. For further documentation see our performance tracing docs.
To view the traces, you can use one of the various exporters that exist. Popular options include Jaeger, OTel Desktop Viewer, or OTel TUI if you prefer not leaving your terminal. Once you’ve ingested the logs there either by uploading the trace file, or sending them directly by giving pulumi the exporter address, look for the pulumi-cli: pulumi root span.
All further spans will be parented to that root span, and you should thus be able to see a nice flow diagram in the viewer of your choice.
As always, we would love any feedback either in the Community Slack, or through a GitHub issue.
A platform engineer with broad access might want Neo to analyze infrastructure and suggest changes, but include guarantees it won’t actually apply them. Read-only mode makes that possible: Neo does the heavy lifting and hands off a pull request for your existing deployment process to pick up.
Neo runs with the permissions of the user who creates a task, but you often want a tighter boundary. Read-only mode solves this by letting you cap Neo’s permissions at task creation time. Neo can still read your infrastructure, run previews, and open pull requests, but it cannot deploy, update, or destroy resources.
When you create a Neo task, you now choose between two permission levels:
Option What Neo can do Availability
Use my permissions Full access (current default behavior) All tiers
Read-only Read, preview, and create PRs. No infrastructure mutations. All tiers
Read-only mode takes your existing permissions and removes the ability to make changes. Neo remains fully active, meaning it can still read your infrastructure state, run previews, write and refactor code, create branches, and open pull requests. If Neo encounters an operation it can’t perform in read-only mode, the operation fails and Neo reports what it would have done. The only difference is that Neo cannot trigger deployments or other write operations in Pulumi Cloud directly.
Neo’s operating modes let you choose how much oversight you want: review mode for full approval at each step, balanced mode for approving only mutating operations, and auto mode for hands-off execution.
Read-only mode pairs well with auto-approve. Because Neo cannot perform write operations like deployments or destroys, you can let it run autonomously and trust that the output is a pull request, not a production change. Kick off a task, let Neo work in the background, and come back to a ready-to-review PR.
Read-only mode is available today for all Pulumi Cloud users.
Sign in to Pulumi Cloud and select Read-only when creating your next Neo task
Read the Neo documentation for detailed guides on permission levels
Join the Community Slack to share your feedback
Infrastructure work ranges from simple updates to complex multi-stack operations. For straightforward tasks, jumping straight to execution is often fine. But complex tasks benefit from deliberate upfront thinking: understanding what exists, identifying dependencies, and agreeing on an approach before anything changes. Today we’re launching Plan Mode, a dedicated experience for collaborating with Neo on a detailed plan before execution begins.
Without dedicated planning, Neo balances planning with progress toward execution. That works well for many tasks, but complex operations benefit from more thorough upfront discovery. Plan Mode now makes upfront deliberation a first-class workflow, where instead of focusing on getting to execution, Neo focuses entirely on discovery and synthesis until you explicitly approve the plan.
Enter Plan Mode by selecting the plan button when starting a task. Neo shifts its behavior:
Discovery: Neo investigates your environment — examining existing infrastructure, reading relevant code, checking dependencies, and researching patterns.
Synthesis: From that research, Neo produces a plan explaining what it will do and why. The plan references specific things Neo discovered, like a particular stack configuration or dependency.
Refinement: You refine the plan through normal conversation, challenging assumptions, asking for an alternative approach, or requesting more detail on a specific area.
Approval: Once you’re satisfied, you approve the plan and execution begins. Neo carries forward everything it learned during discovery, so the transition from planning to execution is seamless.
Plan Mode is opt-in. You choose it when you want to work through an approach before committing:
Complex multi-stack operations where understanding dependencies matters
Unfamiliar infrastructure where discovery reduces churn
Autonomous execution where plan approval is your key control point before Neo runs without step-by-step oversight
Plan Mode is available now for all Pulumi Cloud organizations. It works with any task mode, so you can pair thorough upfront planning with whatever level of execution autonomy fits the situation.
To try it, open Neo in Pulumi Cloud. For more details, see the Plan Mode documentation.
Supply chain attacks on CI/CD pipelines are accelerating. A growing pattern involves attackers compromising popular GitHub Actions through tag poisoning — rewriting trusted version tags to point to malicious code that harvests environment variables, cloud credentials, and API tokens from runner environments. The stolen credentials are then exfiltrated to attacker-controlled infrastructure, often before anyone notices.
For every engineering organization, the question is no longer if your CI pipeline will encounter a compromised dependency, but what is exposed when it does.
At Pulumi, we asked ourselves that question and decided the answer should be “nothing useful.” Here’s how we got there.
Most organizations store long-lived cloud credentials, API tokens, and service account keys as GitHub repository or organization secrets. But this approach has several well-known problems:
Broad availability. Every workflow run on a repository can access every secret stored in that repo. A compromised action in any workflow can read them all.
No expiration. Secrets persist until someone manually rotates them. If exfiltrated, they give attackers persistent access for weeks or months.
No granular audit trail. GitHub tells you a secret was used, but not which workflow, which step, or what it was used for.
Secret sprawl. Across dozens or hundreds of repos, the same credentials are often duplicated, making rotation a coordinated, error-prone effort.
In a supply chain attack scenario, this is exactly what attackers count on: a single compromised action that can dump a trove of long-lived credentials.
We replaced every static GitHub Secret across our CI pipelines with short-lived, dynamically fetched credentials using Pulumi ESC and OpenID Connect (OIDC). The credential flow works in layers, each scoped and ephemeral:
GitHub generates a short-lived OIDC token scoped to the specific workflow run, repository, and branch. This token is cryptographically signed by GitHub’s OIDC provider.
The token is exchanged with Pulumi Cloud for a short-lived Pulumi access token. Pulumi Cloud validates the OIDC claims (organization, repository, branch) against a configured trust policy before issuing the token.
The Pulumi access token opens an ESC environment to retrieve the credentials the workflow needs — cloud provider keys, API tokens, or other secrets.
Cloud credentials themselves are dynamic. ESC environments use OIDC login providers to fetch short-lived credentials directly from AWS, Azure, or GCP. No static keys or cloud credentials are stored anywhere.
The pulumi/esc-action GitHub Action handles this entire flow in a single workflow step.
sequenceDiagram participant Runner as GitHub Actions Runner participant GH as GitHub OIDC Provider participant PC as Pulumi Cloud participant ESC as Pulumi ESC participant Cloud as Cloud Provider (AWS/Azure/GCP) Runner->>GH: Request OIDC token (scoped to workflow run) GH-->>Runner: Short-lived JWT Runner->>PC: Exchange JWT for Pulumi access token PC-->>Runner: Short-lived access token Runner->>ESC: Open environment with access token ESC->>Cloud: OIDC login (assume role / federated identity) Cloud-->>ESC: Short-lived cloud credentials ESC-->>Runner: Cloud credentials + secrets Note over Runner,Cloud: Nothing is stored. Everything expires.
Before this migration, our workflows referenced static secrets stored in GitHub:
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
After the migration, an ESC environment handles credential fetching via OIDC. Here is what the environment definition looks like:
values:
aws:
login:
fn::open::aws-login:
oidc:
duration: 1h
roleArn: arn:aws:iam::123456789012:role/pulumi-esc-role
sessionName: esc-${context.pulumi.user.login}
# Optional: scope down the session beyond what the role allows
policyArns:
- arn:aws:iam::123456789012:policy/ci-build-minimal
environmentVariables:
AWS_ACCESS_KEY_ID: ${aws.login.accessKeyId}
AWS_SECRET_ACCESS_KEY: ${aws.login.secretAccessKey}
The roleArn and optional policyArns make least-privilege straightforward: each login provider assumes a specific role, and policyArns can scope the session down further. You can use multiple login providers in one environment or separate environments per workflow to match permissions to each job’s needs.
The workflow itself becomes minimal — a single step that authenticates via OIDC and injects the credentials:
permissions:
contents: read
id-token: write # Required for OIDC
steps:
- name: Fetch secrets from ESC
uses: pulumi/esc-action@v1
with:
environment: '/'
The static secrets.* references are gone entirely. Every credential is fetched at runtime through ESC.
We didn’t do this for one or two flagship repos; we rolled it out across every Pulumi provider repository: AWS, Azure, GCP, Kubernetes, and over 60 more. The migration was managed centrally through our ci-mgmt tooling, which generates consistent workflow configurations across all provider repos.
The pattern is the same everywhere:
Each repo has a corresponding ESC environment under a github-secrets/ project.
All workflow-level ${{ secrets.* }} references have been removed.
The pulumi/esc-action step with OIDC auth is the single entry point for all credentials.
When every repo follows the same pattern like this, security posture is much more easily verifiable and auditable.
Beyond eliminating static secrets, this migration gave us centralized visibility and control that GitHub Secrets cannot provide:
Audit logging. ESC records which credentials were accessed, when, and by which workflow. This is a meaningful improvement over GitHub’s binary “secret was used” signal.
Centralized access policies. Access rules are defined once in ESC rather than scattered across individual repository settings pages.
Single-point rotation. Because ESC environments can import other environments, shared credentials live in a common base that all 70+ repo environments are composed of. Update it once, and every repo picks up the change on its next run.
Dynamic credentials by default. For cloud providers like AWS, Azure, and GCP, ESC fetches credentials via OIDC at open time. There is nothing to rotate because nothing is stored.
With this architecture in place, here is what an attacker gets if a compromised GitHub Action runs in our CI:
No GitHub Secrets to dump. The repository settings page has no stored secrets for a malicious action to exfiltrate.
OIDC tokens are scoped and short-lived. The GitHub-issued JWT is valid only for the specific workflow run and expires within minutes.
Cloud credentials are ephemeral. Any AWS, Azure, or GCP credentials fetched through ESC are short-lived and scoped to the role assumed during that run.
No persistent access. There are no long-lived tokens to reuse hours or days later.
Compare this to the traditional model, where a single compromised action could exfiltrate AWS access keys that remain valid until someone manually rotates them — which could be weeks or months.
The goal is not to prevent every possible attack. It is to make the blast radius as small as possible when something goes wrong.
If you want to adopt the same pattern in your own CI pipelines:
Tutorial: Using ESC with GitHub Actions — Step-by-step setup guide.
Announcing the Pulumi ESC GitHub Action — Full feature overview and capabilities.
Configuring OIDC for ESC — Set up OIDC trust between ESC and your cloud providers.
Pulumi ESC documentation — Full documentation for environments, secrets, and configuration.
Your CI secrets do not have to be a liability. With OIDC and Pulumi ESC, they do not have to exist at all.
Since the launch of Pulumi IAM with custom roles and scoped access tokens, organizations have been using fine-grained permissions to secure their automation and CI/CD pipelines. As teams scale to hundreds or thousands of stacks, environments, and accounts, the next challenge is applying those permissions efficiently.
Today, we’re introducing three new capabilities to help you manage permissions more dynamically at scale: tag-based access control, team role assignments, and user role assignments.
With custom roles, you can define granular permissions using fine-grained scopes. However, applying those roles still requires selecting individual stacks, environments, or accounts one by one. For organizations managing a large number of Pulumi entities, this means either granting overly broad access or spending significant time on manual configuration. Tag-based access control solves this problem.
You can now create rules within a custom role that dynamically grant permissions based on entity tags. This works across IaC stacks, ESC environments, and Insights accounts. For example, when a new stack is created and tagged env:prod, anyone with a role containing a matching tag-based rule automatically gets the right permissions. No manual assignment required.
A single role can include multiple tag-based rules, and they are evaluated with OR logic. If an entity matches any of the rules, the permissions are granted. Within a single rule, you can combine multiple key-value conditions with implicit AND logic for precise targeting. For example, a rule with conditions env:prod and team:payments ensures access is granted only to production resources owned by the payments team.
Custom roles can now be assigned directly to teams within your Pulumi organization. When an engineer joins a team, whether manually or via SCIM provisioning, they automatically inherit the permissions defined in the team’s assigned roles.
Teams support both inline permissions (ad-hoc access to specific stacks, environments, or accounts) and role-based permissions simultaneously. You can assign multiple roles to a single team, giving you full flexibility to compose access from reusable building blocks while retaining the ability to grant one-off access where needed. If you have existing workflows built around ad-hoc assignments to teams, those continue to work exactly as before. You can adopt roles incrementally or mix both approaches on the same team.
**
Team admins (or users with the team:update scope) can continue to manage their team’s inline permissions as they do today. However, assigning organization-level custom roles to a team requires additional permissions: role:read and role:update.
Custom roles can also be assigned directly to individual organization members. This is useful for users whose responsibilities span multiple teams or require permissions beyond the existing org-level Admin, Member, and Billing Manager roles.
Permissions in Pulumi IAM are additive. A user receives the union of all permissions granted to them, including permissions from roles assigned directly to them as a user and permissions from roles assigned to any team they belong to. A user on both the “SRE” and “Security” teams inherits permissions from both team roles, plus any role assigned to them individually.
Configuring tag-based access control and role assignments is done through the Pulumi Cloud console and REST API.
In Pulumi Cloud, navigate to Settings > Access Management > Roles and create a new custom role. In the role configuration, add tag-based rules that define which entities the role should apply to.
For example, to create a role that grants write access to all production stacks:
Click Create custom role and give it a descriptive name (e.g., “Production Deployer”)
Add a permission set (e.g., Stack Write) to the role
Under entity selection, choose Tag-based rule
Set the condition: tag key env equals prod
Save the role
Go to Settings > Access Management > Teams, select a team, and assign your custom role. All team members immediately inherit the defined permissions.
For users with unique access requirements, go to Settings > Access Management > Members, select a user, and assign a custom role directly.
Video
Tag-based access control relies on consistent tagging. If a stack is missing a tag or has an incorrect value, permissions won’t be applied as expected. Pulumi Policy closes this gap by letting you enforce tagging standards as a preventative policy group, so any pulumi up on a stack with missing or invalid tags is blocked before deployment. This ensures your tag-based RBAC rules always grant the correct permissions. Policy enforces the standard, RBAC enforces the access.
To learn how to write policies that validate stack tags, see Using stack tags in policies.
**
Pulumi Policy currently supports tag enforcement for IaC stacks. For ESC environments and Insights accounts, tags are managed through the Pulumi Cloud console or REST API.
Tag-based access control, team role assignments, and user role assignments are available today for customers on the Pulumi Enterprise and Pulumi Business Critical plans. Check out our pricing page for more details on editions and what’s included.
With custom roles providing fine-grained permissions, tag-based rules enabling dynamic access policies, and the ability to assign roles directly to teams and users, Pulumi IAM now provides everything you need to implement automated, least-privilege access control at scale. We’re excited to see how you leverage these new capabilities to secure and streamline your cloud operations.
Explore the IAM documentation to get started, and share your feedback in our GitHub repository.
Pulumi’s OPA (Open Policy Agent) support is now stable. The v1.1.0 release of pulumi-policy-opa makes OPA/Rego a first-class policy language for Pulumi with full feature parity alongside the native TypeScript and Python policy SDKs. Write Rego policies that validate any resource Pulumi manages, across AWS, Azure, GCP, Kubernetes, and the rest of the provider ecosystem. If you already have Kubernetes Gatekeeper constraint templates, a new compatibility mode lets you drop those .rego files directly into a Pulumi policy pack and enforce them against your Kubernetes resources without modification.
OPA/Rego is now fully supported as a policy language for Pulumi Insights, with the same capabilities as the TypeScript and Python SDKs:
Resource and stack-level policies: Validate individual resources with deny and warn rules, or evaluate your entire stack at once with stack_deny and stack_warn for cross-resource checks like relationship validation and resource count limits.
Enforcement levels: Control how violations are handled. mandatory blocks deployments, advisory surfaces warnings, and disabled turns rules off without removing them. Enforcement levels can be overridden per policy without modifying Rego source.
Policy configuration: Pass custom parameters to policies via configuration files, with optional JSON schema validation. Configuration values are accessible in Rego as data.config...
OPA metadata annotations: Use standard OPA # METADATA comments to provide titles, descriptions, and messages for your policies. These populate the policy metadata displayed in Pulumi Cloud.
Preventative and audit evaluation: OPA policies work with both preventative enforcement during pulumi up and audit policy scans for continuous compliance monitoring.
You can choose whichever language best fits your team. Organizations already using OPA across their toolchain can standardize on Rego for Pulumi policies, while teams preferring TypeScript or Python can continue to use those. All three languages work side by side in the same policy groups.
The headline feature of this release is native support for Kubernetes Gatekeeper constraint template rules. If you’re running Gatekeeper as an admission controller in your clusters, you likely have a library of .rego policies that enforce security and operational standards at admission time. With v1.1.0, those same rules can now run as Pulumi policies, catching violations during pulumi preview before resources ever reach the cluster.
To enable Gatekeeper compatibility, set inputFormat: kubernetes-admission in your PulumiPolicy.yaml:
description: Kubernetes Gatekeeper Policy Pack
runtime: opa
inputFormat: kubernetes-admission
With this setting, Pulumi automatically wraps Kubernetes resources in the Gatekeeper AdmissionReview structure (input.review.object, input.review.kind, etc.), so your existing rules work without modification. Non-Kubernetes resources are silently skipped.
Here’s an example that reuses standard Gatekeeper-style rules, requiring an app label and prohibiting the latest image tag:
package gatekeeper
import rego.v1
# METADATA
# title: Require App Label
# description: All Kubernetes resources must have an "app" label.
violation contains {"msg": msg} if {
not input.review.object.metadata.labels["app"]
msg := sprintf("%s '%s' is missing required label: app",
[input.review.kind.kind, input.review.name])
}
# METADATA
# title: Disallow Latest Tag
# description: Container images must not use the "latest" tag.
deny contains msg if {
container := input.review.object.spec.template.spec.containers[_]
endswith(container.image, ":latest")
msg := sprintf("container '%s' uses the 'latest' tag -- pin to a specific version",
[container.name])
}
These rules are identical to what you’d write for Gatekeeper. Both rule head formats are supported and can coexist: the violation[{"msg": msg}] map format and the deny[msg] string format. Per-policy configuration via input.parameters also works as expected. You can take a .rego file from your Gatekeeper constraint templates, drop it into a Pulumi policy pack, and publish it to Pulumi Cloud to enforce automatically across your stacks.
This shifts policy enforcement left. Instead of waiting for the Kubernetes API server to reject a resource at admission time, you catch the violation during pulumi preview, before anything is deployed.
The OPA Gatekeeper Library is a community-maintained collection of constraint templates covering common Kubernetes guardrails like pod security, image provenance, and resource limits. You can use these policies directly with Pulumi. Here’s an end-to-end example using the allowedrepos policy to restrict which container image registries your deployments can use.
Create a new Kubernetes OPA policy pack:
pulumi policy new kubernetes-opa
Copy the Rego source from gatekeeper-library into your policy pack as allowedrepos.rego. No modifications are needed:
package k8sallowedrepos
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not strings.any_prefix_match(container.image, input.parameters.repos)
msg := sprintf("container has an invalid image repo , allowed repos are %v",
[container.name, container.image, input.parameters.repos])
}
violation[{"msg": msg}] {
container := input.review.object.spec.initContainers[_]
not strings.any_prefix_match(container.image, input.parameters.repos)
msg := sprintf("initContainer has an invalid image repo , allowed repos are %v",
[container.name, container.image, input.parameters.repos])
}
violation[{"msg": msg}] {
container := input.review.object.spec.ephemeralContainers[_]
not strings.any_prefix_match(container.image, input.parameters.repos)
msg := sprintf("ephemeralContainer has an invalid image repo , allowed repos are %v",
[container.name, container.image, input.parameters.repos])
}
Verify that your PulumiPolicy.yaml has Gatekeeper compatibility enabled:
description: Kubernetes Gatekeeper Policy Pack
runtime: opa
inputFormat: kubernetes-admission
Configure the allowed registries. Create a policy-config.json file to pass the repos parameter:
{
"k8sallowedrepos": {
"repos": ["gcr.io/my-company/", "docker.io/library/"]
}
}
Test the policy locally against a stack:
pulumi preview --policy-pack . --policy-pack-config policy-config.json
Any Kubernetes deployment using an image outside the allowed registries will produce a violation at preview time, before it reaches the cluster.
Publish the pack and add it to a policy group to enforce it across your organization:
pulumi policy publish
The same approach works for any policy in the gatekeeper-library: containerlimits, requiredlabels, disallowedtags, and others. Copy the Rego, configure parameters, and publish.
OPA policy support is part of the broader Pulumi Insights governance platform. Insights gives you visibility and compliance across your entire cloud footprint, and OPA policies plug directly into that:
Audit policy scans continuously evaluate OPA policies against your Pulumi stacks and discovered cloud resources, providing a compliance baseline without redeploying anything.
Self-hosted execution lets you run policy evaluations on your own infrastructure using customer-managed workflow runners, keeping credentials and data within your network.
Pre-built compliance packs for CIS, NIST, PCI DSS, and other frameworks are available alongside your custom OPA policies in the same policy groups.
Whether you’re enforcing policy at deployment time, scanning existing infrastructure for drift, or running continuous compliance checks, OPA policies are a native participant.
.rego files?No. Set inputFormat: kubernetes-admission in your PulumiPolicy.yaml and your existing Gatekeeper constraint template rules work as-is. Pulumi handles the AdmissionReview wrapping automatically.
When using inputFormat: kubernetes-admission, non-Kubernetes resources are silently skipped during evaluation. Your Gatekeeper rules only run against Kubernetes resources.
No. The pulumi-policy-opa analyzer plugin embeds the OPA evaluation engine and is installed automatically by the Pulumi CLI (v3.227.0+). The standalone OPA CLI is only needed if you want to run opa test against your policies independently.
If your team already writes Rego for other tools like Gatekeeper, writing Pulumi policies in Rego keeps your policy language consistent. If your team is more comfortable with general-purpose languages or needs auto-remediation, use the TypeScript or Python SDKs.
Gatekeeper constraint templates can be reused directly via the kubernetes-admission input format, but other OPA integrations use different input structures, so those policies would need to be adapted to Pulumi’s resource model. All three languages work together in the same policy groups.
Templates are available for kubernetes-opa, aws-opa, azure-opa, and gcp-opa via pulumi policy new. For more details, see the policy authoring guide and the Policy as Code overview.
Get started with OPA policies [
** github.com/pulumi/pulumi-policy-opa
Pulumi ESC (Environments, Secrets, and Configuration) allows you to compose environments by importing configuration and secrets from other environments, but this also means a child environment can silently override a value set by a parent. When that value is a security policy or a compliance setting, an accidental override can cause real problems. With the new fn::final built-in function, you can mark values as final, preventing child environments from overriding them. If a child environment tries to override a final value, ESC raises a warning and preserves the original value.
Let’s say you have a parent environment that sets the AWS region for all deployments. You can use fn::final to ensure no child environment can change it:
# project/parent-env
values:
aws-region:
fn::final: us-east-1
If a child environment tries to override the final value, ESC raises a cannot override final value warning.
# project/child-env
imports:
- project/parent-env
values:
aws-region: eu-west-1 # raises a warning
This evaluates to:
{
"aws-region": "us-east-1"
}
In this scenario, the ESC environment is still valid, but the final value remains unchanged.
Use fn::final for:
Security-sensitive values that shouldn’t be changed
Compliance or policy settings enforced by a platform team
Shared base environments where certain values must remain consistent
The fn::final function is available now in all Pulumi ESC environments. For more information, check out the fn::final documentation!
The Pulumi Registry now supports browsing documentation for previous versions of first-party Pulumi providers. If you’ve ever needed to look up the API docs for an older provider version, you no longer have to dig through Git history or guess at changes — the docs are right there in the Registry. These docs also help Pulumi Neo and other agents more accurately assist you with your Pulumi code and operations.
When you visit a first-party provider’s page in the Pulumi Registry, you’ll now see a version dropdown selector that lets you switch between the current version and previous versions.
Select a previous version from the dropdown, and the Registry loads the full API documentation for that version.
This feature currently includes documentation for the latest release of each previous major version, going back two major versions. For example, if a provider is on v7.x, you’ll be able to view docs for the latest v6.x and v5.x releases in addition to the current version.
Head over to the Pulumi Registry and try it out. Pick any first-party provider with multiple major versions and use the version dropdown to browse its history.
We’d love to hear your feedback — let us know what you think in the Pulumi Community Slack or by opening an issue on GitHub.
Many developers and platform engineers already use Google accounts daily for email, cloud console access, and collaboration. Until now, signing in to Pulumi Cloud required a GitHub, GitLab, or Atlassian account, or an email/password combination. Today, we’re adding Google as a first-class identity provider, so you can sign in to Pulumi Cloud with the same Google account you already use for everything else.
Adding Google as an identity provider brings several benefits:
Use the account you already have. If your team already lives in Google Workspace, you can sign in to Pulumi Cloud with a single click, no new credentials required.
Inherit your existing security policies. If you’ve already configured two-factor authentication, device management, and other protections in a Google Workspace, you can carry them over to Pulumi Cloud automatically.
On the Pulumi Cloud sign-in page, you’ll see a new Sign in with Google button alongside the existing GitHub, GitLab, and Atlassian options. If you are a new user, select it, authenticate with your Google account, and you’re in.
**
If you already have an existing Pulumi Cloud account, make sure to associate to your existing account as described in the next section.
If you already have a Pulumi Cloud account, you can link your Google identity from your account settings:
Navigate to your Account Settings.
Scroll to the Identity providers section.
Under Available identities, select Connect Google.
Once connected, you can use Google to sign in to your existing Pulumi Cloud account.
Google sign-in lets you authenticate with Pulumi Cloud using your individual Google account. It does not enable Google as a single sign-on (SSO) identity provider for your Pulumi Cloud organization.
If your team uses Google Workspace and needs centralized membership governance for Pulumi Cloud, configure SAML SSO with Google Workspace instead. SAML SSO is available on Pulumi Enterprise and Business Critical editions.
Google sign-in is available now for all new and existing Pulumi Cloud users:
New users: Sign up with Google on the Pulumi Cloud sign-up page.
Existing users: Connect your Google account in your account settings.
For more details, see the Pulumi Cloud accounts documentation.
We’d love to hear your feedback. Join the conversation in the Pulumi Community Slack or open an issue on GitHub.
Your version control provider shouldn’t limit your infrastructure workflows. Pulumi Cloud now works with GitHub, GitHub Enterprise Server, Azure DevOps, and GitLab. Every team gets the same deployment pipelines, PR previews, and AI-powered change summaries regardless of where their code lives.
You can connect multiple VCS providers to a single Pulumi organization simultaneously, like GitHub, GitLab, and Azure DevOps all at once. You can also connect multiple accounts of the same provider, such as two separate GitHub organizations or two GitLab groups. This means teams that work across different repositories, providers, or organizational boundaries can manage everything from one place.
**
GitHub Enterprise Server is currently limited to one connection per Pulumi organization.
Connect a repository to a stack, and infrastructure deploys automatically when you push. Configure path filters to trigger only when relevant files change, and manage environment variables and secrets directly in Pulumi Cloud. No external CI/CD pipeline required.
Every pull request gets an infrastructure preview so reviewers can see exactly what will change before merging. The preview runs the same Pulumi operations your deployment would, giving your team confidence that a merge won’t break anything.
Neo posts AI-generated summaries on your pull requests explaining what infrastructure changes mean in plain language. Reviewers who aren’t Pulumi experts can still understand the impact of a change without reading resource diffs.
Ask Neo to make infrastructure changes and it opens pull requests directly against your connected repositories. Describe what you want in natural language, and Neo writes the code, opens the PR, and kicks off a preview, all without leaving Pulumi Cloud.
Schedule drift detection to catch out-of-band changes automatically. When someone modifies infrastructure outside of your Pulumi programs, drift detection flags the difference so your team can remediate before it causes issues.
Pulumi Cloud authenticates with your VCS provider using OIDC or OAuth so no long-lived credentials need to be stored. Short-lived tokens keep your deployment pipelines secure without manual secret rotation.
The new project wizard discovers your organizations, repositories, and branches so you can scaffold and deploy a new stack without leaving Pulumi Cloud. Pick your repo, choose a branch, and you’re ready to deploy.
An org admin configures the integration under Settings > Version Control.
Authorize with your VCS provider.
Deploy infrastructure with first-class workflows.
For setup details, see the docs for GitHub, GitHub Enterprise Server, Azure DevOps, and GitLab.
Pulumi has a lot of engineers. It has marketers, solution architects, developer advocates. Everyone has something to contribute to docs and blog posts — domain expertise, hard-won lessons, real-world examples. What they don’t all have is familiarity with our Hugo setup, our style guide, our metadata conventions, or where a new document is supposed to live in the navigation tree. I joined Pulumi in July 2025 as a Senior Technical Content Engineer. A few weeks in, my sole teammate departed. The docs practice was now, functionally, me.
The problem was clear enough: how do you take one docs engineer’s accumulated knowledge and make it available to everyone who needs it, without that engineer becoming a bottleneck?
I started packaging it. Here’s what that looked like in practice.
Everyone talks about AI making you faster. That’s not wrong, but it’s not the most interesting part — at least not for me.
The most interesting part is what it does to the starting problem. I have an ADHD brain (not formally diagnosed, but with enough self-recognition to know what’s going on). I know what that means for my relationship with most tasks: I can see the problem, I understand it, I want to fix it, and then the sheer weight of starting crushes me flat.
When I’m stuck on a task, the issue is almost never that I don’t know what to do. It’s that my brain is trying to hold the entire finished product in working memory while simultaneously producing the first step. That’s an enormous cognitive tax, and for an ADHD brain it’s often insurmountable.
Talking through a problem conversationally is a completely different cognitive load. I can tell Claude “here’s the issue, here’s what I’m trying to accomplish, here’s what’s weird about it,” and suddenly I’m not staring at a blank page anymore. I’m in a conversation. The scaffold exists. I can build on it.
That dynamic isn’t new for me. In a previous role writing training modules at Microsoft, I did some of my best work, not because the work was easy, but because I had a collaborator. A friend to think out loud with. Someone to say “okay, so what are we actually trying to say here?” That conversational scaffolding was the difference between spinning and shipping.
In my current role as a team of one, AI turned out to be that collaborator.
This isn’t really a productivity story. It’s closer to a cognitive accommodation story. And I’d bet a lot of people — diagnosed or not — will recognize what I’m describing.
If conversational scaffolding could lower my own activation energy, the next question was obvious: could I build that for anyone who needed it? I knew I wanted to use AI to solve this problem, but I didn’t want to just write a bunch of one-off prompts. That would be a maintenance nightmare, and it wouldn’t scale beyond me. I needed a system. Claude Code calls these reusable prompts skills — other platforms have the same idea under names like plugins or extensions. My first real experiment was /docs-review — a reusable prompt that would run my writing through a consistent set of criteria before I committed it. Nothing fancy. I just wanted a reliable bar that didn’t depend on my mood or how much coffee I’d had.
Then it occurred to me: every PR to our docs repo should get this automatically. So I wired it into our CI/CD pipeline. Meagan, my manager, loved it — and after a few weeks, she noticed that PR quality had improved dramatically. On almost every PR, contributors were now spontaneously pushing an “Addressing feedback” commit right after the automated review posts — catching and fixing issues before I ever saw the PR.
That’s when something clicked: I wasn’t writing prompts anymore. I was writing modules — reusable, composable pieces of my own expertise.
The insight was straightforward, but it changed how I thought about the whole system: if multiple skills need the same context — our style guide, our review criteria, our content standards — that context should live in one place and get consumed by everything that needs it. Just like a shared library. Just like any decent software project.
I created a REVIEW-CRITERIA.md file as the single source of truth for what a “good” docs PR review looks like at Pulumi. Every skill that does any kind of review pulls from it. Change it once, and everything gets smarter at once. Likewise with our style guide, our Hugo conventions, our navigation structure. All of that lives in central reference files that any skill can pull from. If something changes, I change it in one place and all the skills get the update.
This also matters for token efficiency — which sounds like a nerdy footnote but isn’t, especially when automated reviews are running on every PR. Duplicating context across skills bloats token usage fast. Modularizing keeps it lean. Your CI/CD pipeline doesn’t care about elegance, but it definitely cares about cost.
The mental model I kept coming back to: Don’t Repeat Yourself. It’s the same principle that makes good software maintainable. It turns out it makes good AI workflows maintainable too.
From there, the system grew organically. Whenever I found myself doing something more than once, I asked: “Can I turn this into a skill?” Here’s a sampling of what that produced:
/fix-issue — takes a GitHub issue and recommends a concrete plan of attack, so I go from “here’s a ticket” to “here’s what I’m doing” without the spinning-up tax.
/shipit — runs pre-commit checks, writes a focused commit message, and drafts a PR description.
/pr-review — full doc review on a PR branch: style guide, code examples, screenshots, optional test deployment, then an Approve/Merge/Request Changes dialog with a drafted comment.
/slack-to-issue — converts #docs Slack conversations into properly formed GitHub issues. Slack is where decisions happen; issues are where work gets tracked.
/glow-up — runs an older doc through the modern style guide and flags outdated screenshots, for digging out of accumulated technical debt.
/new-doc and /new-blog-post — guide anyone through adding a new document or blog post with the right location, metadata, and navigation wiring. Engineers, marketers, whoever. The barrier to contributing just dropped significantly.
/docs-tools — helps other repo users discover that any of this exists. Discoverability is a real problem with internal tooling.
**
Slack’s built-in Claude integration isn’t the same Claude running your Claude Code workflows — they don’t share context or custom instructions. If you want consistent criteria across both surfaces, you need to bring your own backend. That’s exactly what /slack-to-issue handles.
Other people started contributing skills to the repo — not because I asked, but because the pattern was legible enough to extend. Someone built a skill for SEO analysis. Marketing added their own review criteria. Engineers contributed workflows I never would have thought to build.
The thing I’d built as a personal survival tool had become a shared platform. That happened because I treated the prompts like code: modular, reusable, documented, open for contribution.
It’s not a replacement for human judgment. These are probabilistic tools — they’re right most of the time, not all of the time. /pr-review doesn’t approve PRs autonomously. It highlights things and then asks me, the human, to read them and make the call. The AI does the first pass; I do the last one. That’s not a workaround for a limitation — that’s the design.
The system isn’t finished, either. It’s probably never finished. I’m still tweaking review criteria, still finding edge cases where a skill produces something weird, still adding new tools as new pain points emerge. Treating prompts like code means treating them like software: you ship, you iterate, you maintain. There’s no version 1.0 and done.
And the ADHD angle is real but it’s not magic. There are still days where the paralysis wins. AI lowers the activation energy for starting; it doesn’t eliminate it. I’m still the one who has to show up. I suppose I could automate that too, but then we’d be in a whole different kind of dystopia.
Know your models and their costs. At Pulumi we primarily use Claude, and I work in Claude Code; for most tasks I reach for Sonnet rather than Opus. Opus is excellent, but it’s significantly more expensive, and well-crafted instructions to Sonnet handle the vast majority of my work just as effectively.
Treat it like a coworker. Don’t just issue commands and wait for output. Ask what it thinks. Push back when it’s wrong. Explain your reasoning. The more you engage conversationally, the better the results tend to be. That extends to alignment, too — before diving into a complex task, talk through the approach first. A few minutes of alignment up front beats iterating on a misunderstood spec. I’ve gone as far as adding personal instructions to my config — things like playing along when I’m pretending to be Captain Picard, or using colorful language when the context calls for it. (Yes, those are literal config settings.) That sounds frivolous, but it isn’t: a tool you actually enjoy using is a tool you’ll reach for instead of avoid.
Modularize your workflow. Don’t write one giant monolithic prompt that tries to do everything. Break it into focused skills that do one thing well and share common context through a central reference file. Easier to maintain, easier to debug, cheaper to run.
Version control your prompts. Your skills are code. Treat them like code. Commit them, review them, iterate on them. If a skill starts producing weird output after a tweak, you’ll want to know what changed.
Think about token burn rate. This matters most when running automation in CI/CD. Keep your skills focused — a skill that checks style doesn’t need to load your Hugo navigation conventions. The model only reads what you give it, so give it only what it needs.
Not everything needs to be a prompt. This one is underappreciated: skills can include scripts, and that’s often the right call. When my team moves a doc in the repo, it needs to happen via git mv to preserve history, and we need to add a redirect alias to the front matter to prevent 404s and protect SEO. That’s not something I want an AI to reason through from scratch every time — it’s a solved problem. So it’s a script. The skill just knows the script exists and what it does. Claude orchestrates; the script executes. That’s a cleaner, more reliable division of labor than asking an LLM to reinvent the wheel on every run.
Not everything needs to be generative. Corollary to the last point: if you need deterministic output, don’t use probabilistic tools. We have a skill that generates the meta image for blog posts — procedurally, not generatively. No AI-generated imagery. We have a brand to protect, and “let the AI vibe it out” isn’t a content strategy. The skill follows our visual standards programmatically and produces something consistent every time. Know what you’re automating and why.
The next frontier is bringing some of this tooling to the less technical members of the team — marketing, in particular. The skills I’ve built assume a certain comfort level with terminals and repos. That’s fine for engineers. It’s a barrier for everyone else. A friendly interface would lower that bar significantly — that’s the direction I’m currently exploring.
If you’re a technical writer, a developer advocate, or a solo practitioner figuring out how AI fits into your workflow, the approach described here is a solid starting point. The tools matter, but the mental model matters more: treat your prompts like code. Make them reusable. Document them. Share them.
Our docs repo is public, so the skills are there for anyone who wants them. If you’re building something similar, steal freely — or contribute back.
The blank page is still there. It’s just a lot less intimidating when you’ve got a good collaborator and a solid set of tools.
In January, we introduced a major performance enhancement for Pulumi Cloud through a fundamental change to how Pulumi manages state that speeds up operations by up to 20x. After a staged rollout across many organizations, it is now enabled by default for every Pulumi Cloud operation. No opt-in required—just use Pulumi CLI v3.225.0+ with Pulumi Cloud. The improvement applies to pulumi up, pulumi destroy, and pulumi refresh; pulumi preview does not modify state, so it is unchanged.
First and foremost, nothing about how you work with pulumi needs to change. Your updates now benefit from better parallelism and should thus complete faster. Before this change, pulumi always saved a full snapshot to the cloud, so the current state could always be recovered if something goes wrong. With journaling, we now only send the state of each operation, which allows us to send these updates in parallel, as long as resources are not related to each other. For the full deep dive, see the blog post linked above.
Since January, we’ve had many early adopters of journaling. This helped us shake out one final bug on the server side, and journaling has been stable since then. With that we feel confident in rolling this out to all our users.
We’ve also gathered some real-world data on how journaling is performing. The data from the preview period shows some significant improvements for update times. For stacks with fewer than 100 resources, the median improvement is 25.3%, while the p90 improvement is 75.2%, and we’ve seen a p99 improvement of up to 92.6% Meanwhile, for larger stacks, the median improvement is 60.2%. We need more data for stacks with more than 100 resources, we will update this blog once that comes in.
This data already shows the expected significant improvement in update times, especially for larger stacks, though the improvements strongly depend on the shape and type of resources that are being set up. Stacks with many resources, that are quick to update benefit more than smaller stacks with slower to set up resources. For more numbers see also the Benchmarks section in the previous blog post
While this was an opt-in process using the PULUMI_ENABLE_JOURNALING environment variable, this opt-in is no longer required. Just upgrade your Pulumi CLI to v3.225.0+ and use the Pulumi Cloud backend, and journaling will automatically speed up your updates.
If you encounter any issues, reach out on the Pulumi Community Slack or through Pulumi Support. You can also set the PULUMI_DISABLE_JOURNALING=true env variable to opt out of journaling.
Platform engineering teams managing infrastructure across Terraform and Pulumi now have a way to unify state management without rewriting a single line of HCL. Starting today, Pulumi Cloud can serve as a Terraform state backend, letting you store and manage Terraform state alongside your Pulumi stacks. Your team continues using the Terraform or OpenTofu CLI for day-to-day operations while gaining the benefits of Pulumi Cloud: AI-powered infrastructure management with Pulumi Neo — our infrastructure agent — encrypted state storage, update history, state locking, role-based access control, audit policies, and unified resource visibility through Insights.
This feature is now available in public preview.
Most organizations adopting Pulumi are not starting from scratch. They have years of Terraform deployments spread across teams, and migrating everything to a new IaC tool overnight is not realistic. We have heard from customers who are excited about the power of Pulumi Cloud but have had to manage migration projects before they can fully benefit from centralized visibility and governance.
The Terraform state backend in Pulumi Cloud changes that equation. Instead of requiring a full code conversion before teams see value, you can migrate your state in minutes and immediately unlock Pulumi Cloud capabilities for your existing Terraform infrastructure — including Neo, Pulumi’s AI infrastructure agent. Once your Terraform state is in Pulumi Cloud, Neo can reason about those resources the same way it does for Pulumi IaC stacks: finding resources, troubleshooting issues, understanding dependencies, and writing infrastructure code PRs. Teams that prefer Terraform can keep using it, while platform engineers get a single AI-powered control plane across the entire infrastructure estate.
When you store Terraform state in Pulumi Cloud, your Terraform-managed resources get the following added functionality:
Agentic infrastructure with Neo. Neo, Pulumi’s AI infrastructure agent, works across your entire cloud footprint — Terraform and Pulumi IaC alike. Once your Terraform state is in Pulumi Cloud, you can ask Neo to find resources across both tools, trace dependencies that span Terraform and Pulumi stacks, troubleshoot configuration issues, and generate new infrastructure code informed by your existing resources. This means platform teams get a single AI-powered interface regardless of which IaC tool manages each piece of infrastructure.
Encrypted state with update history. State is encrypted in transit and at rest. Every change is tracked as a versioned checkpoint visible in the stack activity tab, giving you full rollback capability. This is a common concern for teams currently storing state in S3 buckets.
Automatic state locking. Pulumi Cloud prevents concurrent Terraform operations from corrupting state, without requiring you to configure DynamoDB tables or other external locking mechanisms.
Role-based access control. Control who can read or modify each stack using teams and RBAC, applying the same access policies you use for Pulumi stacks.
Unified resource visibility. View Terraform-managed resources alongside Pulumi-managed resources in Resource Search. Each Terraform resource appears in the console using a pulumi:terraform: naming convention, so you can search and filter using the attribute names you already know.
Audit policies. Run audit (detective) policy packs against your Terraform-managed stacks, including Pulumi’s pre-built compliance packs for CIS, PCI, and more. Pulumi Cloud performs a best-effort schema mapping from Terraform resource shapes to Pulumi provider equivalents, so existing policy packs work without modification in most cases.
Stack outputs and references. Terraform root module outputs are automatically mapped to Pulumi stack outputs, making them available via stack references and the pulumi-stacks ESC provider. This is useful for sharing foundational infrastructure like VPC IDs or DNS zones between Terraform and Pulumi stacks, and for incremental migrations where legacy infrastructure stays in Terraform while new stacks are written in Pulumi.
Pulumi Cloud implements the Terraform remote backend API. You point the Terraform CLI at Pulumi Cloud using the standard backend "remote" configuration block, and no changes to your Terraform code or workflow are required.
Each Terraform workspace maps to a Pulumi stack. The workspace name follows the convention _. For example, networking_prod creates a stack named prod in the networking project.
Migration from S3, Azure Blob, GCS, local backends, or HCP Terraform (Terraform Cloud) takes minutes and is documented in the Terraform state backend guide. From S3, Azure Blob, GCS, or local state, back up your state, update your backend block to point to Pulumi Cloud, set TF_TOKEN_api_pulumi_com, and run terraform init -migrate-state. From HCP Terraform, export state manually and push it to Pulumi Cloud.
Each Terraform resource stored in Pulumi Cloud counts as a resource under management, the same as a Pulumi-managed resource. See the pricing page for details.
Store Terraform State in Pulumi Cloud
If you have questions or feedback, join us in the Pulumi Community Slack or open an issue on GitHub.
When an AI agent writes infrastructure code, two things matter: how compact the output is (token efficiency) and how well the model actually reasons about what it’s writing (cognitive efficiency). HCL produces fewer tokens for the same resource. But does that make it the better choice when agents need to refactor, debug, and iterate? We ran a benchmark across Claude Opus 4.6 and GPT-5.2-Codex to find out.
You might assume that the language producing fewer tokens is also the one models reason about best. Research into LLM-driven infrastructure generation suggests otherwise.
HCL is declarative and minimal. It requires no imports, no runtime constructs, and no language scaffolding. For simple infrastructure generation, HCL leads to fewer tokens and lower generation cost.
For a straightforward resource definition, HCL gets straight to the point:
resource "aws_s3_bucket" "example" {
bucket = "my-bucket"
}
Compare that with the Pulumi TypeScript equivalent:
import * as aws from "@pulumi/aws";
const bucket = new aws.s3.Bucket("example", {
bucket: "my-bucket",
});
The HCL version requires fewer tokens. No import statement, no variable declaration, no constructor syntax. For single-shot generation of simple resources, that compactness matters. But the picture changes once you account for deployability and refactoring.
HCL’s token advantage is real for simple generation. But agents don’t just generate once. They validate, repair failures, and refactor. We built a benchmark that measures the full cycle: an open-source tool that sends identical prompts to Claude Opus 4.6 (claude-opus-4-6) and OpenAI GPT-5.2-Codex (gpt-5.2-codex), requesting both Terraform HCL and Pulumi TypeScript for the same AWS infrastructure (VPC with public and private subnets across 2 AZs, an EC2 instance with security groups for SSH and HTTP, and an RDS PostgreSQL instance with a security group allowing port 5432 only from the EC2 security group, plus all networking: internet gateway, NAT gateway, route tables, and associations). We measured token consumption, cost, and deployability across two scenarios: initial generation and refactoring into reusable components.
Methodology: Temperature 0, 5 runs per combination, randomized execution order. Each generated output goes through a three-stage validation pipeline: formatting (terraform fmt for HCL, prettier for TypeScript), static analysis (terraform validate for HCL, tsc --noEmit for TypeScript), and provider-level validation (terraform plan for HCL, pulumi preview for TypeScript). Both plan and preview check against real AWS provider schemas without creating resources, making their pass rates comparable across formats. If plan/preview fails, the benchmark feeds the error back to the model for one self-repair attempt. At temperature 0, Claude Opus 4.6 produced near-identical outputs across runs (sd=0-4 tokens). GPT-5.2-Codex showed more natural variation (sd=130-165 tokens). With 5 runs per combination the results are directional, not statistically conclusive. Costs are estimates based on published pricing as of 2026-02-22. Full methodology and reproducible code:
[
** github.com/dirien/iac-token-benchmark
](https://github.com/dirien/iac-token-benchmark)
HCL uses fewer tokens for generation:
Provider Format Output tokens (mean) LOC (mean) Cost (mean) Plan/Preview pass Repairs needed
Claude Opus 4.6 Terraform 2,007 212 $0.051 5/5 0/5
Claude Opus 4.6 Pulumi TS 2,555 200 $0.065 5/5 0/5
GPT-5.2-Codex Terraform 1,565 110 $0.022 2/5 2/5
GPT-5.2-Codex Pulumi TS 2,322 147 $0.033 0/5 5/5
HCL produces 21-33% fewer output tokens across both models. For simple resource generation, this translates directly to lower cost. Pulumi TypeScript uses more tokens for fewer lines of code because imports, type annotations, and constructor syntax add tokens without adding functional lines.
The Plan/Preview column tells a more complete story. Claude Opus 4.6 produced deployable code on the first pass for both formats: 5/5 for Terraform and 5/5 for Pulumi. Neither needed repairs. GPT-5.2-Codex struggled with both formats, but Terraform fared slightly better (2/5 vs 0/5).
We took each model’s generation output and asked it to refactor the code into a reusable module or component with parameterized environment name, instance sizes, and availability zone count.
Provider Format Output tokens (mean) LOC (mean) Cost (mean) Plan/Preview pass
Claude Opus 4.6 Pulumi TS 2,720 218 $0.082 5/5
Claude Opus 4.6 Terraform 3,379 345 $0.095 5/5
GPT-5.2-Codex Pulumi TS 2,477 248 $0.038 4/5
GPT-5.2-Codex Terraform 1,356 119 $0.021 0/5
This is where the results get interesting. Opus + Pulumi refactoring used 20% fewer tokens, cost 14% less, and passed pulumi preview on every run (5/5) with zero repairs. Opus + Terraform also ended up at 5/5 for terraform plan, but it needed repair cycles to get there. The benchmark run log tells the story:
# Pulumi refactoring: runs 29-33, sequential, no gaps = zero repairs
✓ [29/40] anthropic/pulumi-ts/refactor run 1 — 2721 tokens, $0.0817
✓ [30/40] anthropic/pulumi-ts/refactor run 2 — 2721 tokens, $0.0817
✓ [31/40] anthropic/pulumi-ts/refactor run 3 — 2721 tokens, $0.0817
✓ [32/40] anthropic/pulumi-ts/refactor run 4 — 2721 tokens, $0.0817
✓ [33/40] anthropic/pulumi-ts/refactor run 5 — 2714 tokens, $0.0816
# Terraform refactoring: 34→36→38→40→42, every run skips a number = every run triggered self-repair
✓ [34/40] anthropic/terraform/refactor run 1 — 3388 tokens, $0.0956
✓ [36/40] anthropic/terraform/refactor run 2 — 3339 tokens, $0.0944
✓ [38/40] anthropic/terraform/refactor run 3 — 3390 tokens, $0.0957
✓ [40/40] anthropic/terraform/refactor run 4 — 3388 tokens, $0.0956
✓ [42/40] anthropic/terraform/refactor run 5 — 3388 tokens, $0.0956
The skipped numbers (35, 37, 39, 41) are self-repair turns where the model received terraform plan errors and regenerated the code. Each repair consumed additional tokens that do not show up in the first-generation cost but do show up in the total pipeline cost.
GPT-5.2-Codex tells a different story. Both formats needed repair on every run, but what happened after repair is what matters:
# Codex + Terraform refactoring: repaired, but still failed plan (0/5 deployable)
terraform run 1 turn 2: plan_valid=False tokens=1559
terraform run 2 turn 2: plan_valid=False tokens=1165
terraform run 3 turn 2: plan_valid=False tokens=1068
terraform run 4 turn 2: plan_valid=False tokens=1134
terraform run 5 turn 2: plan_valid=False tokens=1101
# Codex + Pulumi refactoring: repaired, and preview passed (4/5 deployable)
pulumi-ts run 1 turn 2: plan_valid=True tokens=2246
pulumi-ts run 2 turn 2: plan_valid=True tokens=2567
pulumi-ts run 3 turn 2: plan_valid=True tokens=2187
pulumi-ts run 4 turn 2: plan_valid=True tokens=2531
pulumi-ts run 5 turn 2: plan_valid=False tokens=2647
Codex + Terraform used fewer tokens but produced zero deployable refactored code after repair. Codex + Pulumi used more tokens but recovered to deployable code 4 out of 5 times. TypeScript’s type errors gave the model enough information to fix the problems. HCL’s plan errors did not.
This is the number that matters for production agent workflows. It includes generation, any self-repair cycles, and refactoring:
Provider Format Total tokens (mean) Total cost (mean)
Claude Opus 4.6 Pulumi TS 8,183 $0.146
Claude Opus 4.6 Terraform 14,669 $0.249
GPT-5.2-Codex Terraform 8,723 $0.084
GPT-5.2-Codex Pulumi TS 15,211 $0.138
Opus + Pulumi had the lowest total pipeline cost at $0.146, 41% cheaper than Opus + Terraform at $0.249. The difference comes entirely from repair cycles: Pulumi needed zero repairs across both scenarios, while Terraform refactoring triggered self-repair on every run.
With Codex, Terraform had the lower pipeline cost ($0.084 vs $0.138), driven by its smaller token output. But Codex + Terraform produced zero deployable refactored code (0/5 plan pass), while Codex + Pulumi produced deployable code 4 out of 5 times.
HCL uses fewer tokens per generation. For single-shot resource creation, HCL’s compactness saves 21-33% on output tokens. That advantage is consistent across both models.
Pulumi produces deployable refactored code more reliably. With Opus, Pulumi refactoring passed preview 5/5 with zero repairs. Codex + Pulumi passed 4/5. Codex + Terraform passed 0/5.
Total pipeline cost favors Pulumi with Opus. Opus + Pulumi cost 41% less than Opus + Terraform across the full pipeline ($0.146 vs $0.249), because Terraform refactoring needed repair cycles that Pulumi did not.
The tradeoff depends on your model and workflow. Codex + Terraform is cheapest on raw tokens but produces no deployable refactored code. Codex + Pulumi costs more per token but actually deploys. Opus + Pulumi is the best of both: fewer refactoring tokens and zero repairs.
The TerraFormer project identified what they call the correctness-congruence gap: LLMs generate configurations that look valid but fail to match the user’s architectural intent. An error taxonomy study cataloged the same pattern across models. A survey of LLMs for IaC found the gap between syntax validity and architectural correctness widens with infrastructure complexity.
Refactoring is where this gap bites hardest. Turning a flat resource list into a parameterized, reusable module requires the model to restructure dependencies, introduce variables, and compose abstractions. With Pulumi, the model can use TypeScript’s standard refactoring patterns: extract a class, add typed constructor parameters, compose functions. These are patterns it has practiced across millions of repositories during training. With HCL, the same refactoring requires count, for_each, dynamic blocks, and module variable plumbing, domain-specific constructs that have far less representation in training data.
Our benchmark confirms this directly. Opus produced 2,720 tokens for Pulumi refactoring versus 3,379 for Terraform, a 20% reduction, and every Pulumi run passed pulumi preview without repair. The Terraform refactoring runs all triggered self-repair because the restructured HCL modules had issues that terraform plan caught.
Training data distribution makes this structural. LLMs have far more TypeScript than HCL in their corpora. A model refactoring TypeScript draws on patterns from the entire open-source ecosystem. A model refactoring HCL modules has a much smaller pool. Since general-purpose languages dominate new code production, this gap will widen over time.
**
Tooling can close the gap further. The Pulumi MCP server gives AI agents direct access to resource schemas at generation time. A tool like get-resource returns every property, type, and required field for a given cloud resource. The agent does not have to guess from what it memorized during training. It can look up the correct schema before writing a single line of code.
This changes the workflow from “generate, fail, read error, retry” to “look up schema, generate correctly.” Agent skills push this further by encoding working Pulumi idioms as structured prompts, so the model starts from a known-good baseline. Terraform has no equivalent to this MCP-based schema lookup. That difference matters more with every iteration.
One way to think about IaC language choice is through the lens of AI engineering maturity levels. At Level 3 (agentic coding), agents generate infrastructure from prompts. HCL’s 21-33% token savings on generation matters here. At Levels 4-5, agents iterate on specifications, refactor code, and maintain systems over time. Our benchmark shows this is where Pulumi pulls ahead: 41% lower total pipeline cost with Opus, and more deployable refactored output with both models.
The industry is moving toward Levels 4-5. Agents are taking on refactoring, feature flags, environment parameterization. 43% of DevOps teams need four or more deployment iterations before infrastructure is production-ready. The first-generation token advantage HCL holds applies to the task that is shrinking as a share of agent workloads. The refactoring and deployability advantages that Pulumi offers apply to the tasks that are growing.
If your agents primarily generate well-scoped resource definitions, HCL saves 21-33% on output tokens. That advantage is real and consistent.
If your agents need deployable output on the first pass (which avoids repair costs entirely), our data shows Opus + Pulumi is the strongest combination: 5/5 plan/preview pass for both generation and refactoring, zero repairs, lowest total pipeline cost.
If your agents evolve infrastructure over time through refactoring and modularizing, Pulumi produced deployable refactored code more reliably across both models we tested (5/5 and 4/5 vs 5/5 and 0/5 for Terraform).
Pulumi covers the full iteration loop, from generation through repair to refactoring, using the same language patterns and tooling that models already know.
Ready to explore how AI agents work with Pulumi? Check out Pulumi AI to see LLM-powered infrastructure generation in action, or get started with Pulumi’s documentation.
Join the conversation in the Pulumi Community Slack or Pulumi Community Discussions.