[cli] Include the running command name and detected AI agent (when present) in the User-Agent header on Pulumi Cloud API requests #22908
[engine] Include result on the summary engine event
#22883
[sdkgen] Eagerly error on schemas with unconstructable types #22890
[cli/cloud] Auto-fill lang and os query parameters on pulumi cloud api GET/HEAD requests when the matched OpenAPI operation declares them and the caller hasn't supplied them
#22726
[cli/package] Add pulumi package new to bootstrap a Pulumi package from a template
#22837
[cli] Add blank-line gaps between pulumi neo TUI conversation blocks
#22846
[cli/import] Preserve __-prefixed keys when generating PCL for imported resource state, so provider-defined payloads round-trip correctly
#22856
[cli/neo] Render ux__ask_user clarifying questions as questions instead of approval prompts
#22862
[cli/neo] Fix a panic when cancelling a pulumi neo session
#22898
[cli/neo] Render every assistant message in the TUI scrollback so multi-turn commentary no longer disappears between tool calls
[cli/neo] Return the bare stack name and canonical project name from pulumi_preview and pulumi_up tool results instead of echoing the raw input
#22891
[codegen/pcl] Stop reporting spurious circular references when an ignoreChanges, hideDiffs, replaceOnChanges, or additionalSecretOutputs entry shares a name with a top-level node
#22916
[programgen/pcl] Fix PCL binder panic when a conditional mixes a Promise-typed branch with a try() branch #22907
[sdk/python] Support NotRequired, Required and total=False in TypedDicts for component resource arg types
#22858
[cli] Add pulumi logs decrypt command for viewing logs
#22523
[cli] Bundle the hcl language host (from pulumi-labs/pulumi-hcl)
#22807
[cli] Automatically install the hcl converter from pulumi-labs/pulumi-hcl when running pulumi convert --from hcl
#22816
[pcl] Add read blocks to PCL to read resources via ID and query instead of registering them
#22641
[cli/cloud] Add pulumi cloud api <op-or-path> for calling any Pulumi Cloud API
endpoint, with --field/--header/--input/--body flag handling, path
template binding, content negotiation via --format, and --dry-run
[cli/cloud] Add --paginate to pulumi cloud api: follow continuation cursors,
accumulate items into a single JSON envelope, and surface progress
events to stderr with --emit-events (page, complete, truncated,
partial_failure, cancelled).
[cli] Fix the pulumi neo shell tool to honor the agent-supplied timeout and to terminate the whole process tree (and unblock cmd.Wait) when the deadline fires, so commands like kubectl logs -f no longer hang Neo indefinitely.
#22820
[cli] Surface the error and exit when pulumi neo fails to create the underlying task, instead of leaving the TUI stuck in Thinking…
#22825
[codegen/go] Correctly generate []pulumi.Asset & []pulumi.Archive
#22827
[cli/neo] Exit cleanly when the user presses Ctrl+C twice in pulumi neo instead of hanging until a third press
#22821
[engine] The engine now caches schemas at PULUMI_HOME/schemas, and will cache for parameterised packages as well
#22812
[sdk-python] Preserve __-prefixed keys (e.g. __type discriminators) across RPC deserialization, matching the behavior of the other language SDKs
#22834
[programgen/{nodejs,python}] Fix programgen to emit the right length check for string length
#22802
pulumi neo welcome banner with new Neo-branded ASCII art.
#22817The original dark factory was Fanuc’s robotics plant in Oshino, Japan, where the lights are off because nobody is on the floor. Robots build robots. Parts move through the line for weeks at a time without a person walking past them.
The same pattern is now showing up in software. Three engineers at StrongDM shipped roughly 32,000 lines of production code without writing or reviewing any of it. Stripe’s “Minions” agent system merges over a thousand pull requests every week. In January, Dan Shapiro of Glowforge published a five-level autonomy ladder that landed cleanly enough to become the shorthand most people now use, and BCG put out a piece calling it the dark software factory.
Almost every public writeup so far is about application code. The harder question is what this looks like for infrastructure.
Shapiro’s ladder is the cleanest framing I’ve seen. He borrows it from the SAE’s self-driving levels, and it fits surprisingly well:
Level What it is Driving analogy
0 Spicy autocomplete Stick shift; you do everything.
1 Coding intern (boilerplate) Cruise control.
2 Junior developer (interactive pair) One hand on the wheel.
3 AI writes the majority; you review every PR Eyes still on the road.
4 Spec-driven; agent runs unattended for hours; you review later Sleeping at the wheel, you can still wake up.
5 Dark factory; no human review of code before production No steering wheel at all.
Most teams are at level 2 or 3. A few of the more aggressive ones are at 4. Level 5 is the experiment. Most teams won’t get there safely, and probably shouldn’t try to. The interesting design question is what has to be true for level 5 to be safe at all, and that question gets sharper when the thing being shipped is infrastructure.
A dark factory is not a coding harness. A harness is the framework an agent runs inside; the dark factory is the surrounding system that makes a harness’s output mergeable without a human reading the diff. Copilot and Cursor sit at the other end: interactive, the human stays in the loop on every keystroke. The dark factory takes the human out of the per-change loop entirely and puts them at the top, writing the spec and the acceptance criteria.
Strip the dark factory down to its layers and there are four of them.
flowchart LR A[Inputs Humans] --> B[Code Generation Autonomous] B --> C[Validation Autonomous, isolated] C -->|pass| D[Merge & Deploy Autonomous + existing CI/CD] C -->|fail| B A -.->|holdout scenarios generator never sees these| C
The single most important rule is that Code Generation and Validation must be completely isolated. The generator never sees the acceptance scenarios. A separate evaluator does, and it judges the generator’s output against scenarios the generator could not have memorized.
The reason is sycophancy. LLMs are too eager to agree with their own prior turns and too willing to declare victory on something they just produced. Without isolation, the same model that wrote the change is the one telling you it’s fine. The practical concern is direct: a test stored in the same codebase as the implementation will get lazily rewritten to match the code, not the other way around. It isn’t malice; it’s the agent doing exactly what it was asked, badly. The wall is what stops that.
StrongDM’s pattern for this is holdout scenarios: plain-English BDD acceptance tests stored where the generator cannot reach them. Each scenario runs three times against an ephemeral deployment, two of three must pass, and the overall pass rate has to clear 90% before the change moves forward. If the generator fails, it gets a one-line failure message (“SQL Injection Detection failed: endpoint returned 500”), not the scenario text. It cannot game the test.
Without that wall, you don’t have a quality gate. You have theater.
Application code factories can lean on tests, linters, and type checkers. Infrastructure adds blast radius, drift, secrets, irreversible actions, and multi-region state. A code dark factory shipping a broken UI causes a bad user experience. An infrastructure dark factory shipping a broken IAM policy ends in a postmortem.
A few things make this manageable on Pulumi specifically.
The orchestrator does not need to be invented. The Pulumi Automation API is the engine as an SDK in Python, TypeScript, Go, .NET, Java, or YAML, which is the same surface a dark factory orchestrator runs on. Credentials don’t have to be long-lived: ESC and OIDC issue short-lived ones per run, so the agent never sees a static secret.
Policy doesn’t have to be probabilistic: CrossGuard enforces deterministic rules at preview time. Execution doesn’t have to happen on a laptop: Pulumi Cloud Deployments runs pulumi up inside a governed runner with audit logs and approval rules already wired. And the reasoning layer doesn’t have to start from scratch: Pulumi Neo is grounded in your state graph and ships with three modes (Auto, Balanced, Review) that line up cleanly with Shapiro’s levels 5, 4, and 3.
That doesn’t make Pulumi a dark factory by itself. It means the parts that an application-code factory has to build from scratch are pieces a Pulumi shop already has: a credential broker, a policy engine, a governed runner, a state-aware reasoning layer, an audit trail.
And one more piece nobody talks about: pulumi preview produces a clean, deterministic validation artifact, and CrossGuard evaluates that artifact without ever seeing the conversation that produced the program. That’s the same context-free judgment the holdout pattern depends on, applied at the policy layer instead of the acceptance-test layer. For infrastructure, half the wall is already built.
The interesting work is the part that nobody ships in a box.
What no platform ships for you is the wall: the holdout scenarios for infrastructure, the isolated evaluator that runs them, and the agreement on which stacks are even allowed to run lights-out.
The happy-path orchestrator is small. It pulls a spec, runs preview, hands the preview to an isolated evaluator (with its own credentials and its own access to the cloud, no access to the generator’s prompt or output), and branches on the verdict. Auto mode runs up immediately. Balanced mode submits a deployment that requires approval. Review mode opens a PR for a human. Every branch records a stack version traceable in the audit log. Retries, observability, secret rotation, and the rest of the production-grade plumbing add up to real code, but the shape is small.
The wall is the part that takes a week to get right. You write five plain-English scenarios for one stack (“after pulumi up, the bucket is private, has SSE-KMS, lives in eu-west-1, and is tagged owner=team-x”) and a janky evaluator that runs preview and up against an ephemeral copy, queries the cloud, and asks a separate model whether the resulting state satisfies the scenario. Triple-run, 90% pass gate. Then you watch it for a few weeks before you let anything auto-apply.
This is the same path the application-code factories walked, with the gates tightened.
Write an AGENTS.md for your most active stack repo. Pulumi Neo reads it natively, as do most coding agents. While you’re there, look at your CrossGuard rules and rewrite the error messages as instructions. Not “S3 bucket has no encryption” but “S3 bucket has no encryption. Set serverSideEncryptionConfiguration with SSE-KMS to fix.” That single change is the difference between an agent flailing and an agent fixing the policy violation on the first try. Wire pulumi preview as a build-before-push gate so PRs don’t show up just to fail CI.
Pick one stack with a small blast radius. A review-stack lifecycle is ideal. Write five plain-English holdout scenarios for it and the janky evaluator above. Humans still approve every PR. Don’t auto-merge yet. You’re earning the data, not declaring trust.
Only after the three measurable gates hold over twenty PRs (scenario pass rate above 90%, false positive rate below 5%, human override rate below 10%) flip auto-apply on for that one stack. Add a weekly drift sweep that goes through the same scenario gate as everything else.
Expand the auto-apply flag to every stack with strong scenario numbers. Wire your issue tracker so tickets tagged infra:fix flow through the pipeline. Mock the cloud APIs that are slow or flaky enough to make scenario evaluation expensive. At this point the orchestrator is configuration, not architecture.
None of these have clean fixes. The mitigations below reduce risk; they don’t eliminate it. Any team running level 5 should expect to eat one or two of these in the first year.
The validator approves a bad change. This is the obvious one. The standard mitigation is layered: triple-run each scenario with a 2-of-3 threshold, a 90% gate over the run set, a human audit of the first fifty auto-applied changes, and your existing policies still run after the validator says yes.
The agent gets a destroy permission it shouldn’t have. There’s a class of operations that should not sit in the autonomous loop yet: dropping a database, deleting a hosted zone, rotating a root key, anything that crosses a regulated data boundary. Scope what each agent identity can do at the credential layer, require human approval for anything destructive, and start every stack at Review mode. Tag changes, security-group adjustments, and instance resizes can run autonomously today. Release-branch cuts and config promotions can probably run by next quarter. The destructive class earns its way in over months.
You need all three of those layers. Approvals without policy means anything a human approves in a hurry ships. Policy without approvals means a sufficiently clever spec eventually finds the gap. Both without a human kill switch means an incident at 3 a.m. has nobody to escalate to.
Costs blow up. Cap retries at three per spec, alert on token spend per run, and remember that StrongDM reported roughly $1,000 per day per engineer-equivalent. That’s still cheaper than a salary, but only if you put the cap in place before you find out.
Most of what a dark factory needs already exists in any reasonably mature platform. Whatever you have for state, policy, credentials, audit, and a deployment runner is the substrate. The interesting work is not building the factory. It’s the wall: the holdout scenarios that make the gap between “the model says it’s fine” and “the system is actually fine” mean something.
For most teams, Phase 1 alone is the win. Full Level 5 may stay out of reach indefinitely, and that’s fine. The path itself forces useful work: clearer specs, named bottlenecks, the deterministic gates humans had been running in their heads.
Write an AGENTS.md and five holdout scenarios for one stack this week. That’s enough to get a real signal on whether the pattern fits your team. The rest of the path is the same problem the application-code factories have already worked through, with the gates set tighter.
Custom VCS is a new Pulumi Cloud integration that connects any Git or Mercurial version control system to Pulumi Deployments using webhooks and centrally managed credentials. Pulumi Cloud already has native integrations with GitHub, GitLab, and Azure DevOps, but if your team uses a self-hosted or third-party VCS, you’ve been limited to manually configuring credentials per stack with no webhook-driven automation. Custom VCS closes that gap.
Many teams run self-hosted or third-party Git servers that Pulumi Cloud doesn’t have a native integration for, and some teams still use Mercurial. Until now, their only option was the raw git source approach: embedding credentials directly in each stack’s deployment settings, with no way to trigger deployments automatically on push, and no support for Mercurial at all.
This meant:
No push-to-deploy: Every deployment had to be triggered manually or through a separate CI pipeline.
Scattered credentials: Each stack configured its own credentials independently, with no centralized management.
No org-level integration: There was no shared configuration that multiple stacks could reference.
Custom VCS integrations introduce an org-level integration type that works with any Git or Mercurial server. The setup has three parts:
Credentials through ESC: Instead of OAuth flows, you store your VCS credentials (a personal access token, SSH key, or username/password) in a Pulumi ESC environment. The same credential structure works for both Git and Mercurial. The integration references this environment by name and resolves credentials at deployment time. Multiple stacks can share the same credentials without duplicating secrets.
Manual repository registration: You add repositories to the integration by name. Pulumi joins the repository name with the integration’s base URL to form clone URLs. There’s no auto-discovery, so you control exactly which repositories are available.
Webhook-driven deployments: Pulumi provides a webhook endpoint and an HMAC shared secret. You configure your VCS server to POST a JSON payload on push events, and Pulumi automatically triggers deployments for matching stacks. The webhook supports branch filtering and optional path filtering.
Custom VCS focuses on the deployment automation use case. Here’s how it compares to native integrations:
Capability Native integrations Custom VCS
Push-to-deploy Yes Yes
Path filtering Yes Yes
PR/MR previews Yes No
Commit status checks Yes No
PR comments Yes No
Review stacks Yes No
Features like PR comments, commit statuses, and review stacks require deep API integration with each VCS platform, so they aren’t available with Custom VCS. If your VCS provider is GitHub, GitLab, or Azure DevOps, we recommend using the native integration for the full feature set.
Neo, Pulumi’s AI assistant, works with Custom VCS integrations for repository operations that don’t depend on VCS-specific APIs. Neo can clone and push to Git and Mercurial repositories registered with your Custom VCS integration using the credentials from the integration’s ESC environment. Neo cannot open pull requests or create new repositories on Custom VCS servers at this time. Those operations require APIs unique to each VCS platform and are only available through native integrations.
To set up a Custom VCS integration:
Navigate to Management > Version control in Pulumi Cloud.
Select Add integration and choose Custom VCS.
Provide a name, base URL, and ESC environment containing your credentials.
Add your repositories.
Configure your VCS server to send webhooks to the provided URL.
For the full setup guide including webhook payload format, HMAC signing, and credential configuration, see the Custom VCS documentation.
[cli/cloud] Add pulumi cloud api describe for inspecting the parameters, request
body, and response schema of any Pulumi Cloud API operation, with
text, markdown, and JSON output
[cli/cloud] Add pulumi cloud api list for browsing every endpoint exposed by the Pulumi
Cloud OpenAPI spec, with table and JSON output
[auto/python] Expose the auto-generated Pulumi CLI interface as workspace.cli_api
#22638
[cli] Add encrypted logging to ~/.pulumi/logs; use the PULUMI_ENABLE_AUTOMATIC_LOGGING feature flag to turn it on #22494
[cli] Implement the filesystem__grep and filesystem__content_replace local tools
for pulumi neo. grep runs a regex search across files in the project root
with an optional include glob filter and returns results in path:lineno: line
form. content_replace performs a literal multi-file search-and-replace with a
file_pattern glob and dry_run preview mode. Both tools skip binary files,
hidden directories, and node_modules, and reject paths outside the project
root. Their input schemas match the cloud-side tool definitions.
[cli] Add pulumi_preview and pulumi_up as local tools for the experimental pulumi neo
agent. The Neo TUI renders a persistent bordered block for each operation that
streams changed resources and diagnostics as the engine runs and finalizes with a
summary of the op counts. Hidden behind PULUMI_EXPERIMENTAL.
[engine] Add List to the provider protocol and schema
#22693
[engine] Return a clear error when two installed plugins claim the same default provider package name (for example, a native scaleway provider alongside a terraform-provider bridge parameterized as scaleway) instead of panicking with "Should not have seen an older plugin if sorting is correct!"
#22679
[programgen] Do not wrap a call(...) on a method whose return type is marked plain in an Output. Previously
PCL bound every method call's return type as Output<T>, which caused downstream program-gen
to emit broken .apply(...)/.ApplyT(...) traversals against plain struct returns (e.g.
methods with liftSingleValueMethodReturns=true or ReturnTypePlain=true).
[backend/diy] When using a backend url containing creds (e.g. PostgreSQL conn string), mask user:pass as in lock-related error messages #22701
[codegen/go] Generate unqualified Provider references for the package's own provider resource. Previously
the Go codegen always emitted <pkg>.Provider even when the reference appeared inside <pkg>
itself, producing identifiers that would not compile. Affects generated code for method return
types (and other schema positions) that reference pulumi:providers:<pkg>.
[codegen/nodejs] Generate unqualified Provider references for the package's own provider resource when emitting
TypeScript code inside that package. Previously the generator always qualified the name as
<pkg>.Provider, which does not resolve when no <pkg> namespace import is in scope.
[codegen/nodejs] Import the correct class name for a provider resource. Imports for pulumi:providers:<pkg> used
the title-cased package name instead of Provider, producing a phantom identifier that clashed
with the containing package's component/resource classes.
[programgen/nodejs] Emit await for call(...) invocations of methods whose return type is marked plain, and
force the generated program into an async export = async () => ... wrapper whenever such a
call is present. The Node SDK returns Promise<T> for plain methods; previously program-gen
used the result directly, which did not match its runtime type.
[codegen/python] Avoid a self-import (import pulumi_<pkg> inside pulumi_<pkg>/<module>.py) when referencing
the package's own provider resource. Python referenced the Provider as pulumi_<pkg>.Provider
even inside that package, which caused a circular import at runtime.
[sdk/python] Reduce internal Output[T] data to a single asyncio.Future
#22661
[sdkgen/{nodejs,python}] Generate optional input types that accept undefined/None values #22552
Neo already helps your team manage Pulumi infrastructure, but no infrastructure team works inside Pulumi alone. Pages come from PagerDuty, telemetry from Datadog or Honeycomb, follow-ups from Linear or Jira. Most of the job is shuttling context between those tools.
Today we’re launching the Integration Catalog for Pulumi Neo: one place to connect Neo to the tools your team already uses, so your agent has the context it needs to help.
Neo ships with six integrations at launch, each exposed to the agent through the Model Context Protocol (MCP):
Atlassian — Jira issues, Confluence pages, project context
Datadog — metrics, logs, monitors
Honeycomb — traces and observability queries
Linear — issue tracking and project workflows
PagerDuty — incidents, on-call schedules, escalations
Supabase — database management and edge functions
Each integration is a remote MCP server. Neo calls the integration through a structured tool protocol and only sees the tools the vendor chooses to expose.
A latency spike showed up in Datadog yesterday afternoon, and you want to know whether your deploy caused it.
You: Neo, our payments stack saw elevated p95 starting around 3pm yesterday. Did our deploy cause it? Check Datadog and Honeycomb.
Neo lines up the Pulumi update history for the payments stack against the latency and error-rate metrics in Datadog around the same window, then surfaces the top slow traces in Honeycomb to confirm the suspect change.
You: Open a Linear ticket on the platform team with the findings and link the offending update.
Neo opens the Linear issue with the summary, the Pulumi update URL, and a pointer to the Datadog dashboard, all without you leaving the chat or copy-pasting context between tabs.
Admins configure credentials once. In your org’s Neo settings, open the Integration Catalog, pick an integration, and paste in an API token or service-account key.
Your team gets the capability immediately. No per-user setup, no extra OAuth flow for each developer, no asking platform to share a token in 1Password.
Credentials stay encrypted at rest. When a task runs, the service decrypts the configured credentials just long enough to hand them to the agent runtime as MCP server auth.
This is the first cut. Here’s what we’re working on:
CLI integrations — give Neo access to command-line tools like kubectl, aws, gcloud, and az.
OAuth integrations — for providers whose hosted MCP servers only speak OAuth (Notion, Sentry, Vercel), and for orgs that want per-user credentials.
Per-integration access controls — team-scoped policies so admins can say “only the platform team can let Neo touch PagerDuty.”
The Integration Catalog is available now for Neo-enabled organizations. Open your org’s Neo settings, head to the Integrations tab, and connect the first tool you reach for when something breaks. The Neo integrations docs walk through the setup for each one.
As always, we’d love to hear what’s missing. File a feature request in pulumi-cloud-requests with the integration you want next. We’re prioritizing based on what teams actually use.
Happy building.
[cli] Add an experimental pulumi neo command that creates a Pulumi Neo agent task in CLI
tool-execution mode and runs the local tool loop. Filesystem and shell tool calls
issued by the agent run on the user's machine in their working directory; the
interactive chat continues to happen in the Pulumi Console at the URL the command
prints. Hidden behind PULUMI_EXPERIMENTAL.
[cli] Add support for handling user approval requests in the pulumi neo terminal UI.
When the agent requests confirmation for a sensitive action, the TUI prompts the
user and forwards their response back to the Pulumi Console. Hidden behind
PULUMI_EXPERIMENTAL.
[cli] Add an interactive terminal UI for pulumi neo built with bubbletea, rendering
agent messages, tool calls, and streaming output in the terminal alongside the
Pulumi Console session. Hidden behind PULUMI_EXPERIMENTAL.
[cli] Switch logging library from glog to slog.
BREAKING: any if logging.V(x) { need to be changed to if logging.V(x).Enabled()
[cli] Add a plan-mode toggle to the pulumi neo TUI, bound to Shift+Tab. When
plan mode is on, Neo explores and asks questions without writing files,
running pulumi up, or opening PRs, and surfaces an approved plan via a
dedicated approval gate. The toggle must be set before the first message
(plan mode is task-level on the wire); approving the proposed plan exits
plan mode automatically.
[cli] The pulumi neo TUI now drives its "thinking" spinner off a single declarative
rule (the spinner stays on until a final event — final assistant message, approval
request, cancellation, or error — lands), so the indicator no longer flickers off
when the agent hands off tool calls to the CLI or when streaming text arrives
between tools. Press Esc during a turn to ask the agent to cancel; the label
switches to "Cancelling..." until the backend acknowledges.
[cli] pulumi neo now executes the edit filesystem tool locally, matching the schema
and response wording of the upstream mcp-claude-code tool so the agent sees
identical output whether the call ran on Cloud or CLI. edit performs exact-string
replacement with occurrence-count validation, and creates a new file when the
target is missing and old_string is empty.
[cli] Render user messages in the pulumi neo TUI as soon as they're submitted
instead of waiting for the Pulumi Cloud event stream to echo them back.
The initial prompt passed on the command line also appears in the
transcript at startup. Self-echoes from the server are de-duplicated;
user input that originated from another client (e.g. the web UI on the
same task) still renders.
[cli] Wrap warnings, errors, and user-message bubbles to the terminal width in the
pulumi neo TUI. Previously these blocks rendered as single long lines that
were clipped at the right edge of the viewport. On resize, all width-dependent
transcript blocks (user messages, warnings, errors, assistant messages) now
reflow to the new terminal width.
[cli/policy] Fix policy ls to use the default org name, not username
#22656
[engine] Fix provider registry race condition in parallel delete-before-replace #21487
[engine] Signal providers to cancel before closing them during replacement
[sdkgen] Error on 'id' in state inputs #22636
[programgen/python] Add necessary casts between types in generated programs #22567
[sdkgen/go] Fix caching of package references to be per-deployment not per-process #22170
[java] Upgrade java to v1.25.0 #22673
[auto/go] Generate command methods for the Go Automation API codegen #22612
[sdk] Clarify docs on the remote parameter of ComponentResource / Resource in the Node and Python SDKs
#22603
[sdk/dotnet] Upgrade dotnet to v3.103.1 #22676
[yaml] Upgrade yaml to v1.32.0 #22674
Policy authors who need external credentials or environment-specific configuration have had to hardcode values or manage them outside of Pulumi. Policy packs can now reference Pulumi ESC environments, bringing centralized secrets and configuration management to your policies.
Pulumi policy packs let you enforce rules across your infrastructure, but some policies need more than just the resource inputs they evaluate. A policy that validates resources against an external compliance API needs an API token. A cost-enforcement policy might need different spending thresholds for development and production environments. An access-control policy might need to reference an internal service registry.
Until now, these values had to be hardcoded in your policy group configuration or managed through a separate process entirely. This created several problems:
Security risk: Credentials stored in plain text in policy group config
Operational burden: Updating a credential meant touching every policy group that used it
No environment separation: The same values applied everywhere, with no way to vary configuration across environments
Policy packs can now reference ESC environments, just like stacks already do. When you attach an ESC environment to a policy pack in a policy group, the values from that environment are available to your policies at runtime — whether you’re running preventative or audit policies.
This means your policy packs can use ESC for:
Secrets: API tokens, service credentials, and other sensitive values managed through ESC’s secrets management, including dynamic credentials from providers like AWS, Azure, and GCP
Configuration: Environment-specific thresholds, allowed regions, service allowlists, and other policy parameters that vary across environments
You configure ESC environment references on a policy pack within a policy group. At runtime, the values from those environments are resolved and made available to your policies through the policy pack’s configuration.
Here’s an example ESC environment that provides configuration to a compliance policy pack:
values:
compliance:
apiToken:
fn::secret: xxxxxxxxxxxxxxxx
costThreshold: 5000
policyConfig:
cost-compliance:
maxMonthlyCost: ${compliance.costThreshold}
apiEndpoint: https://compliance.example.com
apiToken: ${compliance.apiToken}
The policyConfig property works just like pulumiConfig does for stacks. Values nested under each policy name are made available as configuration to that policy at runtime. Secrets remain encrypted and are only decrypted when the environment is resolved.
You can also use the environmentVariables property to inject values as environment variables into the policy runtime, following the same pattern as stack environment variables.
Consider a policy that validates every new resource against an external compliance API before it can be provisioned. The API requires an authentication token and returns whether the resource configuration meets your organization’s compliance standards.
Before, the API token lived in the policy group configuration in plain text. Rotating the token meant updating every policy group. There was no audit trail for who accessed the credential, and no way to use different API endpoints for staging and production compliance checks.
After, the API token lives in an ESC environment. You get:
Centralized rotation: Update the token in one place and every policy group that references the environment picks up the change
Access controls: ESC’s role-based access controls govern who can view or modify the credential
Audit trail: Every access to the environment is logged
Environment separation: Use different ESC environments for different policy groups, so staging policies validate against a staging compliance endpoint while production policies use the production endpoint
To start using ESC environments with your policy packs:
Create an ESC environment with your policy configuration and secrets
Attach the environment to a policy pack in your policy group through the Pulumi Cloud console
Update your policies to read from the configuration values provided by the environment
To learn more:
Somewhere in your company right now, a developer is building an AI agent. Maybe it’s a release agent that cuts tags when tests pass. Maybe it’s a cost agent that shuts down idle EC2 overnight. It’s running, it’s in production, and there’s a decent chance the platform team doesn’t know it exists.
This isn’t a thought experiment. OutSystems just surveyed 1,900 IT leaders and the numbers are rough: 96% of enterprises run AI agents in production today, 94% say the sprawl is becoming a real security problem, and only 12% have any central way to manage it. Twelve percent. You can read the full report here.
The real question is where those agents run. Inside the platform you’ve already built, or somewhere off to the side where nobody on the platform team can see them.
Platform teams have always had two jobs that pull in opposite directions. Let developers ship without waiting on a ticket. Keep the infrastructure coherent while they do. Golden paths, review stacks, a catalog of components that don’t fight each other.
Agents break the second half of that deal.
A developer with a sharp prompt can spin up an SRE agent that watches a queue, a release agent that cuts tags when the test suite goes green, or a cost agent that kills idle infra at 2 a.m. That’s useful. It’s also running on your production cloud account, using credentials you never provisioned, writing to systems you never approved, and the only audit trail is whatever the developer remembered to log. The Salesforce 2026 Connectivity Benchmark pegs the average enterprise at twelve agents today, projected to grow 67% over the next two years. Most teams aren’t ready for one, let alone twenty.
This is the same shape as every sprawl problem before it. I wrote about the last one in How Secrets Sprawl Is Slowing You Down, and the pattern keeps repeating. When something useful gets cheap, it proliferates. When it proliferates without structure, it becomes a liability.
The clock is also ticking on the compliance side. The EU AI Act’s high-risk obligations kick in on 2 August 2026. Colorado’s AI Act goes live on 30 June 2026 after last year’s delay. A folder of unreviewed agent scripts isn’t going to hold up against either of those.
There are roughly three paths from here.
Do nothing. Accept the sprawl and hope nothing catches fire. This is the default, and it’s also how you end up explaining to an auditor why some finance agent moved data between three systems last Thursday and nobody remembers which prompt triggered it.
Mandate centralization. Tell developers every agent has to be registered and approved before it runs. This sounds responsible on a slide, and it falls apart inside a sprint. Developers route around friction. If the official path takes a week and the unofficial path takes an afternoon, the unofficial path wins, and you’ve just pushed the sprawl underground where you can’t see it anymore.
Make the platform the obvious path. Build the thing developers actually want to use. A place where an agent inherits the guardrails, credentials, policies, and audit trail by default, because that’s what’s on offer. Adoption becomes a side effect of shipping something good.
Option three is the only one that scales. It’s also the one where most platform teams look at their existing stack and assume they need to build a pile of new scaffolding. I don’t think they do, and the rest of this post is why.
An agent needs seven concrete things from the platform it runs on. Each one maps to a Pulumi primitive you already own.
Agents are only as good as the context they can reason over. Drop a generic LLM into your cloud account and you’ll get plausible-sounding nonsense, because the model has never seen your environment. What you actually need is a grounded source of truth: what resources exist, how they relate, which stack owns what, which version is running where.
Pulumi state is already that. Your program graph, your stack outputs, your resource metadata, all of it adds up to a structured record of what you’ve actually deployed. Pulumi Neo reasons directly over that graph, which is why it can tell you why a deployment drifted instead of guessing. I wrote the long version of that argument there. Short version: you already have the context lake. Point agents at it.
An agent that needs to touch five systems shouldn’t need five separate credential dances. That’s where credential sprawl starts. Every agent gets a long-lived key, every key ends up in somebody’s .env, and every rotation turns into an incident.
The Pulumi surface here is the 200+ providers plus Pulumi ESC handling dynamic credentials through OIDC. An agent doesn’t ask for an AWS access key. It asks ESC for a short-lived, scoped token bound to the environment it’s allowed to operate in, and the token expires when the task ends. No static keys, no rotation pain, no awkward postmortem about how something got committed to GitHub. The ESC patterns I walked through in the Claude skills post work just as well for an autonomous agent as they do for a human developer, which is really the whole point.
There’s a real difference between “an agent can see your infrastructure” and “an agent can change your infrastructure.” The second one is where you actually need structure. Pulumi Deployments gives you that structure: defined workflows, controlled triggers, running inside your Pulumi Cloud boundary instead of whatever environment the developer happened to spin up. The Automation API lets you build higher-order orchestration on the same primitives your developers already use.
The framing I keep coming back to goes like this. An agent shouldn’t call pulumi up directly. It should submit an action to a governed pipeline that runs pulumi up on its behalf, inside an environment you control, with a log trail and the guardrails already in place. Same effect, very different threat model.
Real governance lives outside the prompt. “Please don’t delete production” is a wish written into a system prompt, not an enforced control. And when an agent overrides your intent to do what it thought you meant, it’s behaving exactly the way the technology was designed to behave.
Pulumi Policies is the answer the IaC community landed on years ago: policy as code, written in a real programming language, evaluated deterministically at preview and update time. Disallow production RDS deletions. Require encryption at rest. Block S3 buckets with public ACLs. An agent running through Pulumi hits those gates whether it “wants” to or not, because the gates live in the pipeline and not in the prompt. This is the pillar most teams underweight, and it’s the first one most auditors ask about.
When something goes wrong at 3 a.m. (and with enough agents running, something will), you need answers fast. What changed, who changed it, and why. Not just “which agent,” but which version of which agent, triggered by what event, authorized by which policy, touching which resources.
Pulumi Cloud’s activity log, the stack update history, and ESC audit logs already capture all of that. Every update is versioned. Every secret access is logged. Every policy evaluation is recorded. When an agent submits a change through your Pulumi pipeline, it inherits that audit surface for free. The alternative is reconstructing an incident from a mix of Slack messages, container logs, and developer memory, which is roughly the state most teams without a platform are in today.
Not every agent action should wait for a human. But agents do need a promotion path, the same way new platform components do. Experimental, then reviewed, then trusted, then autonomous. That’s exactly what pulumi preview, review stacks, and Deployments PR workflows already model for human contributors. An agent that wants to make a change should have to submit it the same way a junior engineer would. As a diff, with a plan, against a preview environment, until it earns the trust to skip steps.
This connects back to the pattern I laid out in Golden Paths: Infrastructure Components and Templates. Golden paths were never only for humans. They’re just paths, and agents can walk them too.
The last pillar is the one that keeps the other six honest. Some decisions shouldn’t be automated, full stop. Production rollbacks outside business hours. Destructive changes above a certain blast-radius threshold. Anything that touches a regulated data boundary. For those cases, you want a forced human checkpoint that the agent can’t route around.
Pulumi Deployments approvals already play that role for human changes. Pulumi Neo’s review steps add the AI-aware version: a structured plan, a diff, a named approver, and a record of what they decided and why. I walked through what this looks like in practice in Self-Verifying AI Agents. Short version: an agent that proposes is much safer than an agent that commits.
Step back from the seven pillars and look at what they have in common. Context, integrations, governed actions, deterministic policy, audit, review, approval. None of those are new problems that AI agents invented. They’re the problems infrastructure-as-code has been quietly solving for a decade, for human developers.
Every meaningful agent action ends up being a change, whether that’s to infrastructure, configuration, secrets, or state. IaC is the one layer in your stack that already treats change as the unit of work. Plan, preview, apply, record. If you want governance for agents and you don’t want to build it twice, the most efficient move is to route agent changes through the same substrate your humans already use.
I made the same point from a different angle in Token Efficiency vs Cognitive Efficiency: Choosing IaC for AI Agents. An IaC platform that models your world as a graph of typed resources is a much better reasoning surface for an agent than a stack of YAML or a bash script somebody wrote on a Friday. The structure is what makes it work.
There’s a narrative floating around that AI is going to make platform engineers less relevant. I haven’t seen it hold up against an actual production environment. Every stat I’ve looked at points the other way. Gartner expects 70% of enterprises to deploy agentic AI as part of IT infrastructure and operations by 2029, up from less than 5% in 2025. LangChain’s State of Agent Engineering report already has 57% of teams running agents in production today. And Gartner projects that 80% of large software engineering orgs will have a platform team by end of 2026, up from 45% in 2022. More agents means more changes, more changes means more blast radius, and more blast radius means more need for the thing platform teams are uniquely equipped to provide.
Your classic responsibilities haven’t gone anywhere either. Golden paths, service catalogs, CI/CD, on-call rotations, all of that is still yours. Agents are an additional layer that needs the same discipline. The upside is that if your platform already runs on a mature IaC surface, you’re extending a muscle you’ve been building for years instead of growing a new one.
The developer-facing side matters too. A developer building an agent needs to know what’s available to them, needs templates that work on the first try, and needs to see what teammates have already built so they don’t start from a blank page. That’s the territory the Claude skills post and IDP Strategy: Self-Service Infrastructure That Balances Autonomy With Control cover. That’s the experience layer that makes developers actually choose your platform instead of routing around it. You need both sides working at once. The governance your security team cares about, and the experience your developers will actually reach for.
The agents your developers are shipping this week are going to outlive the experiment that started them. Some of them will become critical. At least one will cause an incident. At least one will eventually show up in an audit. All of them are going to be easier to govern if they were built on your platform from day one than if you try to wrap policy around them later.
If you want the longer view on where this is going, AI Predictions for 2026: A DevOps Engineer’s Guide is the companion piece. If you want the developer-facing version of the grounding argument, Grounded AI is what to read next.
Either way, here’s where I land. The substrate for agent governance is already running in your stack. You’ve been pointing it at human changes for years. Now point it at the agents too.
Pulumi Cloud now supports Bitbucket Cloud as a first-class VCS integration, joining GitHub, GitLab, and Azure DevOps. Connect your Bitbucket workspace to deploy infrastructure on every push, preview changes on pull requests, spin up ephemeral review stacks, and get AI-powered change summaries — all without an external CI/CD pipeline.
Connect a Bitbucket repository to a stack and infrastructure deploys automatically when you push to your configured branch. Configure path filters so only relevant file changes trigger deployments, and manage environment variables and secrets directly in Pulumi Cloud. No external CI/CD pipeline required.
Every pull request gets an infrastructure preview showing exactly what will change before merging. Neo posts AI-generated summaries explaining what the changes mean in plain language, so reviewers can understand the impact without reading resource diffs.
The integration supports two authentication methods depending on your Bitbucket plan:
Personal OAuth works with every workspace, including free plans. Authorize through the standard OAuth flow and you’re connected.
Workspace tokens are available for Premium workspaces. Generate a token with the required scopes (repository:admin, repository:write, pullrequest:write, webhook) and paste it into Pulumi Cloud for a service-account-style connection that isn’t tied to an individual user.
Both methods register webhooks automatically — no manual configuration required.
The new project wizard discovers your Bitbucket workspace, repositories, and branches so you can scaffold and deploy a new stack without leaving Pulumi Cloud. Create a new repository directly from the wizard or select an existing one and configure VCS-backed deployments in a few clicks.
An org admin configures the integration under Management > Version control.
Authorize with Bitbucket using personal OAuth or a workspace token.
Deploy infrastructure with first-class workflows.
For full setup details, see the Bitbucket integration docs.
The Pulumi Cloud REST API reference is now generated directly from the live OpenAPI spec at build time. Every endpoint, parameter, request body, and response schema you see on the page comes from the same spec the API itself publishes. The docs now stay in sync with the API automatically!
The previous REST API reference was a set of handwritten pages. That meant every new endpoint, renamed parameter, or revised response shape needed a matching docs PR, and in practice the pages drifted. Small inconsistencies added up: missing parameters, outdated request shapes, schemas that no longer matched what the API returned. We wanted a durable fix that keeps the docs in sync as the API grows.
Generating the reference from the OpenAPI spec closes that gap. When the API ships a change, the docs pick it up automatically the next time our docs are built.
The reference at /docs/reference/cloud-rest-api/ now includes:
Endpoints are grouped by product area — Stacks, Deployments, Environments, Organizations, Registry, Insights, AI, Workflows, and more — so you can jump straight to the part of the API you’re working with.
Every endpoint documents its parameters, request body, and the exact shape of what it returns, so you know what to send and what to expect back without guessing.
When a response references another object, the type name is a link. Click through to drill into its full definition if desired instead of scrolling a lengthy API reference page.
Keeping the reference in sync with the spec isn’t just a human convenience. It changes what’s reliable for AI agents that read the docs and call the API on your behalf. An agent reading a handwritten reference might see a parameter that was renamed six months ago, or miss a field the API now returns, and the call fails silently or in ways that are hard to debug. When the reference is generated from the spec, the agent is working from what the API actually accepts today.
Say you’re onboarding a new team and need to stand up their access in Pulumi Cloud. Point an agent at the REST API reference and ask it to create an sre-oncall team, add four members, and grant admin on three stacks. The agent walks the teams, memberships, and stack-permissions endpoints, builds the right sequence of calls, and executes.
The same pattern holds for bulk audits and cleanup. Ask an agent to find every stack in your org with no recent updates and tag them stale, and it can paginate correctly because the response schema matches reality. While workflows like these were technically possible before, they’re much more reliable now.
The generated docs live at the same URL as the previous reference: /docs/reference/cloud-rest-api/. Bookmarks, blog links, and inbound search traffic still land on the right page. Redirects are in place for any API reference docs page that has been tweaked, renamed, or moved.
Start at the new REST API reference and browse by category. Each page links through to the request and response object schemas it uses.
If you spot anything that looks wrong, the most likely culprit is the OpenAPI spec itself — file an issue in pulumi/docs and we’ll trace it back to the source. For tag intros and structural improvements, PRs to pulumi/docs are welcome. Questions and feedback are always welcome in the Pulumi Community Slack.
[cli] Auto-detect Mercurial repository metadata for pulumi up / pulumi preview updates, mirroring existing Git support
#22618
[engine] Send Cancel RPC to plugins on host close for graceful shutdown #22569
[engine] Pass resource options to hooks through the engine #22582
[engine] Add GetDeploymentInfo to the resource monitor service
[auto/go] Add New command to Automation API
#22439
[auto/{go,nodejs,python}] Add --diff to automation api for destroy #22563
[auto/python] Add new command to Automation API
#22439
[pcl] Support for resource hooks in PCL #22365
[sdk-nodejs] Warn when a non-ComponentResource class is passed in the explicit components list to componentProviderHost
#22619
[sdk-python] Warn when a non-ComponentResource class is passed in the explicit components list to component_provider_host
#22619
[cli/import] Add support for providers to be defined in the same import file as their users #21671
[cli/policy] ESC environment support for local policy packs #22495
[sdk/nodejs] Support package.yaml when using pnpm #22491
[sdk/python] Add function decorator variants for resource and error hooks #22519
[auto/{nodejs,python}] Support --run-program for inline programs with preview_refresh/destroy
[backend/diy] Remove state lock for destroy and import with preview only for diy backend #22561
[engine] Fix snapshot integrity error with component/provider resources in refresh --run-program #21817
[pcl] Type list and tuple indices as integers not numbers #22592
[pcl] Builtin functions element and range take int parameters not numbers #22597
[programgen/{go,nodejs,python}] Fix some cases of name conflicts in program gen not being handled correctly #22556
[programgen/nodejs] Add necessary casts between types in generated programs #22557
[programgen/{nodejs,python}] Fix imports of camelCase modules #22536
[sdk/python] Support Input[Optional[T]] in Python runtime type unwrapping #22553
[sdkgen] Warn about modules nested under the index module which are not supported #22531
Pulumi Insights account scanning now supports every AWS partition. If your workloads run in GovCloud, China, the European Sovereign Cloud, or one of the ISO intelligence-community clouds, you can get the same resource discovery, cross-account search, and AI-assisted insights that commercial accounts already have.
AWS Standard (Commercial)
AWS GovCloud (US)
AWS ISO (US)
AWS ISOB (US)
AWS ISOF (US)
AWS ISOE (Europe)
AWS European Sovereign Cloud
AWS China
You can also exclude specific regions from discovery — useful when regions are disabled by SCPs or fall outside an audit’s scope.
Credentials are exchanged against the partition’s STS endpoint, and every scanner API call targets that partition’s regional endpoints. Discovery traffic doesn’t cross the boundary.
In the Pulumi Cloud console:
Go to Accounts → Create account.
Select AWS as the provider.
Under Add your configuration, pick the target partition.
Supply credentials via a Pulumi ESC environment. The OIDC trust policy uses the partition-appropriate ARN prefix (arn:aws-us-gov:, arn:aws-cn:, etc.).
For IAM and ESC setup, see the Insights accounts docs. Log in to Pulumi Cloud to get started.
Three community frameworks have emerged that fix the specific ways AI coding agents break down on real projects. Superpowers enforces test-driven development. GSD prevents context rot. GSTACK adds role-based governance. All three started with Claude Code but now work across Cursor, Codex, Windsurf, Gemini CLI, and more.
Pulumi uses general-purpose programming languages to define infrastructure. TypeScript, Python, Go, C#, Java. Every framework that makes AI agents write better TypeScript also makes your pulumi up better. After spending a few weeks with each one, I have opinions about when to use which.
AI coding agents are impressive for the first 30 minutes. Then things go sideways. The patterns are predictable enough that three separate teams independently built frameworks to fix them.
Context rot. Every LLM has a context window. As that window fills up, earlier instructions fade. You start a session asking for an S3 bucket with AES-256 encryption, proper ACLs, and access logging. Two hours and 200K tokens later, the agent creates a new bucket with none of those requirements. The context window got crowded and your original instructions lost weight.
No test discipline. Agents write code that looks plausible. Plausible code compiles. Plausible code even runs, for a while. But plausible code without tests is a liability. The agent adds a feature and quietly breaks two others because nothing verified the existing behavior was preserved.
Scope drift. You ask for a VPC with three subnets. The agent decides you also need a NAT gateway, a transit gateway, a VPN endpoint, and a custom DNS resolver. Helpful in theory. In practice, you now have infrastructure you never requested and barely understand. You will also pay for it monthly.
These problems are not specific to Claude Code or any particular agent. They happen with Cursor, Codex, Windsurf, and every other LLM-powered coding tool. The context window does not care which brand name is on the wrapper.
Superpowers was created by Jesse Vincent and has accumulated over 149K GitHub stars. The core idea is simple: no production code gets written without a failing test first.
The framework enforces a 7-phase workflow. Brainstorm the approach. Write a spec. Create a plan. Write failing tests (TDD). Spin up subagents to implement. Review. Finalize. Every phase has gates. You cannot skip ahead. The iron law is that production code only exists to make a failing test pass.
This sounds rigid. It is. That is the point.
Superpowers includes a Visual Companion for design decisions, which helps when you are making architectural choices that need visual reasoning. The main orchestrator manages the entire workflow from a single context window, delegating implementation work to subagents that run in isolation.
The tradeoff is that the mega-orchestrator pattern means the orchestrator itself can hit context limits on very long sessions. One big brain coordinating everything works well until the big brain fills up. For most projects, this is not an issue. For marathon sessions with dozens of files, keep it in mind.
The workflow breaks down into skills that trigger automatically:
Skill Phase What it does
brainstorming
Design
Refines rough ideas through Socratic questions, saves design doc
writing-plans
Planning
Breaks work into 2-5 minute tasks with exact file paths and code
test-driven-development
Implementation
RED-GREEN-REFACTOR: failing test first, minimal code, commit
subagent-driven-development
Implementation
Dispatches fresh subagent per task with two-stage review
requesting-code-review
Review
Reviews against plan, blocks progress on critical issues
finishing-a-development-branch
Finalize
Verifies tests pass, presents merge/PR/keep/discard options
The results speak for themselves. The chardet maintainer used Superpowers to rewrite chardet v7.0.0 from scratch, achieving a 41x performance improvement. Not a 41% improvement. 41 times faster. That is what happens when every code change has to pass a test: the agent optimizes aggressively because it has a safety net.
Superpowers works with Claude Code, Cursor, Codex, OpenCode, GitHub Copilot CLI, and Gemini CLI.
GSD (Get Shit Done) was created by Lex Christopherson and has over 51K stars. Where Superpowers focuses on test discipline, GSD attacks the context window problem directly.
The key architectural decision: GSD does not use a single mega-orchestrator. Instead, it assigns a separate orchestrator to each phase of work. Each orchestrator stays under 50% of its context capacity. When a phase completes, the orchestrator writes its state to disk as Markdown files, then a fresh orchestrator picks up where the last one left off.
Think about why this matters. With a single orchestrator, your 200K token context window is a shared resource. Instructions from hour one compete with code from hour three. GSD sidesteps this entirely. Every phase starts with a full context budget because the previous phase’s orchestrator handed off cleanly and shut down.
The state files use XML-formatted instructions because (it turns out) LLMs parse structured XML more reliably than freeform Markdown. GSD also includes quality gates that detect schema drift and scope reduction. If the agent starts cutting corners or wandering from the plan, the gates catch it.
GSD evolved from v1 (pure Markdown configuration) to v2 (TypeScript SDK), which tells you something about the level of engineering behind it. The v2 SDK gives you programmatic control over orchestration, not just static instruction files.
The tradeoff: GSD has more ceremony than the other two frameworks. For a quick script or a single-file change, the phase-based workflow is overkill. GSD earns its keep on projects that span multiple files, multiple sessions, or multiple days.
The core commands map to a phase-based workflow:
Command What it does
/gsd-new-project
Full initialization: questions, research, requirements, roadmap
/gsd-discuss-phase
Capture implementation decisions before planning starts
/gsd-plan-phase
Research, plan, and verify for a single phase
/gsd-execute-phase
Execute all plans in parallel waves, verify when complete
/gsd-verify-work
Manual user acceptance testing
/gsd-ship
Create PR from verified phase work with auto-generated body
/gsd-fast
Inline trivial tasks, skips planning entirely
GSD supports the widest range of agents: 14 and counting. Claude Code, Cursor, Windsurf, Codex, Copilot, Gemini CLI, Cline, Augment, Trae, Qwen Code, and more.
GSTACK was created by Garry Tan (CEO of Y Combinator) and has over 71K stars. It takes a fundamentally different approach from the other two frameworks.
Instead of disciplining a single agent, GSTACK models a 23-person team. CEO, product manager, QA lead, engineer, designer, security reviewer. Each role has its own responsibilities, its own constraints, and its own slice of the problem.
The framework enforces five layers of constraint. Role focus keeps each specialist in their lane. Data flow controls what information passes between roles. Quality control gates ensure standards at handoff points. The “boil the lake” principle means each role finishes what it can do perfectly and skips what it cannot, rather than producing mediocre work across everything. And the simplicity layer pushes back against unnecessary complexity.
The role isolation is what makes GSTACK distinctive. The engineer role does not see the product roadmap. The QA role does not see the implementation details. Each role only receives the context it needs to do its job. This is not just about efficiency. It prevents the kind of scope creep where an agent that knows everything tries to do everything.
“Boil the lake” is my favorite principle across all three frameworks. It is the opposite of how most agents work. Agents default to attempting everything and producing something mediocre. GSTACK says: do fewer things, but do them right.
The tradeoff: 23 specialist roles feels heavy for pure infrastructure work. If you are writing Pulumi programs and deploying cloud resources with component resources, you probably do not need a product manager role or a designer role. GSTACK shines when you are building a product, not just provisioning infrastructure.
Each slash command activates a different specialist:
Command Role What it does
/office-hours
YC partner
Six forcing questions that reframe your product before you write code
/plan-ceo-review
CEO
Four modes: expand scope, selective expand, hold, reduce
/plan-eng-review
Engineering manager
Lock architecture, map data flow, list edge cases
/review
Staff engineer
Find bugs that pass CI but break in production, auto-fix the obvious ones
/qa
QA lead
Real Playwright browser testing, not simulated
/ship
Release engineer
One-command deploy with coverage audit
/cso
Security officer
OWASP and STRIDE security audits
GSTACK works with Claude Code, Codex CLI, OpenCode, Cursor, Factory Droid, Slate, and Kiro.
Superpowers GSD GSTACK
What it locks down The dev process itself The execution environment Who decides what
Orchestration Single orchestrator Per-phase orchestrators 23 specialist roles
Context management One window State-to-disk, fresh per phase Role-scoped handoffs
Where it shines TDD, subagent delegation, disciplined plan execution Marathon sessions, parallel workstreams, crash recovery Product strategy, multi-perspective review, real browser QA
Where it struggles Anything beyond the build phase Overkill for small tasks, no role separation The actual writing-code part
Best for Solo devs who need test discipline Complex projects that span days or weeks Founder-engineers shipping a product
GitHub stars 149K 51K 71K
Agent support 6 agents 14+ agents 7 agents
For infrastructure work, GSD’s context management matters most. Long Pulumi sessions that provision dozens of resources across multiple stacks are exactly the scenario where context rot bites hardest. GSD’s phase-based approach keeps each orchestrator fresh.
Superpowers’ TDD workflow maps well to application code where unit tests are straightforward. Infrastructure testing is different. You cannot unit test whether an IAM policy actually grants the right permissions. You can test the shape of the policy with Pulumi’s testing frameworks, but the real validation happens at pulumi preview and pulumi up. Superpowers still helps here (discipline is discipline), but the TDD cycle is less natural for infra than for app code.
GSTACK shines when the project has product dimensions. If you are building a SaaS platform where the infrastructure serves a product vision, GSTACK’s multi-role governance keeps the product thinking connected to the engineering work. For pure infra provisioning, the extra roles add overhead without much benefit.
My honest take: none of these is universally best. Knowing your failure mode is the real decision.
What keeps going wrong Try this The reason
Code works today, breaks tomorrow Superpowers Forces every change through a failing test first
Quality drops after the first hour GSD Fresh context per phase, nothing carries over
You ship features nobody asked for GSTACK Product review before engineering starts
All of the above GSTACK for direction, bolt on Superpowers TDD No single framework covers everything yet
These frameworks solve the “how” of agent orchestration. Skills (like the ones from Pulumi Agent Skills) solve the “what,” teaching agents the right patterns for specific technologies. Frameworks and skills complement each other. A skill tells the agent to use OIDC instead of hardcoded credentials. A framework makes sure the agent still remembers that instruction 200K tokens later.
GSD’s state-to-disk approach pairs naturally with Pulumi stack outputs. Each phase can read the previous phase’s stack outputs from the state files, so a networking phase can provision a VPC and the compute phase can reference the subnet IDs without any context window gymnastics.
Superpowers’ TDD cycle maps to infrastructure validation. Write a failing test (the expected shape of your infrastructure). Run pulumi preview (red, the resources do not exist yet). Run pulumi up (green, the infrastructure matches the test). This is not a perfect analogy since infrastructure tests are broader than unit tests, but the discipline of “verify before moving on” translates directly.
You do not have to pick one framework and commit forever. Try GSD for a long multi-stack project. Try Superpowers for a focused library. See which failure mode bites you most and let that guide your choice.
[
** github.com/obra/superpowers
](https://github.com/obra/superpowers) [
** github.com/gsd-build/get-shit-done
](https://github.com/gsd-build/get-shit-done) [
** github.com/garrytan/gstack
](https://github.com/garrytan/gstack)
All three frameworks support multiple agents. For Claude Code, the install commands are straightforward:
# Superpowers
/plugin install superpowers@claude-plugins-official
# GSD (the installer asks which agents and whether to install globally or locally)
npx get-shit-done-cc@latest
# GSTACK
git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup
Check each repository’s README for Cursor, Codex, Windsurf, and other agents.
If you want a managed experience that handles orchestration for you, Pulumi Neo is grounded in your actual infrastructure, not internet patterns. It understands your stacks, your dependencies, and your deployment history. The 10 things you can do with Neo post shows what that looks like in practice.
Pick one and give it a project. You will know within an hour whether it fixes your particular failure mode.
[cli] Detect AI agents and send in update metadata #22497
[auto/nodejs] Add "org" commands (get-default, set-default, search, search ai) to the auto-generated interface #22395
[auto/nodejs] Add "new" command to the auto-generated interface #22421
[sdk/{nodejs,python}] Add Cancel handler to Python & Node.js providers #22516
[cli/policy] Add ESC environment resolution for policy packs #22425
[auto/go] Insert --non-interactive flag before "--" positional separator to avoid misinterpretation as a positional argument #22462
[ci] Add code coverage collection for Node.js automation tools tests #22412
[pcl] Allow key indexing into a map of resources created by range #22498
[programgen/go] Fix an issue with formatted index modules not importing correctly
[cli/state] Check for Pulumi.yaml projects and backend option in state upgrade
#22483
[sdk/go] Fix Go program generation for ternary expressions inside apply callbacks, add Go predeclared identifiers to reserved words, handle optional list types in union type codegen, and fix unmarshalOutput for known outputs with null elements. #22460
[sdk/nodejs] Fix inherited input properties missing from component schema when args interface extends another interface #22446
[sdk/python] Fix inherited input properties missing from component schema when args class inherits from a base class #22484
Last year we added support for Bun as a package manager for Pulumi TypeScript projects. Today we’re taking the next step: Bun is now a fully supported runtime for Pulumi programs. Set runtime: bun in your Pulumi.yaml and Bun will execute your entire Pulumi program, with no Node.js required. Since Bun’s 1.0 release, this has been one of our most requested features.
Bun is a JavaScript runtime designed as an all-in-one toolkit: runtime, package manager, bundler, and test runner. For Pulumi users, the most relevant advantages are:
Native TypeScript support: Bun runs TypeScript directly without requiring ts-node or a separate compile step.
Fast package management: Bun’s built-in package manager can install dependencies significantly faster than npm.
Node.js compatibility: Bun aims for 100% Node.js compatibility, so the npm packages you already use with Pulumi should work out of the box.
With runtime: bun, Pulumi uses Bun for both running your program and managing your packages, giving you a streamlined single-tool experience.
To create a new Pulumi project with the Bun runtime, run:
pulumi new bun
This creates a TypeScript project configured to use Bun. The generated Pulumi.yaml looks like this:
name: my-bun-project
runtime: bun
From here, write your Pulumi program as usual. For example, to create a random password using the @pulumi/random package:
bun add @pulumi/random
import * as random from "@pulumi/random";
const password = new random.RandomPassword("password", {
length: 20,
});
export const pw = password.result;
Then deploy with:
pulumi up
Prerequisites:
If you have an existing Pulumi TypeScript project running on Node.js, you can convert it to use the Bun runtime in a few steps.
Pulumi.yamlChange the runtime field from nodejs to bun:
Before:
runtime:
name: nodejs
options:
packagemanager: npm
After:
runtime: bun
**
When the runtime is set to bun, Bun is also used as the package manager — there’s no need to configure a separate packagemanager option.
tsconfig.jsonBun handles TypeScript differently from Node.js with ts-node. Update your tsconfig.json to use Bun’s recommended compiler options:
{
"compilerOptions": {
"lib": ["ESNext"],
"target": "ESNext",
"module": "Preserve",
"moduleDetection": "force",
"moduleResolution": "bundler",
"allowJs": true,
"allowImportingTsExtensions": true,
"verbatimModuleSyntax": true,
"strict": true,
"skipLibCheck": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedIndexedAccess": true,
"noImplicitOverride": true
}
}
Key differences from a typical Node.js tsconfig.json:
module: "Preserve" and moduleResolution: "bundler": Let Bun handle module resolution instead of compiling to CommonJS. The bundler resolution strategy allows extensionless imports while still respecting package.json exports, matching how Bun resolves modules in practice.
verbatimModuleSyntax: true: Enforces consistent use of ESM import/export syntax. TypeScript will flag any remaining CommonJS patterns like require() at compile time.
Bun makes it easy to go full ESM and it’s the recommended module format for Bun projects. Add "type": "module" to your package.json:
{
"type": "module"
}
With ECMAScript module (ESM) syntax, one thing that gets easier is working with async code. In a CommonJS Pulumi program, if you need to await a data source or other async call before declaring resources, the program must be wrapped in an async entrypoint function. With ESM and Bun, top-level await just works, so you can skip the wrapper function entirely and await directly at the module level:
import * as aws from "@pulumi/aws";
const azs = await aws.getAvailabilityZones({ state: "available" });
const buckets = azs.names.map(az => new aws.s3.BucketV2(`my-bucket-${az}`));
export const bucketNames = buckets.map(b => b.id);
If your existing program does use an async entrypoint with export =, just replace it with the ESM-standard export default:
// CommonJS (Node.js default)
export = async () => {
const bucket = new aws.s3.BucketV2("my-bucket");
return { bucketName: bucket.id };
};
// ESM (used with Bun)
export default async () => {
const bucket = new aws.s3.BucketV2("my-bucket");
return { bucketName: bucket.id };
};
Make sure you’re running @pulumi/pulumi version 3.226.0 or later:
bun add @pulumi/pulumi@latest
pulumi install
pulumi up
With this release, there are now two ways to use Bun with Pulumi:
Configuration Bun’s role Node.js required?
runtime: bun
Runs your program and manages packages
No
runtime: { name: nodejs, options: { packagemanager: bun } }
Manages packages only
Yes
Use runtime: bun for the full Bun experience. The package-manager-only mode is still available for projects that need Node.js-specific features like function serialization.
The following Pulumi features are not currently supported when using the Bun runtime:
**
Callback functions (magic lambdas) are not supported. APIs like aws.lambda.CallbackFunction and event handler shortcuts (e.g., bucket.onObjectCreated) use function serialization which requires Node.js v8 and inspector modules that are only partially supported in Bun.
Dynamic providers are not supported. Dynamic providers (pulumi.dynamic.Resource) similarly rely on function serialization.
If your project uses any of these features, continue using runtime: nodejs. You can still benefit from Bun’s fast package management by setting packagemanager: bun in your runtime options.
Bun runtime support is available now in Pulumi 3.227.0. To get started:
Create a new project: pulumi new bun
Read the docs: TypeScript (Node.js) SDK
Report issues or share feedback on GitHub or in the Pulumi Community Slack
Thank you to everyone who upvoted, commented on, and contributed to the original feature request. Your feedback helped shape this feature, and we’d love to hear how it works for you.
Microsoft Entra ID (formerly Azure Active Directory) is Azure’s identity and access management service. Any time your application needs to authenticate with Entra ID, you create an app registration and give it a client secret that proves its identity. But those secrets expire, and if you don’t rotate them in time, your app loses access.
If you or your team manages Azure app registrations, you know that keeping track of client secrets is a constant hassle. Forgetting to rotate them before they expire can lead to broken authentication and unexpected outages. With Pulumi ESC’s azure-app-secret rotator, you can automate client secret rotation for your Azure apps, so you never have to worry about expired credentials again.
An Azure App Registration
An azure-login environment
Note for OIDC users: Since Azure does not support wildcard subject matches, you will need to add a federated credential for the azure-login environment as well as each environment that imports it.
Application.ReadWrite.All Graph API permission, or the identity must be added as an Owner of the specific app registration whose secrets will be rotated.Let’s assume your azure-login environment looks like this:
# my-org/logins/production
values:
azure:
login:
fn::open::azure-login:
clientId:
tenantId:
subscriptionId:
oidc: true
Create a new environment for your rotator. If you have the existing credentials, set them in the state object so the rotator will treat them as the current credentials.
# my-org/rotators/secret-rotator
values:
appSecret:
fn::rotate::azure-app-secret:
inputs:
login: ${environments.logins.production.azure.login}
clientId:
lifetimeInDays: 180 # How long each new secret is valid (max 730 days)
state:
current:
secretId:
secretValue:
fn::secret:
The lifetimeInDays field controls how long each generated secret remains valid before it expires. Azure allows a maximum of 730 days (two years), but shorter lifetimes are recommended for better security. Make sure to set a rotation schedule that runs before the lifetime expires so your credentials are always fresh.
Azure app registrations can have at most two client secrets at any given time, so the rotator maintains a current and previous secret. When a rotation occurs, the existing current secret becomes the previous secret, and a new secret is created to take its place as the new current. This ensures a smooth rollover with no downtime, since the previous secret remains valid until the next rotation.
Once this is set up, you’re ready to go! You never need to worry about your client secrets expiring, and you will always have the latest credentials in your ESC Environment.
The fn::rotate::azure-app-secret rotator is available now in all Pulumi ESC environments. For more information, check out the fn::rotate::azure-app-secret documentation!
You can now run policy packs against your existing stack state without running your Pulumi program or making provider calls. The new pulumi policy analyze command evaluates your current infrastructure against local policy packs directly, turning policy validation into a fast, repeatable check.
Policy authoring and policy updates usually involve an iteration loop:
Make a policy change.
Run a policy check.
Inspect violations or remediations.
Repeat until the policy behavior matches intent.
Before this command, that loop often depended on pulumi preview or pulumi up, which can be heavier than you need when your goal is validating policy logic against known state.
With pulumi policy analyze, you can evaluate your current stack state directly and quickly.
At minimum, provide a policy pack path and optionally a stack:
pulumi policy analyze \
--policy-pack ./policy-pack \
--stack dev
You can also pass a config file for each policy pack:
pulumi policy analyze \
--policy-pack ./policy-pack \
--policy-pack-config ./policy-config.dev.json \
--stack dev
If any mandatory policy violations are found, the command exits non-zero.
If remediation policies fire, those changes are reported in output, but stack state is not modified.
For policy pack development, this command is useful as a tight local feedback loop:
Pick a representative stack (dev, staging, or a fixture stack).
Run pulumi policy analyze against that stack after each policy change.
Use the output to verify mandatory, advisory, and remediation behavior.
Repeat before publishing the policy pack or attaching it to broader policy groups.
Two output modes are especially useful:
--diff for a concise, human-readable view while iterating locally.
--json for structured output that can be consumed in scripts and CI.
This command is also a good primitive for AI-assisted policy workflows.
Because pulumi policy analyze can emit JSON and a clear process exit code, agents can use it for deterministic policy evaluation steps:
Propose or edit policy rules.
Run pulumi policy analyze --json against target stacks.
Parse violations and remediation signals.
Suggest policy fixes, config adjustments, or targeted infrastructure changes.
Re-run analysis until mandatory violations are resolved.
For example, an agent tasked with fixing a policy violation can run pulumi policy analyze --json to get a structured list of violations, identify which resources are non-compliant, generate targeted infrastructure changes, then re-run analysis to confirm the violations are resolved, all without triggering a full preview on each iteration. The same loop works for policy authoring: an agent can propose a new policy rule, test it against several representative stacks, and surface unintended violations before the rule is published.
This works well for automation because the command doesn’t execute your Pulumi program or make provider calls, so there are no side effects or runtime variance between runs. The JSON output and non-zero exit code on failure give agents a clear pass/fail contract to build on.
pulumi policy analyze is available in Pulumi v3.229.0. Upgrade with:
brew upgrade pulumi
# or
pulumi self-update
If you are authoring or tuning policy packs, start by running this command against a known stack in your environment. It is a quick way to validate policy behavior before rollout.
For implementation details, see the merged PR: pulumi/pulumi#22250.