releases.shpreview
Pulumi/Pulumi Blog

Pulumi Blog

Mon
Wed
Fri
JunJulAugSepOctNovDecJanFebMarAprMay
Less
More
Releases42Avg13/mo

Today, we are announcing v1.0 of the Pulumi Service Provider: a major milestone in managing Pulumi Cloud with Pulumi itself. The provider is now generated directly from the Pulumi Cloud OpenAPI specification, unlocking a dramatically expanded pulumiservice:api/* resource surface and enabling Pulumi Cloud capabilities to become available in the provider faster than ever before.

This release also brings several major new capabilities to infrastructure as code, including fine-grained RBAC as code, Pulumi IDP as code, and audit log export as IaC. Together, these changes make the Pulumi Service Provider the most powerful and extensible way yet to manage and automate your Pulumi Cloud infrastructure.

Why this matters for users

Historically, every new Pulumi Cloud feature implied a follow-up PR in the provider before that feature could be used from a Pulumi program. The provider was always slightly behind the API it wrapped, and entirely new capability areas could take months to land.

The api/* surface changes both timelines. Because the schema is derived from the OpenAPI spec at runtime:

  1. Whole new resource families land in the provider the same release they reach Pulumi Cloud.
  2. New fields, features, and enum values on existing resources show up across all five language SDKs the soon after they appear in the spec.

What’s new in v1.0

v1.0 lifts whole capability areas of Pulumi Cloud into the api/* surface, not just incremental field additions. None of it required bespoke provider code.

  1. Fine-grained RBAC as code. Custom roles, organization membership, and team role assignments are now managed resources. For example, defining a read-only role and assigning it to a team:

    <span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">readOnly</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ps</span><span class="p">.</span><span class="nx">api</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"readOnly"</span><span class="p">,</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">orgName</span><span class="o">:</span> <span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">name</span><span class="o">:</span> <span class="s2">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">description</span><span class="o">:</span> <span class="s2">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">uxPurpose</span><span class="o">:</span> <span class="s2">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">details</span><span class="o">:</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">__type</span><span class="o">:</span> <span class="s2">"PermissionDescriptorAllow"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">permissions</span><span class="o">:</span> <span class="p">[</span><span class="s2">"stack:read"</span><span class="p">,</span> <span class="s2">"stack:list"</span><span class="p">],</span>
    </span></span><span class="line"><span class="cl"> <span class="p">},</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="k">new</span> <span class="nx">ps</span><span class="p">.</span><span class="nx">api</span><span class="p">.</span><span class="nx">teams</span><span class="p">.</span><span class="nx">Role</span><span class="p">(</span><span class="s2">"readOnlyForPlatform"</span><span class="p">,</span> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">orgName</span><span class="o">:</span> <span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">teamName</span><span class="o">:</span> <span class="s2">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="nx">roleID</span>: <span class="kt">readOnly.roleID</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span>
    <span class="line"><span class="cl"><span class="n">read_only</span> <span class="o">=</span> <span class="n">pulumiservice</span><span class="o">.</span><span class="n">api</span><span class="o">.</span><span class="n">Role</span><span class="p">(</span><span class="s2">"readOnly"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">org_name</span><span class="o">=</span><span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">name</span><span class="o">=</span><span class="s2">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">description</span><span class="o">=</span><span class="s2">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">ux_purpose</span><span class="o">=</span><span class="s2">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">details</span><span class="o">=</span><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="s2">"__type"</span><span class="p">:</span> <span class="s2">"PermissionDescriptorAllow"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="s2">"permissions"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"stack:read"</span><span class="p">,</span> <span class="s2">"stack:list"</span><span class="p">],</span>
    </span></span><span class="line"><span class="cl"> <span class="p">})</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="n">pulumiservice</span><span class="o">.</span><span class="n">api</span><span class="o">.</span><span class="n">teams</span><span class="o">.</span><span class="n">Role</span><span class="p">(</span><span class="s2">"readOnlyForPlatform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">org_name</span><span class="o">=</span><span class="s2">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">team_name</span><span class="o">=</span><span class="s2">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">role_id</span><span class="o">=</span><span class="n">read_only</span><span class="o">.</span><span class="n">role_id</span><span class="p">)</span>
    </span></span>
    <span class="line"><span class="cl"><span class="nx">readOnly</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">api</span><span class="p">.</span><span class="nf">NewRole</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">"readOnly"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">api</span><span class="p">.</span><span class="nx">RoleArgs</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">OrgName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"acme"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack-reader"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Description</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"Read-only access to stacks across the org."</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">UxPurpose</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"role"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">Details</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nx">Map</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"__type"</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"PermissionDescriptorAllow"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"permissions"</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nx">StringArray</span><span class="p">{</span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack:read"</span><span class="p">),</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"stack:list"</span><span class="p">)},</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">},</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="p">})</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="nx">teams</span><span class="p">.</span><span class="nf">NewRole</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">teams</span><span class="p">.</span><span class="nx">RoleArgs</span><span class="p">{</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">OrgName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"acme"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">TeamName</span><span class="p">:</span><span class="w"> </span><span class="nx">pulumi</span><span class="p">.</span><span class="nf">String</span><span class="p">(</span><span class="s">"platform"</span><span class="p">),</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nx">RoleID</span><span class="p">:</span><span class="w"> </span><span class="nx">readOnly</span><span class="p">.</span><span class="nx">RoleID</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="p">})</span><span class="w">
    </span></span></span>
    <span class="line"><span class="cl"><span class="kt">var</span> <span class="n">readOnly</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Ps</span><span class="p">.</span><span class="n">Api</span><span class="p">.</span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnly"</span><span class="p">,</span> <span class="k">new</span><span class="p">()</span>
    </span></span><span class="line"><span class="cl"><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="n">OrgName</span> <span class="p">=</span> <span class="s">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Name</span> <span class="p">=</span> <span class="s">"stack-reader"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Description</span> <span class="p">=</span> <span class="s">"Read-only access to stacks across the org."</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">UxPurpose</span> <span class="p">=</span> <span class="s">"role"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">Details</span> <span class="p">=</span> <span class="n">ImmutableDictionary</span><span class="p">.</span><span class="n">CreateRange</span><span class="p">(</span><span class="k">new</span><span class="p">[]</span>
    </span></span><span class="line"><span class="cl"> <span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">object</span><span class="p">>(</span><span class="s">"__type"</span><span class="p">,</span> <span class="s">"PermissionDescriptorAllow"</span><span class="p">),</span>
    </span></span><span class="line"><span class="cl"> <span class="k">new</span> <span class="n">KeyValuePair</span><span class="p"><</span><span class="kt">string</span><span class="p">,</span> <span class="kt">object</span><span class="p">>(</span><span class="s">"permissions"</span><span class="p">,</span> <span class="k">new</span><span class="p">[]</span> <span class="p">{</span> <span class="s">"stack:read"</span><span class="p">,</span> <span class="s">"stack:list"</span> <span class="p">}),</span>
    </span></span><span class="line"><span class="cl"> <span class="p">}),</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl"><span class="k">new</span> <span class="n">Ps</span><span class="p">.</span><span class="n">Api</span><span class="p">.</span><span class="n">Teams</span><span class="p">.</span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span> <span class="k">new</span><span class="p">()</span>
    </span></span><span class="line"><span class="cl"><span class="p">{</span>
    </span></span><span class="line"><span class="cl"> <span class="n">OrgName</span> <span class="p">=</span> <span class="s">"acme"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">TeamName</span> <span class="p">=</span> <span class="s">"platform"</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"> <span class="n">RoleID</span> <span class="p">=</span> <span class="n">readOnly</span><span class="p">.</span><span class="n">RoleID</span><span class="p">,</span>
    </span></span><span class="line"><span class="cl"><span class="p">});</span>
    </span></span>
    <span class="line"><span class="cl"><span class="kd">var</span><span class="w"> </span><span class="n">readOnly</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Role</span><span class="p">(</span><span class="s">"readOnly"</span><span class="p">,</span><span class="w"> </span><span class="n">RoleArgs</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">orgName</span><span class="p">(</span><span class="s">"acme"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">name</span><span class="p">(</span><span class="s">"stack-reader"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">description</span><span class="p">(</span><span class="s">"Read-only access to stacks across the org."</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">uxPurpose</span><span class="p">(</span><span class="s">"role"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">details</span><span class="p">(</span><span class="n">Map</span><span class="p">.</span><span class="na">of</span><span class="p">(</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"__type"</span><span class="p">,</span><span class="w"> </span><span class="s">"PermissionDescriptorAllow"</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="s">"permissions"</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="p">.</span><span class="na">of</span><span class="p">(</span><span class="s">"stack:read"</span><span class="p">,</span><span class="w"> </span><span class="s">"stack:list"</span><span class="p">)))</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">());</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="k">new</span><span class="w"> </span><span class="n">com</span><span class="p">.</span><span class="na">pulumi</span><span class="p">.</span><span class="na">pulumiservice</span><span class="p">.</span><span class="na">api_teams</span><span class="p">.</span><span class="na">Role</span><span class="p">(</span><span class="s">"readOnlyForPlatform"</span><span class="p">,</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="n">com</span><span class="p">.</span><span class="na">pulumi</span><span class="p">.</span><span class="na">pulumiservice</span><span class="p">.</span><span class="na">api_teams</span><span class="p">.</span><span class="na">RoleArgs</span><span class="p">.</span><span class="na">builder</span><span class="p">()</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">orgName</span><span class="p">(</span><span class="s">"acme"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">teamName</span><span class="p">(</span><span class="s">"platform"</span><span class="p">)</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">roleID</span><span class="p">(</span><span class="n">readOnly</span><span class="p">.</span><span class="na">roleID</span><span class="p">())</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="p">.</span><span class="na">build</span><span class="p">());</span><span class="w">
    </span></span></span>
    <span class="line"><span class="cl"><span class="nt">resources</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">readOnly</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">pulumiservice:api:Role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">properties</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgName</span><span class="p">:</span><span class="w"> </span><span class="l">acme</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">stack-reader</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">description</span><span class="p">:</span><span class="w"> </span><span class="l">Read-only access to stacks across the org.</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">uxPurpose</span><span class="p">:</span><span class="w"> </span><span class="l">role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">details</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">__type</span><span class="p">:</span><span class="w"> </span><span class="l">PermissionDescriptorAllow</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">permissions</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">stack:read</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="l">stack:list</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">readOnlyForPlatform</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">pulumiservice:api/teams:Role</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">properties</span><span class="p">:</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">orgName</span><span class="p">:</span><span class="w"> </span><span class="l">acme</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">teamName</span><span class="p">:</span><span class="w"> </span><span class="l">platform</span><span class="w">
    </span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">roleID</span><span class="p">:</span><span class="w"> </span><span class="l">${readOnly.roleID}</span><span class="w">
    </span></span></span>
  2. Pulumi IDP as code. services:Service makes the Pulumi IDP catalog manageable from your Pulumi programs, surfaced the same release IDP ships in Pulumi Cloud. Platform teams can publish service definitions as code rather than only through the IDP console.

  3. Audit-log export as IaC. AuditLogExportConfiguration brings audit-log export sinks under Pulumi management with a real destroy path.

How it works

Pulumi Cloud’s OpenAPI document (published at https://api.pulumi.com/api/openapi/pulumi-spec.json) is embedded in the provider binary at build time, so the provider version you pin is the API surface you get. Preview and update are deterministic, and a version released today will still behave the same way years from now. Alongside the spec, the runtime loads a small companion metadata file that captures the Pulumi-specific semantics OpenAPI can’t express: which endpoints pair into a single resource, what a resource’s ID looks like, and which response fields are secrets that arrive exactly once at create time. That metadata is what lets api/* resources behave as expected.

Most of that metadata is auto-derived by a scaffolder, but the editorial layer, including resource descriptions, examples, and the v0 aliases that make migration safe, stays handmade. Any human override is pinned across regeneration so a future spec change can’t quietly override it. The language SDKs are still generated against the runtime schema, so new fields and enum values reach typed SDKs in all five languages the moment the spec ships.

What the api namespace covers

The api namespace already spans most of Pulumi Cloud’s resource model.

For resources that have an ancestor under pulumiservice:index:*, the mapping lives in docs/v0-api-coverage.md. That file is auto-generated, so it stays in sync. Each api/* resource ships hand-maintained per-language examples in TypeScript, Python, Go, C#, Java, and YAML.

What to know before adopting the preview

The pulumiservice:api:* resource surface is in preview. Resource shape and module layout may change before GA.

The existing pulumiservice:index:* resources remain supported. They are not being deprecated as part of v1.0 and continue to be supported. Migration to api/* is opt-in via Pulumi aliases.

Try it

If you want to take the expanded provider for a spin:

  1. The Pulumi Registry page for pulumiservice has install instructions for every language.
  2. The examples/api/ directory has runnable programs for each resource, in every supported language.
  3. The pulumi-pulumiservice repo is open source if you want to read the runtime, the embedded spec, or the metadata file directly.

Feedback during preview is very beneficial. Please open an issue here if you run into any problems.

Anthropic shipped a piece earlier this month called How Claude Code Works in Large Codebases. I have not read anything more useful about coding agents this year. The core claim, in their words: “the ecosystem built around the model—the harness—determines how Claude Code performs more than the model alone.” In my phrasing: in a real codebase, the model is the smaller variable. The layer of context and tooling you wire around the agent matters more than which version of Sonnet or Opus is behind it.

The post stays high-level, which is the right move for a launch piece. What I want to do here is land it. Same seven pieces, but with the wiring you would actually put in a repo, in the order I would put it.

How Claude Code navigates without an index

Anthropic’s writeup says Claude Code works from the live codebase and does not require a codebase index to be built, maintained, or uploaded. The agent navigates the way an engineer would, with grep, find, ls, file reads, and reference-following. Anthropic calls this agentic search, and the upside is obvious: no separate index exists for you to keep fresh.

The downside is also obvious. An engineer who has never seen your repo and only has shell tools will flounder if you drop them in the root with no map. That is your agent on day one. Everything that follows is about giving it the map.

The AI layer in seven pieces

Every codebase used to have two artifacts engineers cared about: the code and the tests. A third exists now. Call it the AI layer, or the harness, or whatever you want. This layer is the set of context and tools you give your coding agent to operate in this specific repo. Anthropic breaks it into seven pieces, and each one solves a different scaling problem.

Anthropic gives each piece a role: CLAUDE.md is the foundation, hooks do self-improvement, skills are progressive disclosure, plugins handle distribution, LSP gives navigation, MCP is extension, subagents split exploration from editing. They are not equal in usage either. CLAUDE.md is read at the start of each session and stays in context for the duration. The others fire when relevant.

Lean and layered CLAUDE.md

The single biggest mistake I see is a root CLAUDE.md that has grown into a small book. Two thousand lines of conventions for parts of the repo the current task will never touch. Every session pays the tax. Anthropic’s own guidance is to keep these files focused on what applies broadly so they do not become a drag on performance, and you can feel that drag in practice: the agent gets cautious, slow, and oddly literal.

Keep the root file lean. What is this repo, broadly. The tech stack. The commands the agent will need (make test, make lint, how to run the dev server). General conventions that apply everywhere. That is most of what belongs there.

Local conventions go in subdirectory CLAUDE.md files. When the agent starts in a subdirectory, Claude Code walks upward from the working directory and loads every CLAUDE.md it finds on the way to the repo root, so root context is never lost and intermediate layers stack in the order you would expect. Claude Code can also discover files below the current working directory when it reads files in those subdirectories. That means services/api/CLAUDE.md only joins the session when the work reaches that service. Same for services/billing/, the frontend, the data layer.

If you already know the task is scoped to one service, start the agent in that subdirectory. The working directory becomes the focus, and the agent stays out of unrelated code unless you tell it otherwise. Most of the time, you know.

Two more cheap wins live in the same neighborhood. Scope the make test and make lint commands so the subdirectory version runs only the slice the agent is working in, instead of the whole repo on every change. And version-control your exclusion rules in .claude/settings.json so the agent never reads dist/, generated SDKs, or vendored code. Every file the agent skips is tokens you keep for the work that matters. If your directory layout is unconventional or has historical baggage, Anthropic also suggests adding a short codebase map to the root CLAUDE.md so the agent has somewhere to anchor.

Hooks that make the harness self-improving

Most teams use hooks as guardrails. Block edits in vendor/, refuse to delete migrations, kill the run if a secret turns up in a diff. That is fine and you should do it. But hooks have a second life that almost no one uses, and that second life is the more interesting one.

Both kinds register the same way, in .claude/settings.json, against named events Claude Code fires during a session:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"SessionStart"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"command"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"uv run --directory \"$CLAUDE_PROJECT_DIR\" python .claude/hooks/session_start_context.py"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"Stop"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"hooks"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"type"</span><span class="p">:</span> <span class="s2">"command"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"command"</span><span class="p">:</span> <span class="s2">"uv run --directory \"$CLAUDE_PROJECT_DIR\" python .claude/hooks/propose_claude_md.py"</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>

A SessionStart hook fires before the agent has done anything. Whatever the script prints to stdout is injected straight into the session as context, so you can preload the things the agent would otherwise have to spend a turn discovering: the current branch, the uncommitted diff, the last few commits. For a larger team you might fetch the Confluence or Notion page that owns the directory the engineer is working in. Every developer starts each session pre-oriented, with no manual setup.

<span class="line"><span class="cl"><span class="s2">"""SessionStart hook — prints orientation Claude reads as session context."""</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">subprocess</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"CLAUDE_PROJECT_DIR"</span><span class="p">,</span> <span class="s2">"."</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">git</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">out</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"git"</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">],</span> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">out</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="s2">"# Orientation</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Branch</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'rev-parse'</span><span class="p">,</span> <span class="s1">'--abbrev-ref'</span><span class="p">,</span> <span class="s1">'HEAD'</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Uncommitted changes</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'status'</span><span class="p">,</span> <span class="s1">'--porcelain'</span><span class="p">)</span> <span class="ow">or</span> <span class="s1">'(clean)'</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"## Recent commits</span><span class="se">\n</span><span class="si">{</span><span class="n">git</span><span class="p">(</span><span class="s1">'log'</span><span class="p">,</span> <span class="s1">'-5'</span><span class="p">,</span> <span class="s1">'--oneline'</span><span class="p">)</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</span></span>

The Stop hook is the more interesting one. It fires when the agent finishes its turn. At that moment the session context is still fresh, the diff is still small, and you have a free shot at a question nobody asks: did anything I changed invalidate the rules I wrote down? Spawn a separate headless Claude session, hand it the diff and the relevant CLAUDE.md files, ask it to propose updates, and write the result to a markdown review file. You read it when you are ready. The CLAUDE.md files stop going stale on their own.

The trick is to make the hook itself cheap and dispatch the LLM call in the background, so the end of every turn does not block on a reflection:

<span class="line"><span class="cl"><span class="s2">"""Stop hook — dispatch a headless Claude reflection in the background."""</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span><span class="o">,</span> <span class="nn">subprocess</span><span class="o">,</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">pathlib</span> <span class="kn">import</span> <span class="n">Path</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># The reflector spawns its own headless Claude, whose Stop hook lands back</span>
</span></span><span class="line"><span class="cl"><span class="c1"># here. The lock prevents infinite recursion.</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"REFLECT_LOCK"</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">root</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"CLAUDE_PROJECT_DIR"</span><span class="p">,</span> <span class="s2">"."</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">diff</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">"git"</span><span class="p">,</span> <span class="s2">"diff"</span><span class="p">,</span> <span class="s2">"HEAD"</span><span class="p">],</span> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">capture_output</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">True</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span><span class="o">.</span><span class="n">stdout</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="ow">not</span> <span class="n">diff</span><span class="o">.</span><span class="n">strip</span><span class="p">():</span>
</span></span><span class="line"><span class="cl"> <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">env</span> <span class="o">=</span> <span class="p">{</span><span class="o">**</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">,</span> <span class="s2">"REFLECT_LOCK"</span><span class="p">:</span> <span class="s2">"1"</span><span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">(</span>
</span></span><span class="line"><span class="cl"> <span class="p">[</span><span class="s2">"uv"</span><span class="p">,</span> <span class="s2">"run"</span><span class="p">,</span> <span class="s2">"python"</span><span class="p">,</span> <span class="s2">".claude/hooks/reflect_claude_md.py"</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="n">cwd</span><span class="o">=</span><span class="n">root</span><span class="p">,</span> <span class="n">env</span><span class="o">=</span><span class="n">env</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">DEVNULL</span><span class="p">,</span> <span class="n">stderr</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">DEVNULL</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span>

reflect_claude_md.py is the part that calls a headless claude against the diff and writes .claude/claude-md-review.md. You can grow it from twenty lines to two hundred without ever blocking the agent.

The pattern that ties the two together: hooks let the harness improve itself in the background while you do the actual work.

Path-scoped skills

Skills are where the agent learns how to do a thing. CLAUDE.md is conventions (“every route is registered here”). Skills are workflows (“here is how you add a new route in this repo, end to end”). The two overlap, but the framing keeps me honest: rules in CLAUDE.md, recipes in skills.

The piece of the skills system most teams miss is the path scope. A skill can declare which directories it activates in. A create-api-endpoint skill that only loads when the agent is editing under services/api/ is invisible the rest of the time. With dozens of skills in a real repo, scoping is the difference between a useful library and a wall of irrelevant prompts.

The mental model: progressive disclosure for expertise. Most knowledge in a large codebase is local. Load it locally.

Symbol-level search through LSP and MCP

grep is fine until it isn’t. Past six-digit line counts, plain string search gets slow, returns too much, and burns tokens reading files the agent did not need to open. You also lose what every IDE has done for decades: jump-to-definition, find-references, hover-for-types.

You can give the agent the same navigation. Run a language server locally, wrap it in a small MCP server, expose two or three tools: where_is, find_references, goto_definition. The agent now searches by symbol, not by string. A request like “find every place monthly_total_cents is referenced” returns one definition and the actual references, instead of fifty grep hits that mention the substring in unrelated comments.

This is also where bigger orgs invest. Custom MCP servers that expose internal search systems, the code-ownership graph, the design-doc index. The patterns are the same; the targets are domain-specific. The point is that the agent does not have to brute-force its way through your repo when you already have better tools for finding things.

Image: Anthropic, How Claude Code Works in Large Codebases.

Subagents for exploration

The rule I follow: split exploration from editing. A subagent runs in its own context window. You ask which files implement the billing webhook flow, or what the user model looks like across services. It does the digging, and only the summary comes back to your primary session.

The win is context budget, not parallelism. Exploration is wasteful by nature. The agent reads forty files to find the three that matter, and most of those forty get thrown away. If that happens in your primary session, your editing turns start with a context window already half full of noise. If it happens in a subagent, the noise stays there. You get the answer.

Use the built-in Explore subagent liberally. Custom subagents earn their place when you have a workflow specific enough that a generic explorer is the wrong tool. The file shape is small: a single markdown file under .claude/agents/, a short frontmatter block, and a prompt body. name, description, tools, and model are enough to start:

<span class="line"><span class="cl">---
</span></span><span class="line"><span class="cl">name: explorer
</span></span><span class="line"><span class="cl">description: Read-only repo explorer. Map a service or package without burning the main session's context, then return findings.
</span></span><span class="line"><span class="cl">tools: Read, Grep, Glob
</span></span><span class="line"><span class="cl">model: sonnet
</span></span><span class="line"><span class="cl">---
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">You are a read-only explorer. The parent agent will hand you one service or
</span></span><span class="line"><span class="cl">package to map. Read its <span class="sb">`CLAUDE.md`</span> if there is one, then trace entry points,
</span></span><span class="line"><span class="cl">the public surface, and dependencies. Return findings as your final response.
</span></span><span class="line"><span class="cl">No edits.
</span></span>

Restricting tools to read-only is the load-bearing line. The model only sees the tools you expose, so an explorer subagent without Write or Edit has nothing to call when it gets tempted, even if the prompt body forgot to say so. Treat that as a strong default. If you need a hard guarantee, layer a PreToolUse hook on top.

Don’t let it rot

The harness is not a one-time setup. Models improve, and rules written for last year’s model often constrain this year’s. A note like “always split refactors into single-file changes” might have saved you in 2024 and might block a beneficial cross-file edit in 2026. Anthropic suggests reviewing your CLAUDE.md files every three to six months, or whenever performance feels like it has plateaued after a major model release. The stop-hook reflection gives you a head start. The rest is on you.

Assign an owner

The last piece is not technical. The teams that get value out of Claude Code at scale have someone who owns the harness. A small platform-engineering team, or one DRI, or a hybrid PM/engineer doing it half-time. Their job is the same shape as owning a CI pipeline: write the conventions, build the skills, run the LSP wrapper, version the hooks, evangelize what works, retire what does not.

Plugins are the distribution vehicle. A good harness that lives in one engineer’s dotfiles stays tribal. The same harness packaged as a plugin (or a private marketplace) is how a team of five hundred ends up running the same skills, the same MCP servers, and the same hooks without anyone having to remember to copy a config.

The pattern that fails: ship Claude Code to the org on a Friday, hope adoption goes viral, watch every team grow its own slightly different version of CLAUDE.md for six months. The pattern that works: a quiet build-out period, a small set of approved skills, a working plugin or two, a documented governance story, then broad access.

Treat the harness like infrastructure.

Where to start

The order that has worked for me, in any repo:

  1. Trim the root CLAUDE.md until it fits on one screen. Move the rest into subdirectories.
  2. Add a Stop hook that proposes updates to those CLAUDE.md files in headless mode.
  3. Convert your three most common repeated tasks into path-scoped skills.
  4. Run a language server behind an MCP server. Stop searching by string.
  5. Get comfortable dispatching exploration to subagents.

Most teams will plateau on step one for a week and find the agent is already noticeably sharper. The rest compounds. I have written more on the agent-tooling shift this is part of in How Building AI Agents Has Changed in 2026, and on the workflow side in The Claude Skills I Actually Use for DevOps and Superpowers, GSD, and GSTACK.

The model will keep getting better. The harness is the work.

The phrase “AI infrastructure” now means two different things. One is the GPUs, schedulers, and MLOps platforms that exist to run AI workloads. The other is AI that runs infrastructure: agents and assistants that generate, deploy, and govern cloud resources on your behalf. They’re different markets with different vendors, and most teams need to think about both.

The pressure to think about both is real. McKinsey research puts the productivity lift from generative AI in software development at 20–45%, which is great for application teams and a problem for platform teams trying to keep up with the resulting feature flow. Infrastructure investment is climbing on both fronts: more spend on the compute that trains and serves models, more spend on AI tools that manage everything else.

This guide covers both categories: the compute and MLOps stack in Part 1, and AI-powered infrastructure management in Part 2, where the more interesting product shift is happening.

AI infrastructure tools overview

Tools for building AI infrastructure
  1. CoreWeave: GPU cloud built for AI workloads
  2. Lambda Labs: straightforward GPU cloud for research and startups
  3. Modal: serverless GPU compute
  4. Weights & Biases: ML experiment tracking and model management
  5. MLflow: open-source ML lifecycle platform
  6. Hyperscaler AI platforms: AWS SageMaker, Google Vertex AI, Azure ML
AI-powered infrastructure management tools
  1. Pulumi Neo: agentic AI with policy automation
  2. Firefly AIaC: asset codification and IaC generation
  3. env0 Cloud Compass: multi-IaC insights and analysis
  4. Spacelift AI: run explanation and troubleshooting
  5. Crossplane with Upbound: Kubernetes-native infrastructure
  6. General-purpose code assistants: Copilot, Claude Code, Cursor, Gemini
  7. AWS Application Composer: visual serverless builder

Quick picks

If you only have two minutes:

  • Enterprise compliance: Pulumi Neo. Executes changes (not only suggestions), ships with policy packs for CIS, HITRUST, NIST, and PCI DSS, and works with Terraform, CloudFormation, and resources created by hand.
  • Serious GPU compute: CoreWeave. Purpose-built for AI workloads, deep NVIDIA partnership, and prices that generally undercut the hyperscalers.
  • Best developer experience for ML: Modal. Decorate a Python function, get a GPU, pay by the second.
  • Open-source MLOps: MLflow. No vendor lock-in, runs anywhere, plays well with everything.

What is AI infrastructure?

The term covers two distinct categories that share almost no vendors.

Infrastructure for AI is the compute, storage, and orchestration that AI workloads run on. Training a large model is not a normal cloud workload: it wants thousands of GPUs talking to each other over fat, low-latency networks for weeks at a time. Inference is different again: lower latency, smarter batching, different hardware. General-purpose cloud was not designed for either case, which is why specialized GPU clouds and MLOps platforms exist.

AI-powered infrastructure management is the inverse: AI tools that manage cloud infrastructure. They generate IaC, run deployments, detect drift, and remediate policy violations. The pitch is that modern infrastructure (multi-cloud, containers, microservices, regulated workloads) has gotten too complex for humans to manage by hand and too varied for scripted automation to keep up with.

Most organizations end up needing both: somewhere to run their ML workloads, and something to keep the rest of the cloud sane.

Part 1: Tools for building AI infrastructure

These are the platforms you run AI and ML workloads on: GPU clouds for raw compute, MLOps platforms for the lifecycle around them.

CoreWeave

CoreWeave is the GPU cloud that broke out of the AI hype cycle into a real public company. They went public in 2025, signed a multi-billion-dollar capacity deal with OpenAI, and acquired Weights & Biases. Their thesis from day one was that AI workloads deserve infrastructure designed for AI workloads, not a GPU SKU bolted onto a general-purpose cloud.

  • License: Proprietary
  • Best for: Large-scale training and high-throughput inference; teams that need dedicated GPU capacity with first access to new NVIDIA hardware
  • Strengths: GPU infrastructure designed for AI; Kubernetes-native; direct NVIDIA partnership; handles distributed training at scale
  • Watch out for: Smaller global footprint than AWS/GCP/Azure; not a general-purpose cloud, so if you need RDS, S3, and a managed Kafka in the same provider, this isn’t it
Lambda Labs

Lambda has been the approachable GPU cloud for a long time. Environments come pre-configured with PyTorch and TensorFlow, and you can be running on an H100 in about as long as it takes to copy your SSH key.

  • License: Proprietary
  • Best for: Research teams, startups, and individual practitioners who want GPUs without a configuration tax
  • Strengths: Straightforward to start on; pre-configured deep learning environments; competitive on-demand pricing; strong learning resources
  • Watch out for: Smaller scale than CoreWeave or the hyperscalers; availability gets tight during demand spikes
Modal

Modal’s pitch is that you write a Python function, decorate it, and Modal handles the GPU. No capacity planning, no idle instances burning money overnight, no Dockerfile to maintain.

  • License: Proprietary
  • Best for: Variable ML workloads where reserved capacity would sit idle; data scientists who’d rather not learn Kubernetes
  • Strengths: Strong developer experience; serverless GPUs with automatic scaling; pay-per-second pricing; cold starts are fast for what they are
  • Watch out for: You give up infrastructure control. Not ideal for long training jobs that need reserved hardware or strict configuration requirements.
Weights & Biases

Weights & Biases is the de facto standard for ML experiment tracking and model management, integrated with essentially every framework and cloud you’d plausibly use. CoreWeave acquired the company in 2025, which has accelerated the joint roadmap but raised some neutrality questions for teams that prefer their tooling cloud-agnostic.

  • License: Proprietary with a free tier
  • Best for: ML teams that need shared experiment tracking, model versioning, and reporting
  • Strengths: Industry-leading experiment tracking and visualization; comprehensive model registry; strong team collaboration; broad integration surface
  • Watch out for: Costs scale quickly past the free tier; some teams self-host alternatives for data residency reasons
MLflow

MLflow is the leading open-source MLOps platform: experiment tracking, packaging, registry, and serving, with no lock-in. Originally built at Databricks, it’s now a broad open-source ecosystem with managed offerings from multiple vendors (including Databricks and the major clouds).

  • License: Apache 2.0
  • Best for: Teams that want MLOps without a vendor; or want the option to start managed and self-host later
  • Strengths: Open source; covers the full ML lifecycle; runs locally, on-prem, or managed; broad framework support
  • Watch out for: Self-hosting carries the usual operational tax; commercial alternatives have stronger collaboration UX out of the box
Hyperscaler AI platforms

The major clouds all sell end-to-end ML platforms. Each leads on the dimensions that line up with its parent cloud (Vertex for Google’s models and TPUs, SageMaker for AWS-native data pipelines, Azure ML for Microsoft-stack integration), but the wider integration with the rest of the cloud is the deciding factor.

  • AWS SageMaker: end-to-end ML on AWS, deeply integrated with S3 and Glue, with first-class connections to Lambda for serverless inference and to the rest of the AWS data stack. The default pick if your data already lives in AWS.
  • Google Vertex AI: Google’s ML stack, including TPUs for workloads that need them, plus access to Google’s foundation models. Strongest when paired with BigQuery.
  • Azure Machine Learning: the natural choice when the rest of your stack is Microsoft; first-party MLOps integrations across GitHub Actions, Azure DevOps, and Microsoft Fabric for downstream reporting. The right choice if you’re already an Azure shop with Microsoft compliance requirements.

The shared tradeoff: hyperscaler GPU compute typically runs 2–3x the per-hour price of specialized providers, and the platforms work best when you commit to them top to bottom. For organizations already inside one cloud, the unified billing and single support contract usually justifies the premium. For a new ML team starting from scratch, it rarely does.

Part 2: AI-powered infrastructure management tools

This is where the more interesting product shift is happening. Instead of running AI on infrastructure, these tools point AI at your infrastructure and let it do work.

From code generation to agentic execution

Before the tool list, one distinction matters more than any feature comparison: whether the tool generates code or executes changes.

Code generation tools like GitHub Copilot suggest infrastructure code based on context. You review it, maybe edit it, run it yourself. The AI helps, but you’re still the one doing the work.

Agentic platforms generate the code and run it, with the guardrails you define. They understand your environment, handle multi-step workflows, and enforce policies on the way through. You describe the outcome; the platform makes it happen.

Capability

Code generation

Agentic execution

Generates infrastructure code

Yes

Yes

Understands infrastructure context

Limited

Deep

Executes changes

No

Yes

Handles multi-step workflows

No

Yes

Enforces policies automatically

No

Yes

Remediates drift and violations

No

Yes

Where you want to land on this spectrum is mostly a governance question, not a productivity one.

Pulumi Neo

Pulumi Neo is Pulumi’s agentic AI for infrastructure. The distinguishing claim is execution: Neo doesn’t only suggest a Terraform snippet, it figures out the right resources, generates the code, and runs the deployment inside whatever guardrails you’ve set.

  • License: Proprietary (Pulumi Cloud)
  • Best for: Platform engineering teams that want AI automation with real policy controls, especially in regulated industries

A few things that set it apart in practice:

Policy automation and compliance. Neo is integrated with Pulumi Insights and Governance, which ships pre-built policy packs for CIS benchmarks, HITRUST CSF, NIST SP 800-53, and PCI DSS. Detection and remediation run in the same loop: Neo finds a violation, generates a fix, and (subject to approvals) applies it. You can batch-remediate across stacks and accounts with prompts like “find and fix all unencrypted S3 buckets across our AWS accounts.”

Works with infrastructure you didn’t create with Pulumi. Neo’s governance applies to Pulumi-managed resources, Terraform state, CloudFormation stacks, and resources someone clicked together in the AWS console. That matters because the realistic adoption path is to point Neo at what you have, audit it, and gradually bring it under management, not to migrate everything first.

Progressive autonomy. Trust levels are configurable. Start with human approval for everything; loosen it for well-defined, low-risk operations as confidence builds; keep production and sensitive resources behind strict approvals. This is the part that tends to determine whether enterprises actually deploy agentic AI in anger, versus letting it sit as a sandbox toy.

IDE and CI/CD integration. The Pulumi MCP Server brings Neo into Cursor, Claude Code, Claude Desktop, Windsurf, and any other MCP-compatible client. The Pulumi Cloud UI is the home base for approvals, history, and remediation status. Neo also slots into CI/CD pipelines for pre-merge policy remediation.

Case studies:

  • Werner Enterprises reduced infrastructure provisioning time from 3 days to 4 hours using Pulumi.
  • Spear AI cut their Authority to Operate (ATO) timeline from an expected 1.5 years to roughly 3 months by using policy-as-code to evidence compliance controls for auditors.

Tradeoff to be honest about: Neo gets more valuable the deeper you are in the Pulumi ecosystem. If you’re running IaC, ESC, and policy packs already, Neo has a lot of context to draw on. If you’re kicking the tires, it’s still useful, but the differentiating capability (context-aware, policy-respecting agentic execution) is harder to feel.

Firefly AIaC

Firefly is an asset management platform with AI features bolted on top of a strong core. The core capability is asset codification: it discovers cloud resources you already have and generates the IaC for them.

  • License: Proprietary
  • Best for: Teams that need to codify existing cloud footprints or generate IaC from natural language

Strengths: solid asset discovery, multi-cloud coverage, natural-language IaC generation, drift detection with remediation hooks. Caveat: AI features here are supplementary to the asset management product, not the main event, and Firefly is less focused on agentic execution than on inventory and policy.

env0 Cloud Compass

env0’s Cloud Compass adds AI to env0’s IaC automation platform, focusing on analysis rather than autonomous execution.

  • License: Proprietary
  • Best for: Multi-IaC shops that want AI-generated PR summaries, drift explanations, and cost insights

Strengths: multi-tool support across Terraform, OpenTofu, Pulumi, and Terragrunt; AI-generated PR summaries; drift cause analysis; cost estimation. Caveat: this is analysis and explanation, not action: Cloud Compass complements an agentic tool rather than replacing one.

Spacelift AI

Spacelift’s AI work is focused on the post-run experience: explaining what happened in a deployment and helping troubleshoot failures.

  • License: Proprietary
  • Best for: GitOps shops that want AI assistance reading complex runs and diagnosing failed deployments

Strengths: AI-powered run explanation; troubleshooting guidance for failures; broad IaC tool support; mature CI/CD integration. Caveat: like Spacelift as a whole, this is observation and explanation, not generation or execution. Pair with something that writes the code.

Crossplane with Upbound

Crossplane brings Kubernetes-style declarative management to cloud resources. Upbound is the company that commercializes it, and is layering AI-native control-plane capabilities into the 2.0 generation.

  • License: Apache 2.0 (Crossplane); proprietary (Upbound)
  • Best for: Teams already deep in Kubernetes that want to manage cloud resources the same way

Strengths: Kubernetes-native model; native GitOps fit; very active OSS community; AI control-plane work emerging from Upbound. Caveat: the learning curve is real if you’re not already living in Kubernetes; the commercial AI features are still maturing.

General-purpose code assistants

General-purpose AI coding assistants are the tools your developers already have open: GitHub Copilot, Claude Code, Cursor, and Google’s Gemini and Antigravity. They write Terraform HCL, Pulumi programs, and CloudFormation templates competently, about as well as they write anything else.

  • License: Proprietary (subscription), varies by tool
  • Best for: Developers who want broad code assistance, including infrastructure code, inside their existing editor

Strengths: excellent line-by-line code completion; broad language support; first-class editor integration; trained on huge corpora. Caveat: no infrastructure context. They don’t know what’s in your account, what your policies are, or which subnet you should pick. Treat their IaC suggestions as first-pass scaffolding, not production output.

AWS Application Composer

Application Composer is AWS’s visual builder for serverless applications. Drag services onto a canvas, get a CloudFormation template out, with AI suggestions for service configuration along the way.

  • License: Proprietary (AWS, included)
  • Best for: Teams building AWS serverless apps who prefer a visual workflow

Strengths: visual development for serverless; direct AWS integration; AI suggestions for service configuration; emits CloudFormation. Caveat: AWS-only, CloudFormation-only, and best suited to serverless rather than general infrastructure.

Comparison tables

Infrastructure for AI

Tool

Category

Key strength

Limitation

Pricing

Best for

CoreWeave

GPU cloud

Purpose-built GPU infra, NVIDIA partnership

Not a general-purpose cloud

Per-GPU-hour

Large-scale AI training

Lambda Labs

GPU cloud

Approachable, pre-configured environments

Smaller scale

Per-GPU-hour

Research teams, startups

Modal

Serverless GPU

Developer experience, pay-per-second

Less infrastructure control

Pay-per-use

Variable ML workloads

Weights & Biases

MLOps

Industry-standard experiment tracking

Costs scale quickly

Free tier + paid

ML team collaboration

MLflow

MLOps

Open source, no lock-in

Self-hosting overhead

Free (self-hosted)

Flexible ML lifecycle

AWS SageMaker

Hyperscaler

AWS ecosystem integration

Higher cost, lock-in

Per-use

AWS-native orgs

Google Vertex AI

Hyperscaler

Google models, TPU access

Lock-in

Per-use

Google Cloud users

Azure ML

Hyperscaler

Microsoft integration, enterprise features

Lock-in

Per-use

Microsoft ecosystem

AI-powered infrastructure management

Tool

Approach

Key strength

Limitation

Pricing

Best for

Pulumi Neo

Agentic AI

Execution + policy automation

Best within Pulumi ecosystem

Pulumi Cloud tiers

Enterprise platform teams

Firefly AIaC

Asset management

Asset codification, IaC generation

AI is supplementary

Proprietary

Codifying existing infra

env0 Cloud Compass

Multi-IaC platform

Multi-tool support, PR analysis

Analysis, not execution

Proprietary

Multi-IaC environments

Spacelift AI

CI/CD platform

Run explanation, troubleshooting

Observation, not action

Proprietary

GitOps workflows

Crossplane / Upbound

Kubernetes-native

K8s patterns for infra

Requires K8s expertise

Open source + commercial

Kubernetes-native teams

Code assistants

Code assistant

Broad language support, IDE

No infrastructure context

Subscription

General code assistance

AWS Composer

Visual builder

Visual serverless development

AWS- and CFN-only

Included with AWS

AWS serverless apps

How to choose

There’s no universal best tool. Five questions sort the field quickly:

  • Cloud strategy. Multi-cloud means tools like Pulumi Neo, Firefly, env0, or Crossplane. Single-cloud commitment means hyperscaler-native tools may integrate more deeply (AWS Composer, SageMaker, and so on).
  • Team expertise. Programmers gravitate to tools that use real languages (Pulumi Neo, Pulumi IaC). Kubernetes teams find Crossplane natural; everyone else finds it steep. Teams that prefer visual workflows should look at AWS Composer or env0’s UI.
  • Compliance. Regulated industries (healthcare, finance, government) get the most value from tools with pre-built compliance packs and audit trails. Pulumi Neo’s CIS/HITRUST/NIST/PCI packs are the most direct fit. If preventative policy enforcement matters, prefer tools that block non-compliant deployments rather than flag them after the fact.
  • Existing footprint. Greenfield projects can use anything. Brownfield is where it gets interesting: Pulumi Neo works against Terraform, CloudFormation, and manually-created resources, which lets you adopt incrementally instead of migrating first. Mixed-IaC shops should also look at env0.
  • Budget. Open source first: MLflow for MLOps, Crossplane for Kubernetes-native infra. Open source is not free, though: self-hosting carries a real total cost of ownership in hosting, maintenance, and the expertise to operate it. Commercial tools (Pulumi Cloud, env0, Spacelift) fold that operational cost into the price, on top of support, SLAs, and the enterprise-tier features open source can lack.

Before adopting anything, get visibility into what you have today, pilot on staging where mistakes are cheap, and define success metrics up front: time to provision, policy violation rates, mean time to remediate. The best AI infrastructure tool is the one your team will actually use, which means meeting developers where they already work.

Key trends and outlook

From copilots to agents. “AI suggests code” and “AI runs the deploy” are different products with different governance implications. The teams getting value from agentic tools have figured out which tasks to delegate fully, which to keep human-in-the-loop, and which to leave alone.

Progressive autonomy. Enterprise adoption follows a predictable shape: visibility → recommendations → human-approved execution → autonomous execution for well-understood scenarios. Tools that support that graduation will see stronger enterprise traction than tools that force an all-or-nothing choice.

Policy as the control plane. As AI takes on more infrastructure tasks, policy frameworks become the primary control plane. Done well, policy becomes an enabler (guardrails that let you safely expand automation) rather than a brake on velocity.

MCP standardization. The Model Context Protocol is becoming the integration standard between AI assistants and infrastructure tools. The practical upshot is that the IDE is increasingly a viable surface for managing infrastructure, with AI mediating between natural language and the underlying APIs.

Consolidation. CoreWeave acquiring Weights & Biases and NVIDIA acquiring Run:ai both point toward integrated platforms across the AI infrastructure stack. For tool selection today, that’s an argument for picking vendors with clear strategic direction over point solutions likely to be acquired or out-competed.

Frequently asked questions

What is the best AI agent for cloud infrastructure management?

For enterprise governance plus true agentic capability, Pulumi Neo is currently the most complete offering: it executes changes (not just suggests them), integrates with pre-built compliance frameworks, and works with infrastructure regardless of how it was provisioned. For Kubernetes-native shops, Crossplane with Upbound’s emerging AI features is worth tracking.

How can I use generative AI to manage cloud infrastructure?

Start by identifying the repetitive, time-consuming infrastructure work in your team. The highest-value early use cases tend to be:

  • Code generation: write IaC from natural-language descriptions, then review.
  • Documentation: explain unfamiliar configurations and reduce onboarding time.
  • Troubleshooting: analyze logs, errors, and configs to suggest likely causes.
  • Security and compliance: scan for violations and generate fixes.
  • Full automation: for shops that want it, agentic platforms like Pulumi Neo execute provisioning workflows end-to-end with governance controls intact.
What is agentic AI for infrastructure?

Agentic AI for infrastructure means AI systems that autonomously execute infrastructure tasks, not just generate code suggestions. The difference from a code assistant is action: an agent understands your environment, respects your policies, and performs multi-step work (provisioning, configuration, security controls) within the boundaries you’ve defined.

How do AI agents improve DevOps workflows?

By automating the repetitive parts (provisioning, drift remediation, policy enforcement), reducing context-switching, and catching issues earlier. Teams that have rolled out agentic tools well report faster provisioning, fewer policy violations slipping into production, and quicker compliance remediation. The compounding effect (engineers freed for higher-value work as the agent absorbs the routine) is the actual point.

What’s the difference between AI code generation and agentic execution?

Code generation suggests IaC for a human to review and run. Agentic execution generates the code and runs it, with policy and governance enforced along the way. It’s the difference between a knowledgeable colleague who suggests an approach and a knowledgeable colleague who also ships the change with appropriate oversight.

Can AI generate Terraform or Pulumi programs?

Yes. Most general-purpose AI assistants (Copilot, Claude, Gemini, ChatGPT, Cursor) can produce Terraform HCL, Pulumi programs in TypeScript / Python / Go, and CloudFormation. Quality varies. Generic assistants lack environment context and will happily emit syntactically correct but operationally wrong code. Infrastructure-specific tools like Pulumi Neo generate code that’s aware of your existing resources, policies, and provider constraints.

Can AI help with infrastructure compliance and policy automation?

Yes, and this is one of the highest-leverage uses of AI in infrastructure. Tools like Pulumi Neo detect policy violations across your footprint (including resources created outside IaC), generate compliant remediation, and apply it with the approvals you require. Pre-built frameworks for CIS, HITRUST, NIST, and PCI DSS shorten what would otherwise be a long manual compliance project.

Are AI infrastructure tools secure for enterprise use?

Enterprise-grade ones are. Look for RBAC, full audit logging of AI actions, preventative policy enforcement (not just detection), and human-in-the-loop approvals for sensitive operations. SOC 2, data residency options, and configurable autonomy levels are table stakes. The risk to avoid is wiring a consumer AI assistant directly into a production cloud account without those controls.

How do I choose between different AI infrastructure tools?

Match the tool to your context: existing clouds and IaC, team skills, compliance requirements, budget. Enterprise platform teams with governance needs should evaluate Pulumi Neo first. MLOps-focused teams should look at Weights & Biases or MLflow. For general code assistance inside the editor, a general-purpose assistant like Copilot, Cursor, or Gemini is the default. Most organizations end up with more than one: a code assistant for daily development and an agentic platform for production infrastructure.

What are the best tools for machine learning infrastructure?

For GPU compute, CoreWeave leads at scale, Modal wins for variable workloads and developer experience, and the hyperscalers are the default pick if you’re already inside one of them. For experiment tracking and model management, Weights & Biases is the leading commercial platform; MLflow is the leading open-source one. Most teams pick on the deploy model and pricing fit rather than capability gap. For the cloud infrastructure underneath the ML workloads, the same infrastructure management story applies: Pulumi Neo can provision and govern ML infrastructure the same way it handles everything else.

Conclusion

Two categories, two problems. GPU clouds and MLOps platforms (CoreWeave, Lambda, Modal, hyperscaler trio, W&B, MLflow) solve the compute and lifecycle problem for running AI workloads. AI-powered infrastructure tools (Neo, Firefly, env0, Spacelift, Crossplane, code assistants, Composer) solve the management problem for everything else.

For GPU workloads, the choice mostly comes down to scale and where you already are. For infrastructure management, the real question is how much you actually want AI to do. Code assistants help you write IaC faster, but you’re still running it. Agentic platforms like Pulumi Neo execute changes and enforce policy on the way through, with the guardrails you control.

The pattern from teams getting real value: treat AI as a force multiplier on routine work (provisioning, drift, compliance) and keep human judgment in the loop for the architecture and the edge cases.

If you want to see agentic infrastructure management running against real resources, start with Pulumi Neo.

Infrastructure as code is the right model for production systems. State tracking, drift detection, and repeatable deployments all matter when you’re managing real workloads.

But sometimes, you also need a quick, one-off interaction with the cloud: create a bucket or a database, look up a VPC, delete a stray resource.

Today we’re introducing pulumi do, a new command for direct resource operations. With pulumi do, you can create, read, update, delete, and query any cloud resource from the terminal with a single command, across thousands of Pulumi-supported providers — no project, code, or state required.

The problem: Sometimes IaC is more than you need

When you’re managing production workloads, IaC is the proven solution. Code lets you declare complex systems, state tracking catches drift before it becomes a problem, dependency graphs sequence changes safely, and policy keeps everything in bounds. That full lifecycle, especially with the backing of a platform like Pulumi Cloud, is exactly what you want to build systems that scale.

But when you (or your coding agent) need an ad-hoc Postgres database, the simplest path with IaC still takes several steps: make a directory, create a project, configure your credentials, write the code, preview, deploy. It works, but it’s not always necessary for what should be a simple operation. pulumi do collapses all of those steps into one, using the same Pulumi providers, resource model, and ecosystem that powers the core Pulumi platform.

Resource creation is also only part of the problem. As Joe laid out in The Agentic Infrastructure Era, the real challenge for AI agents isn’t with code or CLI commands, it’s with everything else: getting a cloud account, resolving credentials, wiring configuration across multiple services. Agent accounts, also released this week, simplify this by letting an agent provision its own ephemeral Pulumi Cloud account, and Pulumi ESC takes care of consolidating credentials across providers. Together, with pulumi do, agents can now go from zero to deployed infrastructure without requiring a human in the loop — and when that one-off resource needs to grow into a more permanent system, there’s a clear graduation path back to full Pulumi IaC.

What it looks like

As an example, say you wanted to provision an S3 bucket. With the AWS CLI, you’d need to assemble an aws s3api create-bucket invocation with the right set of command-line flags, region constraints, a globally unique name, and so on. With pulumi do, it’s just this:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create
</span></span>

That might not look all that different on the surface — but because you’re using the Pulumi engine and resource model, you can provide a minimal set of input properties, take advantage of provider-defined defaults, and use Pulumi’s auto-naming feature to give the bucket a unique name automatically:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">This will create aws:s3/bucket:Bucket with the following inputs:
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"bucket"</span>: <span class="s2">"bucket-279ea56"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"tagsAll"</span>: <span class="o">{}</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Please confirm that this is what you<span class="err">'</span>d like to <span class="k">do</span> by typing <span class="sb">`</span>yes<span class="sb">`</span>:
</span></span>

Answer yes (or just pass --yes), and you’re done. To delete the bucket:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket delete bucket-279ea56 --yes
</span></span>

Need to look up an existing resource? Use a provider function:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:ec2:getVpc --default
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"arn"</span>: <span class="s2">"arn:aws:ec2:us-west-2:663782525873:vpc/vpc-d7b311af"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"cidrBlock"</span>: <span class="s2">"172.31.0.0/16"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableDnsHostnames"</span>: true,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableDnsSupport"</span>: true,
</span></span><span class="line"><span class="cl"> <span class="s2">"enableNetworkAddressUsageMetrics"</span>: false,
</span></span><span class="line"><span class="cl"> <span class="s2">"id"</span>: <span class="s2">"vpc-d7b311af"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

Same CLI, same output contract, same provider ecosystem.

The command shape

The do command accepts a Pulumi resource type, or type token, to determine the action to take. Type tokens have the form <package:module:resource>. For example, aws:s3:Bucket refers to the Amazon S3 Bucket resource that belongs to the s3 module of the aws package.

You can also provide a portion of the token to help you find what you’re looking for without ever having to leave the terminal:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Functions and resources <span class="k">for</span> the s3 module.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Run <span class="s1">'pulumi do <module/resource/function> --help'</span> <span class="k">for</span> more details on usage.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Functions:
</span></span><span class="line"><span class="cl"> aws:s3:getAccessPoint
</span></span><span class="line"><span class="cl"> aws:s3:getAccountPublicAccessBlock
</span></span><span class="line"><span class="cl"> aws:s3:getBucket
</span></span><span class="line"><span class="cl"> aws:s3:getBucketObject
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Resources:
</span></span><span class="line"><span class="cl"> aws:s3:AccessPoint
</span></span><span class="line"><span class="cl"> aws:s3:AccountPublicAccessBlock
</span></span><span class="line"><span class="cl"> aws:s3:AnalyticsConfiguration
</span></span><span class="line"><span class="cl"> aws:s3:Bucket
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-d20976f
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"arn"</span>: <span class="s2">"arn:aws:s3:::bucket-d20976f"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucket"</span>: <span class="s2">"bucket-d20976f"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucketDomainName"</span>: <span class="s2">"bucket-d20976f.s3.amazonaws.com"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"bucketNamespace"</span>: <span class="s2">"global"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

The package, module, and resource/function segments all come directly from the Pulumi provider schema, so --help works at every level of the tree. Pass a package name, optional module, and optional function or resource type, and do returns the appropriate level of detail.

You can also provide the input properties of a resource in a YAML or JSON file with the --input option. To create a container service in Google Cloud Run for example:

<span class="line"><span class="cl"><span class="c"># service.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">location</span><span class="p">:</span><span class="w"> </span><span class="l">us-central1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">deletionProtection</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"> </span>- <span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">us-docker.pkg.dev/cloudrun/container/hello</span><span class="w">
</span></span></span>
<span class="line"><span class="cl">$ pulumi <span class="k">do</span> gcp:cloudrunv2:Service create <span class="se">\
</span></span></span><span class="line"><span class="cl"> --input yaml <span class="se">\
</span></span></span><span class="line"><span class="cl"> --input-file service.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">This will create gcp:cloudrunv2/service:Service with the following inputs:
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"deletionProtection"</span>: false,
</span></span><span class="line"><span class="cl"> <span class="s2">"location"</span>: <span class="s2">"us-central1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"name"</span>: <span class="s2">"service-b8af752"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"template"</span>: <span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"containers"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"image"</span>: <span class="s2">"us-docker.pkg.dev/cloudrun/container/hello"</span>
</span></span><span class="line"><span class="cl"> <span class="o">}</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>
</span></span><span class="line"><span class="cl"> <span class="o">}</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

The result:

<span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"createTime"</span><span class="p">:</span> <span class="s2">"2026-05-22T23:00:22.415839Z"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="err">...</span>
</span></span><span class="line"><span class="cl"> <span class="nt">"urls"</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"https://service-b8af752-921927215178.us-central1.run.app"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"https://service-b8af752-ctnulmzwoa-uc.a.run.app"</span>
</span></span><span class="line"><span class="cl"> <span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span>
Resource operations

Most resources support the full set of CRUD operations — create, read, update, delete, and list — directly from the CLI. Each operation maps to a provider CRUD method using the same provider logic a full Pulumi program would use, and resources are addressable by their cloud provider IDs:

<span class="line"><span class="cl"><span class="c1"># Create a resource</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket create --yes <span class="p">|</span> jq -r <span class="s2">".name"</span>
</span></span><span class="line"><span class="cl">bucket-4f5cb22
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Fetch it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-4f5cb22 <span class="p">|</span> jq -r <span class="s2">".hostedZoneId"</span>
</span></span><span class="line"><span class="cl">Z3BJ6K6RIION7M
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Update/patch it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket patch bucket-4f5cb22 --input yaml --input-file tags.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket <span class="nb">read</span> bucket-4f5cb22 <span class="p">|</span> jq <span class="s2">".tags"</span>
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"key"</span>: <span class="s2">"value"</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Delete it</span>
</span></span><span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws:s3:Bucket delete bucket-4f5cb22
</span></span>
Provider configuration

Today, pulumi do resolves provider configuration — for example, applying your AWS credentials — using environment variables or credential files as supported by each individual Pulumi provider. See the Pulumi Registry for provider-specific configuration details.

Designed for humans and agents

We’ve designed pulumi do to serve humans and coding agents equally well, guided by three fundamental ideas:

  • Consistent command structure across every provider. The do <package:module:type> <operation> pattern is the same for AWS, Azure, Google Cloud, Kubernetes, Cloudflare, Datadog, and every provider, including packages containing higher-level component resources. Once an agent learns that pattern, it applies across the board.

  • Predictable output contract. JSON on stdout, progress on stderr, consistent exit codes. An agent can parse the result programmatically without scraping human-formatted tables.

  • A single CLI command that works across every cloud. Many cloud and SaaS providers don’t have a full CLI at all. pulumi do generates commands from the provider schema, so if a Pulumi provider exists for it, the CLI just works. Neither humans nor agents need to install, learn, or even know about cloud provider-specific tooling.

What’s next

Resource operations and provider functions are the foundation. The pulumi do roadmap extends the same direct-operation model with credential management, state tracking, and a path to full IaC.

Unified credentials with Pulumi ESC

One of the hardest parts of multi-cloud operations is credential management. Every provider has its own authentication scheme, environment variables, and session lifecycle. An agent working across AWS, Cloudflare, and Datadog today manages three separate credential mechanisms.

We’re building Pulumi ESC integration into pulumi do so you can manage credentials in one place and resolve them everywhere. ESC handles credential resolution (including OIDC-based dynamic credential generation and short-lived tokens) across all of your providers. Name the credential set, reference it, and ESC does the rest, with rotation, RBAC, and audit built in.

Cross-resource references

Real infrastructure has dependencies — subnets need VPCs, security group rules need their security groups, and so on. When you’re building resources one at a time, those references need to flow between commands somehow.

A future version of pulumi do will let resource inputs reference outputs from previously created resources, allowing the CLI to resolve them automatically and preserve the dependency graph. Later, when the time comes to graduate to a full IaC program, the generated code contains proper resource references rather than hard-coded strings.

Stateful mode and the graduation path

Today, pulumi do is stateless. Each command runs independently. A planned stateful mode will persist resource state across operations, enabling drift detection, lifecycle management, and a graduation path to full infrastructure as code.

Here’s what we’re planning:

  1. Zero setup. Your first pulumi do implicitly creates a project and stack. No manual initialization.

  2. Accumulate resources. Each operation stores resource state. After a few commands, you have a lightweight representation of your infrastructure.

  3. Eject to a full project. When the time comes, generate a Pulumi project in your chosen language with all resources imported and dependency graphs intact.

  4. Connect to Pulumi Cloud. Layer on governance, compliance, team collaboration, and deployment automation through Pulumi Cloud. Resources created via pulumi do can be governed by Pulumi Insights from day one, even before you opt into full IaC.

This path works because pulumi do uses the same providers, resource types, and property schemas as every other pulumi operation. Provisioned cloud resources stay where they are as management capabilities are added as needed.

Get started

pulumi do ships as a research preview in Pulumi CLI v3.242.0 and later. Install or update the CLI, install a provider plugin, and start running commands. The documentation has the full reference.

We can’t wait to hear your feedback. Give it a try today, tell us what works (and what doesn’t), and help shape the CLI that agents and humans both reach for first.

This week, Pulumi Neo started working in two more places: GitHub and Slack. The agent that already runs Pulumi tasks from the Cloud console and the terminal now participates in the threads where your team discusses changes.

Mention @pulumi-neo in a pull request or issue and Neo replies in the thread. Mention @Neo in a Slack channel and Neo starts a task, continuing the conversation as you reply.

Neo in GitHub

Mention @pulumi-neo in a pull request description, a top-level or inline review comment, or an issue. Neo sees the diff, the stacks linked to the repository, and their current state. Reviewers can ask Neo to walk through what a proposed change does, including resources that change in stacks the PR doesn’t touch directly. Responses land in the same thread, so the analysis becomes part of the review record and any follow-up stays with it.

Neo in Slack

Mention @Neo in any channel where Neo has been added, and Neo starts a task in the thread. The reply lands in the same thread, and follow-up messages continue the conversation there. The rest of the channel can see what was asked and what Neo found. Neo has the same capabilities here as in the Pulumi Cloud console or the terminal: check stack state, investigate failures, walk through what a change will do, or carry out actions the team has approved.

Integrations in action

A teammate posts in #platform-engineering: “API latency p95 has been climbing for two days, nobody can figure out why.” You reply:

You: @Neo check the production API stack. Anything change in the last 72 hours?

Neo starts a task in the thread, walks the stack history, and finds a configuration change to the load balancer’s idle-timeout setting that landed Friday afternoon. It posts the change, who deployed it, and when. The rest of the channel sees the finding without you having to retell it.

You: @Neo open a PR to revert idle-timeout to the previous value.

Neo edits the stack’s Pulumi program, runs pulumi preview to confirm the change touches only the load balancer, and opens a pull request with the diff and the preview output. A reviewer pulls it up:

Reviewer: @pulumi-neo what else does this change affect downstream?

Neo replies in the same review thread with the resources that change: the listener config and the target group health check. The reviewer reads, approves, and the change ships.

The investigation moved from Slack to GitHub, and both threads keep the record.

Permissions and governance

Whether the conversation starts in GitHub or Slack, Neo runs with the RBAC permissions of your Pulumi Cloud user. Stack-level controls, organization-level guardrails, and audit logging apply the same way they do for a task started from the console. Starting a conversation in a new place doesn’t grant Neo new permissions; it just changes where the conversation happens.

Try it out

Both integrations are available now for Neo-enabled organizations. The GitHub integration docs and Slack integration docs cover the one-time setup. From there, every engineer with a linked Pulumi Cloud identity can mention Neo from the threads they already work in.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, the Neo CLI launch post for Neo’s new home in the terminal, and the Neo Integrations post for the MCP servers and cloud CLIs that ship with this release.

As always, we’d love to hear what you think — and if you have any suggestions for places we should put Neo next, file an issue in pulumi-cloud-requests.

Recurring platform work slips: provider versions fall behind, drift accumulates between checks, and the quarterly audit keeps getting pushed back another month. Pulumi Neo can now run any task on a cadence you set, opening a pull request for each run.

Automations in action

Your platform team runs stacks across staging and production, and the AWS, GCP, and Kubernetes providers keep shipping new versions. Nobody has time to bump them stack by stack.

You write one automation:

Every Monday at 8 AM, check the infra/ project for stacks where the AWS, GCP, or Kubernetes provider is more than two minor versions behind. For each one, bump the out-of-date provider, run pulumi preview, and open a PR if the preview is clean.

Monday morning, Neo runs the prompt. It finds three stacks behind on the AWS provider, edits each program, runs preview, and opens a PR for each clean run. You review the PRs like you would any other dependency bump, merge them, and Neo runs again next Monday.

What automations are for

The launch includes four built-in templates: a provider freshness check, an encryption audit, a backup audit, and an activity digest. You can also skip the templates and write your own prompt.

Pick from hourly, daily, weekdays, or weekly cadences. Each automation gets its own page in the Automations tab, where you can edit the prompt, change the schedule, run it once on demand, or pause it.

Safe by default

Automations default to two settings that fit recurring work. Approval mode is auto, so a run doesn’t wait for human confirmation between steps. Permission mode is read-only, so a run can read state and propose changes through pull requests but can’t apply changes directly. You can override either default per automation.

How automations fit with the rest of Neo

A scheduled task uses the same context as an interactive Neo task. Custom Instructions at the organization and project level apply, so a scheduled run respects the same naming conventions, tagging policies, and architecture rules your team has written down.

MCP integrations and CLI integrations work in scheduled tasks the same way they work in interactive ones, so a weekly drift check can query AWS through the aws CLI, file Linear issues, and link related PagerDuty incidents. Scheduled tasks also run with the RBAC permissions of the user who scheduled them, checked at run time; if permissions change between scheduling and execution, the new permissions apply.

Try it out

Open Neo in Pulumi Cloud, switch to the Automations tab, and pick a template or write your own prompt. The automations docs cover the form, scheduling options, and per-automation overrides.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo Integrations post for the third-party tools and CLIs your automations can use.

As always, we’d love to hear what you think — and if you have any suggestions for automations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Ewan Dawson is CTO of Compostable AI, where five engineers run an AI-native software factory: nineteen clients, custom AWS deployments, most of them shipped within a day of contract signing. This article is adapted from his recent Pulumi webinar, and covers rules in more depth than we had time for on stage.

For the past twenty years, I’ve viewed software development as a craft. The best engineers drew on decades of experience to get every function right.

But two years into the agentic AI revolution, I realised software is going to look more like a factory than a craft. The economics have changed. We can’t treat code as bespoke anymore. To scale, we have to think industrial — use the tools to ship more value with fewer engineers.

I joined Compostable AI soon after it was founded 2.5 years ago, and I built the engineering org AI-native from day one. The technology has come a long way since then, and so has my understanding of what AI-native actually means. Here are seven rules I keep coming back to.

1. Transform, don’t enhance

Going AI-native isn’t an upgrade to your existing process. If you treat AI as a way to hand your developers smarter tools, you leave most of the value on the table. You get the leverage by rebuilding how you write software — and the culture and processes around it.

I know that’s a tall order for a large, mature engineering org. My advice: start small. Pick one team or one business area and run it as a fully AI-native function. Take what you learn and roll it out from there. And do the political work early, especially with your Governance, Risk, and Compliance function. Get GRC on your side early. Otherwise AI becomes a compliance fight instead of a structural advantage.

Don’t bolt AI onto your existing workflow. Redesign the workflow around what agents can do.

Most of the leverage in this technology comes from rebuilding around it. The tool change is the small part.

2. Remove the problem, don’t solve it

Going AI-native flips which problems are hard and which are easy. The right move often isn’t to engineer a solution. It’s to reframe the problem so it goes away.

Here’s an example. Serving multiple clients with agents writing the code, blast radius wasn’t a hypothetical. One bad agent run could trash a customer’s database, or leak one client’s data into another’s. Our instinct was to build a secure multi-tenant sandbox with guardrails, approvals, rollback. But every version we tried still had agents loose in a shared environment, one bug away from making one customer’s data visible to another’s. So we removed the problem: every client gets two dedicated AWS accounts, one for production and one “digital twin” staging account. Agents iterate on staging until the work checks out. Only then does it ship to production. We have nineteen accounts now, one per client.

Managing nineteen AWS accounts with five engineers used to be an administrative nightmare. When code is cheap, infrastructure-as-code tools like AWS Control Tower and Pulumi make it the easier path.

Remove the problem before you try to solve it.

It’s cheaper to reframe the problem than to engineer your way through it.

3. Pick tools your agents can drive

Removing problems is the process side. The other side is tooling. If you want an automated factory, your tech stack has to be something agents can drive. This overlaps a lot with tools that have great developer experience. If a tool has a robust API plus a clean CLI, agents can drive it. If it’s heavy click-ops around a web UI, agents stop there.

We didn’t get there first try. Our first IaC tool worked fine when we had a couple of clients. As we added more, accounts drifted, deployments slowed, retries got complicated. We needed something built for where we were heading.

I went looking, and Pulumi fit. We express infrastructure as type-safe code — TypeScript, in our case, rather than HCL — and agents are good at writing it. Pair that with Pulumi Neo — pre-loaded with domain-specific Pulumi skills — and we ship infrastructure that follows best practices. One of my colleagues put it: “The scary thing about Neo is it just seems to know everything about what we do.” Pulumi IaC plus Pulumi ESC for configuration beats stitching tools together. And TypeScript lets us build higher-level abstractions that keep the AWS account fleet tractable.

“I don’t actually care if it’s HCL or TypeScript, as long as my software development agents can write it. And they do a better job with TypeScript than HCL.”

Tools have to share your AI-native mindset. If they don’t integrate deeply, the human becomes the glue.

If part of your stack still requires a human to click through a web UI to provision an account, your agents stop there.

4. Don’t let one agent do everything

When I first started with agents, I reached for a god prompt: one massive system prompt meant to guide a single agent through the whole software lifecycle. It didn’t work. Agents struggle when you give them multiple goals. The writer is lenient on its own work — it won’t catch what it just shipped. You don’t want it reviewing the code, checking for security flaws, or hunting bugs.

We get better results from a constellation of specialized agents, each handling one part of the line. Pulumi Neo handles infrastructure. Alongside it sit agents specialized in:

  • Code implementation
  • Code review and testing
  • Security auditing
  • Internal standards compliance
  • Documentation updates

Tasks pass down the line. Clean code comes out the other end, with almost no human involved.

Don’t let any agent mark its own homework. Specialize by job.

Treat agents the way you’d treat a team. The one who writes the code shouldn’t be the one signing it off.

5. Measure human hours per unit of value

Once we had agents writing and agents reviewing, throughput went up — but the bottleneck moved past the PR. Engineering hours were still the most expensive thing in the building, so my core metric is human hours per unit of value produced. Minimize that.

That means hunting for every step that still goes through a person — especially the mid-pipeline steps between ideation and production. Automate the human touchpoints along that line, and the factory runs 24/7.

Pushing automation this hard also forces good engineering. A chaotic, undocumented process is impossible to automate. Good engineering is still good engineering, AI or not. Agents won’t fix a weak process.

Measure human hours per unit of value. Treat every one as a bottleneck to remove.

You can’t automate what you can’t describe. Every human in the pipeline marks a piece that hasn’t been described yet.

6. Design for convergence, not one-shot correctness

Even with the human touchpoints removed, the agents don’t ship right the first try. Once you embrace the factory pipeline, you stop needing them to. We design for convergence instead — a system that lands on the right answer through automated iteration.

The loop we run looks like this:

  1. Refinement: agents iterate on the Product Requirements Document until the problem is clear.
  2. Planning: agents draft multiple technical approaches, and evaluation agents pick the best one.
  3. Implementation: coding agents write the software.
  4. Review: specialized checking agents look for bugs, API misuse, and security flaws.

If the checkers find a problem, they hand it back to the implementation agent. The loop repeats until the tests pass and the agents agree on a clean PR. Once it converges, we merge and deploy to staging.

Two things have to be true. You need a way to evaluate the output. Without that, you don’t know when to stop. And the loop has to converge — each pass has to get closer. A checker that fails every PR for a different reason isn’t helping — it just keeps the work going in circles. The feedback has to narrow the search, not widen it.

Once it converges, the question moves on. How cheap can we make it? Lower the time to PR, reduce token count, drop the overall cost. The optimization never really ends.

Don’t aim for one-shot correctness. Design for convergence.

It doesn’t matter how many tries it takes, as long as the loop closes without a human in it. Get convergence first. The optimization comes after.

7. Run the factory in the cloud, not on a laptop

Even a converged factory has to live somewhere. Try running a fully automated factory on individual developers’ laptops, and it falls apart. Laptops are highly trusted machines. Put autonomous agents on them and your security posture drops, fast. And the factory has to run 24/7. Events come from elsewhere — PR comments, Slack threads, errors in test environments.

Cloud also kills configuration drift across a dozen developer machines. The same prompts run against different model versions, and env vars sit half-set on half the laptops. The thing you’re trying to optimize lives in different states across the team. Cloud isn’t just where the factory runs; it’s the only place a team can iterate on it together. Keep everything in one place — AWS, Pulumi Cloud, GitHub. The specific stack matters less than the principle of one place.

And the part that matters most: the factory keeps running, testing, and deploying long after we’ve closed our laptops and gone to sleep.

Build the factory somewhere you can work on it — not just somewhere it can run.

A factory scattered across laptops can’t be improved as a system. Cloud keeps it in one shape, 24/7, and lets the team iterate together.

Closing thought

I’ve shipped more code in the last two years than I did in the fifteen before that. Most of it in languages I couldn’t write by hand. And that’s after a stretch in leadership where I wrote almost none.

If you’re where I was two years ago: don’t ask how AI fits into what you already do. The factory is built one rule at a time, and it’s not a template — it’s the practice of finding where you’re taking advantage of the new economics and where you’re not, where your practices still need an update. The leverage is in finding these places and improving them.


Watch the original Pulumi webinar. Learn more about Compostable AI and Pulumi Neo.

Since launching Pulumi Neo, over 4,500 organizations have used it to delegate real infrastructure work: scaffolding, migrating, investigating, operationalizing, and more. Though that usage has come entirely through Pulumi Cloud, we know a large portion of Pulumi users live in the terminal, and increasingly that’s where AI tools run too. Now we’re bringing Neo there.

pulumi neo brings the same Neo experience you’ve had in Pulumi Cloud to your terminal. Running locally means there’s no separate branch to push, no credentials to provision, and no context to paste: Neo picks up the setup you already have.

What local execution unlocks

Neo inherits your setup when it runs locally. The CLIs you’ve authenticated, the environment variables and kubeconfigs you’ve configured, and the project you’re editing right now are all available without any setup on your part. That means Neo can run the same commands you would, against the same systems you have access to.

That makes pulumi neo a fit for paired, interactive sessions where you and Neo work through a problem together. For asynchronous, autonomous tasks you set up and come back to, Pulumi Cloud Neo is still the surface to reach for. Both reach the same Neo.

You can also hand tasks to Neo from other agent sessions. Simply ask your agent, such as Claude Code or Codex, to hand the task off to Neo, and the Neo handoff skill packages the current thread (goal, repo pointers, conversation summary) and starts a Neo task using pulumi neo under the hood. This works anywhere skills are supported, without leaving your current session.

What carries over

Local tools and context are what’s new. The full set of controls you have in Pulumi Cloud Neo applies in the terminal: approval modes (manual, balanced, auto) for tool calls, permission modes (default, read-only) for what Neo can change, and Plan Mode for research and planning before execution.

Integrations carry over too. The integration catalog (connectors to Atlassian, Datadog, Linear, PagerDuty, and others) works the same way from the terminal. Identity, RBAC, and audit all run through your pulumi login, the same way they do in the console. See the Pulumi Neo docs for details.

Get started

pulumi neo ships with the latest Pulumi CLI. To start a session:

  1. Authenticate to Pulumi Cloud with pulumi login.
  2. Run pulumi neo, or pass an initial prompt: pulumi neo "what's in this stack?".

pulumi neo is part of a broader launch on agentic infrastructure. See the pulumi neo command reference and the Pulumi Neo docs for details. 10 things you can do with Neo is a good starting point for tasks to try. The Pulumi Community Slack is the place for questions and feedback.

Pulumi Neo already understands your infrastructure: your code, your stacks, your state. Today we’re launching new capabilities that extend Neo’s reach in two directions: into the third-party systems your team uses to plan and observe, and out to the cloud CLIs that actually drive your infrastructure.

The first half is MCP integrations: connections to Atlassian, Datadog, Honeycomb, Linear, PagerDuty, and Supabase that show up as tools Neo can call during a task. The second half is CLI integrations: scopable access to aws, gcloud, az, and kubectl. Both are configured once at the org level and available to every Neo task in the organization.

Integrations in action

A PagerDuty alert just fired: RDS storage on payments-prod is at 90% and climbing. You want to know how fast, and whether you can buy yourself any runway before it fills.

You: Neo, RDS storage on payments-prod just paged at 90%. How fast is it growing, and what do we have configured?

Neo pulls the active incident from PagerDuty, decides on its own to check Datadog for the storage-utilization curve over the last 30 days, and runs aws rds describe-db-instances --db-instance-identifier payments-prod through your production-aws CLI integration (the name your org gave its production AWS credentials). The database has been growing about 5 GB a day. The instance has AllocatedStorage at 200 GB and MaxAllocatedStorage also at 200, so storage autoscaling is effectively disabled. At current growth, the disk fills in three days.

You: Bump max allocated storage to 500. Open a PR.

Neo edits the payments stack’s Pulumi program to raise maxAllocatedStorage from 200 to 500 on the RDS instance, runs pulumi preview to confirm the change is scoped to that one resource, and opens a pull request with the diff, the preview output, and links to the PagerDuty incident and the Datadog graph. You review the PR and merge it. Pulumi applies the change, and Neo posts the resolution back to PagerDuty.

With three integrations and one conversation, the change is reviewed, shipped, and the alert resolved a few minutes later.

MCP integrations: context from your existing tools

The launch catalog covers six services that show up most often in infrastructure investigations: Atlassian for Jira issues and Confluence runbooks, Datadog for metrics and logs, Honeycomb for traces, Linear for issue tracking, PagerDuty for incidents and on-call schedules, and Supabase for managed database changes. Each connects Neo to a remote MCP server hosted by the provider, so the agent has access to the full set of tools the vendor chooses to expose.

Integrations can be enabled by organization administrators on the Neo Settings page. Once configured, they’re available to every Neo task in your organization.

CLI integrations: live cloud insights

CLI integrations cover what MCP doesn’t reach: live cloud insights. With AWS, GCP, Azure, or Kubernetes connected, Neo can check live database utilization, look up the current state of a running service, verify a service quota before scaling, or reach into resources that aren’t managed by any Pulumi stack.

An admin enables a CLI integration the same way as an MCP one, from your org’s Neo settings. Each integration gets a name your team chooses, like production-aws or staging-gcloud, and tasks reference that name to tell Neo which environment to reach into. You can connect multiple instances of the same CLI (for example, production-aws and staging-aws) so Neo can investigate staging without touching production. Credentials are backed by Pulumi ESC environments your org owns; the CLI integrations docs walk through setup.

Per-task control and failure handling

Both surfaces default to org-wide availability, with per-task overrides. Before starting a task, you can toggle individual MCP integrations off. The toggles only affect that task; the org-level configuration is unchanged.

Failures behave the same way for both. If an integration can’t be reached, Neo logs a warning, skips it, and continues with the rest. A single broken integration doesn’t stop a task. CLI integration connect and disconnect events go to your organization’s audit log, and Neo’s individual CLI calls appear in the task transcript alongside its other tool calls.

Try it out

Both MCP and CLI integrations are available now for Neo-enabled organizations. Open your org’s Neo settings, connect the MCP server or CLI of your choice, and let Neo do the next investigation against the tools you already use. The MCP integrations docs and CLI integrations docs walk through credential setup for each one, and the Neo integrations hub ties it all together.

Today’s launch is part of a bigger story. Read our launch-day piece on the agentic infrastructure era for the broader vision, and the Neo CLI launch post for Neo’s new home in the terminal.

As always, we’d love to hear what you think — and if you have any suggestions for integrations that’d make Neo even better, file an issue in pulumi-cloud-requests.

Last fall, after launching Pulumi Neo, we wrote up 10 things you could do with it. In the months that followed, as platform teams handed Neo more real work, we watched and listened, shipping a steady stream of features like plan mode, read-only mode, AGENTS.md, an integration catalog, cross-cloud migration, and task sharing. With today’s release, Neo extends beyond the Pulumi Cloud console into the Pulumi CLI, GitHub, and Slack.

So here are 10 more things you can do with Neo.

1. Deploy your app to AWS without writing IaC

Hand Neo a repo and a target cloud. Neo picks the right services, writes the Pulumi, and opens a PR.

The cloud infrastructure part of getting a new service running, especially one in a new language, is always a few hours of boilerplate: a VPC and subnets, an IAM role, security groups, a load balancer, DNS, and a TLS cert.

With Neo, that work collapses into a prompt. Point Neo at a repo and ask:

Deploy this app to AWS as a publicly accessible service.

Plan mode comes back with the resources Neo will create, named and sized: ECS Fargate, an ALB, and the VPC wiring. Approve, and Neo writes the Pulumi program, runs a preview, and opens a PR. You, the human in the loop, merge it after review.

Neo planning a PR and deploying an app to AWS.

[

Start a Neo task Ask Neo to deploy your app to AWS and make a PR

](https://app.pulumi.com/neo?prompt=I%27d+like+to+deploy+this+app+to+AWS.+Confirm+what+you%27ll+create.)

2. Diagnose a slow API from metrics, logs, and code

Slow endpoints live at the seam between runtime metrics and the stack that runs them. Neo reads both and proposes a fix with the metric evidence as the rationale.

Production incidents often involve multiple tools. When the checkout endpoint’s p95 climbs from 200ms to 1.2s, the metric is in Datadog, but the cause might be somewhere in your AWS account: maybe RDS is out of IOPS, maybe the connection pool is too small, maybe the autoscaler isn’t keeping up. Connecting “this metric looks bad” to a recent backend change and then to a one-line fix in your Pulumi program is an exercise in detective work.

Neo’s integration catalog bridges this gap. With built-in Datadog, PagerDuty, and Honeycomb integrations sitting alongside your Pulumi state, Neo can read traces and metrics from the tools your team already uses and take action.

Ask Neo:

Find the scaling bottleneck on /checkout from the last 7 days of metrics and propose a fix.

Neo pulls the metric history, matches the Datadog tag db.cluster=checkout-rds to the RDS instance in your prod-checkout Pulumi stack, and opens a PR with a Pulumi diff that bumps the storage IOPS and raises the connection-pool ceiling. You review and roll out the fix.

Toggle on the Honeycomb integration so Neo can read traces and metrics alongside your Pulumi stacks.

3. Triage a PagerDuty alert from Slack

A page comes in. You paste it into your on-call channel and tag Neo, and Neo replies with the cross-system view you’d otherwise spend the first 20 minutes assembling.

On-call triage is often about getting up to speed quickly. You get paged because something is in the red, and you don’t know why.

You mention Neo in the on-call Slack channel:

@neo, what’s going on with this alert?

Neo starts querying metrics and traces. With PagerDuty and Datadog in the integration catalog, it correlates the alert with every deploy and stack change tagged with the alert’s service in the last hour, and finds the change that lines up:

Two deploys in the last hour touched services tagged service:checkout: checkout-api@a3f9c2 (12 min ago, app-layer deploy) and Pulumi stack prod-checkout-rds (45 min ago, decreased max_connections from 200 → 100). p99 inflection at 14:03 lines up with the stack change. Likely cause: the connection-pool reduction is starving the API under current load.

You ask a couple of clarifying questions in-thread, then ask Neo to open a rollback PR against the Pulumi stack.

Authorize PagerDuty and Datadog in Neo's settings. Neo can then read alerts in your on-call Slack channel, find the change that correlates, and open a PR when you ask.

4. Implement a Linear ticket end-to-end

Hand Neo a ticket number from Linear, Jira, or GitHub Issues. Neo reads the description and acceptance criteria, plans against your stack, and opens a PR.

Tickets often pile up not because they’re unimportant, but because they’re not urgent. Ongoing maintenance quietly accumulates. Bumping a provider version, centralizing secret management, working through small policy violations: each one matters, but none of them ever moves to the top of the queue. Explaining each one to an agent is its own overhead.

The fix is letting Neo read the ticket itself. Connect Linear or Jira through the integration catalog (GitHub Issues works too), and Neo pulls the ticket the same way an engineer would: title, description, acceptance criteria.

Ask Neo:

Implement CAD-1234 in our payments stack.

Neo reads the ticket, plans against your existing stack, opens a PR, and drops a comment back on the ticket. The ticket and the PR end up linked, and your backlog shrinks.

Neo running locally in the Pulumi CLI: fielding a Linear issue, analyzing the codebase, and producing a PR that upgrades multiple projects to the latest Pulumi and AWS provider versions.

[

Start a Neo task Implement a Linear ticket end-to-end

](https://app.pulumi.com/neo?prompt=I%27d+like+to+implement+a+ticket+from+Linear+%28or+Jira%2C+or+GitHub+Issues%29.+Ask+me+for+the+ticket+number.)

5. Tighten over-privileged IAM roles

Neo audits each role against what your stack code actually does, and proposes scoped policies that improve your security posture.

IAM cleanup is the kind of work nobody has the time to prioritize. Production has 40 roles. Half of them started with s3:* because nobody had time to scope them, and the cleanup slips quarter to quarter.

Ask Neo:

Audit IAM permissions across my accounts and propose narrower policies for over-privileged stack-managed roles.

Neo cross-references each role’s policy against what the stack code actually calls, and opens a PR per role. The PR body lists the API calls Neo found in the stack code, like s3:GetObject on audit-logs-* and s3:PutObject on audit-logs-staging, as the justification for the scoped policy. The evidence sits next to the diff.

If you’re unclear about which roles count as in-scope or what your team considers over-privileged, start in plan mode and agree on that with Neo first.

Neo auditing an over-privileged IAM role and proposing a narrower policy, with the actually-used permissions as evidence.

[

Start a Neo task Audit IAM and tighten over-privileged roles

](https://app.pulumi.com/neo?prompt=Audit+IAM+permissions+across+my+accounts+and+propose+narrower+policies+for+over-privileged+stack-managed+roles.)

6. Migrate from AWS CDK onto your platform’s golden paths

Neo reads your existing CDK app and lands a PR that swaps AWS’s defaults for your team’s published components.

CDK’s L2 constructs encode AWS’s defaults. s3.Bucket with encryption: BucketEncryption.S3_MANAGED is a sane choice, but it’s AWS’s idea of sane, not yours. A platform team that’s published its own components to the Pulumi Private Registry has already decided what your bucket defaults look like: encryption with the right KMS key, tagging by cost center.

Ask Neo:

Migrate the payments-vpc CDK stack to Pulumi using our published components.1

Neo reads the source CDK app and your registry side by side. It maps each CDK construct to its closest team-published equivalent, clarifying with you where the mapping is ambiguous.

<span class="line"><span class="cl"><span class="c1">// Before (AWS CDK, AWS's defaults)
</span></span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">bucket</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">s3</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="s2">"Assets"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">bucketName</span><span class="o">:</span> <span class="s2">"payments-assets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">encryption</span>: <span class="kt">s3.BucketEncryption.S3_MANAGED</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">versioned</span>: <span class="kt">true</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>
<span class="line"><span class="cl"><span class="c1">// After (Pulumi, your team's published component)
</span></span></span><span class="line"><span class="cl"><span class="kr">import</span> <span class="o">*</span> <span class="kr">as</span> <span class="nx">platform</span> <span class="kr">from</span> <span class="s2">"@payments/platform"</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">bucket</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">platform</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="s2">"assets"</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nx">bucketName</span><span class="o">:</span> <span class="s2">"payments-assets"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"> <span class="nx">classification</span><span class="o">:</span> <span class="s2">"internal"</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">});</span>
</span></span>

[

Start a Neo task Migrate CDK onto your golden paths

](https://app.pulumi.com/neo?prompt=I%27d+like+to+migrate+this+CDK+stack+to+Pulumi.+Use+our+published+components+where+you+can.)

7. Migrate a service to Kubernetes from a runbook

Once the migration pattern is written down, the next service to move is a prompt away.

Containerizing an app and moving it to Kubernetes involves several small decisions: which base image, what labels go on deployments, how ingress is wired, and how secrets reach the pod. But after a team has moved two or three services, the pattern is set. The decisions get written down in a runbook, and every subsequent migration is mostly the same shape.

Ask Neo:

Containerize the billing-api service and write its Kubernetes manifests, following our K8s migration runbook in Confluence.

Neo reads the source repo and the runbook in Confluence via the integration catalog and starts working on your request.

You can save this as a Neo skill that splits the work into multiple PRs — Dockerfile first, ECR config next, Deployment/Service/Ingress manifests after — and link back to each runbook convention for ease of review. The output reflects your conventions: the labels you actually use, the ingress class you’ve standardized on, and the External Secrets Operator config your team prefers.

You’re still the one reviewing the PRs and deciding what the cutover looks like in production. Neo follows your internal standards, so the new service ends up shaped like the last one you migrated.

Neo migrating a VM-based service to Kubernetes step by step, following the team's Confluence runbook.

Once you’ve delegated something a few times, the next move is to automate it. The remaining three tasks are the kind Neo doesn’t need to be asked for. Drift, deps, compliance: they’re the operations you put on a schedule.

8. Schedule daily drift checks across your cloud infrastructure

Schedule a daily drift check across your cloud. Wake up to PRs that fix what changed overnight.

Configuration drift is an ongoing challenge. The security team rotated an IAM role at 04:47 UTC. Someone changed a security group in the AWS console three weeks ago. Left alone, drift turns into security gaps, into compliance issues, and into the kind of “wait, who changed that?” confusion nobody wants to chase down.

Pulumi Cloud is already good at drift detection. Neo takes it a step further.

Ask Neo:

Every morning at 6 AM, check all production infrastructure for drift and create PRs to fix any issues you find.

From then on, the task runs on its own, and you wake up to a PR per drifted resource. The description spells out what happened (iam_role.audit-reader had inline policy AllowReadAuditLogs added at 04:47 UTC) and cites the section of infra/runbooks/drift.md Neo followed.

Some drift gets encoded into the Pulumi program, like the IAM rotation above. Some gets reverted, like the security group rule added from the console. Some gets ignored entirely, like autoscaler-managed Lambda concurrency reservations the runbook tells Neo to skip. You write the runbook once; Neo follows it every morning to decide what to do.

Neo's morning drift PR. The body names the resource, the change, when it happened, and the section of the runbook Neo followed to decide what to do.

[

Start a Neo task Schedule a daily drift check

](https://app.pulumi.com/neo?prompt=Every+morning+at+6+AM%2C+check+all+production+infrastructure+for+drift+and+create+PRs+to+fix+any+issues+you+find.)

9. Schedule weekly upgrades for outdated providers and runtimes

Lambda runtimes and container base images age out. Schedule the upgrade pass; review the PRs Neo opens.

AWS Lambda end-of-life notices come out months ahead. Node 20 stopped receiving runtime updates at the end of April. Python 3.9 ended last December. After the deadline, AWS blocks new deploys and eventually stops invoking the function. Each one needs to move to a supported runtime before the cutoff.2

Schedule it:

Every Sunday night at 10 PM, check our Lambdas for runtimes nearing end-of-support and open PRs to upgrade them.

Neo reads the AWS Lambda runtime deprecation page, matches the end-of-support runtimes against every Lambda in your stacks, and opens one PR per stack.

If Python 3.9 is reaching end-of-support, the upgrade is to Python 3.12, and datetime.utcnow() calls need to move to datetime.now(datetime.UTC). Neo can make all of those replacements in the same PR.

The same task can catch container base images with critical CVEs and bump them too.

Setting up a weekly task in the Scheduled Tasks UI. Once saved, Neo runs the prompt every Sunday night and opens PRs you review on Monday.

[

Start a Neo task Schedule a weekly runtime upgrade check

](https://app.pulumi.com/neo?prompt=Every+Sunday+night+at+10+PM%2C+check+our+Lambdas+for+runtimes+nearing+end-of-support+and+open+PRs+to+upgrade+them.)

10. Fix CIS Benchmark failures with daily PRs

Run the benchmark on a schedule. Wake up to PRs that fix what failed.

The CIS AWS Foundations Benchmark, available through AWS Security Hub, is something every team should be keeping an eye on. The benchmark finds issues like S3 buckets that allow public read access (S3.1), root user access keys that shouldn’t exist (IAM.4), or CloudTrail not being enabled (CloudTrail.1). Scanning for these issues is a solved problem, but closing and addressing them is not. They pile up between audits because each one is a code change in a different stack, and nobody owns the cross-stack cleanup.3

Schedule the cleanup:

Every morning, read CIS Benchmark failures from Security Hub. For every failure on an IaC-managed resource, open a PR with the fix.

Neo opens one PR per failure. A bucket failing S3.1 arrives as a Pulumi diff that adds blockPublicAccess to the bucket in your prod-checkout stack. The PR body lists the CIS rule number, the resource ID, the diff, and a clean pulumi preview against the live infrastructure.

The runbook is where your security team writes down what each control means for your stacks. Block public S3 buckets, except the ones tagged public-content=true for CloudFront origins. Don’t auto-touch the break-glass IAM roles; page a human instead. Multi-region CloudTrail stays on, no exceptions. Neo reads that file, checks each Security Hub finding against it, and only opens a PR for the ones you’ve said are safe to fix. The rest get routed or ignored, the way your team already handles them.

A PR raised by Neo to fix a CIS Benchmark failure, with the failing rule, the resource, and the runbook decision laid out in the body.

[

Start a Neo task Schedule a daily compliance scan

](https://app.pulumi.com/neo?prompt=Every+morning%2C+verify+all+resources+meet+our+compliance+policies+and+create+PRs+to+fix+violations.)

Neo: your newest platform engineer

Over the past year, many product teams have stopped treating AI as a request-by-request assistant and started delegating to it outright. Agents open pull requests, investigate issues, and iterate on review feedback.

But platform engineers have held back because a bad infrastructure change doesn’t just fail, it can take production down. Coding agents benefit from fast, forgiving feedback loops, but infrastructure recovery is rarely as simple as reverting a commit.

What was missing wasn’t the appetite. It was an agent with enough organizational context and grounding to plan reliably, enough guardrails to feel safe and contain mistakes, and enough discipline to keep working without being asked.

The theme across these tasks is clear. A thing platform engineers used to keep in their heads becomes a task you delegate, then becomes work that runs without you. Neo isn’t generating infrastructure from a template. It’s a teammate who knows your code, your providers, your conventions, your production metrics, and can raise PRs for you to review.

Neo now lives in your terminal, in your pull requests, in your Slack workspace, and in Pulumi Cloud. Pick one of these workflows and give it a try.


  1. The observant reader will notice Terraform-to-Pulumi was covered in the original post. ↩︎

  2. Also covered in the original post. Last year you could ask Neo to do it once. This year you can put it on a schedule. ↩︎

  3. Also covered in the original post. Last year Neo could remediate violations on demand. This year Security Hub feeds findings to a scheduled task that knows your runbook’s interpretation of each control. ↩︎

AI agents do a lot of their work through CLIs. They’re easier to call than HTTP APIs and they produce predictable output. Over the last few months our own CLI traffic has shifted from mostly people typing commands to people and agents running commands together, often in the same session.

Today we’re shipping a release built for both. The Pulumi CLI is reorganized around three ideas: the right command should be the one you can guess, anything you can do in Pulumi Cloud should also be doable from the terminal, and what comes back should be just as readable to an agent as it is to a person.

Designing for guessability

The bar we set was that both developers and coding agents should be able to guess at the right command for a particular task: pulumi env edit to modify an environment, pulumi stack get to see what’s going on with a stack, pulumi org member list to see who’s on the team. If we had to explain which command did what, the usability bar hadn’t been met.

Branches in the tree are now singular nouns like stack, env, org, and deployment. Leaves are now verbs from a canonical vocabulary — list, get, set, new, edit, remove — and they mean the same thing wherever they’re used. edit always means modify an existing thing. Wherever the old vocabulary differed, though, the old name still works: ls, rm, update, and open are all aliased to preserve backward compatibility.

For the most part, product names have also been replaced with familiar nouns. Users (human or otherwise) don’t think in product names; they think in terms of resources, stacks, environments. For example, take Pulumi ESC: the product may be named ESC (and for a while the command was too), but nobody thinks I need to initialize a new ESC — they think I need to create a new environment. The command is therefore pulumi env new, with esc init preserved as an alias to avoid disrupting anyone’s existing workflows.

<span class="line"><span class="cl">$ pulumi env new my-project my-env
</span></span><span class="line"><span class="cl">Environment created.
</span></span>

All of Pulumi Cloud in the terminal

Up to now, most of what you could do with Pulumi Cloud had to be done either in the browser or through direct API calls. Things like reviewing deployments, setting up webhooks, finding non-compliant resources, or managing deployment settings all required you to break out curl and hit the API docs or open a browser and navigate the Pulumi Cloud console.

That changes today. Pulumi Cloud is now fully accessible from the command line through the pulumi CLI, with consistently named nouns and verbs aligned to what you’d expect:

  • pulumi stack get returns a complete stack overview, metadata, resource list, and more:

    <span class="line"><span class="cl">$ pulumi stack get <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --stack cnunciato/chris.nunciato.org/production <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output json <span class="p">|</span> jq -r <span class="s2">".resources[].type"</span> <span class="p">|</span> grep <span class="s2">"aws:s3"</span>
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">aws:s3:BucketEventSubscription
    </span></span><span class="line"><span class="cl">aws:s3/bucket:Bucket
    </span></span><span class="line"><span class="cl">aws:s3/bucket:Bucket
    </span></span><span class="line"><span class="cl">aws:s3/bucketPublicAccessBlock:BucketPublicAccessBlock
    </span></span><span class="line"><span class="cl">aws:s3/bucketWebsiteConfiguration:BucketWebsiteConfiguration
    </span></span><span class="line"><span class="cl">aws:s3/bucketOwnershipControls:BucketOwnershipControls
    </span></span><span class="line"><span class="cl">aws:s3/bucketNotification:BucketNotification
    </span></span>

    … with other stack-related commands like pulumi stack history get events, pulumi stack drift list, pulumi stack schedule new, and pulumi stack webhook new alongside it.

  • Organizational commands like pulumi org member list, pulumi org role list, pulumi org usage get, and pulumi org audit-log export can help you dig into the details when you need to as well.

  • Deployment-related commands like pulumi deployment list, get, log, and cancel let you see what’s running, dive into what happened, and take action without having to leave the terminal.

    <span class="line"><span class="cl">$ pulumi deployment list <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --stack cnunciato/chris.nunciato.org/production <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output table
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">┌──────────────────────────────────────┬───────────┬─────────┬───────────┬──────────────┬─────────────────────────┐
    </span></span><span class="line"><span class="cl"> ID OPERATION VERSION STATUS INITIATED BY MODIFIED
    </span></span><span class="line"><span class="cl">├──────────────────────────────────────┼───────────┼─────────┼───────────┼──────────────┼─────────────────────────┤
    </span></span><span class="line"><span class="cl"> 83e44b8c-643c-4e9f-9f36-0c6a81d9db2e update <span class="m">140</span> running cnunciato 2026-05-17 21:26:37.340
    </span></span><span class="line"><span class="cl"> 52a37cbe-b7fd-4027-8e0f-7b4785ab12e8 update <span class="m">139</span> succeeded cnunciato 2026-05-16 23:36:07.999
    </span></span><span class="line"><span class="cl"> 94e04525-b3a4-42b5-9987-e344018a3324 preview <span class="m">138</span> succeeded cnunciato 2026-05-16 23:29:19.709
    </span></span><span class="line"><span class="cl">└──────────────────────────────────────┴───────────┴─────────┴───────────┴──────────────┴─────────────────────────┘
    </span></span>
  • And when you need to query across managed (and even unmanaged) resources, pulumi insights resource search and get can help you find what you’re looking for quickly:

    <span class="line"><span class="cl">$ pulumi insights resource search <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --query <span class="s1">'type:aws:s3/bucket:Bucket org:cnunciato project:photomap stack:dev'</span> <span class="se">\
    </span></span></span><span class="line"><span class="cl"> --output table
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">┌──────────────────────────────────────────────────────────────────────────┬──────────────────────┬───────┬──────────────────────────┐
    </span></span><span class="line"><span class="cl"> URN TYPE STACK MODIFIED
    </span></span><span class="line"><span class="cl">├──────────────────────────────────────────────────────────────────────────┼──────────────────────┼───────┼──────────────────────────┤
    </span></span><span class="line"><span class="cl"> urn:pulumi:dev::photomap::aws:apigateway:x:API<span class="nv">$aws</span>:s3/bucket:Bucket::api aws:s3/bucket:Bucket dev 2020-10-31T00:39:47.926Z
    </span></span><span class="line"><span class="cl"> urn:pulumi:dev::photomap::aws:s3/bucket:Bucket::images aws:s3/bucket:Bucket dev 2020-10-31T00:39:47.926Z
    </span></span><span class="line"><span class="cl">└──────────────────────────────────────────────────────────────────────────┴──────────────────────┴───────┴──────────────────────────┘
    </span></span><span class="line"><span class="cl">
    </span></span><span class="line"><span class="cl">Showing <span class="m">2</span> of <span class="m">2</span> resources.
    </span></span>

Flags and output formats are consistent across commands (--output table, json), as are the shapes of cross-cutting features like webhooks. If you’ve used pulumi stack webhook, for example, you already know how to use pulumi env webhook and pulumi org webhook, and so on.

Direct access to the Pulumi Cloud API

For any features of Pulumi Cloud that don’t yet have their own commands, you’ve also got pulumi api. It’s a gh api-inspired command designed to give you direct access to the full REST API, without having to manage separate access tokens, auth settings, or request/response payloads. Everything is handled for you through your authenticated pulumi CLI.

There’s even pulumi api list, which enumerates every single endpoint that’s exposed:

<span class="line"><span class="cl">$ pulumi api list
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">┌───────────────┬────────┬───────────────────────────────────────┬──────────────────────────────┐
</span></span><span class="line"><span class="cl"> TAG METHOD PATH SUMMARY
</span></span><span class="line"><span class="cl">├───────────────┼────────┼───────────────────────────────────────┼──────────────────────────────┤
</span></span><span class="line"><span class="cl"> AccessTokens GET /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens ListOrgTokens
</span></span><span class="line"><span class="cl"> AccessTokens POST /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens CreateOrgToken
</span></span><span class="line"><span class="cl"> AccessTokens DELETE /api/orgs/<span class="o">{</span>orgName<span class="o">}</span>/tokens/<span class="o">{</span>tokenId<span class="o">}</span> DeleteOrgToken
</span></span><span class="line"><span class="cl"> AccessTokens GET /api/user/tokens ListPersonalTokens
</span></span><span class="line"><span class="cl"> AccessTokens POST /api/user/tokens CreatePersonalToken
</span></span><span class="line"><span class="cl"> AccessTokens DELETE /api/user/tokens/<span class="o">{</span>tokenId<span class="o">}</span> DeletePersonalToken
</span></span><span class="line"><span class="cl">...
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="m">537</span> operations. Pass --output<span class="o">=</span>json <span class="k">for</span> a stable, scriptable contract.
</span></span>

To get the details about a particular API, use pulumi api describe:

<span class="line"><span class="cl">$ pulumi api describe <span class="s1">'DELETE /api/user/tokens/{tokenId}'</span> <span class="c1"># or DeletePersonalToken</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">DELETE /api/user/tokens/<span class="o">{</span>tokenId<span class="o">}</span>
</span></span><span class="line"><span class="cl">Tag: AccessTokens
</span></span><span class="line"><span class="cl">Operation: DeletePersonalToken
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">DeletePersonalToken
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Permanently deletes a personal access token by its identifier. The token is immediately
</span></span><span class="line"><span class="cl">invalidated and can no longer be used <span class="k">for</span> authentication. Returns <span class="m">204</span> on success or <span class="m">404</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> the token does not exist.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">Parameters:
</span></span><span class="line"><span class="cl"> <span class="o">[</span>path<span class="o">]</span> tokenId* <span class="o">(</span>string<span class="o">)</span> — The access token identifier
</span></span>

All requests are made through your authenticated pulumi CLI:

<span class="line"><span class="cl">$ pulumi login
</span></span><span class="line"><span class="cl">Logged in to pulumi.com as cnunciato.
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi whoami
</span></span><span class="line"><span class="cl">cnunciato
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">$ pulumi api /api/user/tokens/2cf15c7d-afad-458f-ace0-fc7ff0512b10 <span class="se">\
</span></span></span><span class="line"><span class="cl"> --method DELETE <span class="o">&&</span> <span class="nb">echo</span> <span class="s2">"Token deleted."</span>
</span></span><span class="line"><span class="cl">Token deleted.
</span></span>

Newly published endpoints are available through pulumi api immediately, so you don’t have to wait for a new CLI release before you can start using them. See the Pulumi Cloud REST API documentation to learn more.

Finding templates in the Pulumi Cloud Registry

Finding out which templates are available to you through your Pulumi organization used to mean having to navigate to the Pulumi Cloud Registry and start searching. The new pulumi template commands make this easier by letting you ask for what’s available right from the shell, either by fetching the full list or filtering with the --name or --search params:

<span class="line"><span class="cl">$ pulumi template list --search <span class="s2">"container typescript"</span> --org cnunciato
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">┌─────────────────────────────────────────────┬────────┬────────────┬────────────┐
</span></span><span class="line"><span class="cl"> Name Source Language Visibility
</span></span><span class="line"><span class="cl">├─────────────────────────────────────────────┼────────┼────────────┼────────────┤
</span></span><span class="line"><span class="cl"> pulumi/templates/container-aws-typescript github typescript public
</span></span><span class="line"><span class="cl"> pulumi/templates/container-azure-typescript github typescript public
</span></span><span class="line"><span class="cl"> pulumi/templates/container-gcp-typescript github typescript public
</span></span><span class="line"><span class="cl">└─────────────────────────────────────────────┴────────┴────────────┴────────────┘
</span></span>

This is especially useful when you’re working with an agent because it helps the agent discover your org’s approved templates without having to name them. Start with a prompt that tells the agent what you want to build, and let the agent find the right template for you.

Agent-friendly Markdown docs for providers and components

Both humans and agents need to be able to understand what’s inside a Pulumi package before they can use it. And while the Registry is an excellent resource for that, it was mainly designed to deliver HTML — a human-friendly format that agents can certainly use, but that’s much more verbose than they actually need.

With pulumi api, agents can fetch the details about a package from the Registry directly and get back those details either in markdown or json, whichever works best, filtering on properties like language where applicable:

<span class="line"><span class="cl">$ pulumi api <span class="s2">"/api/registry/packages/pulumi/pulumi/random/versions/4.19.1"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"name"</span>: <span class="s2">"random"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"publisher"</span>: <span class="s2">"pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"publisherDisplayName"</span>: <span class="s2">"Pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"source"</span>: <span class="s2">"pulumi"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"version"</span>: <span class="s2">"4.19.1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"description"</span>: <span class="s2">"A Pulumi package to safely use randomness in Pulumi programs."</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"repoUrl"</span>: <span class="s2">"https://github.com/pulumi/pulumi-random"</span>,
</span></span><span class="line"><span class="cl"> ...
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>
<span class="line"><span class="cl">$ pulumi api <span class="s2">"/api/registry/packages/pulumi/pulumi/random/versions/4.19.1/docs/random%3Aindex%2FrandomPassword%3ARandomPassword"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> --output markdown
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># RandomPassword</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">resource <span class="sb">`</span>random:index/randomPassword:RandomPassword<span class="sb">`</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1">## Example Usage</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">package main
</span></span><span class="line"><span class="cl">...
</span></span>

Resources are individually addressable using their URL-encoded Pulumi type tokens — e.g., random:index/randomPassword:RandomPassword — and API endpoints are configured to deliver Markdown when agents ask for it:

<span class="line"><span class="cl">$ curl <span class="s2">"https://api.pulumi.com/api/registry/packages/pulumi/pulumi/random/versions/latest/readme?lang=python"</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"> -H <span class="s2">"Accept: text/markdown"</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Installation</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">The Random provider is available as a package in all Pulumi languages:
</span></span><span class="line"><span class="cl">...
</span></span>

Even compared to JSON (which is itself a significant improvement over HTML), Markdown is a much more token-efficient format for agents to work with:

Package

Endpoint

JSON

Markdown

Tokens saved

random

/readme

10.68 KB

6.04 KB

43%

aws

/readme

4.22 KB

2.54 KB

40%

aws

/nav?depth=full

204 KB

170 KB

17%

aws

/docs/{resource token}

15.24 KB

11.28 KB

26%

azure-native

/docs/{resource token}

48.13 KB

30.37 KB

37%

aws

/docs/{function token}

2.40 KB

1.46 KB

39%

Learn more about our Registry endpoints in the REST API docs. (Or just ask your agent!)

New to the CLI: Pulumi Neo

When we launched Pulumi Neo last year, the only way to use it was in the Pulumi Cloud Console. But while there’s a ton you can do with Neo in the browser, if you’re an engineer already living in the terminal, chances are that eventually you’re going to wish you had Neo right in the CLI along with you.

Now you do. Running pulumi neo with or without a prompt launches a Pulumi Cloud-connected session that gives Neo access to your local environment just like any other coding agent. Use it on its own to scaffold a new project, understand an existing codebase, or debug a failing deployment — or pull it into an active session with the coding agent you’re already using. Either way, it stays in the shell you’re already working in.

We’ll cover Neo in the CLI in more detail later this week. In the meantime, here’s a peek:

Smaller changes that add up

A long list of smaller changes also runs through this release:

  • The core loop now speaks JSON end to end, with pulumi up, pulumi destroy, and pulumi import all emitting structured JSON output when called with --output json.

  • Streams now behave the way scripts expect them to, with data on stdout, progress and diagnostics on stderr.

  • Exit codes are more consistent across the board. Every failure mode — auth, resource, policy, missing stack, cancellation, timeout, and others — has its own exit code, so agents can branch on the actual cause instead of having to interpret output. The full table is in the docs.

  • Help text explains why a command exists, not just what it does, and includes at least one concrete example. Examples in --help are one of the most effective ways to improve LLM accuracy on first-try invocations — and it turns out they’re pretty handy for humans, too.

A sneak peek at a new command

Later this week, you’ll get a closer look at pulumi do, a new top-level command that enables direct resource operations like create, read, update, delete, and list across every Pulumi-supported cloud provider and resource, all in one command. A simple example:

<span class="line"><span class="cl">$ pulumi <span class="k">do</span> aws getAvailabilityZones
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="o">{</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"groupNames"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2-zg-1"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"id"</span>: <span class="s2">"us-west-2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"names"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2a"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2b"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2c"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"us-west-2d"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"region"</span>: <span class="s2">"us-west-2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"zoneIds"</span>: <span class="o">[</span>
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az2"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az1"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az3"</span>,
</span></span><span class="line"><span class="cl"> <span class="s2">"usw2-az4"</span>
</span></span><span class="line"><span class="cl"> <span class="o">]</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span>

It might look like that’s calling the AWS CLI, but it’s not — it’s using the same AWS provider function a full Pulumi program would use, only without the program, and invoked directly from the CLI.

More on how it works, and what you can do with it, in the days ahead.

Try it yourself

A lot of what makes a developer tool worth using is in the details, and most of what’s in this release is exactly that, across the whole CLI, with humans and agents in mind.

We’d love for you to grab the latest release and give it a try. Tell us what’s now easy, what’s still hard, and what to fix next on GitHub or in the community Slack. The fastest way the CLI gets better is feedback from the humans and agents who live in it.

Twelve months ago, building an AI agent meant picking a framework, defining your tools, standing up a RAG pipeline, and writing a stack of glue code to wire it all together. That was the default playbook. The post-mortem on six months of work usually went the same way: half the time went into infrastructure that had nothing to do with the agent’s actual job.

That isn’t where the work is anymore. Most of the middle layer is gone. The SDKs ship with the tools, the skills system replaced the upfront tool registry, and longer context windows pushed vector search out of the default slot it held all of last year.

The shape is the same as a lot of infrastructure shifts before it. The hard thing got cheap, the cheap thing got expected, and the question moved up a level.

The old playbook

A 2024 to 2025 agent project looked like this. You picked a framework, usually LangChain, LlamaIndex, or an early version of Pydantic AI. You wrote tool definitions, usually a wrapper around an API the agent would call. You stood up a RAG pipeline: chunk your documents, embed them, pick a vector database, write retrievers, layer reranking on top. Then you wrote the agent loop yourself, including prompt assembly, tool dispatch, retry logic, and observability.

This was the default for good reasons. Foundation models had short context windows. They didn’t ship with file access. They couldn’t run code. If you wanted an agent to do anything useful with your data, you had to bring the data to the model in pre-digested chunks.

The cost wasn’t only setup time. It was infra bills, retries against embedding APIs, and a context strategy that fought the model as the model got better. By mid-2025 the retrieval layer was often the bottleneck on quality. The agent would ask a question, get five plausible-looking chunks, and answer from those instead of the document you actually wanted it to read. Chunking decisions made on a Tuesday in March were still hurting answer quality six months later.

Most teams I talked to in 2025 were tuning their RAG pipeline. Almost nobody enjoyed it.

The shift: three things changed at once

Three changes landed close enough together that they collapsed the middle layer.

Built-in tools. The Claude Agent SDK ships with Read, Write, Edit, Bash, Grep, Glob, WebSearch, and WebFetch out of the box. OpenAI’s Codex SDK is similar in shape, with shell and file tools available to the agent by default. These are the tools every agent project was rebuilding in 2024, often as a side quest to the work the agent was actually meant to do. A Read that handles binary files. A Bash that streams output and respects working directory. A Grep that doesn’t choke on large files. The 80% of agent tooling everyone was paying their team to reimplement is now table stakes.

The consequence is that you can give an agent the ability to do real work with about ten lines of configuration. The flip side is that the differentiator moved up a layer. The value isn’t in having Read. It’s in what the agent does with it.

Anything outside the built-in toolbox plugs in through MCP servers. The registry has grown nearly 8x since early 2025, and every major model vendor now ships first-party support. The picture in 2026 is more layered than that, though. A lot of what used to call for an MCP server is now better served by the agent invoking a CLI through Bash and wrapping the recipe in a skill. Benchmarks put CLI-based tool calls at a fraction of the context cost of equivalent MCP calls, with fewer round-trips and fewer failure modes. MCP still earns its place for protocol-heavy work like browser control, OAuth flows, and streaming services, but it stopped being the automatic answer to “how do I give my agent a new capability.”

Skills replaced tool stuffing. The old way was to register every tool the agent might need at startup, eating context every turn whether the agent used the tool or not. A hundred tools meant a heavy system prompt before the agent had thought about anything. The skills pattern flips that. A skill is a small markdown package with a name and a one-line description. The agent sees the description (around 100 tokens) and only loads the body when it decides the skill is relevant. A hundred skills no longer means a hundred tools’ worth of context tax. Anthropic frames this as progressive disclosure: because the body only loads on demand, the amount of content you can bundle into a single skill is effectively unbounded.

Progressive disclosure isn’t a new idea. What’s new is that the agent harness now treats it as the default loading strategy instead of something you have to engineer.

RAG got demoted. This is the change with the biggest blast radius and the smallest amount of commentary. A year ago, “we need to add RAG” was the reflex answer when somebody asked how an agent would handle a corpus. Today that question splits three ways. If the corpus fits in the context window, put it in. If the agent can grep the filesystem, let it grep. If the corpus is genuinely too large for either, vector search is still right, but you’ll find that’s a smaller set of cases than it used to be. You can see this in the coding agents that already ship today. Cursor, Claude Code, and Devin lean on grep, find, and direct file reads more than vector search. LlamaIndex’s own writing on agentic retrieval is one of the clearer reads on where this is going.

Vector search didn’t get worse. The context around it improved enough that it stopped being the right first move.

Taken together, what got pulled into the SDK is the middle of an agent project: the tools layer, the retrieval layer, and the loop. What’s left for the team is the system prompt, the skills, and the policies around what the agent is allowed to do.

When you still need a framework

The first reaction to a lot of this is to declare that frameworks are over. They aren’t, but the cases where you reach for one have narrowed.

Pydantic AI is still the right choice when you want strong typing, deterministic output schemas, and an evaluation loop that matches how the rest of your Python codebase already thinks. LangGraph is still the right choice when your problem is genuinely a graph of agent states with branching and human approval steps. OpenAI’s Agents SDK is built around explicit handoffs between agents and earns its place when that pattern fits how you want to decompose the work. CrewAI is the fastest path I’ve seen for prototyping a multi-agent system, as long as you can live with its opinions. Any team running production traffic across multiple model providers is going to want a routing layer that the official SDK from any single vendor isn’t going to give them. Anthropic’s own writing on building effective agents lands in the same place: start with the simplest thing, add complexity only when the problem demands it.

The mental model that works for me: start with the SDK, reach for a framework when you outgrow it. “Outgrow” usually means one of four things:

  • Multi-provider routing. You’re running production traffic across more than one model vendor and need a routing layer the official SDKs don’t ship.

  • Multi-agent orchestration. Your problem genuinely decomposes into separate agents with handoffs, branching, or human approval steps.

  • Deterministic typing. You need strong schemas and validation around inputs and outputs, and the rest of your codebase already thinks that way.

  • Production observability. You need eval loops, replay, or tracing beyond what the SDK provides out of the box.

If none of those four are biting, the SDK is probably enough, and adding a framework on top is a layer you’ll regret in six months.

Where this lands for infrastructure work

Two things from the new agent shape map cleanly onto infrastructure work. The first is that “built-in tools plus governed actions” is the model an IaC platform was already running. The SDK assumes the agent has tools that do real work. The platform assumes those tools have policies, audit logs, and short-lived credentials around them. Those assumptions stack.

The second is that a state graph is already structured context. You don’t need to chunk it. You don’t need to embed it. An agent reasoning over a Pulumi stack can grep its way through the program graph the same way it greps a codebase, and the answers are grounded in the same source of truth the rest of your platform uses. I wrote the deeper version in Grounded AI: Why Neo Knows Your Infrastructure. The dark-factory and sprawl posts (The Dark Factory Pattern for Infrastructure and Agent Sprawl Is Here. Your IaC Platform Is the Answer.) are the places to go if you want to push on this further.

Start with the SDK

A year ago, an agent project was 80% glue code and 20% the thing the agent actually did. On most projects today that ratio is flipped. If you’ve been sitting on an agent idea, build it the SDK way first and reach for a framework only when you hit something the SDK genuinely can’t do. Most teams will be surprised how often they don’t.

There’s one agent you don’t have to build at all. Pulumi Neo is the same SDK-first shape applied to the IaC slice: tools that reason directly over your state graph, governed by the controls the rest of your platform already runs on. Save your own SDK time for the agents only you can build.

See how Pulumi Neo works

The original dark factory was Fanuc’s robotics plant in Oshino, Japan, where the lights are off because nobody is on the floor. Robots build robots. Parts move through the line for weeks at a time without a person walking past them.

The same pattern is now showing up in software. Three engineers at StrongDM shipped roughly 32,000 lines of production code without writing or reviewing any of it. Stripe’s “Minions” agent system merges over a thousand pull requests every week. In January, Dan Shapiro of Glowforge published a five-level autonomy ladder that landed cleanly enough to become the shorthand most people now use, and BCG put out a piece calling it the dark software factory.

Almost every public writeup so far is about application code. The harder question is what this looks like for infrastructure.

What a dark factory actually is

Shapiro’s ladder is the cleanest framing I’ve seen. He borrows it from the SAE’s self-driving levels, and it fits surprisingly well:

Level What it is Driving analogy

0 Spicy autocomplete Stick shift; you do everything.

1 Coding intern (boilerplate) Cruise control.

2 Junior developer (interactive pair) One hand on the wheel.

3 AI writes the majority; you review every PR Eyes still on the road.

4 Spec-driven; agent runs unattended for hours; you review later Sleeping at the wheel, you can still wake up.

5 Dark factory; no human review of code before production No steering wheel at all.

Most teams are at level 2 or 3. A few of the more aggressive ones are at 4. Level 5 is the experiment. Most teams won’t get there safely, and probably shouldn’t try to. The interesting design question is what has to be true for level 5 to be safe at all, and that question gets sharper when the thing being shipped is infrastructure.

A dark factory is not a coding harness. A harness is the framework an agent runs inside; the dark factory is the surrounding system that makes a harness’s output mergeable without a human reading the diff. Copilot and Cursor sit at the other end: interactive, the human stays in the loop on every keystroke. The dark factory takes the human out of the per-change loop entirely and puts them at the top, writing the spec and the acceptance criteria.

The wall between generator and validator

Strip the dark factory down to its layers and there are four of them.

flowchart LR A[Inputs Humans] --> B[Code Generation Autonomous] B --> C[Validation Autonomous, isolated] C -->|pass| D[Merge & Deploy Autonomous + existing CI/CD] C -->|fail| B A -.->|holdout scenarios generator never sees these| C

The single most important rule is that Code Generation and Validation must be completely isolated. The generator never sees the acceptance scenarios. A separate evaluator does, and it judges the generator’s output against scenarios the generator could not have memorized.

The reason is sycophancy. LLMs are too eager to agree with their own prior turns and too willing to declare victory on something they just produced. Without isolation, the same model that wrote the change is the one telling you it’s fine. The practical concern is direct: a test stored in the same codebase as the implementation will get lazily rewritten to match the code, not the other way around. It isn’t malice; it’s the agent doing exactly what it was asked, badly. The wall is what stops that.

StrongDM’s pattern for this is holdout scenarios: plain-English BDD acceptance tests stored where the generator cannot reach them. Each scenario runs three times against an ephemeral deployment, two of three must pass, and the overall pass rate has to clear 90% before the change moves forward. If the generator fails, it gets a one-line failure message (“SQL Injection Detection failed: endpoint returned 500”), not the scenario text. It cannot game the test.

Without that wall, you don’t have a quality gate. You have theater.

Why infrastructure is the harder version

Application code factories can lean on tests, linters, and type checkers. Infrastructure adds blast radius, drift, secrets, irreversible actions, and multi-region state. A code dark factory shipping a broken UI causes a bad user experience. An infrastructure dark factory shipping a broken IAM policy ends in a postmortem.

A few things make this manageable on Pulumi specifically.

The orchestrator does not need to be invented. The Pulumi Automation API is the engine as an SDK in Python, TypeScript, Go, .NET, Java, or YAML, which is the same surface a dark factory orchestrator runs on. Credentials don’t have to be long-lived: ESC and OIDC issue short-lived ones per run, so the agent never sees a static secret.

Policy doesn’t have to be probabilistic: CrossGuard enforces deterministic rules at preview time. Execution doesn’t have to happen on a laptop: Pulumi Cloud Deployments runs pulumi up inside a governed runner with audit logs and approval rules already wired. And the reasoning layer doesn’t have to start from scratch: Pulumi Neo is grounded in your state graph and ships with three modes (Auto, Balanced, Review) that line up cleanly with Shapiro’s levels 5, 4, and 3.

That doesn’t make Pulumi a dark factory by itself. It means the parts that an application-code factory has to build from scratch are pieces a Pulumi shop already has: a credential broker, a policy engine, a governed runner, a state-aware reasoning layer, an audit trail.

And one more piece nobody talks about: pulumi preview produces a clean, deterministic validation artifact, and CrossGuard evaluates that artifact without ever seeing the conversation that produced the program. That’s the same context-free judgment the holdout pattern depends on, applied at the policy layer instead of the acceptance-test layer. For infrastructure, half the wall is already built.

The interesting work is the part that nobody ships in a box.

The interesting work

What no platform ships for you is the wall: the holdout scenarios for infrastructure, the isolated evaluator that runs them, and the agreement on which stacks are even allowed to run lights-out.

The happy-path orchestrator is small. It pulls a spec, runs preview, hands the preview to an isolated evaluator (with its own credentials and its own access to the cloud, no access to the generator’s prompt or output), and branches on the verdict. Auto mode runs up immediately. Balanced mode submits a deployment that requires approval. Review mode opens a PR for a human. Every branch records a stack version traceable in the audit log. Retries, observability, secret rotation, and the rest of the production-grade plumbing add up to real code, but the shape is small.

The wall is the part that takes a week to get right. You write five plain-English scenarios for one stack (“after pulumi up, the bucket is private, has SSE-KMS, lives in eu-west-1, and is tagged owner=team-x”) and a janky evaluator that runs preview and up against an ephemeral copy, queries the cloud, and asks a separate model whether the resulting state satisfies the scenario. Triple-run, 90% pass gate. Then you watch it for a few weeks before you let anything auto-apply.

A four-phase rollout

This is the same path the application-code factories walked, with the gates tightened.

Phase 1: better context, this afternoon

Write an AGENTS.md for your most active stack repo. Pulumi Neo reads it natively, as do most coding agents. While you’re there, look at your CrossGuard rules and rewrite the error messages as instructions. Not “S3 bucket has no encryption” but “S3 bucket has no encryption. Set serverSideEncryptionConfiguration with SSE-KMS to fix.” That single change is the difference between an agent flailing and an agent fixing the policy violation on the first try. Wire pulumi preview as a build-before-push gate so PRs don’t show up just to fail CI.

Phase 2: spec-driven with holdouts, this week

Pick one stack with a small blast radius. A review-stack lifecycle is ideal. Write five plain-English holdout scenarios for it and the janky evaluator above. Humans still approve every PR. Don’t auto-merge yet. You’re earning the data, not declaring trust.

Phase 3: take the human out of the merge

Only after the three measurable gates hold over twenty PRs (scenario pass rate above 90%, false positive rate below 5%, human override rate below 10%) flip auto-apply on for that one stack. Add a weekly drift sweep that goes through the same scenario gate as everything else.

Phase 4: lights out

Expand the auto-apply flag to every stack with strong scenario numbers. Wire your issue tracker so tickets tagged infra:fix flow through the pipeline. Mock the cloud APIs that are slow or flaky enough to make scenario evaluation expensive. At this point the orchestrator is configuration, not architecture.

What could go wrong

None of these have clean fixes. The mitigations below reduce risk; they don’t eliminate it. Any team running level 5 should expect to eat one or two of these in the first year.

The validator approves a bad change. This is the obvious one. The standard mitigation is layered: triple-run each scenario with a 2-of-3 threshold, a 90% gate over the run set, a human audit of the first fifty auto-applied changes, and your existing policies still run after the validator says yes.

The agent gets a destroy permission it shouldn’t have. There’s a class of operations that should not sit in the autonomous loop yet: dropping a database, deleting a hosted zone, rotating a root key, anything that crosses a regulated data boundary. Scope what each agent identity can do at the credential layer, require human approval for anything destructive, and start every stack at Review mode. Tag changes, security-group adjustments, and instance resizes can run autonomously today. Release-branch cuts and config promotions can probably run by next quarter. The destructive class earns its way in over months.

You need all three of those layers. Approvals without policy means anything a human approves in a hurry ships. Policy without approvals means a sufficiently clever spec eventually finds the gap. Both without a human kill switch means an incident at 3 a.m. has nobody to escalate to.

Costs blow up. Cap retries at three per spec, alert on token spend per run, and remember that StrongDM reported roughly $1,000 per day per engineer-equivalent. That’s still cheaper than a salary, but only if you put the cap in place before you find out.

Where to start

Most of what a dark factory needs already exists in any reasonably mature platform. Whatever you have for state, policy, credentials, audit, and a deployment runner is the substrate. The interesting work is not building the factory. It’s the wall: the holdout scenarios that make the gap between “the model says it’s fine” and “the system is actually fine” mean something.

For most teams, Phase 1 alone is the win. Full Level 5 may stay out of reach indefinitely, and that’s fine. The path itself forces useful work: clearer specs, named bottlenecks, the deterministic gates humans had been running in their heads.

Write an AGENTS.md and five holdout scenarios for one stack this week. That’s enough to get a real signal on whether the pattern fits your team. The rest of the path is the same problem the application-code factories have already worked through, with the gates set tighter.

Custom VCS is a new Pulumi Cloud integration that connects any Git or Mercurial version control system to Pulumi Deployments using webhooks and centrally managed credentials. Pulumi Cloud already has native integrations with GitHub, GitLab, and Azure DevOps, but if your team uses a self-hosted or third-party VCS, you’ve been limited to manually configuring credentials per stack with no webhook-driven automation. Custom VCS closes that gap.

The problem

Many teams run self-hosted or third-party Git servers that Pulumi Cloud doesn’t have a native integration for, and some teams still use Mercurial. Until now, their only option was the raw git source approach: embedding credentials directly in each stack’s deployment settings, with no way to trigger deployments automatically on push, and no support for Mercurial at all.

This meant:

  • No push-to-deploy: Every deployment had to be triggered manually or through a separate CI pipeline.

  • Scattered credentials: Each stack configured its own credentials independently, with no centralized management.

  • No org-level integration: There was no shared configuration that multiple stacks could reference.

How Custom VCS works

Custom VCS integrations introduce an org-level integration type that works with any Git or Mercurial server. The setup has three parts:

Credentials through ESC: Instead of OAuth flows, you store your VCS credentials (a personal access token, SSH key, or username/password) in a Pulumi ESC environment. The same credential structure works for both Git and Mercurial. The integration references this environment by name and resolves credentials at deployment time. Multiple stacks can share the same credentials without duplicating secrets.

Manual repository registration: You add repositories to the integration by name. Pulumi joins the repository name with the integration’s base URL to form clone URLs. There’s no auto-discovery, so you control exactly which repositories are available.

Webhook-driven deployments: Pulumi provides a webhook endpoint and an HMAC shared secret. You configure your VCS server to POST a JSON payload on push events, and Pulumi automatically triggers deployments for matching stacks. The webhook supports branch filtering and optional path filtering.

What’s supported

Custom VCS focuses on the deployment automation use case. Here’s how it compares to native integrations:

Capability Native integrations Custom VCS

Push-to-deploy Yes Yes

Path filtering Yes Yes

PR/MR previews Yes No

Commit status checks Yes No

PR comments Yes No

Review stacks Yes No

Features like PR comments, commit statuses, and review stacks require deep API integration with each VCS platform, so they aren’t available with Custom VCS. If your VCS provider is GitHub, GitLab, or Azure DevOps, we recommend using the native integration for the full feature set.

Neo support

Neo, Pulumi’s AI assistant, works with Custom VCS integrations for repository operations that don’t depend on VCS-specific APIs. Neo can clone and push to Git and Mercurial repositories registered with your Custom VCS integration using the credentials from the integration’s ESC environment. Neo cannot open pull requests or create new repositories on Custom VCS servers at this time. Those operations require APIs unique to each VCS platform and are only available through native integrations.

Get started

To set up a Custom VCS integration:

  • Navigate to Management > Version control in Pulumi Cloud.

  • Select Add integration and choose Custom VCS.

  • Provide a name, base URL, and ESC environment containing your credentials.

  • Add your repositories.

  • Configure your VCS server to send webhooks to the provided URL.

For the full setup guide including webhook payload format, HMAC signing, and credential configuration, see the Custom VCS documentation.

Learn more

Neo already helps your team manage Pulumi infrastructure, but no infrastructure team works inside Pulumi alone. Pages come from PagerDuty, telemetry from Datadog or Honeycomb, follow-ups from Linear or Jira. Most of the job is shuttling context between those tools.

Today we’re launching the Integration Catalog for Pulumi Neo: one place to connect Neo to the tools your team already uses, so your agent has the context it needs to help.

Six integrations in the launch catalog

Neo ships with six integrations at launch, each exposed to the agent through the Model Context Protocol (MCP):

  • Atlassian — Jira issues, Confluence pages, project context

  • Datadog — metrics, logs, monitors

  • Honeycomb — traces and observability queries

  • Linear — issue tracking and project workflows

  • PagerDuty — incidents, on-call schedules, escalations

  • Supabase — database management and edge functions

Each integration is a remote MCP server. Neo calls the integration through a structured tool protocol and only sees the tools the vendor chooses to expose.

Neo in action: one task, many systems

A latency spike showed up in Datadog yesterday afternoon, and you want to know whether your deploy caused it.

You: Neo, our payments stack saw elevated p95 starting around 3pm yesterday. Did our deploy cause it? Check Datadog and Honeycomb.

Neo lines up the Pulumi update history for the payments stack against the latency and error-rate metrics in Datadog around the same window, then surfaces the top slow traces in Honeycomb to confirm the suspect change.

You: Open a Linear ticket on the platform team with the findings and link the offending update.

Neo opens the Linear issue with the summary, the Pulumi update URL, and a pointer to the Datadog dashboard, all without you leaving the chat or copy-pasting context between tabs.

How the Integration Catalog works

Admins configure credentials once. In your org’s Neo settings, open the Integration Catalog, pick an integration, and paste in an API token or service-account key.

Your team gets the capability immediately. No per-user setup, no extra OAuth flow for each developer, no asking platform to share a token in 1Password.

Credentials stay encrypted at rest. When a task runs, the service decrypts the configured credentials just long enough to hand them to the agent runtime as MCP server auth.

What’s coming next: CLI, OAuth, and access controls

This is the first cut. Here’s what we’re working on:

  • CLI integrations — give Neo access to command-line tools like kubectl, aws, gcloud, and az.

  • OAuth integrations — for providers whose hosted MCP servers only speak OAuth (Notion, Sentry, Vercel), and for orgs that want per-user credentials.

  • Per-integration access controls — team-scoped policies so admins can say “only the platform team can let Neo touch PagerDuty.”

Try it out

The Integration Catalog is available now for Neo-enabled organizations. Open your org’s Neo settings, head to the Integrations tab, and connect the first tool you reach for when something breaks. The Neo integrations docs walk through the setup for each one.

As always, we’d love to hear what’s missing. File a feature request in pulumi-cloud-requests with the integration you want next. We’re prioritizing based on what teams actually use.

Happy building.

Policy authors who need external credentials or environment-specific configuration have had to hardcode values or manage them outside of Pulumi. Policy packs can now reference Pulumi ESC environments, bringing centralized secrets and configuration management to your policies.

The problem

Pulumi policy packs let you enforce rules across your infrastructure, but some policies need more than just the resource inputs they evaluate. A policy that validates resources against an external compliance API needs an API token. A cost-enforcement policy might need different spending thresholds for development and production environments. An access-control policy might need to reference an internal service registry.

Until now, these values had to be hardcoded in your policy group configuration or managed through a separate process entirely. This created several problems:

  • Security risk: Credentials stored in plain text in policy group config

  • Operational burden: Updating a credential meant touching every policy group that used it

  • No environment separation: The same values applied everywhere, with no way to vary configuration across environments

What’s new

Policy packs can now reference ESC environments, just like stacks already do. When you attach an ESC environment to a policy pack in a policy group, the values from that environment are available to your policies at runtime — whether you’re running preventative or audit policies.

This means your policy packs can use ESC for:

  • Secrets: API tokens, service credentials, and other sensitive values managed through ESC’s secrets management, including dynamic credentials from providers like AWS, Azure, and GCP

  • Configuration: Environment-specific thresholds, allowed regions, service allowlists, and other policy parameters that vary across environments

How it works

You configure ESC environment references on a policy pack within a policy group. At runtime, the values from those environments are resolved and made available to your policies through the policy pack’s configuration.

Here’s an example ESC environment that provides configuration to a compliance policy pack:

values:
 compliance:
 apiToken:
 fn::secret: xxxxxxxxxxxxxxxx
 costThreshold: 5000

 policyConfig:
 cost-compliance:
 maxMonthlyCost: ${compliance.costThreshold}
 apiEndpoint: https://compliance.example.com
 apiToken: ${compliance.apiToken}

The policyConfig property works just like pulumiConfig does for stacks. Values nested under each policy name are made available as configuration to that policy at runtime. Secrets remain encrypted and are only decrypted when the environment is resolved.

You can also use the environmentVariables property to inject values as environment variables into the policy runtime, following the same pattern as stack environment variables.

Example: compliance API validation

Consider a policy that validates every new resource against an external compliance API before it can be provisioned. The API requires an authentication token and returns whether the resource configuration meets your organization’s compliance standards.

Before, the API token lived in the policy group configuration in plain text. Rotating the token meant updating every policy group. There was no audit trail for who accessed the credential, and no way to use different API endpoints for staging and production compliance checks.

After, the API token lives in an ESC environment. You get:

  • Centralized rotation: Update the token in one place and every policy group that references the environment picks up the change

  • Access controls: ESC’s role-based access controls govern who can view or modify the credential

  • Audit trail: Every access to the environment is logged

  • Environment separation: Use different ESC environments for different policy groups, so staging policies validate against a staging compliance endpoint while production policies use the production endpoint

Get started

To start using ESC environments with your policy packs:

  • Create an ESC environment with your policy configuration and secrets

  • Attach the environment to a policy pack in your policy group through the Pulumi Cloud console

  • Update your policies to read from the configuration values provided by the environment

To learn more:

Somewhere in your company right now, a developer is building an AI agent. Maybe it’s a release agent that cuts tags when tests pass. Maybe it’s a cost agent that shuts down idle EC2 overnight. It’s running, it’s in production, and there’s a decent chance the platform team doesn’t know it exists.

This isn’t a thought experiment. OutSystems just surveyed 1,900 IT leaders and the numbers are rough: 96% of enterprises run AI agents in production today, 94% say the sprawl is becoming a real security problem, and only 12% have any central way to manage it. Twelve percent. You can read the full report here.

The real question is where those agents run. Inside the platform you’ve already built, or somewhere off to the side where nobody on the platform team can see them.

The new platform tension

Platform teams have always had two jobs that pull in opposite directions. Let developers ship without waiting on a ticket. Keep the infrastructure coherent while they do. Golden paths, review stacks, a catalog of components that don’t fight each other.

Agents break the second half of that deal.

A developer with a sharp prompt can spin up an SRE agent that watches a queue, a release agent that cuts tags when the test suite goes green, or a cost agent that kills idle infra at 2 a.m. That’s useful. It’s also running on your production cloud account, using credentials you never provisioned, writing to systems you never approved, and the only audit trail is whatever the developer remembered to log. The Salesforce 2026 Connectivity Benchmark pegs the average enterprise at twelve agents today, projected to grow 67% over the next two years. Most teams aren’t ready for one, let alone twenty.

This is the same shape as every sprawl problem before it. I wrote about the last one in How Secrets Sprawl Is Slowing You Down, and the pattern keeps repeating. When something useful gets cheap, it proliferates. When it proliferates without structure, it becomes a liability.

The clock is also ticking on the compliance side. The EU AI Act’s high-risk obligations kick in on 2 August 2026. Colorado’s AI Act goes live on 30 June 2026 after last year’s delay. A folder of unreviewed agent scripts isn’t going to hold up against either of those.

Three ways to respond (only one of them works)

There are roughly three paths from here.

Do nothing. Accept the sprawl and hope nothing catches fire. This is the default, and it’s also how you end up explaining to an auditor why some finance agent moved data between three systems last Thursday and nobody remembers which prompt triggered it.

Mandate centralization. Tell developers every agent has to be registered and approved before it runs. This sounds responsible on a slide, and it falls apart inside a sprint. Developers route around friction. If the official path takes a week and the unofficial path takes an afternoon, the unofficial path wins, and you’ve just pushed the sprawl underground where you can’t see it anymore.

Make the platform the obvious path. Build the thing developers actually want to use. A place where an agent inherits the guardrails, credentials, policies, and audit trail by default, because that’s what’s on offer. Adoption becomes a side effect of shipping something good.

Option three is the only one that scales. It’s also the one where most platform teams look at their existing stack and assume they need to build a pile of new scaffolding. I don’t think they do, and the rest of this post is why.

The seven things an AI agent needs from your platform

An agent needs seven concrete things from the platform it runs on. Each one maps to a Pulumi primitive you already own.

1. A trustworthy context lake

Agents are only as good as the context they can reason over. Drop a generic LLM into your cloud account and you’ll get plausible-sounding nonsense, because the model has never seen your environment. What you actually need is a grounded source of truth: what resources exist, how they relate, which stack owns what, which version is running where.

Pulumi state is already that. Your program graph, your stack outputs, your resource metadata, all of it adds up to a structured record of what you’ve actually deployed. Pulumi Neo reasons directly over that graph, which is why it can tell you why a deployment drifted instead of guessing. I wrote the long version of that argument there. Short version: you already have the context lake. Point agents at it.

2. Pre-cleared integrations

An agent that needs to touch five systems shouldn’t need five separate credential dances. That’s where credential sprawl starts. Every agent gets a long-lived key, every key ends up in somebody’s .env, and every rotation turns into an incident.

The Pulumi surface here is the 200+ providers plus Pulumi ESC handling dynamic credentials through OIDC. An agent doesn’t ask for an AWS access key. It asks ESC for a short-lived, scoped token bound to the environment it’s allowed to operate in, and the token expires when the task ends. No static keys, no rotation pain, no awkward postmortem about how something got committed to GitHub. The ESC patterns I walked through in the Claude skills post work just as well for an autonomous agent as they do for a human developer, which is really the whole point.

3. Governed actions

There’s a real difference between “an agent can see your infrastructure” and “an agent can change your infrastructure.” The second one is where you actually need structure. Pulumi Deployments gives you that structure: defined workflows, controlled triggers, running inside your Pulumi Cloud boundary instead of whatever environment the developer happened to spin up. The Automation API lets you build higher-order orchestration on the same primitives your developers already use.

The framing I keep coming back to goes like this. An agent shouldn’t call pulumi up directly. It should submit an action to a governed pipeline that runs pulumi up on its behalf, inside an environment you control, with a log trail and the guardrails already in place. Same effect, very different threat model.

4. Deterministic policy

Real governance lives outside the prompt. “Please don’t delete production” is a wish written into a system prompt, not an enforced control. And when an agent overrides your intent to do what it thought you meant, it’s behaving exactly the way the technology was designed to behave.

Pulumi Policies is the answer the IaC community landed on years ago: policy as code, written in a real programming language, evaluated deterministically at preview and update time. Disallow production RDS deletions. Require encryption at rest. Block S3 buckets with public ACLs. An agent running through Pulumi hits those gates whether it “wants” to or not, because the gates live in the pipeline and not in the prompt. This is the pillar most teams underweight, and it’s the first one most auditors ask about.

5. An audit trail

When something goes wrong at 3 a.m. (and with enough agents running, something will), you need answers fast. What changed, who changed it, and why. Not just “which agent,” but which version of which agent, triggered by what event, authorized by which policy, touching which resources.

Pulumi Cloud’s activity log, the stack update history, and ESC audit logs already capture all of that. Every update is versioned. Every secret access is logged. Every policy evaluation is recorded. When an agent submits a change through your Pulumi pipeline, it inherits that audit surface for free. The alternative is reconstructing an incident from a mix of Slack messages, container logs, and developer memory, which is roughly the state most teams without a platform are in today.

6. A review process

Not every agent action should wait for a human. But agents do need a promotion path, the same way new platform components do. Experimental, then reviewed, then trusted, then autonomous. That’s exactly what pulumi preview, review stacks, and Deployments PR workflows already model for human contributors. An agent that wants to make a change should have to submit it the same way a junior engineer would. As a diff, with a plan, against a preview environment, until it earns the trust to skip steps.

This connects back to the pattern I laid out in Golden Paths: Infrastructure Components and Templates. Golden paths were never only for humans. They’re just paths, and agents can walk them too.

7. Human-in-the-loop approval

The last pillar is the one that keeps the other six honest. Some decisions shouldn’t be automated, full stop. Production rollbacks outside business hours. Destructive changes above a certain blast-radius threshold. Anything that touches a regulated data boundary. For those cases, you want a forced human checkpoint that the agent can’t route around.

Pulumi Deployments approvals already play that role for human changes. Pulumi Neo’s review steps add the AI-aware version: a structured plan, a diff, a named approver, and a record of what they decided and why. I walked through what this looks like in practice in Self-Verifying AI Agents. Short version: an agent that proposes is much safer than an agent that commits.

Why IaC is the natural substrate for this

Step back from the seven pillars and look at what they have in common. Context, integrations, governed actions, deterministic policy, audit, review, approval. None of those are new problems that AI agents invented. They’re the problems infrastructure-as-code has been quietly solving for a decade, for human developers.

Every meaningful agent action ends up being a change, whether that’s to infrastructure, configuration, secrets, or state. IaC is the one layer in your stack that already treats change as the unit of work. Plan, preview, apply, record. If you want governance for agents and you don’t want to build it twice, the most efficient move is to route agent changes through the same substrate your humans already use.

I made the same point from a different angle in Token Efficiency vs Cognitive Efficiency: Choosing IaC for AI Agents. An IaC platform that models your world as a graph of typed resources is a much better reasoning surface for an agent than a stack of YAML or a bash script somebody wrote on a Friday. The structure is what makes it work.

What this means for the platform engineer

There’s a narrative floating around that AI is going to make platform engineers less relevant. I haven’t seen it hold up against an actual production environment. Every stat I’ve looked at points the other way. Gartner expects 70% of enterprises to deploy agentic AI as part of IT infrastructure and operations by 2029, up from less than 5% in 2025. LangChain’s State of Agent Engineering report already has 57% of teams running agents in production today. And Gartner projects that 80% of large software engineering orgs will have a platform team by end of 2026, up from 45% in 2022. More agents means more changes, more changes means more blast radius, and more blast radius means more need for the thing platform teams are uniquely equipped to provide.

Your classic responsibilities haven’t gone anywhere either. Golden paths, service catalogs, CI/CD, on-call rotations, all of that is still yours. Agents are an additional layer that needs the same discipline. The upside is that if your platform already runs on a mature IaC surface, you’re extending a muscle you’ve been building for years instead of growing a new one.

The developer-facing side matters too. A developer building an agent needs to know what’s available to them, needs templates that work on the first try, and needs to see what teammates have already built so they don’t start from a blank page. That’s the territory the Claude skills post and IDP Strategy: Self-Service Infrastructure That Balances Autonomy With Control cover. That’s the experience layer that makes developers actually choose your platform instead of routing around it. You need both sides working at once. The governance your security team cares about, and the experience your developers will actually reach for.

Close the window

The agents your developers are shipping this week are going to outlive the experiment that started them. Some of them will become critical. At least one will cause an incident. At least one will eventually show up in an audit. All of them are going to be easier to govern if they were built on your platform from day one than if you try to wrap policy around them later.

If you want the longer view on where this is going, AI Predictions for 2026: A DevOps Engineer’s Guide is the companion piece. If you want the developer-facing version of the grounding argument, Grounded AI is what to read next.

Either way, here’s where I land. The substrate for agent governance is already running in your stack. You’ve been pointing it at human changes for years. Now point it at the agents too.

See how Pulumi Neo governs agent actions

Pulumi Cloud now supports Bitbucket Cloud as a first-class VCS integration, joining GitHub, GitLab, and Azure DevOps. Connect your Bitbucket workspace to deploy infrastructure on every push, preview changes on pull requests, spin up ephemeral review stacks, and get AI-powered change summaries — all without an external CI/CD pipeline.

Deploy infrastructure from Bitbucket

Connect a Bitbucket repository to a stack and infrastructure deploys automatically when you push to your configured branch. Configure path filters so only relevant file changes trigger deployments, and manage environment variables and secrets directly in Pulumi Cloud. No external CI/CD pipeline required.

Every pull request gets an infrastructure preview showing exactly what will change before merging. Neo posts AI-generated summaries explaining what the changes mean in plain language, so reviewers can understand the impact without reading resource diffs.

Two ways to connect

The integration supports two authentication methods depending on your Bitbucket plan:

  • Personal OAuth works with every workspace, including free plans. Authorize through the standard OAuth flow and you’re connected.

  • Workspace tokens are available for Premium workspaces. Generate a token with the required scopes (repository:admin, repository:write, pullrequest:write, webhook) and paste it into Pulumi Cloud for a service-account-style connection that isn’t tied to an individual user.

Both methods register webhooks automatically — no manual configuration required.

Scaffold new projects from your repositories

The new project wizard discovers your Bitbucket workspace, repositories, and branches so you can scaffold and deploy a new stack without leaving Pulumi Cloud. Create a new repository directly from the wizard or select an existing one and configure VCS-backed deployments in a few clicks.

Getting started

  • An org admin configures the integration under Management > Version control.

  • Authorize with Bitbucket using personal OAuth or a workspace token.

  • Deploy infrastructure with first-class workflows.

For full setup details, see the Bitbucket integration docs.

Connect your Bitbucket workspace **

The Pulumi Cloud REST API reference is now generated directly from the live OpenAPI spec at build time. Every endpoint, parameter, request body, and response schema you see on the page comes from the same spec the API itself publishes. The docs now stay in sync with the API automatically!

Why this matters

The previous REST API reference was a set of handwritten pages. That meant every new endpoint, renamed parameter, or revised response shape needed a matching docs PR, and in practice the pages drifted. Small inconsistencies added up: missing parameters, outdated request shapes, schemas that no longer matched what the API returned. We wanted a durable fix that keeps the docs in sync as the API grows.

Generating the reference from the OpenAPI spec closes that gap. When the API ships a change, the docs pick it up automatically the next time our docs are built.

What’s new

The reference at /docs/reference/cloud-rest-api/ now includes:

  • Find what you need faster

Endpoints are grouped by product area — Stacks, Deployments, Environments, Organizations, Registry, Insights, AI, Workflows, and more — so you can jump straight to the part of the API you’re working with.

  • Complete request and response details

Every endpoint documents its parameters, request body, and the exact shape of what it returns, so you know what to send and what to expect back without guessing.

  • One-click navigation between related types

When a response references another object, the type name is a link. Click through to drill into its full definition if desired instead of scrolling a lengthy API reference page.

What this unlocks for agents

Keeping the reference in sync with the spec isn’t just a human convenience. It changes what’s reliable for AI agents that read the docs and call the API on your behalf. An agent reading a handwritten reference might see a parameter that was renamed six months ago, or miss a field the API now returns, and the call fails silently or in ways that are hard to debug. When the reference is generated from the spec, the agent is working from what the API actually accepts today.

Say you’re onboarding a new team and need to stand up their access in Pulumi Cloud. Point an agent at the REST API reference and ask it to create an sre-oncall team, add four members, and grant admin on three stacks. The agent walks the teams, memberships, and stack-permissions endpoints, builds the right sequence of calls, and executes.

The same pattern holds for bulk audits and cleanup. Ask an agent to find every stack in your org with no recent updates and tag them stale, and it can paginate correctly because the response schema matches reality. While workflows like these were technically possible before, they’re much more reliable now.

Same URL, existing links keep working

The generated docs live at the same URL as the previous reference: /docs/reference/cloud-rest-api/. Bookmarks, blog links, and inbound search traffic still land on the right page. Redirects are in place for any API reference docs page that has been tweaked, renamed, or moved.

Try it out

Start at the new REST API reference and browse by category. Each page links through to the request and response object schemas it uses.

If you spot anything that looks wrong, the most likely culprit is the OpenAPI spec itself — file an issue in pulumi/docs and we’ll trace it back to the source. For tag intros and structural improvements, PRs to pulumi/docs are welcome. Questions and feedback are always welcome in the Pulumi Community Slack.

Explore the REST API

Pulumi Insights account scanning now supports every AWS partition. If your workloads run in GovCloud, China, the European Sovereign Cloud, or one of the ISO intelligence-community clouds, you can get the same resource discovery, cross-account search, and AI-assisted insights that commercial accounts already have.

Supported partitions

  • AWS Standard (Commercial)

  • AWS GovCloud (US)

  • AWS ISO (US)

  • AWS ISOB (US)

  • AWS ISOF (US)

  • AWS ISOE (Europe)

  • AWS European Sovereign Cloud

  • AWS China

You can also exclude specific regions from discovery — useful when regions are disabled by SCPs or fall outside an audit’s scope.

Discovery stays inside the partition

Credentials are exchanged against the partition’s STS endpoint, and every scanner API call targets that partition’s regional endpoints. Discovery traffic doesn’t cross the boundary.

Set it up

In the Pulumi Cloud console:

  • Go to Accounts → Create account.

  • Select AWS as the provider.

  • Under Add your configuration, pick the target partition.

  • Supply credentials via a Pulumi ESC environment. The OIDC trust policy uses the partition-appropriate ARN prefix (arn:aws-us-gov:, arn:aws-cn:, etc.).

For IAM and ESC setup, see the Insights accounts docs. Log in to Pulumi Cloud to get started.

Last Checked
1d ago
Latest
May 28, 2026
Tracking since Dec 16, 2025