Back to Issue 1
    The Operations Room Issue 1

    Agentic AI Gets Metered: Vertex AI Agent Engine Billing Goes Live

    January 12, 2026
    Agentic AI Gets Metered: Vertex AI Agent Engine Billing Goes Live
    [ AI Integration, Deployment & Production Operations ]

    On January 28, 2026, Google Cloud will begin billing for three core components of Vertex AI Agent Engine: Sessions, Memory Bank, and Code Execution. This change makes agent state, persistence, and sandboxed execution first-class, metered resources rather than implicitly bundled conveniences.

    Vertex AI Agent Engine, formerly known as the Reasoning Engine, has been generally available since 2025, with runtime compute billed based on vCPU and memory usage. But key elements of agent behavior, including session history, long-term memory, and sandboxed code execution, operated without explicit pricing during preview and early GA phases.

    In December 2025, Google updated both the Vertex AI pricing page and Agent Builder release notes to confirm that these components would become billable starting January 28, 2026. With SKUs and pricing units now published, the platform moves from a partially bundled cost model to one where agent state and behavior are directly metered.

    How The Mechanism Works

    Billing for Vertex AI Agent Engine splits across compute execution and agent state persistence.

    Runtime compute is billed using standard Google Cloud units. Agent Engine runtime consumes vCPU hours and GiB-hours of RAM, metered per second with idle time excluded. Each project receives a monthly free tier of 50 vCPU hours and 100 GiB-hours of RAM, after which usage is charged at published rates.

    Sessions are billed based on stored session events that contain content. Sessions are not billed by duration, but by the number of content-bearing events retained. Billable events include user messages, model responses, function calls, and function responses. System control events, such as checkpoints, are explicitly excluded. Pricing is expressed as a per-event storage model, illustrated using per-1,000-event examples, rather than compute time.

    Memory Bank is billed based on the number of memories stored and returned. Unlike session events, which capture raw conversational turns, Memory Bank persists distilled, long-term information extracted from sessions. Configuration options determine what content is considered meaningful enough to store. Each stored or retrieved memory contributes to billable usage.

    Code Execution allows agents to run code in an isolated sandbox. This sandbox is metered similarly to runtime compute, using per-second vCPU and RAM consumption, with no charges for idle time. Code Execution launched in preview in 2025 and begins billing alongside Sessions and Memory Bank in January 2026.

    What This Looks Like In Practice

    Consider a customer service agent handling 10,000 conversations per month. Each conversation averages 12 events: a greeting, three customer messages, three agent responses, two function calls to check order status, two function responses, and a closing message.

    That is 120,000 billable session events per month, before accounting for Memory Bank extractions or any code execution. If the agent also stores a memory for each returning customer and retrieves it on subsequent visits, memory operations add another layer of metered usage.

    Now scale that to five agents across three departments, each with different verbosity levels and tool dependencies. The billing surface area expands across sessions, memory operations, and compute usage, and without instrumentation, teams may not see the accumulation until the invoice arrives.

    Analysis

    This change matters because it alters the economic model of agent design. During preview, teams could retain long session histories, extract extensive long-term memories, and rely heavily on sandboxed code execution without seeing distinct cost signals for those choices.

    By introducing explicit billing for sessions and memories, Google is making agent state visible as a cost driver. The platform now treats conversational history, long-term context, and tool execution as resources that must be managed, not just features that come for free with inference.

    Implications For Enterprises

    For platform and engineering teams, cost management becomes a design concern rather than a post-deployment exercise. Session length, verbosity, and event volume directly affect spend. Memory policies such as summarization, deduplication, and selective persistence now have financial as well as architectural consequences.

    From an operational perspective, autoscaling settings, concurrency limits, and sandbox usage patterns influence both performance and cost. Long-running agents, multi-agent orchestration, and tool-heavy workflows can multiply runtime hours, stored events, and memory usage.

    For governance and FinOps teams, agent state becomes something that must be monitored, budgeted, and potentially charged back internally. Deleting unused sessions and memories is not just a data hygiene task but the primary way to stop ongoing costs.

    The Bigger Picture

    Google is not alone in moving toward granular agent billing. As agentic architectures become production workloads, every major cloud provider faces the same question: how do you price something that thinks, remembers, and acts?

    Token-based billing made sense when AI was stateless. But agents accumulate context over time, persist memories across sessions, and invoke tools that consume compute independently of inference. Metering these components separately reflects a broader industry shift: agents are not just models. They are systems, and systems have operational costs.

    Similar pricing structures are increasingly plausible across AWS, Azure, and independent agent platforms as agentic workloads mature. The teams that build cost awareness into their agent architectures now will have an advantage when granular agent billing becomes standard.

    Risks and Open Questions

    Several uncertainties remain. Google documentation does not yet clearly define default retention periods for sessions or memories, nor how quickly deletions translate into reduced billing. This creates risk for teams that assume a short-lived state by default.

    Forecasting costs may also be challenging. Session and memory usage scales with user behavior, response verbosity, and tool invocation patterns, making spend less predictable than token-based inference alone.

    Finally, as agent systems grow more complex, attributing costs to individual agents or workflows becomes harder, especially in multi-agent or agent-to-agent designs. This complicates optimization, internal chargeback, and accountability.

    Further Reading

    Google Cloud Vertex AI Pricing, Vertex AI Agent Builder Release Notes, Vertex AI Agent Engine Memory Bank Documentation, AI CERTs analysis on Vertex AI Agent Engine, GA Google Cloud blog on enhanced tool governance in Agent Builder

    [ From the Issue ]

    The Enterprise AI Brief | Issue 1

    View all articles in this issue
    [ Keep Reading ]

    More from The Operations Room

    Issue 7

    Treasury’s New AI Risk Framework Gives the Financial Sector a Governance Playbook

    The Treasury’s new Financial Services AI Risk Management Framework turns the abstract ideas of trustworthy AI into something financial institutions can actually implement. Instead of principles alone, it introduces more than 200 concrete control objectives and a toolkit built for real governance workflows. For banks deploying AI in lending, fraud detection, and customer systems, the question is no longer whether governance exists. It is whether governance holds up under examination.

    Read article
    Issue 6

    The Trace Is the Truth: Observability Is Becoming the Operational Backbone of AI Systems

    An AI system can return a 200 OK and still be wrong. As enterprises move from single-model services to autonomous agents, tracing prompts, retrieval, tool calls, and state transitions is the only reliable way to explain what happened. This edition looks at why observability is shifting from background logging to the operational backbone of AI in production — and what it means for teams that can’t afford to find out after the fact.

    Read article
    Issue 5

    When Prompts Started Breaking Production

    By early 2026, prompts were breaking production often enough that teams stopped treating them as configuration and started treating them like code: versioned, regression-tested, blocked in CI/CD when quality metrics slip. This is what happened when informal text became the functional interface defining system behavior, and why the teams that got ahead of it caught failures before their users did.

    Read article

    Have a Project in Mind?

    Talk to our team about how we can put these ideas to work in your organization.

    Contact Us