Programming

Coordinating Multiple AI Agents at Scale: Lessons from Intuit’s Engineering Team

Learn how Intuit engineers tackle the hardest problem in engineering: coordinating multiple AI agents at scale, with strategies for communication, consistency, and orchestration.

Published 2026-05-03 01:03:21 • Atinec Stack Staff

The Rising Complexity of Multi-Agent Systems

As artificial intelligence becomes more pervasive, companies are increasingly deploying not just one but multiple AI agents to handle complex tasks autonomously. From customer support chatbots to automated data pipelines, these agents must collaborate seamlessly to achieve business goals. Yet, as Chase Roossin, group engineering manager, and Steven Kulesza, staff software engineer at Intuit, revealed in a recent podcast, orchestrating multiple agents at scale is arguably the hardest problem in modern engineering.

Coordinating Multiple AI Agents at Scale: Lessons from Intuit’s Engineering Team — Source: stackoverflow.blog

Their conversation shed light on the unique challenges of getting diverse AI agents to “play nice”—communicate effectively, avoid conflicts, and maintain consistency—all while operating under real-world constraints like latency, cost, and reliability. This article distills key lessons from their experience and offers a broader framework for tackling multi-agent coordination at scale.

Key Challenges in Multi-Agent Coordination

Building a system where multiple AI agents work together involves far more than just integrating separate models. The following challenges must be addressed head-on:

Communication Overhead: Agents often speak different “languages” or use disparate data formats. Without a common protocol, messages can be lost, misinterpreted, or delayed.
Consistency and State Management: Each agent may hold partial views of the world. Keeping shared state consistent across agents without introducing bottlenecks is non-trivial.
Conflict Resolution: When agents make contradictory decisions—for instance, booking overlapping resources—a mechanism must exist to detect and resolve conflicts quickly.
Resource Contention: Agents competing for limited computational resources (GPU, memory, API calls) can degrade overall system performance if not carefully managed.
Observability: Understanding which agent did what, and why, becomes exponentially harder as the number of agents grows.

Strategies for Making Agents “Play Nice” at Scale

Roossin and Kulesza emphasized that solving these challenges requires a combination of architectural choices and operational practices. Below are some of the most effective strategies drawn from their work and industry best practices.

Standardized Protocols and APIs

Just as microservices rely on APIs to communicate, multi-agent systems need a common contract. Defining standardized message schemas and interaction patterns (e.g., request/response, publish/subscribe) reduces ambiguity. Intuit, for example, uses a lightweight schema registry that agents query before exchanging data, ensuring every agent knows the expected format and semantics.

Centralized Orchestration vs. Decentralized Consensus

There is no one-size-fits-all approach. In some contexts, a centralized orchestrator (like a workflow engine) provides clear control and easy debugging. In others, a decentralized model—where agents negotiate through a consensus algorithm—offers better scalability and fault tolerance. The key is to choose based on the criticality of decisions and the speed of coordination required.

Monitoring and Observability

You cannot fix what you cannot see. Intuit’s teams built custom dashboards that track agent-level metrics: decision latency, conflict rates, state drift, and resource usage. They also implemented distributed tracing to follow a single request across multiple agents, making it easier to pinpoint failures or inefficiencies. As Kulesza noted, “observability is the bedrock of trust in a multi-agent system.”

Lessons from Intuit’s Implementation

During the podcast, Roossin and Kulesza shared specific insights from their experience rolling out multi-agent systems at Intuit. One notable example involved their tax preparation platform, where multiple agents handle different parts of the workflow (e.g., data extraction, form completion, error checking). The team discovered that early conflict detection dramatically reduced rework: by having agents broadcast their intended actions before committing, the system could flag potential overlaps.

Another lesson was the importance of graceful degradation. When one agent fails or returns a low-confidence result, the system should not halt entirely. Instead, fallback agents or human-in-the-loop mechanisms take over, ensuring the overall user experience remains smooth.

Finally, they emphasized iterative scaling. Starting with just two agents and gradually adding more allowed the team to identify coordination pitfalls early. They recommend any organization embarking on multi-agent development to begin with a minimal viable orchestration layer and expand only after proving the core logic.

The Future of Multi-Agent Systems

As AI agents become more capable, the need for robust coordination will only grow. Emerging trends include self-organizing agent swarms, where agents dynamically form teams based on task requirements, and reinforcement learning for coordination, where agents learn optimal interaction strategies over time. However, these advances will bring new challenges in safety, accountability, and explainability.

For now, the lessons from Intuit’s engineering team provide a solid foundation: standardize communication, build for observability, plan for conflict, and scale deliberately. Overcoming the “hardest problem in engineering” is not only possible but essential as we entrust more critical tasks to multi-agent systems.

To dive deeper, listen to the full podcast episode with Chase Roossin and Steven Kulesza, or explore Intuit’s engineering blog for detailed technical walkthroughs.