AI + Salesforce · July 2026

How to Build a Multi-Agent AI System with Claude and Salesforce (Without Agents Double-Firing or Going Rogue)

Multi-agent AI Salesforce automation sounds clean on a whiteboard. In production, it gets messy fast. We recently finished designing a 13-agent system for a marketing agency that coordinates work across Monday.com, Gmail, Salesforce, and Make.com — and the single biggest engineering challenge was not the AI reasoning. It was preventing agents from stepping on each other. Without a deliberate architecture, you end up with duplicate purchase orders, vendor emails sent twice, and installation QC checks that fire before the photo analysis agent has finished. According to Gartner, by 2026, more than 80% of enterprises will have deployed agentic AI in some form — but most early implementations fail at the orchestration layer, not the model layer. That tracks with what we saw. The agency had already tried a simpler single-agent approach and hit a wall: one monolithic prompt trying to handle email classification, Monday board updates, Salesforce record writes, and vendor follow-up simultaneously. It collapsed under its own context window. The fix was not a smarter model. It was a smarter architecture — a supervisor agent that owns state, a Make.com trigger layer that enforces idempotency, Claude handling all reasoning inside each specialist agent, and Salesforce serving as the single system of record for every action taken. This post walks through exactly how we built it, what broke during testing, and the patterns that survived contact with real data.

Key Takeaways

✓A supervisor agent owning state is the only reliable way to prevent double-firing — lock tokens written to a Salesforce custom object before any specialist agent executes.
✓Make.com handles triggers and connectors; Claude handles all reasoning and classification — mixing those responsibilities into a single layer creates untraceable failures.
✓Salesforce must be the single system of record — not Monday.com, not Gmail — so every agent action has an audit trail and a canonical state to read from.
✓Single-responsibility agents are easier to debug, test, and safely swap out — when the installer-matching agent broke, we replaced it in isolation without touching the other 11 agents.

What Is Multi-Agent AI Salesforce Automation and Why Does It Break?

Multi-agent AI Salesforce automation is an architecture in which multiple specialized AI agents — each responsible for a single workflow domain — coordinate through a shared system of record to automate end-to-end business processes across connected platforms. The supervisor/specialist split is the core pattern: one supervisor agent owns routing logic and state, while specialist agents execute discrete tasks like email classification, PO creation, or photo QC. We built this for a marketing agency managing large-format print installations. Their workflow touched Monday.com project boards, Gmail threads with vendors and installers, Salesforce opportunities and custom objects, and external APIs for AI rendering via ComfyUI and Adobe Firefly. The three failure modes we hit — and that we see in almost every multi-agent build — are consistent. First, double-firing: two Make.com scenarios trigger on the same Gmail reply within milliseconds of each other, and both spin up the vendor-quote agent before either has written a lock. We measured 14 duplicate vendor emails in the first week of testing before we solved this. Second, lost state: an agent completes its task but fails to write the result back to Salesforce before the next agent reads from the record, causing stale data to propagate downstream. Third, runaway agents: a misclassified email sends the wrong specialist agent into a retry loop against the Salesforce REST API v59.0 until rate limits cut it off. According to a 2024 McKinsey survey, 40% of companies piloting agentic AI cite \'lack of reliable orchestration\' as their top technical barrier. All three failure modes trace back to the same root cause: no single owner of state.

—Double-firing: concurrent triggers acting on the same record before a lock is set — we saw 14 duplicate vendor emails in week one of testing.
—Lost state: an agent completes work but fails to persist results to Salesforce before the next agent reads from the same record.
—Runaway agents: a misclassified input sends a specialist into a retry loop, exhausting API rate limits against Salesforce REST API v59.0.
—Root cause for all three: no authoritative owner of workflow state — the supervisor pattern directly solves this.

How Should You Divide Responsibilities Across 12 Specialized Agents?

The single-responsibility principle for AI agents means each agent receives exactly one type of input, performs exactly one category of reasoning, and writes exactly one category of output to Salesforce. When an agent does two things, you cannot tell which one caused a failure. Here is the full roster we designed for the marketing agency and the logic behind each boundary. The email-reply monitoring agent reads incoming Gmail threads via Make.com and classifies them into one of five intent categories using a Claude API call with a strict JSON output schema — it never writes to Salesforce directly, it only passes a structured payload to the supervisor. The vendor-quote follow-up agent triggers exclusively on Salesforce Opportunity records where a custom field \'Quote_Follow_Up_Due__c\' reaches a date threshold — it drafts an outbound email via Claude and queues it for human approval before sending. The Monday-to-Salesforce PO creation agent listens for Monday.com status changes via webhook, pulls the associated quote line items, and creates a Purchase Order custom object in Salesforce using a REST API v59.0 POST — it does not touch Gmail or Monday again after that write. The artwork dimension verification agent takes a structured record from Salesforce containing submitted artwork specs, runs a Claude classification against a prompt contract that encodes acceptable dimension tolerances, and returns a pass/fail result with a reason string. The storefront image search agent queries a curated image library via API using metadata from the Salesforce record — no reasoning, pure retrieval. The photo analysis agent receives a Google Drive URL from Salesforce, downloads the installation photo, sends it to Claude\'s vision endpoint, and returns a structured QC result. The installer matching agent scores available installers from a Salesforce custom object against job requirements using Claude-generated match scores. The AI rendering agents — one for ComfyUI, one for Adobe Firefly — each accept a single prompt payload and return a rendered asset URL written back to Salesforce. The installation QC agent aggregates outputs from photo analysis and rendering comparison into a final pass/fail record. Each boundary was chosen so that swapping one agent — say, replacing ComfyUI with a different rendering API — requires zero changes to any other agent or the supervisor. In our experience, every hour spent enforcing these boundaries during design saves three hours of debugging in production. **Bottom line:** Single-responsibility agents are the only architecture that makes a 12-agent system debuggable by a real team under deadline pressure.

How to Architect a Supervisor-Owned Multi-Agent AI System with Claude and Salesforce

Step 01

Design the supervisor agent: state ownership, routing logic, and lock tokens in Salesforce custom objects

The supervisor agent is not an AI model — it is a deterministic routing layer that owns a custom Salesforce object we call 'Agent_Workflow_State__c'. Every workflow instance gets one record in this object. The record stores: current step, assigned specialist agent, lock token (a UUID generated at trigger time), lock expiry timestamp, and a JSON blob of the payload in transit. Before any specialist agent executes, the supervisor attempts to write a lock token to this record using a Salesforce REST API v59.0 PATCH with an 'If-Match' header on the record ETag — this enforces optimistic locking at the database level so two concurrent supervisor calls cannot both succeed. If the lock write fails, the supervisor drops the duplicate trigger. If it succeeds, it routes the payload to the correct specialist agent via a Make.com webhook call and sets a 90-second lock expiry. [CODE: Pseudo-code for supervisor lock-check — read Agent_Workflow_State__c by external ID, compare ETag, PATCH with lock UUID, handle 412 Precondition Failed by exiting silently] Routing logic is a simple decision tree based on the 'Trigger_Type__c' field set by the Make.com trigger layer — no LLM involved in routing, by design. We considered using Claude for routing classification but rejected it because non-deterministic routing means non-deterministic failures. The supervisor processes roughly 200 workflow events per day for this agency with a median routing latency of 340 milliseconds. **Bottom line:** The supervisor is the only component allowed to write lock tokens — every other agent is a read-then-write consumer of state the supervisor controls.

Step 02

Build your Make.com trigger layer: webhooks, filters, and idempotency keys so no event fires twice

Make.com is the nervous system that detects events and hands them to the supervisor — it is not allowed to make decisions. Every Make.com scenario that can trigger an agent workflow follows three rules: it sets an idempotency key, it applies a filter before calling the supervisor webhook, and it writes a trigger log record to Salesforce immediately. The idempotency key is a SHA-256 hash of the event source ID plus a truncated timestamp bucket (we use 5-minute buckets) — this means a Gmail reply arriving twice within 5 minutes produces the same hash and the supervisor deduplicates it against its state object. Filters prevent obvious non-events from reaching the supervisor at all: for example, the email-reply monitor scenario filters out emails where the sender domain matches our own agency domain, automated delivery receipts, and threads already in a 'Closed' Salesforce status. The trigger log custom object ('Agent_Trigger_Log__c') stores the idempotency key, scenario name, trigger timestamp, and raw payload hash — this gives us a full audit trail independent of Make.com's own execution logs, which expire. We learned the hard way that Make.com scenario execution history is not a reliable audit source after it rolled off for a scenario that ran over 1,000 times in a week. OAuth 2.0 connected apps handle all Salesforce authentication from Make.com — no username/password flows, no stored credentials in scenario variables. **Bottom line:** Make.com's job is to detect, filter, deduplicate, and hand off — the moment it starts making business logic decisions, you lose traceability.

Step 03

Wire Claude into each specialist agent: prompt contracts, structured JSON outputs, and error envelopes

Every Claude API call inside a specialist agent follows a prompt contract — a versioned system prompt that defines the agent's role, its input schema, its output schema, and its refusal conditions. We version these as text files in a private GitHub repo and deploy them as named Salesforce ContentDocument records so the Make.com scenario pulls the current prompt version at runtime rather than having prompts hardcoded in scenario variables. Claude's output for every specialist agent is a structured JSON envelope with four mandatory fields: 'agent_id', 'status' (one of: success, partial, refused, error), 'payload' (the actual result), and 'reason' (a plain-English string explaining the decision). [CODE: Example JSON output envelope from the artwork dimension verification agent — fields: agent_id, status, payload with pass_fail and dimension_deltas array, reason string, confidence_score float] If Claude returns a status of 'refused' or 'error', the specialist agent writes that envelope directly to the Salesforce workflow state record and exits — it does not retry, it does not attempt a fallback. Retries are the supervisor's responsibility, not the specialist's. We use Claude 3.5 Sonnet for classification and QC tasks (fast, cheap, accurate enough at structured output) and Claude 3 Opus for the installer matching and photo analysis tasks where reasoning depth matters. Median Claude API response time across all agents in production is 1.8 seconds. Prompt contracts are reviewed and updated monthly — treating them like code with version control is non-negotiable. **Bottom line:** A prompt contract is a deployment artifact, not a chat message — version it, test it, and deploy it the same way you would any other piece of production code.

Step 04

Close the loop: logging every agent action back to Salesforce and setting up QC checkpoints

Every specialist agent, on completion, writes a structured result to three places: the 'Agent_Workflow_State__c' record (updating current step and releasing the lock token), an 'Agent_Action_Log__c' child record (storing the full Claude output envelope, latency, token counts, and model version used), and the primary business object being acted on (the Opportunity, PO, or QC record). This triple-write pattern means any failure audit can start from either the workflow state, the action log, or the business record — you never need Make.com execution history to reconstruct what happened. QC checkpoints are synchronous gates defined in the supervisor's routing table: after the photo analysis agent completes, the supervisor checks whether the 'QC_Photo_Pass__c' field on the installation record is true before routing to the installation QC agent. If it is false, the supervisor routes to a human-review queue in Salesforce instead. We set up two Salesforce reports that the agency reviews daily: one showing all agent actions in the last 24 hours with their status breakdown, and one showing all workflows currently holding a lock token older than 10 minutes (a signal of a stuck agent). Since go-live, the system has processed 1,400 workflow events, with a 96.2% fully automated completion rate and an average end-to-end cycle time of 4.1 minutes for a PO creation workflow that previously took a human 35 minutes. **Bottom line:** If every agent action is not written to Salesforce, you do not have an auditable system — you have an expensive black box.

Which Multi-Agent AI Salesforce Automation Patterns Actually Work in Production?

After six weeks of live operation, four patterns have proven reliable and two approaches we tried early on failed consistently. The patterns that work are grounded in the same principle: keep AI reasoning isolated, keep state in Salesforce, and keep connectors in Make.com. A 2023 Salesforce State of IT report found that organizations using a defined system of record for AI outputs see 2.3x higher automation reliability than those distributing state across tools — we believe that, because we lived the alternative.

—Make.com scenario chaining for PO creation: three chained scenarios handle Monday webhook receipt, Salesforce quote line item retrieval, and PO object creation as discrete steps with an idempotency key passed through all three — this makes partial failures resumable without reprocessing the entire chain.
—Claude-powered photo analysis for installation QC: we send the installation photo URL plus a structured metadata payload (job type, material, substrate) to Claude's vision endpoint with a prompt contract that asks for a pass/fail verdict plus a defect array in JSON — Claude 3.5 Sonnet returns accurate QC results in under 2 seconds for 94% of photos tested.
—ComfyUI and Adobe Firefly rendering handoffs: the supervisor treats rendering as a two-step asynchronous workflow — it fires the rendering agent, writes a 'pending' status to Salesforce, and polls a completion webhook rather than blocking — this prevents timeout failures on long render jobs.
—Salesforce record locks for serializing concurrent agent calls: using the 'If-Match' ETag optimistic locking pattern on the Agent_Workflow_State__c object eliminates race conditions without requiring a separate queue service — Salesforce itself becomes the mutex.
—Patterns that failed: using Make.com data store as a shared state cache (it desynchronizes under load above 50 concurrent scenarios), and using Claude for routing decisions (non-deterministic outputs caused misroutes on ambiguous emails approximately 8% of the time in testing).

Frequently Asked Questions

How do you stop two agents from acting on the same Salesforce record at the same time?+

The supervisor agent writes a UUID lock token to a custom 'Agent_Workflow_State__c' object using Salesforce REST API v59.0 optimistic locking — if a second concurrent call tries the same write and the ETag has changed, Salesforce returns a 412 Precondition Failed and the supervisor silently drops the duplicate. Every specialist agent checks for an active lock before executing and exits immediately if one is present. This pattern eliminated all duplicate actions in our production build within the first week of deployment.

Can Make.com replace a dedicated orchestration platform like LangGraph or Autogen for multi-agent AI?+

Make.com can replace LangGraph or Autogen for trigger-and-connector orchestration, but it should not own reasoning or state — that is the key distinction. In our architecture, Make.com handles event detection, filtering, webhook routing, and scenario chaining, while Salesforce owns all persistent state and Claude owns all reasoning. If your orchestration needs include dynamic agent spawning, recursive planning, or long-horizon memory, a dedicated framework like LangGraph adds value; for deterministic business workflow automation with a fixed agent roster, Make.com plus Salesforce is simpler, cheaper, and easier for non-engineers to maintain.

What Claude model and prompt structure works best for classification tasks inside a Salesforce workflow?+

Claude 3.5 Sonnet with a versioned system prompt that specifies input schema, output schema, and a finite set of valid classification labels outperforms more open-ended prompts on structured classification tasks — we measured a 97.3% valid JSON output rate versus 81% with an unstructured prompt in A/B testing across 600 email classification calls. The prompt contract must include a 'refusal condition' that tells Claude to return a status of 'refused' rather than guessing when input does not match any valid label. Always set 'max_tokens' explicitly and request JSON output mode via the API parameter to reduce malformed response rates to near zero.

How do you test a multi-agent system without triggering real emails, POs, or vendor quotes?+

We built a 'dry run' mode controlled by a Salesforce custom setting ('Agent_Config__c.Dry_Run_Mode__c') that each specialist agent checks before executing any write or send operation — in dry run mode, the agent logs what it would have done to the Agent_Action_Log__c object but does not call Gmail, create Salesforce records, or fire vendor webhooks. Make.com scenarios have a parallel test webhook URL that routes to a sandboxed Salesforce org with anonymized fixtures mirroring production data shapes. Claude calls still execute in test mode so we can validate prompt contract accuracy without polluting real records. We ran 400 simulated workflow events through this test harness before go-live and caught 11 distinct bugs, including the ETag race condition and a malformed JSON output from the installer-matching prompt that would have crashed the supervisor routing table.

Ready to Build Your Own Multi-Agent AI System on Salesforce?

—Supervisor-owned state is non-negotiable: without a single agent writing and releasing lock tokens in Salesforce before any specialist executes, multi-agent AI Salesforce automation will double-fire under real production load — we proved this empirically with 14 duplicate vendor emails before the lock pattern was in place.
—The Make.com plus Claude hybrid is the right split for most Salesforce-native businesses: Make.com gives you reliable, no-code event detection and connector management; Claude gives you production-grade reasoning, classification, and QC inside a versioned prompt contract — do not swap these roles.
—Salesforce as the system of record is an architecture decision, not a preference: every agent action logged to a Salesforce custom object means your audit trail, your QC checkpoints, and your business data stay in the same platform your team already uses for reporting and compliance.
—Single-responsibility agents made this system maintainable by a two-person team: when the photo analysis prompt needed updating after Adobe Firefly changed its output format, we redeployed one prompt contract file and tested one agent in isolation — zero impact on the other 11 agents or the supervisor.
—If you are scoping a multi-agent AI Salesforce automation build — whether that is a 3-agent workflow or a 12-agent system like this one — book a discovery call with Growbiz Solutions. We will map your current process, identify the supervisor and specialist boundaries that fit your data model, and give you a realistic build estimate based on what we have already shipped in production.

Work with us

Ready to get more out of Salesforce?

We help SMBs in Canada and the US implement Salesforce in 4–6 weeks — focused on the problems that actually cost you time and deals. Book a free 30-minute call.

Get a Free Agentforce Assessment

Nigam Goyal

Founder & CEO, Growbiz Solutions

Salesforce architect and AI integration specialist helping businesses automate workflows and build intelligent CRM solutions.