More Agents, Worse Results: When Simple Beats…

Oct 27, 2025

What I learned when my multi-agent system produced worse results than the "simple" approach.

5 Comments

I’m building an agent that is able to jut a large amount of product data, extract signals that entail growth, decline, risk, etc., and produce hypotheses and summaries for GTM teams.

From a genetic perspective, a single agent seems to do the job for this use case. This would be something that a downstream agent can consume as well, because it provides as it’s output, the hypotheses/summaries, as well as “ evidence “that are basically restructuring the data points as well as combining them into short sentences.

I weigh the data in different ways to give the agent more context into what the numbers mean. The core problem we are solving with this is, what has changed since I last reviewed this customer‘s data and is it meaningful? So I feed the agent both the raw data and these labels (high growth, decline, etc), with the idea being that the agent should have the context into whether a change from X to Y is actually meaningful. And I don’t feed it 365 days of raw data, rather I take medians or averages of periods and then provide those periods to the AI so it can understand the data clear as opposed to dumping a bunch of individual data points on it. Between the label and these raw aggregations, the agent can answer this question of what is changing and is it meaningful.

Based on this, we are able to reduce the time it takes for someone to analyze product usage data for a given customer, and we can flag ICP accounts that are investing or at risk.

Would love any feedback you have on this! I’m also trying to think through how this might become a tool that other agents can plug into. Do I create an agent that can produce both summaries and evidence snippet and expose this via MCP? More to come!

Reply (1)

Lily Luo

Jan 8

Hi Jordan, thanks for sharing! I love seeing how others are approaching and leveraging AI. A few thoughts:

-What you've described sounds like a well-architected workflow (from building my own agent, I now know that "agent" implies autonomy, so adding this as a way to update my post above) that processes data, runs AI analysis, and produces outputs (product usage analysis).

-Your single process approach is smart. If you broke this into three different steps where AI summaries were passed back and forth, you’d lose the nuance. By keeping the raw data and your weighted labels in one view, you’re making sure the AI has the data it needs to be accurate.

-The evidence snippets is smart. And for your MCP question: downstream tools or agents really just need a reliable "handshake." If your tool returns a clear hypothesis + evidence + confidence score, it gives the next tool in the chain (or the human) everything they need to take action without having to re-run the numbers themselves.

I then asked Atlas (my agent) what it thought and some additional ideas to take it further:

1. Play "Devil's Advocate": Ask the AI to try and disprove its own growth hypothesis using that same data. For example: "The data looks like growth, but give me one reason why this might just be a seasonal fluke." If the AI can't find a counterargument, the human end-user will trust the original summary much more.

2. Give it a "None of the Above" Option*: One of the biggest risks with AI is that it feels forced to give an answer even when the data is messy. Give your system the permission to say,"The signals are too conflicting, I can't make a call."* A "I don't know" is much more valuable than a guess when you're flagging at-risk accounts.

3. Check the Narrative: Instead of just looking at the raw data for this period, try feeding the AI the last summary it wrote for that customer. This helps the AI see the "story" of the account over time, rather than just treating every analysis like a brand-new event.

Let me know what you think!

Reply (1)

Jordan Gronkowski

Jan 10

A lot of the real demos for GTM problem spaces aren't in public yet, so since I appreciated your article so much, I thought I'd share something I'm working on too!

These are great suggestions - I will think more more about all of them! So far with this MVP, we know we can generate valuable hypotheses purely on product data, which and crucially, act as inputs to downstream agents for broader/deeper hypotheses and actions.

One thing missing in the system now is the feedback loop. While it's doing the work like what our Solutions and Sales teams do, sometimes these AI projects feel like data science in reverse. I don't have a target variable(s) that the hypotheses are optimizing on (in my case, it's because I don't have all the time series data I need). Instead, I'm collecting human feedback and then fine-tuning. We'll use Google's AI eval tooling for this since this is all built around the Gemini batch API.

One major takeaway from this project: **Know AI capabilities and limitations.**

One example form this project so far: AI is great at synthesizing large amounts of data, but only within its context window/connected knowledge. When we gave it the raw data, only, it was like, "Variable X increased from 1,000 to 1,050, so crazy right?!" which is actually uninteresting, but it doesn't know!

Instead, I'm giving it the labels as well as raw data, e.g. "when variable X goes up by 5%, that's meaningful if the value was at 1M to start, based on a distribution of changes in accounts at that scale; but 5% is not meaningful if you're at 1K to start." So we do the heavy-lifting in preprocessing and then give it all the stuff we want it to synthesize from there. I'll look to write about it once it's in prod. (:

Stan Heaton

Oct 27Edited

Perfect timing. I’m about to build an AI org/roles/workflow with multiple agents. This is good advice. I think like any good system, the agents need self-correction mechanisms. I’m planning on training mine using the Socratic method - having them ask questions and then assessing the answers together. That may prevent them from being starved and may reveal gaps. Have you tried that? Certainly it will take more time, but my hope is the investment in teaching yields better outputs.

Reply (1)

Lily Luo

Oct 27

Interesting - let's talk more about your use case and what your final output is intended to look like. But yes, when I am using AI, especially for a larger-scale project, I ask it to fill in the gaps and review my logic. 1) Ask me any clarifying questions before finding a solution 2) Weigh the pros and cons of each solution and based on my requirements, which method do you recommend (sometimes the AI has a hard time picking a choice so I try to force it to) 3) I always try to create a project file and instructions for larger projects so I can keep track of it.

You can have AI update it continuously so it's always recent. When I need to come back to it, it has the most up-to-date details.

I have also experimented with "QA agents" to review requirements and ensure the output meets them, saving me time doing it manually. For example, for ad creation, it will do a check of character counts, any messaging guidelines, etc. as a last step to refine the results as well.

Applied AI for Marketing Ops | Lily Luo

More Agents, Worse Results: When Simple Beats…