Understanding Hierarchical Multi-Agent Systems with Minute-Level Response Times

Greetings,

Our team recently conducted a proof of concept with CrewAI using a simple hierarchical architecture. In this setup, a manager LLM/agent dispatches queries to specialist agents, with all specialists using only function calls that require structured inputs.

While this approach seems straightforward, we encountered minutes-level latency, which falls short of our target requirement of seconds-level response times. This latency puzzles us, as CrewAI is considered one of the more established agent frameworks with numerous production applications.

My questions are:

  • Have others encountered similar latency challenges, especially in function call-oriented applications?
  • What response times have other developers typically seen in their agent-based applications?
  • What might be causing these long processing times in our specific setup?

Agent response times depend on multiple factors including the amount of work that your agent is doing.

How many tools are your agents calling? Are they calling an API?

Curious as to your use case that requires second level response time, are you using it in a chatbot?

How many tools are your agents calling? Are they calling an API?

For simplicity in our POC, each specialist agent has access to only 1-3 function tools, while the manager LLM/agent handles query dispatching to a maximum of 10 specialists.

Rather than making actual API calls, we implemented function mocks using Pydantic for parameter schema validation. At this stage, we’re evaluating the system’s ability to understand intent and parse parameters, rather than the accuracy of responses.

Curious as to your use case that requires second level response time, are you using it in a chatbot?

Yes, we are presently using it in a chatbot.