I have limited experience and some of the bugs mentioned could be my own fault - but I will share the following observations:
-turning on verbose mode and piping all output to a text file can reveal much more good stuff that does not necessarily make it into the ‘final’ report stored by the manager. Reviewing that file will show what the Manager is trying to do and if it was successful at getting a response or not. Be sure to look at times when the manager has a ‘Thought’ about what it is doing. That is where I found some rich stuff.
-sometimes the interaction between the manager and a coworker does not go as planned; and instead, it will cycle or timeout leaving a task incomplete or returning with a very short answer. There is one reason, so far, that I observe this happening: that within the Manager request to delegate to a coworker with a task and a question at the same time, there is an error like the following:
Tool Output:
Error: the Action Input is not a valid key, value dictionary.
Unfortunately, this occurs when it is really getting going and getting deep into the problem. The task it was assigning and the question it was asking are really getting to the core of the problem and show it has learned from previous levels since the beginning.
But even after these errors, the interaction between the coworkers and the manager are impressive even considering I am still using a local LLM for testing and not deployed out to a production environment - once i was able to see all the stuff going on in verbose mode. Verbose mode was set here:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.hierarchical,
verbose=True,
manager_llm=‘ollama/llama3.2’
To answer the other part of your question, it seems to iterate in this mode more than in sequential mode, and I am seeing benefit as the manager parses answers from one coworker into questions for another coworker.
One additional note. Once I switched the crew LLM over from a local testing ollama to a production Azure hosted OpenAI model, the behavior of the Manager agent changed remarkably. It was clear that the Manager was really struggling with the format of the schemas and even developed a Thought that it was having a recurring problem.
“Agent: Crew Manager
Thought: Since previous attempts to delegate the work have failed, I will now simplify the descriptions even further, ensuring the language is direct and clear.”
“Agent: Crew Manager
Thought: Since past attempts to delegate the task have failed due to ongoing validation errors, I need to ensure the input strings are as clear and direct as possible. I will focus on making everything concise and precise.”
The larger LLM still gave a better result, and was orders of magnitude faster, but it was interesting that the delegating-of-tasks was hanging up so much. In this case the Manager asked much less questions of the coworkers and just kept trying to adjust the delegation to make the handoff work.
Hope this is helpful.