Agent Final Answer is modified before output?

I am experiencing the same issue described on Parsed final answer in AgentFinish / TaskOutput often incomplete · Issue #788 · crewAIInc/crewAI · GitHub.

The issue was closed without resolution, so I am seeking help here.

I am building on the example: crewAI-examples/write_a_book_with_flows at main · crewAIInc/crewAI-examples · GitHub

I can see from verbose log that my agent constructed the expected output as Final Answer, which is a markdown formatted chapter, but the actual output has contents stripped out of the markdown.

chapters = await asyncio.gather(*tasks)

Does anyone know how to prevent this from happening? Is this an output formatting issue or a prompting issue (e.g. tasks.yaml)

expected_output: >
A markdown-formatted chapter of around 5,000 words that covers the provided chapter title and outline description.

It is frustrating as I can verify the desired “Final Answer” in the verbose log message from the agent, but it just keeps outputting content that has pieces of markdown missing.

Any help will be appreciated.

@Jeff_Chen If you’re using a tool for the task, you can force tool output as a result by setting the result_as_answer parameter to True. As stated in the docs:

This parameter ensures that the tool output is captured and returned as the task result, without any modifications by the agent.

Here’s an example from the docs:

from crewai.agent import Agent
from my_tool import MyCustomTool

coding_agent = Agent(
    role="Data Scientist",
    goal="Produce amazing reports on AI",
    backstory="You work with data and AI",
    tools=[MyCustomTool(result_as_answer=True)],
)

Hi @rokbenko,

I am not using a tool at the moment but was contemplating if I can define a simple tool that just passes through the Final Answer from the agent with result_as_answer=True to fix this problem. It’s a little kludgey.

I did compare the outputs of gpt-4o-mini vs gpt-4o (2024-08-06). The latter got it almost 95% correct. There was one markdown tag that was not output correctly but the rest was output correctly.

This suggests the problem is in part the instruction follow capability of the LLM for the agent?

This was exactly my thought too! :sweat_smile:

Absolutely! The choice of LLM can significantly impact the quality of the final answer. In a past project, I switched from Anthropic’s Haiku to Sonnet, and the results improved dramatically, even though the agent and task YAML configurations stayed exactly the same.

If using a cheaper (i.e., less capable) LLM saves a lot of money and doesn’t lower the quality of the final answer (since the agent only removes some parts from the markdown), it might be worth creating a simple tool that just passes the final answer and set the result_as_answer parameter to True. You’d just need to make sure the agent always uses this tool.

I tried the approach of passing the content to a “null custom tool” then forcing the result as answer, but the agent passed in it’s self reflection about the content rather than the actual content.

This is most frustrating, but at least the usable output is in the verbose log as “Final Answer”, I am just having to harvest it by hand at the moment.

The output of gpt-4o turns out not to be consistent, so the first output I examined was very close, but then chunks started to be filtered between “Final Answer” and actual agent output. This is so very strange. I am experimenting with my task expected_output definition but have not gotten the result that I seek yet.