How to deal with failing tasks?

Hey there,

I’m relatively new to CrewAI but i’m quite fascinated so far. I’m finding my struggles in dealing with a controlled process though. Primarily speaking of custom made tools.

My primary question is: How would I best control a Crew/Agent in terms of failing Tasks/Tools.

Let’s assume I have a custom tool that queries some data from an endpoint. I easily got there but noticed it’s hard to get a reliable answer structure for once and also hard to deal with failure seemingly. Also it’s tough for me to get control over the amount of retries/exections.

I ran into this in a rather big application already so i tried to reproduce it in a simple test flow and it’s kind of the same.

Here’s my code - I hope it’s okay to post this here.

import os

from crewai import Agent, Crew, Process, Task
from crewai.tools import BaseTool
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

# Load environment variables from .env file
load_dotenv()

# Check if the key is loaded (optional debug step)
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    print("Warning: OPENAI_API_KEY not found after loading .env!")
else:
    print("OPENAI_API_KEY loaded successfully.")

# --- 1. Define a Tool that Simulates Data Fetching Failure ---
class DataFetcherTool(BaseTool):
    name: str = "DataFetcherTool"
    description: str = "A tool to fetch specific data based on an ID."
    raise_on_error: bool = True # Ensure errors propagate

    def _run(self, resource_id: str) -> str:
        print(f"Tool: Attempting to fetch data for resource_id: {resource_id}")
        # Simulate a failure condition (e.g., data not found)
        raise ValueError(f"Resource Error: Could not find data for resource ID '{resource_id}'")

# --- 2. Initialize the Tool ---
data_fetcher_tool = DataFetcherTool()

# --- 3. Define the Agent ---
# Assumes OPENAI_API_KEY is set in the environment
llm = ChatOpenAI(model="gpt-4o")

data_retrieval_agent = Agent(
    role='Data Retrieval Specialist',
    goal=(
        'Use the DataFetcherTool to retrieve specific information. '
        'Handle potential issues gracefully.'
    ),
    backstory=(
        "You are an agent responsible for fetching data from various sources. "
        "You must use the provided tool to get the data for a specific resource ID. "
        "If the tool indicates the resource cannot be found or returns an error, "
        "you must stop immediately and report this outcome. Do not invent data."
    ),
    verbose=True,
    allow_delegation=False,
    llm=llm,
    max_retry_limit=0, # Limit Agent retries
    max_iter=1, # Limit tool iterations to ensure it stops on tool error
    tools=[data_fetcher_tool]
)

# --- 4. Define the Task ---
fetch_data_task = Task(
    description=(
        "Use the DataFetcherTool to fetch data for resource ID 'item-123'. "
        "If successful, the tool should return a JSON string with 'name' and 'value'. "
        "Your final answer MUST be this JSON string. "
        "If the tool cannot find the resource or encounters an error, report that the operation could not be completed."
    ),
    expected_output="A JSON string like '{\"name\": \"...\", \"value\": \"...\"}' or a confirmation that the data could not be fetched.",
    agent=data_retrieval_agent,
    max_retries=0, # Limit task retries
    verbose=True
)

# --- 5. Define the Crew ---
data_crew = Crew(
    agents=[data_retrieval_agent],
    tasks=[fetch_data_task],
    process=Process.sequential,
    verbose=True # Set verbose to boolean True
)

# --- 6. Execute the Crew with Error Handling ---
if __name__ == "__main__":
    print("\n--- Starting Minimal Crew Data Fetch Test ---")
    test_input = {'input_data': 'trigger'} # Crew needs some input dict

    try:
        result = data_crew.kickoff(inputs=test_input)
        print("\n--- Crew Execution Finished ---")
        print(f"Result: {result}")
        print("Status: SUCCESS (Should not happen if tool fails correctly)")

    except Exception as e:
        print("\n--- Crew Execution FAILED ---")
        print(f"Caught Exception: {type(e).__name__}")
        print(f"Error Message: {e}")
        print("Status: FAILURE (Expected due to tool error)")

    print("\n--- Test Complete ---")

Here’s the result of the code above: Hastebin

From the execution you can see that the Agent uses the tool multiple times even though max_iter is set to 1 (it doesn’t use it at all if i set to 0 so I guess 1 should be the right number to go with if I only want it to execute once?). I also set Retries on Agent and Task on 0 but still it exectues more than once.

And then even though it fails with the tool it reports the Task as done and the Crew as well.

I’m sure I’m just missing very obvious things but I hope the foundation is good enough to get some help here :slight_smile:

Thanks

First off, welcome aboard.

Let me clarify what the Agent class docstring says about the max_iter attribute: “Maximum number of iterations for an agent to execute a task.” This information is then passed to the CrewAgentExecutor class. When this limit is reached (in your example, 1), the _handle_max_iterations_exceeded method is called, and it “requests” that the final answer be generated. That’s exactly the “Maximum iterations reached. Requesting final answer.” message you’re seeing in your output.

As for what to do in this situation, I’ll leave it to folks much more qualified than me to chime in and guide you. Personally, I tend to keep Python-things apart from LLM-things and only bring them together when absolutely necessary. I’m already losing my hair, and I don’t want to lose what’s left trying to figure out how the LLM is going to decide to use my tools. Specifically, for the use case you presented, a good-old try block inside your tool’s _run method could help you. And maybe even wrap that try-except block inside a loop to try N times? Finally, after N attempts, your tool could return something like, “Hey, Mr. LLM, I couldn’t get the info for client ID-666!”, and while you’re at it, log the error details for a flesh-and-blood human to check out and handle it later.

If you want to learn a bit more about some of the principles I use to structure my solutions, I recommend this text from Anthropic and this recent video. Good luck!

Thanks for replying Max.

I totally get your point. I am only slowly getting started with CrewAI and still have a bunch to figure out. But during the last 2 days I already was able to move on quite a bit.

I thought I’m gonna come back and add this to my post here because the learnings I had, some others might profit from in the future. So, I think there are two big learnings.

First is that making use of Pydantic output from tasks is totally something that helps here and can already solve quite a bit and really makes processes like these super reliable.

But then second, more importantly, as you already said Max, it’s probably not worth looking at having CrewAI handle a task with LLMs when a task is completely not making use of reasoning or any other LLM related decision process. But with this, I also found out that you can just use flows and integrate your own functions into the flow, which actually is great. And that’s what I did now. I basically created my tool that is not really acting as a tool in this case, but figured out how to make it basically flexible so I can just use it directly as well as a function, but also as a CrewAI tool for AI if needed. So I basically created a WordPress tool in my case and have it access WordPress directly or be a helpful tool for the AI if needed. That’s a great solution, I guess.

I’m happy with it, just coming here to update you all on this and tell you what I found out.

Cheers

2 Likes

Great points, @Helmi.

The combination: Flows + clear LLM calls + outputs in Pydantic models can solve 90% of problems in a very elegant and scalable way.

Best of luck on your journey and thanks for sharing!