Error: LLM call exception in async task doesn't stop flow execution

Issue Overview

I’m having inconsistent problems with my LLM usage that I would like to handle properly in my Flow execution. Everything works (and fails) as expect when running tasks synchronously, but since my flow would really benefit from running them asynchronously, this is an issue I would rather fix than circumvent. Unfortunately, when the LLM call throws an exception in an async task, the flow freezes instead of failing gracefully.

Context

I have a crew which summarizes a bunch of input documents and then finds similar documentation in a private database. Since the input can sometimes be very large, I split it in smaller groups so that I don’t have issues with context window limits. My crew execution is composed of three tasks:

  • Summarize documents: Executed with async_execution=True, this task may be executed n times, depending on the input.
  • Combine summaries: Has all previous summaries tasks in its context. Combines all summaries into one concise summary.
  • Find relevant documentation: Searches private documentation based on the previous summary.

The code for this crew creation is as follows:

def crew(self, total_executions) -> Crew:
        summarization_tasks = []

        for i in range(total_executions):
            summarize_task = self.summarize_code() # Creates a new task based on task.yaml
            summarize_task.async_execution = True

            # For each task description, change input placeholder to receive a different group of documents
            summarize_task.description = summarize_task.replace('input', f'input_{i}')
            summarization_tasks.append(summarize_task)

        combine_task = self.combine_summaries()
        combine_task.context = summarization_tasks # Define all previous summary tasks as context

        return Crew(
            agents=[
                self.summarizer(), # Used in summary tasks and summaries combine task
                self.evaluator() # Used in task to find similar documentation
            ],
            tasks=[
                *summarization_tasks,
                combine_task,
                self.evaluate()
            ]
        )

Each agent has it’s LLM defined with this function:

def _build_llm(self,  model) -> LLM:
        base_url = os.getenv("BASE_URL")
        return LLM(
            model=model,
            base_url=base_url,
            api_key=os.getenv("SECRET"),
            temperature=0.6
        )

Crew is called inside flow step as usual:

summary_crew = SummaryCrew()
crew = summary_crew.crew(total_executions=5)
result = crew.kickoff(inputs=inputs)

Flow is called asynchronously, as it is running in an uvicorn app:

flow = MyFlow()
result = await flow.kickoff_async(flow_input)

Error information

During the asynchronous summarize tasks, I sometimes have the following error:
litellm.APIError: APIError: OpenAIException
Although this occurs due to an issue in my company’s proxy, what I’m really trying to solve is how to handle this and potentially other errors that may happen.

The full trace is the following:

Exception in thread Thread-3 (_execute_task_async):
Traceback (most recent call last):
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 745, in completion
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 673, in completion
    ) = self.make_sync_openai_chat_completion_request(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/
logging_utils.py", line 237, in sync_wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 489, in make_sync_openai_chat_completion_request
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 471, in make_sync_openai_chat_completion_request
    raw_response = openai_client.chat.completions.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_legacy_response.py"
, line 364, in wrapped
    return cast(LegacyAPIResponse[R], func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_utils/_utils.py", 
line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/resources/chat/compl
etions/completions.py", line 1189, in create
    return self._post(
           ^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_base_client.py", 
line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_base_client.py", 
line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.PermissionDeniedError: redacted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/project/.venv/lib/python3.12/site-packages/litellm/main.py", 
line 2158, in completion
    raise e
  File "/project/.venv/lib/python3.12/site-packages/litellm/main.py", 
line 2130, in completion
    response = openai_chat_completions.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 756, in completion
    raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: redacted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/project/.venv/lib/python3.12/site-packages/crewai/task.py", 
line 497, in _execute_task_async
    result = self._execute_core(agent, context, tools)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/task.py", 
line 591, in _execute_core
    raise e  # Re-raise the exception after emitting the event
    ^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/task.py", 
line 522, in _execute_core
    result = agent.execute_task(
             ^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agent/core.py", line
514, in execute_task
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agent/core.py", line
490, in execute_task
    result = self._execute_without_timeout(task_prompt, task)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agent/core.py", line
598, in _execute_without_timeout
    return self.agent_executor.invoke(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agents/crew_agent_ex
ecutor.py", line 188, in invoke
    formatted_answer = self._invoke_loop()
                       ^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agents/crew_agent_ex
ecutor.py", line 287, in _invoke_loop
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agents/crew_agent_ex
ecutor.py", line 229, in _invoke_loop
    answer = get_llm_response(
             ^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/utilities/agent_util
s.py", line 276, in get_llm_response
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/utilities/agent_util
s.py", line 268, in get_llm_response
    answer = llm.call(
             ^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/llm.py", line
1321, in call
    return self._handle_non_streaming_response(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/llm.py", line
1081, in _handle_non_streaming_response
    response = litellm.completion(**params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/litellm/utils.py", 
line 1381, in wrapper
    raise e
  File "/project/.venv/lib/python3.12/site-packages/litellm/utils.py", 
line 1250, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/litellm/main.py", 
line 3772, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/
exception_mapping_utils.py", line 2328, in exception_type
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/
exception_mapping_utils.py", line 563, in exception_type
    raise APIError(
litellm.exceptions.APIError: litellm.APIError: APIError: OpenAIException 

After this exception is thrown, my flow freezes like this:

🌊 Flow: MyFlow
ID: uuid
├── ✨ Created
├── ✅ Initialization Complete
├── ✅ Completed: process_input_step
├── ✅ Completed: input_router
└── 🔄 Running: iterative_evaluation_step # step that called the crew

When running tasks synchronously, the flow fails as expected:

🌊 Flow: MyFlow
ID: uuid
├── ✨ Created
├── ✅ Initialization Complete
├── ✅ Completed: process_input_step
├── ✅ Completed: input_router
└── ❌ Failed: iterative_evaluation_step
  • Python 3.12.3
  • CrewAI 1.6.0

What I want is that when executing tasks asynchronously, the execution fails exactly like it does synchronously. Ideally, I want to handle the exception inside either the crew execution or the flow execution. If there is no workaround, being able to properly throw the exception and stop the flow execution is enough for me.

Heads up

  • Both LLMCallFailedEvent and TaskFailedEvent are being raised correctly.
  • Flow execution would freeze whether or not the async tasks were included in the context of the following task (combine summaries).

This is a AI Generated guide (with a agent I created) . All the best

:hammer_and_wrench: Troubleshooting Guide: LLM Call Exception in Async Task Doesn’t Stop Flow Execution (CrewAI)

Key Takeaway:
When using CrewAI, exceptions from LLM calls in async tasks may not halt the flow as expected, causing the workflow to freeze instead of failing gracefully. This guide provides step-by-step troubleshooting, immediate workarounds, root cause analysis, and best practices to ensure robust error handling in CrewAI async workflows.


1. Understand the Problem Context

  • Issue:
    When an LLM call fails (e.g., litellm.APIError: APIError: OpenAIException) inside an async task (async_execution=True), the CrewAI flow “freezes” instead of stopping or propagating the error. In synchronous execution, the flow fails as expected.
  • Desired Behavior:
    Async task failures should halt the flow, matching synchronous error handling .

2. Immediate Troubleshooting Steps

A. Reproduce the Issue

  • Run your flow with async_execution=True and intentionally trigger an LLM error (e.g., by using an invalid API key or exceeding rate limits).
  • Observe if the flow freezes or continues instead of failing.

B. Switch to Synchronous Execution (Workaround)

  • Temporarily set async_execution=False for critical tasks to ensure errors halt the flow as expected.
  • Use kickoff() instead of kickoff_async() for flow execution.

C. Check for Error Events

  • Confirm that LLMCallFailedEvent and TaskFailedEvent are being raised in your logs.
  • If these events are present but the flow does not stop, proceed to error handling configuration.

3. Root Cause Analysis

Execution Mode Error Propagation Typical Behavior
Synchronous Yes Flow halts on error
Asynchronous No (by default) Flow may freeze/hang
  • Why?
    CrewAI’s async task architecture does not always propagate exceptions up to the flow controller, especially when using decorators like @listen or when async methods are not properly awaited , , .

4. Configuration and Code Fixes

A. Explicit Exception Handling in Async Tasks

  • Wrap LLM calls in try-except blocks and re-raise exceptions to ensure they are not silently swallowed:
async def my_async_task(...):
    try:
        # LLM call here
        result = await llm_call(...)
        return result
    except Exception as e:
        # Log and re-raise to halt the flow
        logger.error(f"LLM call failed: {e}")
        raise

B. Use Synchronous Decorators for Critical Tasks

  • Prefer @start over @listen for tasks where error propagation is critical, as @listen may not propagate exceptions correctly .

C. Adjust Flow and Task Settings

Setting Recommended Value Purpose
raise_on_error True Ensures errors halt execution
max_retry_limit 0 Prevents retries on failure
max_iter 1 Stops after first error

D. Implement Error Callbacks

  • Use task_callback or set_error_handler to capture and handle errors at the task or crew level:
def my_error_handler(task, error):
    logger.error(f"Task {task.name} failed: {error}")
    # Optionally, halt or clean up the flow here

task.set_error_handler(my_error_handler)

5. Best Practices for Robust Error Handling in CrewAI

  • Always validate inputs and outputs (e.g., with Pydantic models) to catch errors early.
  • Log all exceptions with detailed context for debugging.
  • Implement retry logic for transient errors, but avoid infinite retries.
  • Test error handling logic in both sync and async modes before production deployment.
  • Monitor flows using CrewAI’s observability tools or external monitoring solutions.

6. Summary Table: Troubleshooting Checklist

Step Action/Checkpoint
Reproduce error in async mode Confirm flow freezes on LLM exception
Switch to sync mode Verify flow halts as expected
Check error events in logs Look for LLMCallFailedEvent, TaskFailedEvent
Add explicit try-except in async tasks Ensure exceptions are re-raised
Use @start for critical tasks Avoid @listen for error-prone tasks
Set raise_on_error=True Enforce error propagation
Limit retries (max_retry_limit=0) Prevent repeated failures
Implement error callbacks Capture and handle errors centrally
Validate inputs/outputs Use Pydantic or similar
Test both sync and async flows Ensure robust error handling in all modes

7. Key Takeaways & Next Steps

Key Finding:
CrewAI’s async task error propagation is currently limited. For critical workflows, use synchronous execution or explicit error handling in async tasks. Monitor for updates in CrewAI’s async support and consider contributing to or following related GitHub issues for long-term solutions.


If you need code samples, configuration templates, let me know!