Error: LLM call exception in async task doesn't stop flow execution

Juliano_Pasa · November 28, 2025, 3:16pm

Issue Overview

I’m having inconsistent problems with my LLM usage that I would like to handle properly in my Flow execution. Everything works (and fails) as expect when running tasks synchronously, but since my flow would really benefit from running them asynchronously, this is an issue I would rather fix than circumvent. Unfortunately, when the LLM call throws an exception in an async task, the flow freezes instead of failing gracefully.

Context

I have a crew which summarizes a bunch of input documents and then finds similar documentation in a private database. Since the input can sometimes be very large, I split it in smaller groups so that I don’t have issues with context window limits. My crew execution is composed of three tasks:

Summarize documents: Executed with async_execution=True, this task may be executed n times, depending on the input.
Combine summaries: Has all previous summaries tasks in its context. Combines all summaries into one concise summary.
Find relevant documentation: Searches private documentation based on the previous summary.

The code for this crew creation is as follows:

def crew(self, total_executions) -> Crew:
        summarization_tasks = []

        for i in range(total_executions):
            summarize_task = self.summarize_code() # Creates a new task based on task.yaml
            summarize_task.async_execution = True

            # For each task description, change input placeholder to receive a different group of documents
            summarize_task.description = summarize_task.replace('input', f'input_{i}')
            summarization_tasks.append(summarize_task)

        combine_task = self.combine_summaries()
        combine_task.context = summarization_tasks # Define all previous summary tasks as context

        return Crew(
            agents=[
                self.summarizer(), # Used in summary tasks and summaries combine task
                self.evaluator() # Used in task to find similar documentation
            ],
            tasks=[
                *summarization_tasks,
                combine_task,
                self.evaluate()
            ]
        )

Each agent has it’s LLM defined with this function:

def _build_llm(self,  model) -> LLM:
        base_url = os.getenv("BASE_URL")
        return LLM(
            model=model,
            base_url=base_url,
            api_key=os.getenv("SECRET"),
            temperature=0.6
        )

Crew is called inside flow step as usual:

summary_crew = SummaryCrew()
crew = summary_crew.crew(total_executions=5)
result = crew.kickoff(inputs=inputs)

Flow is called asynchronously, as it is running in an uvicorn app:

flow = MyFlow()
result = await flow.kickoff_async(flow_input)

Error information

During the asynchronous summarize tasks, I sometimes have the following error:
litellm.APIError: APIError: OpenAIException
Although this occurs due to an issue in my company’s proxy, what I’m really trying to solve is how to handle this and potentially other errors that may happen.

The full trace is the following:

Exception in thread Thread-3 (_execute_task_async):
Traceback (most recent call last):
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 745, in completion
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 673, in completion
    ) = self.make_sync_openai_chat_completion_request(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/
logging_utils.py", line 237, in sync_wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 489, in make_sync_openai_chat_completion_request
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 471, in make_sync_openai_chat_completion_request
    raw_response = openai_client.chat.completions.with_raw_response.create(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_legacy_response.py"
, line 364, in wrapped
    return cast(LegacyAPIResponse[R], func(*args, **kwargs))
                                      ^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_utils/_utils.py", 
line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/resources/chat/compl
etions/completions.py", line 1189, in create
    return self._post(
           ^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_base_client.py", 
line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/openai/_base_client.py", 
line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.PermissionDeniedError: redacted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/project/.venv/lib/python3.12/site-packages/litellm/main.py", 
line 2158, in completion
    raise e
  File "/project/.venv/lib/python3.12/site-packages/litellm/main.py", 
line 2130, in completion
    response = openai_chat_completions.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.
py", line 756, in completion
    raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: redacted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/project/.venv/lib/python3.12/site-packages/crewai/task.py", 
line 497, in _execute_task_async
    result = self._execute_core(agent, context, tools)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/task.py", 
line 591, in _execute_core
    raise e  # Re-raise the exception after emitting the event
    ^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/task.py", 
line 522, in _execute_core
    result = agent.execute_task(
             ^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agent/core.py", line
514, in execute_task
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agent/core.py", line
490, in execute_task
    result = self._execute_without_timeout(task_prompt, task)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agent/core.py", line
598, in _execute_without_timeout
    return self.agent_executor.invoke(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agents/crew_agent_ex
ecutor.py", line 188, in invoke
    formatted_answer = self._invoke_loop()
                       ^^^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agents/crew_agent_ex
ecutor.py", line 287, in _invoke_loop
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/agents/crew_agent_ex
ecutor.py", line 229, in _invoke_loop
    answer = get_llm_response(
             ^^^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/utilities/agent_util
s.py", line 276, in get_llm_response
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/crewai/utilities/agent_util
s.py", line 268, in get_llm_response
    answer = llm.call(
             ^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/llm.py", line
1321, in call
    return self._handle_non_streaming_response(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/crewai/llm.py", line
1081, in _handle_non_streaming_response
    response = litellm.completion(**params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/litellm/utils.py", 
line 1381, in wrapper
    raise e
  File "/project/.venv/lib/python3.12/site-packages/litellm/utils.py", 
line 1250, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/project/.venv/lib/python3.12/site-packages/litellm/main.py", 
line 3772, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/
exception_mapping_utils.py", line 2328, in exception_type
    raise e
  File 
"/project/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/
exception_mapping_utils.py", line 563, in exception_type
    raise APIError(
litellm.exceptions.APIError: litellm.APIError: APIError: OpenAIException

After this exception is thrown, my flow freezes like this:

🌊 Flow: MyFlow
ID: uuid
├── ✨ Created
├── ✅ Initialization Complete
├── ✅ Completed: process_input_step
├── ✅ Completed: input_router
└── 🔄 Running: iterative_evaluation_step # step that called the crew

When running tasks synchronously, the flow fails as expected:

🌊 Flow: MyFlow
ID: uuid
├── ✨ Created
├── ✅ Initialization Complete
├── ✅ Completed: process_input_step
├── ✅ Completed: input_router
└── ❌ Failed: iterative_evaluation_step

Python 3.12.3
CrewAI 1.6.0

What I want is that when executing tasks asynchronously, the execution fails exactly like it does synchronously. Ideally, I want to handle the exception inside either the crew execution or the flow execution. If there is no workaround, being able to properly throw the exception and stop the flow execution is enough for me.

Heads up

Both LLMCallFailedEvent and TaskFailedEvent are being raised correctly.
Flow execution would freeze whether or not the async tasks were included in the context of the following task (combine summaries).

grojas123 · December 4, 2025, 7:51pm

This is a AI Generated guide (with a agent I created) . All the best

Troubleshooting Guide: LLM Call Exception in Async Task Doesn’t Stop Flow Execution (CrewAI)

Key Takeaway:
When using CrewAI, exceptions from LLM calls in async tasks may not halt the flow as expected, causing the workflow to freeze instead of failing gracefully. This guide provides step-by-step troubleshooting, immediate workarounds, root cause analysis, and best practices to ensure robust error handling in CrewAI async workflows.

1. Understand the Problem Context

Issue:
When an LLM call fails (e.g., litellm.APIError: APIError: OpenAIException) inside an async task (async_execution=True), the CrewAI flow “freezes” instead of stopping or propagating the error. In synchronous execution, the flow fails as expected.
Desired Behavior:
Async task failures should halt the flow, matching synchronous error handling .

2. Immediate Troubleshooting Steps

A. Reproduce the Issue

Run your flow with async_execution=True and intentionally trigger an LLM error (e.g., by using an invalid API key or exceeding rate limits).
Observe if the flow freezes or continues instead of failing.

B. Switch to Synchronous Execution (Workaround)

Temporarily set async_execution=False for critical tasks to ensure errors halt the flow as expected.
Use kickoff() instead of kickoff_async() for flow execution.

C. Check for Error Events

Confirm that LLMCallFailedEvent and TaskFailedEvent are being raised in your logs.
If these events are present but the flow does not stop, proceed to error handling configuration.

3. Root Cause Analysis

Execution Mode	Error Propagation	Typical Behavior
Synchronous	Yes	Flow halts on error
Asynchronous	No (by default)	Flow may freeze/hang

Why?
CrewAI’s async task architecture does not always propagate exceptions up to the flow controller, especially when using decorators like @listen or when async methods are not properly awaited , , .

4. Configuration and Code Fixes

A. Explicit Exception Handling in Async Tasks

Wrap LLM calls in try-except blocks and re-raise exceptions to ensure they are not silently swallowed:

async def my_async_task(...):
    try:
        # LLM call here
        result = await llm_call(...)
        return result
    except Exception as e:
        # Log and re-raise to halt the flow
        logger.error(f"LLM call failed: {e}")
        raise

B. Use Synchronous Decorators for Critical Tasks

Prefer @start over @listen for tasks where error propagation is critical, as @listen may not propagate exceptions correctly .

C. Adjust Flow and Task Settings

Setting	Recommended Value	Purpose
`raise_on_error`	`True`	Ensures errors halt execution
`max_retry_limit`	`0`	Prevents retries on failure
`max_iter`	`1`	Stops after first error

D. Implement Error Callbacks

Use task_callback or set_error_handler to capture and handle errors at the task or crew level:

def my_error_handler(task, error):
    logger.error(f"Task {task.name} failed: {error}")
    # Optionally, halt or clean up the flow here

task.set_error_handler(my_error_handler)

5. Best Practices for Robust Error Handling in CrewAI

Always validate inputs and outputs (e.g., with Pydantic models) to catch errors early.
Log all exceptions with detailed context for debugging.
Implement retry logic for transient errors, but avoid infinite retries.
Test error handling logic in both sync and async modes before production deployment.
Monitor flows using CrewAI’s observability tools or external monitoring solutions.

6. Summary Table: Troubleshooting Checklist

Step	Action/Checkpoint
Reproduce error in async mode	Confirm flow freezes on LLM exception
Switch to sync mode	Verify flow halts as expected
Check error events in logs	Look for `LLMCallFailedEvent`, `TaskFailedEvent`
Add explicit try-except in async tasks	Ensure exceptions are re-raised
Use `@start` for critical tasks	Avoid `@listen` for error-prone tasks
Set `raise_on_error=True`	Enforce error propagation
Limit retries (`max_retry_limit=0`)	Prevent repeated failures
Implement error callbacks	Capture and handle errors centrally
Validate inputs/outputs	Use Pydantic or similar
Test both sync and async flows	Ensure robust error handling in all modes

7. Key Takeaways & Next Steps

Key Finding:
CrewAI’s async task error propagation is currently limited. For critical workflows, use synchronous execution or explicit error handling in async tasks. Monitor for updates in CrewAI’s async support and consider contributing to or following related GitHub issues for long-term solutions.

If you need code samples, configuration templates, let me know!

Juliano_Pasa · December 8, 2025, 1:17pm

Unfortunately none of the suggestions resolve my issue, and some of them even made up some parameters. I tried using callbacks and event listeners, but since they don’t execute in the main thread, raising new exceptions yields the same result.

One thing that solved my issue was setting a timeout to asynchronous tasks, like this:

# in file crew.py
def _process_async_tasks(
    self,
    futures: list[tuple[Task, Future[TaskOutput], int]],
    was_replayed: bool = False,
) -> list[TaskOutput]:
    task_outputs: list[TaskOutput] = []
    for future_task, future, task_index in futures:
        task_output = future.result(timeout=60) # set timeout to 60 seconds
        task_outputs.append(task_output)
        self._process_task_result(future_task, task_output)
        self._store_execution_log(
            future_task, task_output, task_index, was_replayed
        )
    return task_outputs

If an exception happens in a given asynchronous task, its result will not be returned in time and an exception in the main thread will be raised, as shown in the method doc:

(method) def result(timeout: float | None = None) -> TaskOutput
Return the result of the call that the future represents.

Args
timeout
The number of seconds to wait for the result if the future isn't done. If None, then there is no limit on the wait time.

Returns
out
The result of the call that the future represents.

Raises
CancelledError
If the future was cancelled.

TimeoutError
If the future didn't finish executing before the given timeout.

Exception
If the call raised then that exception will be raised.

Although I’m using this timeout to solve a very specific issue, I believe it would be beneficial to have this timeout as a Task configuration parameter, that could be used for async and sync tasks.

Topic		Replies	Views
My Flow isn't waiting for crews to finish CrewAI Community Support	1	184	February 19, 2025
Running crewai flows - CrewAI Flows deep dive course error CrewAI Community Support crewai	4	319	December 26, 2024
Handling LLM Errors in Hierarchical CrewAI Process with Callbacks LLMs llama-31-8b	8	686	April 10, 2025
Flow execution - asyncio.run() cannot be called from a running event loop General flows	2	1361	February 3, 2025
Need help with asynchronous tasks CrewAI Community Support	1	441	April 4, 2025