Hello friends,
Hope everyone doing great.
I’ve inherited the LLM
class in CrewAI to add custom error handling and logging (e.g., storing LLM errors and metrics in MongoDB). The custom class also supports retry logic and fallback models. Here’s a simplified version of what I’m doing:
class CustomLLM(LLM):
def call(self, messages, tools=None, callbacks=[], available_functions=None, retry_count=0):
try:
return super().call(...) # Normal LLM call
except Exception as e:
# Log error, fallback to another model or retry
if retry_count < 3:
# Retry logic with fallback model
return self.call(messages, ..., retry_count + 1)
else:
# Final failure after retries
raise e # Intend to stop Crew here
The issue I’m facing is: even after raising the exception (after retries are exhausted), the Crew continues execution — particularly in hierarchical setups with a manager agent and sub-agents.
My question:
How can I terminate the entire Crew execution when an unrecoverable LLM failure occurs, even if it happens deep in a hierarchy of agents?
Any guidance on how to hook into Crew or manager-agent logic to intercept such terminal failures and gracefully stop would be highly appreciated.
Thanks in advance!