Issues with token's limit

Hi there, I’m running a crew with 4 agents and sometimes I get this error:

Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini in organization org-cYJxGgQQaN0Q42pQUmO3ksQr on tokens per min (TPM): Limit 4000000, Requested 8814213. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

here’s the full stack trace:

LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

ERROR:root:LiteLLM call failed: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini in organization org-cYJxGgQQaN0Q42pQUmO3ksQr on tokens per min (TPM): Limit 4000000, Requested 8814213. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

ERROR:agent_rose:Error in property feasibility analysis
Traceback (most recent call last):
File “…/litellm/llms/openai/openai.py”, line 711, in completion
raise e
File “…/litellm/llms/openai/openai.py”, line 638, in completion
self.make_sync_openai_chat_completion_request(
File “…/litellm/litellm_core_utils/logging_utils.py”, line 145, in sync_wrapper
result = func(*args, **kwargs)
File “…/litellm/llms/openai/openai.py”, line 457, in make_sync_openai_chat_completion_request
raise e
File “…/litellm/llms/openai/openai.py”, line 439, in make_sync_openai_chat_completion_request
raw_response = openai_client.chat.completions.with_raw_response.create(
File “…/openai/_legacy_response.py”, line 364, in wrapped
return cast(LegacyAPIResponse[R], func(*args, **kwargs))
File “…/openai/_utils/_utils.py”, line 279, in wrapper
return func(*args, **kwargs)
File “…/openai/resources/chat/completions/completions.py”, line 879, in create
return self._post(
File “…/openai/_base_client.py”, line 1296, in post
return cast(ResponseT, self.request(…))
File “…/openai/_base_client.py”, line 973, in request
return self._request(
File “…/openai/_base_client.py”, line 1062, in _request
return self._retry_request(
File “…/openai/_base_client.py”, line 1111, in _retry_request
return self._request(
File “…/openai/_base_client.py”, line 1077, in _request
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini… (TPM): Limit 4000000, Requested 8814213.’}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “…/litellm/main.py”, line 1692, in completion
raise e
File “…/litellm/main.py”, line 1665, in completion
response = openai_chat_completions.completion(
File “…/litellm/llms/openai/openai.py”, line 721, in completion
raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini…’}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “…/property_analyzer.py”, line 164, in analyze_properties
analyzed_properties = crew.analyze_properties(properties, …)
File “…/crew.py”, line 293, in analyze_properties
analyzed_property = self.analyze_property(property_data)
File “…/crew.py”, line 199, in analyze_property
crew_output = self.crew().kickoff(inputs=inputs)
File “…/crew.py”, line 576, in kickoff
result = self._run_sequential_process()
File “…/crew.py”, line 683, in _run_sequential_process
return self._execute_tasks(self.tasks)
File “…/crew.py”, line 781, in _execute_tasks
task_output = task.execute_sync(…)
File “…/task.py”, line 302, in execute_sync
return self._execute_core(…)
File “…/task.py”, line 366, in _execute_core
result = agent.execute_task(…)
File “…/agent.py”, line 254, in execute_task
raise e
File “…/agent.py”, line 243, in execute_task
result = self.agent_executor.invoke(…)
File “…/crew_agent_executor.py”, line 112, in invoke
raise e
File “…/crew_agent_executor.py”, line 102, in invoke
formatted_answer = self._invoke_loop()
File “…/crew_agent_executor.py”, line 160, in _invoke_loop
raise e
File “…/crew_agent_executor.py”, line 140, in _invoke_loop
answer = self._get_llm_response()
File “…/crew_agent_executor.py”, line 210, in _get_llm_response
raise e
File “…/crew_agent_executor.py”, line 201, in _get_llm_response
answer = self.llm.call(…)
File “…/llm.py”, line 291, in call
response = litellm.completion(**params)
File “…/utils.py”, line 1154, in wrapper
raise e
File “…/utils.py”, line 1032, in wrapper
result = original_function(*args, **kwargs)
File “…/main.py”, line 3068, in completion
raise exception_type(
File “…/exception_mapping_utils.py”, line 2201, in exception_type
raise e
File “…/exception_mapping_utils.py”, line 345, in exception_type
raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini…’}}

ValueError: Error in property feasibility analysis: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini…’}}

I thought CrewAI handled this things by default, am I incorrect? I’ve also seen errors similar to this one when I enable memory.

I have exactly the same issue. All what I can find that CrewAI does limit retry rate through max_rpm parameter in task but there is nothing about token per minute limiting. I can adjust max_token but it leads to reduce quality. max_token_per_minute option would be good.

These types of errors don’t get handled by CrewAI. When there is an error, the task just fails because the exception is not handled. The only solution I found is to modify the Task class so that the exception is not raised, and the task is rerun after a fixed amount of time. This way, it doesn’t end up in error when it gets a rate limit error or a throttling error

def _execute_core(
        self,
        agent: Optional[BaseAgent],
        context: Optional[str],
        tools: Optional[List[Any]],
        max_retries: Optional[int] = 5,
        delay: Optional[int] = 5,
    ) -> TaskOutput:
...
...
...
except Exception as e:
            self.end_time = datetime.datetime.now()
            crewai_event_bus.emit(self, TaskFailedEvent(error=str(e), task=self))
            if max_retries > 0:
                # Retry task execution
                time.sleep(delay)
                max_retries -= 1
                return self._execute_core(agent, context, tools, max_retries, delay)
            else:
                crewai_event_bus.emit(self, TaskCompletedEvent(output=TaskOutput(description="Task failed", agent=self.agent.role), task=self))
                #raise e  # Re-raise the exception after emitting the event

Look Here

@tonykipkemboi Is this worth a feature request as I have seen a few of these and felt it myself.
Request to add max_token_per_minute to params