Issues with token's limit

Hi there, I’m running a crew with 4 agents and sometimes I get this error:

Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini in organization org-cYJxGgQQaN0Q42pQUmO3ksQr on tokens per min (TPM): Limit 4000000, Requested 8814213. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

here’s the full stack trace:

LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

ERROR:root:LiteLLM call failed: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini in organization org-cYJxGgQQaN0Q42pQUmO3ksQr on tokens per min (TPM): Limit 4000000, Requested 8814213. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

ERROR:agent_rose:Error in property feasibility analysis
Traceback (most recent call last):
File “…/litellm/llms/openai/openai.py”, line 711, in completion
raise e
File “…/litellm/llms/openai/openai.py”, line 638, in completion
self.make_sync_openai_chat_completion_request(
File “…/litellm/litellm_core_utils/logging_utils.py”, line 145, in sync_wrapper
result = func(*args, **kwargs)
File “…/litellm/llms/openai/openai.py”, line 457, in make_sync_openai_chat_completion_request
raise e
File “…/litellm/llms/openai/openai.py”, line 439, in make_sync_openai_chat_completion_request
raw_response = openai_client.chat.completions.with_raw_response.create(
File “…/openai/_legacy_response.py”, line 364, in wrapped
return cast(LegacyAPIResponse[R], func(*args, **kwargs))
File “…/openai/_utils/_utils.py”, line 279, in wrapper
return func(*args, **kwargs)
File “…/openai/resources/chat/completions/completions.py”, line 879, in create
return self._post(
File “…/openai/_base_client.py”, line 1296, in post
return cast(ResponseT, self.request(…))
File “…/openai/_base_client.py”, line 973, in request
return self._request(
File “…/openai/_base_client.py”, line 1062, in _request
return self._retry_request(
File “…/openai/_base_client.py”, line 1111, in _retry_request
return self._request(
File “…/openai/_base_client.py”, line 1077, in _request
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini… (TPM): Limit 4000000, Requested 8814213.’}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “…/litellm/main.py”, line 1692, in completion
raise e
File “…/litellm/main.py”, line 1665, in completion
response = openai_chat_completions.completion(
File “…/litellm/llms/openai/openai.py”, line 721, in completion
raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini…’}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “…/property_analyzer.py”, line 164, in analyze_properties
analyzed_properties = crew.analyze_properties(properties, …)
File “…/crew.py”, line 293, in analyze_properties
analyzed_property = self.analyze_property(property_data)
File “…/crew.py”, line 199, in analyze_property
crew_output = self.crew().kickoff(inputs=inputs)
File “…/crew.py”, line 576, in kickoff
result = self._run_sequential_process()
File “…/crew.py”, line 683, in _run_sequential_process
return self._execute_tasks(self.tasks)
File “…/crew.py”, line 781, in _execute_tasks
task_output = task.execute_sync(…)
File “…/task.py”, line 302, in execute_sync
return self._execute_core(…)
File “…/task.py”, line 366, in _execute_core
result = agent.execute_task(…)
File “…/agent.py”, line 254, in execute_task
raise e
File “…/agent.py”, line 243, in execute_task
result = self.agent_executor.invoke(…)
File “…/crew_agent_executor.py”, line 112, in invoke
raise e
File “…/crew_agent_executor.py”, line 102, in invoke
formatted_answer = self._invoke_loop()
File “…/crew_agent_executor.py”, line 160, in _invoke_loop
raise e
File “…/crew_agent_executor.py”, line 140, in _invoke_loop
answer = self._get_llm_response()
File “…/crew_agent_executor.py”, line 210, in _get_llm_response
raise e
File “…/crew_agent_executor.py”, line 201, in _get_llm_response
answer = self.llm.call(…)
File “…/llm.py”, line 291, in call
response = litellm.completion(**params)
File “…/utils.py”, line 1154, in wrapper
raise e
File “…/utils.py”, line 1032, in wrapper
result = original_function(*args, **kwargs)
File “…/main.py”, line 3068, in completion
raise exception_type(
File “…/exception_mapping_utils.py”, line 2201, in exception_type
raise e
File “…/exception_mapping_utils.py”, line 345, in exception_type
raise RateLimitError(
litellm.exceptions.RateLimitError: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini…’}}

ValueError: Error in property feasibility analysis: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘Request too large for gpt-4o-mini…’}}

I thought CrewAI handled this things by default, am I incorrect? I’ve also seen errors similar to this one when I enable memory.

I have exactly the same issue. All what I can find that CrewAI does limit retry rate through max_rpm parameter in task but there is nothing about token per minute limiting. I can adjust max_token but it leads to reduce quality. max_token_per_minute option would be good.