TPM: Tokens per minute limiting

CrewAI has RPM options for crews, but not TMP (Tokens per minute), which is necessary for services such as Groq. Here’s an example error:

2024-12-10 16:50:11,862 - 135614723733312 - llm.py-llm:170 - ERROR: LiteLLM call failed: litellm.RateLimitError: RateLimitError: GroqException - {"error":{"message":"Rate limit reached for model llama-3.3-70b-versatile in organization org_xxxx on tokens per minute (TPM): Limit 6000, Used 7599, Requested 1777. Please try again in 33.764s. Visit https://console.groq.com/docs/rate-limits for more information.","type":"tokens","code":"rate_limit_exceeded"}}

Are there any plans to implement this, or are there any work arounds that people have used?

4 Likes

That’d be awesome to have, are there any plans to do the limit or at least a backoff?

Welcome to the community

Yes sounds like a good idea to have TMP (Tokens Per Minute).
In the meantime this might be useful. Agents - CrewAI

@tonykipkemboi Is there an approved way to make suggestions?

Thanks!

Just FYI - I’ve used all of these recommendations in my project from the start. The TPM rate limit happened once to me so far, so it doesn’t guarantee it won’t happen, but I’m sure it reduces the risk