Error 'insufficient_quota' using Llma3

I am trying to create a crew with Llama3, which is already correctly installed on my machine, and I have even run tests with Langchain where everything worked as expected.

Now, I created a crew that worked well with OpenAI. I am changing it to use Llama, but the following error occurs:

2024-11-06 22:49:46,582 - 131459711406208 - llm.py-llm:161 - ERROR: LiteLLM call failed: litellm.RateLimitError: RateLimitError: OpenAIException - Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

I would like help to understand the reason and know how to solve it. I am creating the LLM like this:

from crewai import LLM
llm=LLM(model="ollama/llama3.1", base_url="http://localhost:11434")

I also tried with Groq, but the same error occurs.

from langchain_groq import ChatGroq
llm = ChatGroq(
    api_key="minha chave", 
    model="llama3-8b-8192"
)

I would greatly appreciate it if someone could help.

FabrĂ­cio Silva

@Fabricio_Silva You’re seeing this error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

This happens because the code is using OpenAI’s API, but your OpenAI account doesn’t have any credit.

Why? Even though you’re using Llama via Ollama, the OpenAI API is still called somewhere in your CrewAI code, likely without you realizing it.

Where? A common place this happens is when memory is set to True in CrewAI. By default, CrewAI uses OpenAI’s embedding model (e.g., text-embedding-3-small) to manage memory.

How not to use the OpenAI API for memory? You can customize that. See the docs. If you want to use Ollama for embedding, customize your code like this:

from crewai import Crew, Agent, Task, Process

my_crew = Crew(
    agents=[...],
    tasks=[...],
    process=Process.sequential,
    memory=True,
    verbose=True,
    embedder={
        "provider": "ollama",
        "config": {
            "model": "mxbai-embed-large"
        }
    }
)
2 Likes