Is it possible to use in an agent a Hugging Face model downloaded to a local folder and call it with the Hugging Face transformers library?

dcd_cdc · December 17, 2024, 5:35pm

Hello, I have downloaded the Llama-3.2-1B model from Hugging Face to a local folder on my computer. Is it possible to define an agent in CrewAI that uses this local model as its underlying model? I have tried defining a class that masks the instantiation of the model with the transformers library and then assigning the class to the model, but it doesn’t work. Is this possible?

rokbenko · December 17, 2024, 5:47pm

@dcd_cdc CrewAI’s LLM class leverages LiteLLM in the background, which supports a wide range of LLM providers. Among these is Hugging Face, enabling you to run models like Llama 3.2. See tutorial on how to use Hugging Face with liteLLM.

Here’s a code example:

from crewai import Agent, LLM

my_llm = LLM(
    api_base="<your-api-base>",
    model="huggingface/meta-llama/Llama-3.2-3B",
)

my_agent = Agent(
    ...,
    llm=my_llm,
)

dcd_cdc · December 18, 2024, 4:45am

Hi
thank you for your response. I am trying to test it, but I am receiving errors in the format of the response returned by my service. Basically, I have created a simple example of a local service that instantiates a local llama_3.1 model and answers a question. I invoke it with http://localhost:5000/generate, and it responds with a string generated by the model. I have made a small example using crewai as you indicated in your response. :from crewai import Agent, LLM

my_llm = LLM(
api_base=“http://localhost:5000/generate”,
model=“huggingface/meta-llama/Llama-3.2-1B-Instruct”,
)
amable = Agent(
role=“Agente Amable”,
goal=“Saludar con amabilidad a quien le pregunte en idioma {idioma}”,
backstory="eres un agente amable siempre responde con saludos amables "
,
allow_delegation=False,
verbose=True,
llm=my_llm,
)
i can see that the service is called and the model return an answer to the agent, but it seems that the agent expects a different response format because it gives this error: raise APIError(
litellm.exceptions.APIError: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: 'system\n\nYou are Agente Amable. eres un agente amable siempre responde con saludos amables \nYour personal goal is: Saludar c…}]
do you know what response format should the service calling the model return?
Thanks very much for your help
best regards
diego

rokbenko · December 18, 2024, 8:37am

Unfortunately, smaller LLMs (e.g., Llama 3.2 1B) sometimes struggle to work with CrewAI. Try to switch the LLM to a more capable one (e.g., Llama 3.2 11B).

dcd_cdc · December 18, 2024, 3:34pm

hi
in this example the llm only answer a string but, we can adapt the model’s response to the format that Crew accepts, but we don’t know what format Crew expects to receive. Is there any documentation that specifies the format that the agent expects as a response in a call to an LLM?
thanks in advance for your help
best regards
diego

rokbenko · December 18, 2024, 3:45pm

@dcd_cdc Have you tried switching the LLM to a more capable one? Such errors can happen simply because of the LLM, trust me. I’m 99% certain it’s the LLM causing this error in your case. A 1B LLM will not work with CrewAI.

dcd_cdc · December 18, 2024, 5:16pm

Hi, @rokbenko, I have tried two slightly larger models, meta_llama_3_8B_instruct and llama3_70b_awq, but the result is the same. I always receive the same error: ERROR:root:LiteLLM call failed: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: 'system\n\nYou are Kind Agent. You are a kind agent, always respond with kind greetings \nYour persona…,
I understand that I am doing something wrong and the /generate service that uses the llma model instances the model and responds to the request it receives, but the response it generates is always in the same format, a JSON like this: return jsonify({‘response’: response}), regardless of the model I use. The example code to implement the service is simple:

model_path = “/local/model…”
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

@app.route(‘/generate’, methods=[‘POST’])
def generate():
data = request.json
input_text = data[‘inputs’]
input_ids = tokenizer.encode(input_text, return_tensors=‘pt’)
output = model.generate(input_ids, max_length=350)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Response:----> {response}“)”

is it possible that in the response of the service, I need to format the response in a different way?
Thanks in advance for your help

rokbenko · December 18, 2024, 5:45pm

I see, but now you picked an older Llama model family, right? You switched Llama 3.2 for Llama 3.

Try Llama 3.2 11B, for example. That’s a newer model family and a more capable one.

dcd_cdc · December 18, 2024, 6:45pm

hi, i can not find the model llama 3.2 11B, in huggingface the only model like 3.2 11B is meta-llama/Llama-3.2-11B-Vision , the other llama 3.2 models are llama 3.2 1B and llama 3.2 3B, (Llama 3.2 - a meta-llama Collection) do you know where i can find llama 3.2 11B
thanks
regards
diego

rokbenko · December 18, 2024, 7:47pm

This is the one I had in mind.

dcd_cdc · December 19, 2024, 4:26am

hi @rokbenko
Unfortunately Meta does not allow us to use the 11b-Vision model from Europe, “Meta-llama has disallowed access to this model in the EU
Downloads of this model are not accessible from the European Union (EU). Please see the Llama Acceptable Use Policy and License FAQ page for more information.”
We cannot use this model for testing, we have tested with the Qwen/Qwen2.5-14B model but the result is the same: " File “C:\ProgramData\Anaconda3\envs\udemy\lib\site-packages\litellm\litellm_core_utils \exception_mapping_utils.py”, line 1438, in exception_type
raise APIError(
litellm.exceptions.APIError: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: '<|begin_of_text|><|start_header_id|> system<|end_header_id|>\n\nYou are Friendly Agent…"
It is possible that we are always receiving the same error since in the end we are telling the service that encapsulates the call to the model to return a json with the format {" response":“…answer from the model…”} and it is not if this could be the problem.
I have tried to skip the call to the model within the service and always respond to the agent with a fixed string “hello how are you” and the error has changed it seems to indicate that the response is not a json, like It’s normal, I only returned a fixed string.
Traceback (most recent call last):
File “C:\ProgramData\Anaconda3\envs\udemy\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 699, in completion
completion_response = response.json()
File " C:\ProgramData\Anaconda3\envs\udemy\lib\site-packages\requests\models.py", line 978, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

That is, it seems that the response must be a json but I don’t know what json format it expects, what fields it needs, I don’t know if this is correct and can you tell me what the json format would be? of response.
Thanks in advance for your help
best regards
diego

rokbenko · December 19, 2024, 1:05pm

Yes, unfortunately…

Can you please share your full code?

dcd_cdc · December 19, 2024, 5:34pm

Yes, of course, i use two python scripts to test this use case.
The first scrip encapsulate the huggingface model as a http post service.
i have downloaded, for example , the model Qwen2.5-14B-Instruct/ from this url h_ttps://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE to a local folder /model/qwen/Qwen2.5-14B-Instruct
the first scrip is :
from transformers import AutoModelForCausalLM, AutoTokenizer
from flask import Flask, request, jsonify

app = Flask(name)

model_path = " /model/qwen/Qwen2.5-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

@app.route(‘/generate’, methods=[‘POST’])
def generate():
data = request.json
input_text = data[‘inputs’]
input_ids = tokenizer.encode(input_text, return_tensors=‘pt’)
output = model.generate(input_ids, max_length=350)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Response:----> {response}")
return jsonify({‘response’: response})

if name == ‘main’:
app.run(host=‘host1.domain.net’,port=8601,debug=False)

after start this scrip the service is waiting for post request that the model answer.

the second scrip is a little scrip to test crewai agent, it is a very simple scrip but i only want to test how to integrate an local llm to be used by a crew agent:
from crewai import LLM, Agent, Task, Crew
import os
import warnings
warnings.filterwarnings(‘ignore’)

Desactivar Telemetría

os.environ[“OTEL_SDK_DISABLED”] = “true”
os.environ[‘CREWAI_DISABLE_TELEMETRY’] = ‘true’

my_llm = LLM(
api_base=“http://host1.domain.net:8601/generate”,
model=“huggingface/Qwen/Qwen2.5-14B”
)
amable = Agent(
role=“Agente Amable”,
goal=“Saludar con amabilidad a quien le pregunte en idioma {idioma}”,
backstory="eres un agente amable siempre responde con saludos amables "
,
allow_delegation=False,
verbose=True,
llm=my_llm,
)

#####################################

TASKS

saludar = Task(
description=(
“Contestar con un saludo amable”
“Responde siempre con un saludo amable”

),
expected_output="Un saludo amable como hola que tal, o como te va o  "
    "cualquier saludo amable que se te ocurra, "
    ,
agent=amable,

)

crew
crew = Crew(
agents=[amable],
tasks=[saludar],
verbose=True
)

#############

Run

result = crew.kickoff(inputs={“idioma”:“español”})
print(f"RESULTADO ----> {result}")

i start first the http service with the first scrip and then start the second scrip.
i can see that the agent call the model because i see in the console of the first scrip the request and the answer that the model generate, but in the second scrip i can see the next messages in the output console:

Agent: Agente Amable

Task: Contestar con un saludo amableResponde siempre con un saludo amable

LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True’.

ERROR:root:LiteLLM call failed: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: 'You are Agente Amable. eres un agente amable siempre responde con saludos amables \nYour personal goal is: Saludar con amabilidad a quien le pregunte en idioma español\nTo give my best complete final answer to the task use the exact following format:\n\nThought: I now can give a great answer\nFinal Answer: Your final answer must be the great and the most complete as possible, it must be outcome described.\n\nI MUST use these formats, my job …

thanks very much for your help
best regards
diego

pratikchhapolika · February 7, 2025, 6:54am

I am using below versions @rokbenko

crewai=0.28.8 
crewai_tools=0.1.6 
langchain_community=0.0.29

How can I pass hugging face embedding model to it?

semantic_search_resume = MDXSearchTool(mdx='./fake_resume.md',embedder={"provider": "huggingface",
                                                                        "config": {
                                                                            "model": "huggingface/BAAI/bge-large-en",
                                                                            "api_key": "hf_******",
                                                                                    }
                                                                        }
                                       )

what would be the value of api_base?

It gives me error

Traceback (most recent call last):
  File "/crewai/L7_job_application_crew.py", line 110, in <module>
    semantic_search_resume = MDXSearchTool(mdx='./fake_resume.md',embedder={"provider": "huggingface",
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "crewai/.venv/lib/python3.12/site-packages/crewai_tools/tools/mdx_seach_tool/mdx_search_tool.py", line 32, in __init__
    super().__init__(**kwargs)
  File "/crewai/.venv/lib/python3.12/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/crewai/.venv/lib/python3.12/site-packages/crewai_tools/tools/rag/rag_tool.py", line 47, in _set_default_adapter
    app = App.from_config(config=self.config) if self.config else App()
                                                                  ^^^^^
  File "/crewai/.venv/lib/python3.12/site-packages/embedchain/app.py", line 114, in __init__
    self.embedding_model = embedding_model or OpenAIEmbedder()
                                              ^^^^^^^^^^^^^^^^
  File "/crewai/.venv/lib/python3.12/site-packages/embedchain/embedder/openai.py", line 19, in __init__
    api_key = self.config.api_key or os.environ["OPENAI_API_KEY"]
                                     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "<frozen os>", line 685, in __getitem__
KeyError: 'OPENAI_API_KEY'

I use Outh for gpt models, I do not have OPENAI_API_KEY keys like this

    def _fetch_token(self):
        """Fetch the access token from the token URL."""
        headers = {'Content-Type': 'application/x-www-form-urlencoded'}
        data = {
            'client_id': self.client_id,
            'client_secret': self.client_secret,
            'grant_type': 'client_credentials',
            'scope': 'openid email profile'
        }

        response = requests.post(self.token_url, headers=headers, data=data)
        response.raise_for_status()  # Raise exception for HTTP errors
        self.token = response.json().get("access_token")
        return self.token

custom_model = AzureChatOpenAI(
    api_version=conf.pf_api_version,
    azure_endpoint=conf.pf_oa_endpoint,
    azure_ad_token=tokens,
    max_tokens=conf.max_tokens,
    model=conf.pf_llm_deployment,

)

azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version=conf.pf_api_version,
    azure_endpoint=conf.pf_oa_endpoint_embed,
    azure_ad_token=tokens,
    model=conf.pf_embedding_engine,
)

Topic		Replies	Views
Trying to run Llama model that is downloaded on my server locally using huggingface hub CrewAI Community Support	1	83	May 11, 2025
How to use the qwen2.5-vl-3b-instruct model with the CrewAi? LLMs llama-31-8b	3	303	April 6, 2025
Error connecting with Higgingface models CrewAI Community Support tools_issues , agent , crewai	1	70	April 4, 2025
Model does not exist General agent , task	12	111	April 8, 2025
Error in Quickstart turorial with huggingface llama General agent , crewai	0	49	February 17, 2025

Is it possible to use in an agent a Hugging Face model downloaded to a local folder and call it with the Hugging Face transformers library?

Desactivar Telemetría

TASKS

Run

Agent: Agente Amable

Task: Contestar con un saludo amableResponde siempre con un saludo amable

Related topics