Hello, I have downloaded the Llama-3.2-1B model from Hugging Face to a local folder on my computer. Is it possible to define an agent in CrewAI that uses this local model as its underlying model? I have tried defining a class that masks the instantiation of the model with the transformers
library and then assigning the class to the model, but it doesn’t work. Is this possible?
@dcd_cdc CrewAI’s LLM class leverages LiteLLM in the background, which supports a wide range of LLM providers. Among these is Hugging Face, enabling you to run models like Llama 3.2. See tutorial on how to use Hugging Face with liteLLM.
Here’s a code example:
from crewai import Agent, LLM
my_llm = LLM(
api_base="<your-api-base>",
model="huggingface/meta-llama/Llama-3.2-3B",
)
my_agent = Agent(
...,
llm=my_llm,
)
Hi
thank you for your response. I am trying to test it, but I am receiving errors in the format of the response returned by my service. Basically, I have created a simple example of a local service that instantiates a local llama_3.1 model and answers a question. I invoke it with http://localhost:5000/generate, and it responds with a string generated by the model. I have made a small example using crewai as you indicated in your response. :from crewai import Agent, LLM
my_llm = LLM(
api_base=“http://localhost:5000/generate”,
model=“huggingface/meta-llama/Llama-3.2-1B-Instruct”,
)
amable = Agent(
role=“Agente Amable”,
goal=“Saludar con amabilidad a quien le pregunte en idioma {idioma}”,
backstory="eres un agente amable siempre responde con saludos amables "
,
allow_delegation=False,
verbose=True,
llm=my_llm,
)
i can see that the service is called and the model return an answer to the agent, but it seems that the agent expects a different response format because it gives this error: raise APIError(
litellm.exceptions.APIError: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: 'system\n\nYou are Agente Amable. eres un agente amable siempre responde con saludos amables \nYour personal goal is: Saludar c…}]
do you know what response format should the service calling the model return?
Thanks very much for your help
best regards
diego
Unfortunately, smaller LLMs (e.g., Llama 3.2 1B) sometimes struggle to work with CrewAI. Try to switch the LLM to a more capable one (e.g., Llama 3.2 11B).
hi
in this example the llm only answer a string but, we can adapt the model’s response to the format that Crew accepts, but we don’t know what format Crew expects to receive. Is there any documentation that specifies the format that the agent expects as a response in a call to an LLM?
thanks in advance for your help
best regards
diego
@dcd_cdc Have you tried switching the LLM to a more capable one? Such errors can happen simply because of the LLM, trust me. I’m 99% certain it’s the LLM causing this error in your case. A 1B LLM will not work with CrewAI.
Hi, @rokbenko, I have tried two slightly larger models, meta_llama_3_8B_instruct and llama3_70b_awq, but the result is the same. I always receive the same error: ERROR:root:LiteLLM call failed: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: 'system\n\nYou are Kind Agent. You are a kind agent, always respond with kind greetings \nYour persona…,
I understand that I am doing something wrong and the /generate service that uses the llma model instances the model and responds to the request it receives, but the response it generates is always in the same format, a JSON like this: return jsonify({‘response’: response}), regardless of the model I use. The example code to implement the service is simple:
model_path = “/local/model…”
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
@app.route(‘/generate’, methods=[‘POST’])
def generate():
data = request.json
input_text = data[‘inputs’]
input_ids = tokenizer.encode(input_text, return_tensors=‘pt’)
output = model.generate(input_ids, max_length=350)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Response:----> {response}“)”
is it possible that in the response of the service, I need to format the response in a different way?
Thanks in advance for your help
I see, but now you picked an older Llama model family, right? You switched Llama 3.2 for Llama 3.
Try Llama 3.2 11B, for example. That’s a newer model family and a more capable one.
hi, i can not find the model llama 3.2 11B, in huggingface the only model like 3.2 11B is meta-llama/Llama-3.2-11B-Vision , the other llama 3.2 models are llama 3.2 1B and llama 3.2 3B, (Llama 3.2 - a meta-llama Collection) do you know where i can find llama 3.2 11B
thanks
regards
diego
This is the one I had in mind.
hi @rokbenko
Unfortunately Meta does not allow us to use the 11b-Vision model from Europe, “Meta-llama has disallowed access to this model in the EU
Downloads of this model are not accessible from the European Union (EU). Please see the Llama Acceptable Use Policy and License FAQ page for more information.”
We cannot use this model for testing, we have tested with the Qwen/Qwen2.5-14B model but the result is the same: " File “C:\ProgramData\Anaconda3\envs\udemy\lib\site-packages\litellm\litellm_core_utils \exception_mapping_utils.py”, line 1438, in exception_type
raise APIError(
litellm.exceptions.APIError: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: '<|begin_of_text|><|start_header_id|> system<|end_header_id|>\n\nYou are Friendly Agent…"
It is possible that we are always receiving the same error since in the end we are telling the service that encapsulates the call to the model to return a json with the format {" response":“…answer from the model…”} and it is not if this could be the problem.
I have tried to skip the call to the model within the service and always respond to the agent with a fixed string “hello how are you” and the error has changed it seems to indicate that the response is not a json, like It’s normal, I only returned a fixed string.
Traceback (most recent call last):
File “C:\ProgramData\Anaconda3\envs\udemy\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 699, in completion
completion_response = response.json()
File " C:\ProgramData\Anaconda3\envs\udemy\lib\site-packages\requests\models.py", line 978, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
That is, it seems that the response must be a json but I don’t know what json format it expects, what fields it needs, I don’t know if this is correct and can you tell me what the json format would be? of response.
Thanks in advance for your help
best regards
diego
Yes, unfortunately…
Can you please share your full code?
Yes, of course, i use two python scripts to test this use case.
The first scrip encapsulate the huggingface model as a http post service.
i have downloaded, for example , the model Qwen2.5-14B-Instruct/ from this url h_ttps://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/LICENSE to a local folder /model/qwen/Qwen2.5-14B-Instruct
the first scrip is :
from transformers import AutoModelForCausalLM, AutoTokenizer
from flask import Flask, request, jsonify
app = Flask(name)
model_path = " /model/qwen/Qwen2.5-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)
@app.route(‘/generate’, methods=[‘POST’])
def generate():
data = request.json
input_text = data[‘inputs’]
input_ids = tokenizer.encode(input_text, return_tensors=‘pt’)
output = model.generate(input_ids, max_length=350)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f"Response:----> {response}")
return jsonify({‘response’: response})
if name == ‘main’:
app.run(host=‘host1.domain.net’,port=8601,debug=False)
after start this scrip the service is waiting for post request that the model answer.
the second scrip is a little scrip to test crewai agent, it is a very simple scrip but i only want to test how to integrate an local llm to be used by a crew agent:
from crewai import LLM, Agent, Task, Crew
import os
import warnings
warnings.filterwarnings(‘ignore’)
Desactivar Telemetría
os.environ[“OTEL_SDK_DISABLED”] = “true”
os.environ[‘CREWAI_DISABLE_TELEMETRY’] = ‘true’
my_llm = LLM(
api_base=“http://host1.domain.net:8601/generate”,
model=“huggingface/Qwen/Qwen2.5-14B”
)
amable = Agent(
role=“Agente Amable”,
goal=“Saludar con amabilidad a quien le pregunte en idioma {idioma}”,
backstory="eres un agente amable siempre responde con saludos amables "
,
allow_delegation=False,
verbose=True,
llm=my_llm,
)
#####################################
TASKS
saludar = Task(
description=(
“Contestar con un saludo amable”
“Responde siempre con un saludo amable”
),
expected_output="Un saludo amable como hola que tal, o como te va o "
"cualquier saludo amable que se te ocurra, "
,
agent=amable,
)
crew
crew = Crew(
agents=[amable],
tasks=[saludar],
verbose=True
)
#############
Run
result = crew.kickoff(inputs={“idioma”:“español”})
print(f"RESULTADO ----> {result}")
i start first the http service with the first scrip and then start the second scrip.
i can see that the agent call the model because i see in the console of the first scrip the request and the answer that the model generate, but in the second scrip i can see the next messages in the output console:
Agent: Agente Amable
Task: Contestar con un saludo amableResponde siempre con un saludo amable
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True’.
ERROR:root:LiteLLM call failed: litellm.APIError: HuggingfaceException - response is not in expected format - [{‘response’: 'You are Agente Amable. eres un agente amable siempre responde con saludos amables \nYour personal goal is: Saludar con amabilidad a quien le pregunte en idioma español\nTo give my best complete final answer to the task use the exact following format:\n\nThought: I now can give a great answer\nFinal Answer: Your final answer must be the great and the most complete as possible, it must be outcome described.\n\nI MUST use these formats, my job …
thanks very much for your help
best regards
diego