CrewAI and OpenAI-compatible vLLM-hosted model

I have following code:
from crewai import Agent, Task, Crew, LLM

raw_llm = LLM(
    base_url="https://<url>/v1/chat/completions",
    model="hosted_vllm/mistralai/Devstral-Small-2505",
    api_key="<api_key>"
)

hello_agent = Agent(
    role="Greeter",
    goal="Say 'Hello World' in a creative way.",
    backstory="You're a friendly assistant who loves to greet people with flair.",
    llm=raw_llm
)

hello_task = Task(
    description="Come up with a creative greeting using the phrase 'Hello World'.",
    agent=hello_agent,
    expected_output="A short, creative greeting that includes the phrase 'Hello World'."
)

crew = Crew(
    agents=[hello_agent],
    tasks=[hello_task]
)

result = crew.kickoff()
print(result)

Following error occurs:
litellm.exceptions.NotFoundError: litellm.NotFoundError: NotFoundError: Hosted_vllmException - Error code: 404 - {'detail': 'Not Found'}

If I replace in the LLM definition base_url=… with endpoint=…, I get following error:
litellm.exceptions.AuthenticationError: litellm.AuthenticationError: AuthenticationError: Hosted_vllmException - Incorrect API key provided: sdfadswe***********RSXs. You can find your API key at https://platform.openai.com/account/api-keys.

Please note that my model is hosted via vLLM and is OpenAI-compatible.

Is there a problem with the specification of the base_url, or with the api_key? Both should be correct; at least other agentic frameworks do their job…

Thank you for your help!

According to the documentation:

raw_llm = LLM(
    model="hosted_vllm/mistralai/Devstral-Small-2505",
    api_base="https://<url>/v1/chat/completions",
    api_key="<api_key>"
)

Thank you for your answer!

Now I tried it using your proposition, but again I got the NotFoundError.

For completeness, here the whole error message:

Traceback (most recent call last):
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\llms\openai\openai.py", line 725, in completion
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\llms\openai\openai.py", line 653, in completion
    ) = self.make_sync_openai_chat_completion_request(
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        openai_client=openai_client,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        logging_obj=logging_obj,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\litellm_core_utils\logging_utils.py", line 149, in sync_wrapper
    result = func(*args, **kwargs)
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\llms\openai\openai.py", line 471, in make_sync_openai_chat_completion_request
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\llms\openai\openai.py", line 453, in make_sync_openai_chat_completion_request
    raw_response = openai_client.chat.completions.with_raw_response.create(
        **data, timeout=timeout
    )
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\openai\_legacy_response.py", line 364, in wrapped
    return cast(LegacyAPIResponse[R], func(*args, **kwargs))
                                      ~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\openai\_utils\_utils.py", line 287, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\openai\resources\chat\completions\completions.py", line 925, in create
    return self._post(
           ~~~~~~~~~~^
        "/chat/completions",
        ^^^^^^^^^^^^^^^^^^^^
    ...<43 lines>...
        stream_cls=Stream[ChatCompletionChunk],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\openai\_base_client.py", line 1239, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\openai\_base_client.py", line 1034, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'detail': 'Not Found'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\main.py", line 1969, in completion
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\main.py", line 1942, in completion
    response = openai_chat_completions.completion(
        model=model,
    ...<15 lines>...
        custom_llm_provider=custom_llm_provider,
    )
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\llms\openai\openai.py", line 736, in completion
    raise OpenAIError(
    ...<4 lines>...
    )
litellm.llms.openai.common_utils.OpenAIError: Error code: 404 - {'detail': 'Not Found'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Markus.Jehle\test\agents\crewai\Travel_devstral\_crewai_test3.py", line 34, in <module>
    result = crew.kickoff()
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\crew.py", line 669, in kickoff
    result = self._run_sequential_process()
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\crew.py", line 780, in _run_sequential_process
    return self._execute_tasks(self.tasks)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\crew.py", line 883, in _execute_tasks
    task_output = task.execute_sync(
        agent=agent_to_use,
        context=context,
        tools=cast(List[BaseTool], tools_for_task),
    )
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\task.py", line 356, in execute_sync
    return self._execute_core(agent, context, tools)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\task.py", line 504, in _execute_core
    raise e  # Re-raise the exception after emitting the event
    ^^^^^^^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\task.py", line 420, in _execute_core
    result = agent.execute_task(
        task=self,
        context=context,
        tools=tools,
    )
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\agent.py", line 462, in execute_task
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\agent.py", line 438, in execute_task
    result = self._execute_without_timeout(task_prompt, task)
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\agent.py", line 534, in _execute_without_timeout
    return self.agent_executor.invoke(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^
        {
        ^
    ...<4 lines>...
        }
        ^
    )["output"]
    ^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\agents\crew_agent_executor.py", line 114, in invoke
    formatted_answer = self._invoke_loop()
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\agents\crew_agent_executor.py", line 208, in _invoke_loop
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\agents\crew_agent_executor.py", line 154, in _invoke_loop
    answer = get_llm_response(
        llm=self.llm,
    ...<3 lines>...
        from_task=self.task
    )
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\utilities\agent_utils.py", line 160, in get_llm_response
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\utilities\agent_utils.py", line 153, in get_llm_response
    answer = llm.call(
        messages,
    ...<2 lines>...
        from_agent=from_agent,
    )
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\llm.py", line 971, in call
    return self._handle_non_streaming_response(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        params, callbacks, available_functions, from_task, from_agent
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\crewai\llm.py", line 781, in _handle_non_streaming_response
    response = litellm.completion(**params)
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\utils.py", line 1306, in wrapper
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\utils.py", line 1181, in wrapper
    result = original_function(*args, **kwargs)
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\main.py", line 3430, in completion
    raise exception_type(
          ~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<3 lines>...
        extra_kwargs=kwargs,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 2293, in exception_type
    raise e
  File "C:\Users\Markus.Jehle\test\.venv\Lib\site-packages\litellm\litellm_core_utils\exception_mapping_utils.py", line 465, in exception_type
    raise NotFoundError(
    ...<5 lines>...
    )
litellm.exceptions.NotFoundError: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'detail': 'Not Found'}

Might the reason be, that …hosted_vllm… is not correct? But I also tried …openai… with the same error message, and when I leave that out, I get:

litellm.exceptions.BadRequestError: litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=mistralai/Devstral-Small-2505
 Pass model as E.g. For 'Huggingface' inference endpoints pass in completion(model='huggingface/starcoder',..) Learn more: https://docs.litellm.ai/docs/providers 

Really strange… since it works with other frameworks…

I’m not a vLLM user myself, but shouldn’t the address declaration be something like:

api_base="http://localhost:8000"

In other words, could the problem be that you’re adding /v1/chat/completions to the end of the address?

Thank you, that helped a lot.

Now I specified the llm to:

raw_llm = LLM(
    model="hosted_vllm/mistralai/Devstral-Small-2505",
    api_base="http://<ip>:<port>/vllm-devstral/v1",
    api_key="<api_key>"
)

And now it works :slight_smile:

Please note, that the /v1 has to be added, otherwiese it won’t work…