Crewai + multimodal

Hi Team,

I need to use the multimodal LLM in crewai. In case of multimodal, usually image need to convert into base64 image and send that encoded value into LLM param as “Image_url” in message request body. I would like to know whether crewai have an options to send that image encoded value vis crewai’s LLM() package.

Here is my request body:

request_body_sample = {
    "messages": [{"role":"system","content":system_prompt}, {"role":"user","content":[{"type":"text","text":user_text_input},{"type":"image_url","image_url":{"url": f"data:image/jpeg;base64,{imgBase64EncValue}"}}]}],
    "project_id": credentials.get("project_id"),
    "model_id": "watsonx/meta-llama/llama-3-2-90b-vision-instruct",
    "decoding_method": "sample",
    "random_seed": 568743,
    "temperature": 0,
    "top_k": 50,
    "top_p": 1,
    "repetition_penalty": 1,
    "max_tokens": 8000
}    


response = requests.post(
    credentials.get("url"),
    headers=headers,
    json=request_body_sample
    )
if response.status_code != 200:
    raise Exception("Non-200 response: " + str(response.text))
data = response.json()
print(data['choices'][0]['message']['content'])

Or please help me to use multimodal in crewai in better way.

when I ask the question related to “multimodal support” documentation chatbot. Its output the message like

crewai doesn’t support multimodal?

Thanking you.

hey @Paarttipaabhalaji did you find a solution for this?

No @uma-08 , please help me on this.

sure would love to discuss workflows to execute on this, I’m also stuck in this. what’s the best way to connect with you, @Paarttipaabhalaji ?

kindly connect me in linkedin.

We do sort of support multi-model through one of our tools - Vision Tool - CrewAI but this is only a tool so it may or may not help

@matt I need to use multimodal from the provider watsonx.

I also faced the same issue.

I want to directly input the image path to my local multimodal model, without using additional tool

@matt any guidance or update on this.