Using ollama(llama3.2-vision) to extract text from image

yaki6 · January 12, 2025, 5:33pm

Hi team,

I am trying to use llama3.2-vision on ollama to spin up a crew to extract text. I use the following to do the job:

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What's in the image?',
        'images': ['test/test.jpg']
    }]
)

print(response['message']['content'])

However, I am struggling with specifying the agent to process the image from the directory. I don’t think using Tools is the right way to go. Is there any way to configure this?

Thanks!

grgg · January 17, 2025, 6:04am

I used this formatting to do some OCR with Llama-3.2-11B-Vision


      prompt = “What’s in the image?”

        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "image", "image": sample["image"]},#Image- #path
                ],
            },
            {
                "role": "assistant",
                "content": [{"type": "text", "text": sample["text"]}],
            },
        ]
    }

See if something like that works

system · February 11, 2025, 5:34pm

This topic was automatically closed after 30 days. New replies are no longer allowed.

Topic		Replies	Views
Do image analysis with locall ollama CrewAI Community Support agent , task , crewai	1	48	June 21, 2025
LLM used through Ollama is not working for fileread tool General	2	111	February 7, 2025
Agent with Ollama on a remote server CrewAI Community Support agent	5	1077	November 19, 2024
WebsearchTool : LLM Parsing Error CrewAI Community Support tools_issues	3	199	January 30, 2025
CrewAI multimodal Capability CrewAI Community Support	4	154	April 26, 2025

Using ollama(llama3.2-vision) to extract text from image

Related topics