Using ollama(llama3.2-vision) to extract text from image

Hi team,

I am trying to use llama3.2-vision on ollama to spin up a crew to extract text. I use the following to do the job:

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What's in the image?',
        'images': ['test/test.jpg']
    }]
)

print(response['message']['content'])

However, I am struggling with specifying the agent to process the image from the directory. I don’t think using Tools is the right way to go. Is there any way to configure this?

Thanks!

I used this formatting to do some OCR with Llama-3.2-11B-Vision


      prompt = “What’s in the image?”

        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "image", "image": sample["image"]},#Image- #path
                ],
            },
            {
                "role": "assistant",
                "content": [{"type": "text", "text": sample["text"]}],
            },
        ]
    }

See if something like that works

1 Like