Where do you feed the url/path argument to the Vision tool?

GundulaGause · October 17, 2024, 7:08am

Hi all,

I am new to using crew ai and wanted to use its vision tool. However, I am a bit puzzled as to where to feed the tool the path to my image that I want it to read. Could anybody help me please?

Thanks,
Gundi

rokbenko · October 17, 2024, 8:20am

@GundulaGause You set the image_path parameter to the Vision tool as follows:

from crewai_tools import VisionTool

vision_tool = VisionTool(image_path="path/to/your/image.png")

my_agent = Agent(
    ...,
    tools=[vision_tool]
)

See the docs.

GundulaGause · October 17, 2024, 10:04am

Thanks for the response.

I did exactly that but get the following error when kicking of the the crew:
“I encountered an error while trying to use the tool. This was the error: [Errno 2] No such file or directory: ‘URL_of_the_image’.
Tool Vision Tool accepts these inputs: Vision Tool(image_path_url: ‘string’) - This tool uses OpenAI’s Vision API to describe the contents of an image.”

However, the image path is set correctly and when I run vision_tool.run(image_path_url=image_path), it does extract the correct information.

rokbenko · October 17, 2024, 10:35am

@GundulaGause You’re right. The docs are not updated. It looks like the parameter was renamed.

The following code, as you’ve already figured out, should work.

from crewai_tools import VisionTool

vision_tool = VisionTool(image_path_url="path/to/your/image.png")

my_agent = Agent(
    ...,
    tools=[vision_tool]
)

@joaomdmoura @matt @tonykipkemboi I created a pull request with fixed docs.

GundulaGause · October 17, 2024, 10:44am

Unfortunately it does not. I tried your suggestion but keep getting the same error:

“I encountered an error while trying to use the tool. This was the error: [Errno 2] No such file or directory: ‘URL_of_the_image’.
Tool Vision Tool accepts these inputs: Vision Tool(image_path_url: ‘string’) - This tool uses OpenAI’s Vision API to describe the contents of an image.”

Again, just running “extracted_text = vision_tool.run(image_path_url=image_path)” works fine, I just don’t know how to integrate that into my Agent or Task framework.

rokbenko · October 17, 2024, 10:57am

@GundulaGause Got it! Still, the docs needed to be updated. If you take a look at the source code, the Vision tool expects the image_path_url parameter, not the image_path parameter.

Can you confirm that the file is located where the code searches for it?

GundulaGause · October 17, 2024, 11:30am

I agree, what is needed is the ‘image_path_url’ not the ‘image_path’.

But I can confirm that the defined path links to the correct file. That is why it extracts the correct information when running: vision_tool.run(image_path_url=image_path).

rokbenko · October 17, 2024, 11:44am

@GundulaGause That makes me think that there’s maybe something wrong with the source code?

@joaomdmoura @matt @tonykipkemboi Can you please check this out?

matt · October 17, 2024, 11:57am

We need to update tree the docs to reflect how the tool is meant to be used.

The Vision tool should be used where the Agent is passed an image from another agent to be processed - in a more autonomous way.

We can add an attribute for direct paths for sure

Sorry about the confusion

GundulaGause · October 17, 2024, 1:36pm

Thank you guys, but I am still not sure what to do now? Is there a solution at this point or should wait until you made some updates?

tonykipkemboi · October 29, 2024, 3:22pm

Thank you for pointing this out.

Looking into this today and will update.

srinivasa_yumunus · November 15, 2024, 1:06am

Has anyone successfully demonstrated how to use VisionTool for agent communication or for completing a task like extracting text from an image? The documentation lacks detail, and numerous users have reported issues and inconsistencies with the tool. A functional example would greatly aid in understanding its use.

rokbenko · November 15, 2024, 9:26am

There’s a simple example in the docs:

from crewai_tools import VisionTool

vision_tool = VisionTool()

@agent
def researcher(self) -> Agent:
    '''
    This agent uses the VisionTool to extract text from images.
    '''
    return Agent(
        config=self.agents_config["researcher"],
        allow_delegation=False,
        tools=[vision_tool]
    )

Does this not help you? What are you missing?

AREL · November 22, 2024, 10:49pm

Hi rokbenko! As matt noted, the vision tool only seems to work if it receives the expected image_path_url parameter from another agent. Setting the parameter when instantiating the tool in an Agent configuration doesn’t seem to carry over and the agent fails to read the image. I believe that that’s what’s slightly misleading in the docs.

rokbenko · November 23, 2024, 12:57pm

Aha, now I understand!

Topic		Replies	Views
VisionTool returning an error Crews tools_issues	0	17	June 7, 2025
Missing Extracted Content with Crew AI Vision Tool CrewAI Community Support tools_issues	1	88	November 15, 2024
Pass file path for DirectoryReadTool CrewAI Community Support	6	371	January 17, 2025
CrewAI multimodal Capability CrewAI Community Support	4	163	April 26, 2025
Using dynamic path in FileRead Tool Path CrewAI Community Support	1	155	January 28, 2025

Where do you feed the url/path argument to the Vision tool?

Related topics