Where do you feed the url/path argument to the Vision tool?

Hi all,

I am new to using crew ai and wanted to use its vision tool. However, I am a bit puzzled as to where to feed the tool the path to my image that I want it to read. Could anybody help me please?

Thanks,
Gundi

@GundulaGause You set the image_path parameter to the Vision tool as follows:

from crewai_tools import VisionTool

vision_tool = VisionTool(image_path="path/to/your/image.png")

my_agent = Agent(
    ...,
    tools=[vision_tool]
)

See the docs.

Thanks for the response.

I did exactly that but get the following error when kicking of the the crew:
“I encountered an error while trying to use the tool. This was the error: [Errno 2] No such file or directory: ‘URL_of_the_image’.
Tool Vision Tool accepts these inputs: Vision Tool(image_path_url: ‘string’) - This tool uses OpenAI’s Vision API to describe the contents of an image.”

However, the image path is set correctly and when I run vision_tool.run(image_path_url=image_path), it does extract the correct information.

@GundulaGause You’re right. The docs are not updated. It looks like the parameter was renamed.

The following code, as you’ve already figured out, should work.

from crewai_tools import VisionTool

vision_tool = VisionTool(image_path_url="path/to/your/image.png")

my_agent = Agent(
    ...,
    tools=[vision_tool]
)

@joaomdmoura @matt @tonykipkemboi I created a pull request with fixed docs.

Unfortunately it does not. I tried your suggestion but keep getting the same error:

“I encountered an error while trying to use the tool. This was the error: [Errno 2] No such file or directory: ‘URL_of_the_image’.
Tool Vision Tool accepts these inputs: Vision Tool(image_path_url: ‘string’) - This tool uses OpenAI’s Vision API to describe the contents of an image.”

Again, just running “extracted_text = vision_tool.run(image_path_url=image_path)” works fine, I just don’t know how to integrate that into my Agent or Task framework.

@GundulaGause Got it! Still, the docs needed to be updated. If you take a look at the source code, the Vision tool expects the image_path_url parameter, not the image_path parameter.

Can you confirm that the file is located where the code searches for it?

I agree, what is needed is the ‘image_path_url’ not the ‘image_path’.

But I can confirm that the defined path links to the correct file. That is why it extracts the correct information when running: vision_tool.run(image_path_url=image_path).

@GundulaGause That makes me think that there’s maybe something wrong with the source code?

@joaomdmoura @matt @tonykipkemboi Can you please check this out?

We need to update tree the docs to reflect how the tool is meant to be used.

The Vision tool should be used where the Agent is passed an image from another agent to be processed - in a more autonomous way.

We can add an attribute for direct paths for sure

Sorry about the confusion

Thank you guys, but I am still not sure what to do now? Is there a solution at this point or should wait until you made some updates?

Thank you for pointing this out.

Looking into this today and will update.

Has anyone successfully demonstrated how to use VisionTool for agent communication or for completing a task like extracting text from an image? The documentation lacks detail, and numerous users have reported issues and inconsistencies with the tool. A functional example would greatly aid in understanding its use.

There’s a simple example in the docs:

from crewai_tools import VisionTool

vision_tool = VisionTool()

@agent
def researcher(self) -> Agent:
    '''
    This agent uses the VisionTool to extract text from images.
    '''
    return Agent(
        config=self.agents_config["researcher"],
        allow_delegation=False,
        tools=[vision_tool]
    )

Does this not help you? What are you missing?

Hi rokbenko! As matt noted, the vision tool only seems to work if it receives the expected image_path_url parameter from another agent. Setting the parameter when instantiating the tool in an Agent configuration doesn’t seem to carry over and the agent fails to read the image. I believe that that’s what’s slightly misleading in the docs.

Aha, now I understand! :slight_smile: