I am creating a crew to analyze images/diagrams. I am using Anthropic’s claude-3-haiku-20240307 model. Since it can’t read images directly, I am using a tool that I created being hosted on Cloudflare as a MCP that takes in the base64 encoding of a PNG, the prompt for what to do with that image (generated by my crew), and an Anthropic API key so that it can query Anthropic.
I have tested the tool and I know that it works perfectly fine. However, when I run it using my crew, the crew is unable to correctly input arguments into the tool. My code is below:
agents.yaml:
analyzer_agent:
role: >
Image Analyzer
goal: >
Utilize the tool provided by a remote SSE MCP server to analyze images
backstory: >
A AI agent that uses external services via SSE to analyze images
tasks.yaml:
analyze_image_task:
description: >
Use the tool from the MCP server to analyze the image at {image_path}. You
have been given the api key at {api_key}, and the image encoded in base64
png at {image_base64}. Use these to use the tool.
expected_output: >
The analysis results of the image
agent: analyzer_agent
crew.py:
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, task, crew, before_kickoff, after_kickoff
from crewai.agents.agent_builder.base_agent import BaseAgent
from typing import List
# import streamlit as st
from crewai.agents.parser import AgentFinish
from crewai.agents.parser import AgentAction
from crewai_tools import MCPServerAdapter
import sys
import os
import base64
# If you want to run a snippet of code before or after the crew starts,
# you can use the @before_kickoff and @after_kickoff decorators
# https://docs.crewai.com/concepts/crews#example-crew-class-with-decorators
@CrewBase
class Temp():
"""Temp crew"""
agents_config = "config/agents.yaml"
tasks_config = "config/tasks.yaml"
agents: List[BaseAgent]
tasks: List[Task]
def __init__(self):
self.server_params = {
"url": "***MY CLOUDFLARE SERVER URL***",
"transport": "sse"
}
self.tools = []
self._setup()
def _setup(self):
try:
self.adapter = MCPServerAdapter(self.server_params)
self.tools = self.adapter.__enter__()
print(f"Available tools from the SSE MCP server: {[tool.name for tool in self.tools]}")
except Exception as e:
print(f"Error setting up tools: {e}")
sys.exit(1)
@before_kickoff
def b64_encode(self, inputs):
with open(inputs.get('image_path'), 'rb') as image_file:
inputs['image_base64'] = base64.b64encode(image_file.read()).decode('utf-8')
@agent
def analyzer_agent(self) -> Agent:
return Agent(
config=self.agents_config['analyzer_agent'], # type: ignore[index]
max_iter=3,
)
@task
def analyze_image_task(self) -> Task:
return Task(
config=self.tasks_config['analyze_image_task'], # type: ignore[index]
tools=self.tools, # Registering the custom tool
)
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents, # Automatically created by the @agent decorator
tasks=self.tasks, # Automatically created by the @task decorator
process=Process.sequential,
verbose=True,
# process=Process.hierarchical, # In case you wanna use that instead https://docs.crewai.com/how-to/Hierarchical/
)
main.py (important part):
def run():
"""
Run the crew.
"""
inputs = {
'image_path': '/Users/ak/vision/example-1.png',
'api_key': '****MY API KEY****',
}
try:
Temp().crew().kickoff(inputs=inputs)
except Exception as e:
raise Exception(f"An error occurred while running the crew: {e}")
note: I am aware it’s not the greatest idea to just put my API key into the crew like that. I’m worrying about security once I get it to correctly use the tool on the server :). I delete the token once I am done testing with it.
Here is the output from crewai when I run crewai run
The issue is that it is sending the request as
“{"image_base64": "{image_base64}", "prompt": "Describe the contents of this image.", "anthropicApiKey": "{api_key}"}”
rather than
“{"image_base64": "actual value in image_base64", "prompt": "Describe the contents of this image.", "anthropicApiKey": "actual value in api_key"}”
I’ve been trying different things for the past 2 days. I’ve played around with changing the YAML prompts to be super specific, super vague, and everything in between. I’m not sure where to go from here.

