This code works:
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, LLM
import google.generativeai as genai
# Load environment variables
load_dotenv()
# Initialize Gemini
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
if not GOOGLE_API_KEY:
raise ValueError("Please set GOOGLE_API_KEY environment variable")
genai.configure(api_key=GOOGLE_API_KEY)
# Configure CrewAI to use Gemini
llm = LLM(
model="gemini/gemini-1.5-pro-latest",
temperature=0.7,
api_key=GOOGLE_API_KEY
)
def analyze_image():
print("\nStarting image analysis...\n")
# Validate image path
image_path = "data/test2.png"
abs_image_path = os.path.abspath(image_path)
if not os.path.exists(abs_image_path):
raise FileNotFoundError(f"Image not found at: {abs_image_path}")
print(f"Found image at: {abs_image_path}")
# Create a multimodal image analyst agent
image_analyst = Agent(
role='Image Analyst',
goal='Analyze images and provide detailed, accurate descriptions with meaningful insights',
backstory='''You are an expert image analyst with years of experience in visual content
interpretation. You have a keen eye for detail and can identify subtle elements that others
might miss. Your expertise spans across various types of images, from photographs to
technical diagrams, and you excel at providing clear, structured analysis.''',
llm=llm,
multimodal=True, # Enable multimodal capabilities
verbose=True, # Enable detailed execution logs for debugging
max_iter=1
)
# Create an image analysis task
image_task = Task(
description=f'''Analyze the image at {abs_image_path} and generate a comprehensive description.
Focus on:
1. Main subjects and objects in the image
2. Visual composition, colors, and lighting
3. Any text or notable details
4. Context and setting
5. Any notable patterns or relationships between elements''',
expected_output='''A detailed, structured analysis of the image covering all the requested focus areas.
The output should be clear, concise, and organized into sections for easy reading.''',
agent=image_analyst
)
# Create and run the crew
crew = Crew(
agents=[image_analyst],
tasks=[image_task],
verbose=True # Enable detailed execution logs
)
result = crew.kickoff()
print("\nImage Analysis Result:")
print(result)
if __name__ == "__main__":
analyze_image()
However the console output shows either a 429, 500 or 503 error (regardless of how many requests are being sent):
I have tried to use max_iter=1, max_execution_time=5, max_rpm=5 and max_retry_limit - no matter what I try, the result is always the same: a literal spam of 429, 500 or 503 errors.
What confuses me, is when I make a similar request using the gemini SDK it works fine, I never get 429, 500 or 503 errors. Also using Crew AI with Gemini just for text also works fine - no errors.
Has anybody gotten multimodal agents working with Gemini? Do we know if it works?
Let me know if I can provide any additional information.