Feature Request: Real-Time Multimodal Support for Agent

ai_agent_recruiter · February 12, 2025, 10:35am

Hi Guys,

I’d love to see real-time multimodal support for agent requests, allowing agents to process voice, video, and image inputs & outputs. Is this on your roadmap? If so, do you have a timeline for it?

Thanks!

tonykipkemboi · February 12, 2025, 3:35pm

I love this idea. Could you give an example flow of how you would want to use it?

ai_agent_recruiter · February 12, 2025, 7:32pm

Dear Tony,

Thank you for your reply. I’m glad that you liked it. I was going to develop a real-time application that my client will continuously stream his/her voice/video and I have to pass this stream to my Crew and I expect to have a stream of responses in Video/Voice/Text or tool calling etc.

For example I will stream the user’s voice to my crew and my crew needs to process it in real-time and respond based on the instructions given to it.

So far some LLMs are supporting this feature like gpt-4v and gemini etc.

Thanks for your attention,

Kind regards,

Hadi.

Mo-Shiha · February 15, 2025, 11:03am

Another use case is to consider screenshots in email attachment along with the text to give the agent the full context of a (for example a help desk ticket).

Topic		Replies	Views
Real-Time LLM Response Streaming in CrewAI CrewAI Community Support tools_issues , agent , task , crewai , feature	0	1588	November 15, 2024
CrewAI multimodal Capability CrewAI Community Support	3	428	April 26, 2025
Crewai + multimodal CrewAI Community Support crewai , feature	13	998	June 10, 2025
Building crewAI agents with locally run ollama llm CrewAI Community Support agent , crewai	2	1554	February 12, 2025
Using Langchain Tool VertexAIImageGeneratorChat with CrewAI Agent CrewAI Community Support agent	0	198	February 3, 2025

Feature Request: Real-Time Multimodal Support for Agent

Related topics