Hi everyone,
I’m building a voice agent using CrewAI where Arabic audio is transcribed and fed into GPT-4o. However, I face two issues:
-
Session Persistence: Every call to crew.kickoff
resets the session, losing previous context. How can I maintain a continuous, multi-turn conversation with human-in-the-loop feedback?
-
Language Output: Since my transcriber outputs text, GPT-4o replies in Arabic. What’s the best way to force responses in another language (e.g., English) without breaking context?
Any advice or code snippets would be greatly appreciated!
Maintain conversation history and pass it in with your inputs when you kickoff the crew. That was context is preserved in multi turn conversations.
To get replies in another language you might achieve this by adding a translator agent to your crew whose task is to translate the output to the language you want.
Yes, this is what I created, but the issue here is that I am using a database, which is increasing the time by 2-4 seconds. Additionally, streaming is not supported. How can I reduce the time?
I want to create an application like VAPI or Retell.
The database cost will always be there, you might need to be creative and use something like redis that’s in-memory or a database that’s fast.