Transcribe audio file

Hello,
I am writing on my masterthesis about LLM Multiagentsystems.
I want to build a demonstrator and came up with an idea.
I thougt about a crew receiving an audio file with an order andtranscribe it, check with the current stock and send an Email.
Now I´m stuck with how to transcribe an audio file. Is it possible to do this with a task?
Sorry, I got a bit lost there…

The fastest way I can think of is to give the agent a link to an audio that you want to transcribe. First you would need to create a tool that takes a URL to an audio and sends it to an audio LLM, personally I use OpenAI, Whisper V3 Large. This will return a transcription text to you which you can do whatever you want with. So in summary:

  1. upload audio and get a url
  2. Create tool that sends audio url to whisper
  3. Manipulate the transcription and send as email
1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.