Hey, I wanted to kindof know opinions and possibilities about a certain usage of crew/agentic workflow.
Context
A bit of context, I kindof stumbled upon this problem of creating video directors, like ad, music etc. And in order to do that I needed a script writing, which ensure co-herence. Initially this was a single prompt, using a bigger model to reason. Which took around a north of a minute on gpt-4o (playground).
I had the idea of splitting it up, so that I could play around with parallelism, and tackle the challenge of continutity separately.I kindof knew if I split it too much, it would be too many API calls. But i needed to start.
Initial Version
Overall this started to feel like a good usecase to take out the manager_agent
for a spin. So I split up the whole task into 7-8 agents, controlled by 1 task. Out of which only 2 had delegation allowed:
- The Workflow Manager / Central Coordinator2.
- The Continuity Manager
Overall it worked out, the latency was 135ms, (all max_* config values were defaults)
Second Attempt
I wanted to check, if there could be something better. Also, the problem with the above is, I need to wait for it to completely complete, before I start sending the requests for text to image from the generated shot prompts. So, I batched these into two:
- User Request & Context Analysis + Narrator and Scene Planner + Continuity Agent , which would generate an intermediate json, containing everything needed to build the images, action, environment
- Cinematographer: Per Scene expansion, which takes each of the scenes from intermediate json and expands it, (image, motion, voice-over etc)
Only the first Agent, is driven by a manager. The Cinematographer gets only two scenes and previously extracted features, and expands the selected scene (passed via inputs), two scenes, to somehow ensure consistency between shots. So [0, 1], [1, 2] .. [n-1, n]
This allows me parallelize, and I can send the image gen requests, as soon as each of this Cinematic Scenes to images. The request came down 90s for completion, but its okay, since we can trigger work faster.
Although I would need to generate the videos.
Looking for improvements
Now, I also wanted continuity to handle this per script scene expansion (Cinematographers).
One way to do it, is something like a fork-join model, where, once the kickoff is completed, a continuity
task to re-arrange these expanded scene prompts.
Another approach could be to write an agent to generate images, and equip it with an API call tool for the 1st approach (single Task multiple Agents).
A Third would be, If, instead of crew.kickoff_async
I were to have some sort of way to dynamically create this n, CinematographerAgent()
s and then create a Task
and use the ContinuityAgent as a manager
.
I could use the prompt templates in the backstory
. So something like:
Crew(
agents=self.build_cinematographers(intermediate_result),
tasks=[scene_planning],
manager_agent=continuity_agent,
)
But then Question: do I use Events
for each Agent , to parallelise the API call for each Agent completion instead of waiting for the task to complete ?
So just wanted to know, what are the thoughts, ideas to improve this thing, if possible..
Some of the resources I saw: