I think I can kill two birds with one stone here and give my take on both questions at once.
It’s pretty common to hear people advocating for breaking down tasks. And the upsides are pretty obvious: everything gets more organized, and for each task (or each step in the chain, you could say), it’s like you’ve got this hyper-focus going on. If you, like, roughly picture an LLM as a “brain,” it’s as if the entire brain is wiped clean, ready to take in only the specific rules needed for that one sub-task. And yeah, LLMs do get more efficient when you feed them fewer rules in the context window, even if your LLM has a massive context window. With Gemini 1.5 Pro’s 2 million token context window, you could probably upload the entire Bible into its context, and I bet it could find specific passages throughout the whole text (that’s the “needle in a haystack” test). But even then, it wouldn’t exactly become a good Christian, right? It has access to all the content, sure, but that doesn’t mean it can actually use all the rules packed into that massive amount of information. So, all this seems to back up the idea that sub-tasks are a smart move, 'cause they help optimize how many rules you’re passing along at each step of the chain, right?
What doesn’t get talked about as much is that this really only holds true 100% when those sub-tasks are atomic, meaning they’re self-contained and self-sufficient. And in real-world scenarios with agentic systems, that’s almost never the case. Splitting tasks can mean breaking a line of reasoning that’s crucial for getting the job done well. And that break point is never perfectly put back together. No matter how you try to pass along what was done before (CrewAI, for example, sends the previous task’s output to the next task’s input as context in its default sequential process), there’s still some loss in that handoff. Even if your agent can ping another agent while working on a task, if that second agent is just going to spit back the full text of the rules, wouldn’t it have been better to just give all that text to the first agent from the get-go? These are the kinds of questions that’ll help you figure out how to divide tasks in your agentic system. Basically, if a task is complex, then its prompt is gonna be complex and long, and that’s okay. What matters is that the agent has all the instructions it needs to nail that task properly during execution.
To give you some concrete examples, up until about a month and a half ago, a task in a system I was using to generate content for a YouTube channel had a prompt that was over 41k characters long (around 10k tokens). You read that right. Over 40,000 characters just for the rules. This task would churn out a 13k-character video script in one go. It was absolutely crucial that the generated text used every trick in the book for text humanization in Portuguese (my native language) and storytelling techniques. The prompt is broken down into sections, like a mini-book. There’s a section for persona creation, rules on what to do (you could call these business rules), examples of how to do it, rules on what not to do, a chain of thought, a section with review procedures before the final output, and so on. I tried splitting off some of the rules into other tasks, but the results would always take a nosedive. It was easy to spot because the views would plummet. I deliberately picked a niche that really relies on the audience making a human connection, so less humanized text meant the channel’s numbers dropped, and I used that as a benchmark.
To give you some smarter, more sophisticated examples than my little toy project, the prompt for the Devin-AI programming agent is said to be over 34,000 characters (around 8-9,000 tokens), and the system prompt for Claude4 is reportedly 24-25,000 tokens long (that’s about 95-100,000 characters). As a rule of thumb, don’t sweat it if your task needs up to 20,000 characters of text for its instructions. Just make sure all those rules really belong to the same logical course of action, in other words, the same task.
This doesn’t mean you can’t or shouldn’t have, say, review tasks after the main job is done (doesn’t matter if it’s the same agent or another one doing the review, the latency cost is the same). But you’ll definitely see a solid difference when the initial task is done by an agent that’s aware of all the necessary rules upfront, and then it’s just reviewed in a separate, dedicated task later on.
Hope these thoughts, even if they seem pretty basic, help you get the best out of your projects. Good luck!