Is there a pattern which solves the following issue.
I have a large document (5MB), which I want to ensure remains in context across conversations, especially when the manager asks questions or delegates to an agent (or when it goes through the process of re-trying until it gets a final answer), as there are times when the manager summaries or re-interprets the task description so the document context can be lost and it can go badly off track.
I would use the knowledge module however, this uses embeddings and I can’t create a single chunk size for the whole document as I get a maximum size error (google.api_core.exceptions.InvalidArgument: 400 400 Request payload size exceeds the limit: 36000 bytes. in upsert - imposed by google embedding). if I attempt to use smaller chunks I find that it never returns enough of the document or all results which provide a good answer.
So I prefer to use the long context (1 million input tokens is more than fine) rather than a RAG approach. I’m not worried about the number of tokens being used by sending the whole document each time, as the LLM does a much better job at returning quality results when it has the whole document in context for each request (as the results are deterministic in nature) - then it knows to refer to this provided document rather than relying on training data or any grounded results.
As callbacks and guardrails are executed after the task is executed, I can’t see any mechanism to force the insertion before passing to the LLM.
So I’m out of ideas at the moment, other than it would be good in the knowledge module supported full text attachment (rather than chunking/ embedding), so each request generated by the manager would keep this in context.
Is there any approach to solve this, or is this an enhancement?