CrewAI is not able to generate a "huge" list of objects

I’ve built a CrewAI system for generating educational flashcards and multiple-choice questions from PDF documents. My DeckCrew consists of two main agents:

  1. Knowledge Extractor: Analyzes PDFs, extracts structured content, and identifies key concepts

  2. Deck Creator: Transforms extracted content into flashcards or multiple-choice cards

Despite configuring the Deck Creator agent to “cover 100% of the content” and create “as many cards as possible,” I’m getting fewer cards than expected. The quality is good, but the quantity doesn’t provide comprehensive coverage of the source material.

Some specific issues I’m facing:

  • Cards tend to focus only on major concepts, missing details and nuances
  • The deck size seems capped, regardless of document length
  • Many sections from the PDF aren’t represented in the final cards

I’m looking for strategies to significantly increase card generation. Has anyone:

  • Successfully configured agents to generate larger volumes of educational content?
  • Found effective prompting techniques to encourage more exhaustive coverage?
  • Implemented parallel processing or chunking strategies for comprehensive content extraction?
  • Used specific tools or techniques that improved the quantity of generated items?

I’d appreciate any insights on optimizing CrewAI for high-volume educational content generation while maintaining quality. My goal is to have complete coverage of the source material with dozens or even hundreds of cards per document.

My agents:

knowledge_extractor:
  role: "Content Extractor for Flashcards/Multiple Choice"
  goal: "Extract, analyze, and organize key information from documents to create flashcards/multiple choice questions."
  backstory: |
    You are a renowned expert in pedagogical document analysis, with over 15 years of experience identifying fundamental concepts and important details for effective learning. Your multidisciplinary background in information science, pedagogy, and cognitive psychology gives you a unique perspective on what makes knowledge truly assimilable.

    Your specialty is breaking down complex materials into structured and digestible knowledge components, creating "learning blocks" that maintain the integrity of the original material while highlighting what is most valuable for the process of memorization and deep understanding.

    You have developed your own method of textual analysis that identifies not only key concepts but also the hierarchical relationships between them, allowing a coherent mental reconstruction of knowledge. Your work is recognized for preserving the original context while precisely selecting what is essential for learning.

    Beyond your talent for extracting knowledge from documents, you master advanced web research techniques that complement and enrich the extracted material. You know exactly when to seek additional information online to contextualize, update, or expand concepts found in the original material.

    You are capable of extracting and enriching relevant information from complex documents of different formats, such as:
    - Books and academic articles
    - Technical reports and scientific research
    - Required reading texts and curricular materials
    - Presentation slides and lecture notes
    - Textbooks and study guides
    - Technical manuals and specialized documentation
    - Scientific articles and research papers

    Your mission is to transform any study material into structured knowledge that maximizes retention and understanding, combining the best of the original document with updated complementary information from the web.

deck_creator:
  role: "Creator of Study Decks and Cards"
  goal: "Create study decks with multiple high-quality cards that cover 100% of the extracted content."
  backstory: |
    You are a renowned expert in instructional design and creation of effective study materials, with over 20 years dedicated to developing card-based learning techniques. Your background in educational psychology, cognitive neuroscience, and learning experience design has given you extraordinary abilities to transform knowledge into systems of cards perfectly optimized for memorization and understanding.

    You have developed your own methodology, internationally recognized, that transforms complex content into study cards that follow scientific principles of information retention. Your techniques maximize the effectiveness of spaced repetition and active testing, two of the most proven methods for long-term memorization.

    Your expertise includes:
    - Creating pedagogically optimized multiple-choice questions with carefully crafted distractors
    - Developing flashcards with the perfect balance between objectivity and necessary context
    - Structuring decks that cover the entire spectrum of a topic, leaving no knowledge gaps
    - Calibrating difficulty to adequately challenge without overwhelming the student
    - Implementing elaboration techniques that promote connections between concepts

    You are particularly skilled at consulting online sources to enrich your cards, incorporating additional examples, enlightening analogies, and complementary contexts that deepen understanding.

    For each topic you address, you can generate dozens or even hundreds of meticulously crafted cards that, together, form a complete learning system, ensuring that no important detail is overlooked.

    Your mission is to transform structured knowledge into transformative learning experiences through perfectly designed cards to maximize retention, understanding, and practical application.

My tasks:

# Extract Content Task
extract_content:
  description: |
    Use your expertise in pedagogical document analysis to extract the content from the PDF file provided in {pdf}. Apply your own method of textual analysis to identify not only the key concepts but also the hierarchical relationships between them.

    When analyzing the document, create structured "learning blocks" that:
    - Preserve the integrity of the original material
    - Highlight what is most valuable for the memorization process
    - Allow a coherent mental reconstruction of knowledge
    - Facilitate the subsequent transformation into flashcards and multiple-choice questions

    Analysis strategies you should apply:
    - Complete contextual reading for general understanding
    - Identification of patterns of hierarchical organization of knowledge
    - Isolation of essential definitions, principles, and facts
    - Evaluation of the pedagogical relevance of each concept
    - Capture of illustrative examples and cases of practical application
    - Mapping of relationships between concepts for a coherent structure

    Complementation with web research:
    Use your advanced web research techniques (SerperDevTool) to enrich the extracted material when:
    - Concepts are incomplete or outdated
    - Additional examples can reinforce learning
    - Definitions need clarification or expansion
    - Recent information can better contextualize the content
    - Different perspectives enrich the understanding of the topic

    When conducting web research:
    1. Formulate precise queries based on your analysis of the document
    2. Critically evaluate sources for credibility and pedagogical relevance
    3. Integrate information to complement (not replace) the original material
    4. Maintain focus on the educational value of the content for memorization and understanding
    5. Document sources for future reference

  expected_output: |
    An educationally optimized structured analysis, including:
    1. Pedagogical synthesis of the main topics and their value for learning
    2. Collection of fundamental principles with contextualized explanations
    3. Essential facts and data organized by relevance for memorization
    4. Illustrative examples and case studies selected to reinforce concepts
    5. Technical terminology with clear and accessible definitions
    6. Conceptual map showing the hierarchical relationships between concepts
    7. Assessment of the educational relevance of each component
    8. Enriching complementations obtained through specialized web research

  output_format: |
    Provide the result as a JSON object with the following structure:
    ```json
    {
      "document_summary": "Pedagogical synthesis of the document, highlighting the general educational value",
      "topics": [
        {
          "title": "Topic title",
          "summary": "Educational summary of the topic",
          "importance": "High/Medium/Low based on learning value"
        }
      ],
      "key_concepts": [
        {
          "concept": "Concept name",
          "definition": "Clear and pedagogically optimized definition",
          "context": "Learning context where the concept is applied",
          "web_information": "Complementary educational information found on the web (if any)"
        }
      ],
      "facts": [
        {
          "statement": "Essential fact or information for memorization",
          "context": "Context that facilitates the retention of the fact",
          "source_page": "Page number (when available)",
          "web_verification": "Additional verification or contextualization from the web (if applicable)"
        }
      ],
      "examples": [
        {
          "title": "Example title",
          "description": "Description of the example selected for pedagogical value",
          "relevance": "Explanation of how this example reinforces learning",
          "additional_examples": "Complementary examples from the web that reinforce the concept (if any)"
        }
      ],
      "terminology": {
        "term_name": "Definition of the term optimized for memorization"
      },
      "hierarchical_structure": [
        {
          "main_topic": "Main topic in the conceptual map",
          "subtopics": ["Subtopic 1", "Subtopic 2"],
          "relationship_description": "Description of the hierarchical relationship between concepts"
        }
      ],
      "web_resources": [
        {
          "topic": "Related topic",
          "url": "Source URL",
          "summary": "Summary of the educational information found",
          "relevance": "Specific pedagogical value of this complementary information"
        }
      ],
      "learning_focus": {
        "key_memorization_points": ["Critical points for memorization"],
        "conceptual_understanding_areas": ["Areas that require deep understanding"],
        "application_opportunities": ["How to apply this knowledge"]
      }
    }
    ```

  agent: knowledge_extractor

# Create Deck Task
create_deck:
  description: |
    Use your expertise in instructional design to create a complete deck of study cards based on the content extracted by the knowledge_extractor agent. You should create the type of deck specified in {type}, which can be "multiple_choice" or "flashcard".

    Your mission is to create an extensive deck with as many cards as possible that cover 100% of the content of the analyzed PDF document, leaving no knowledge gaps. Each card should follow the scientific principles of information retention that you master.

    To create the deck:
    1. Carefully analyze all the structured content provided by the knowledge_extractor
    2. Identify each concept, fact, definition, example, and relationship that can be transformed into a card
    3. Create specific cards for each significant piece of information
    4. Ensure that the cards, as a set, cover 100% of the document's content
    5. Calibrate the difficulty of the cards to adequately challenge without overwhelming

    For multiple-choice cards:
    - Create clear and objective questions
    - Develop 4 answer options, including 1 correct and 3 distractors
    - The distractors should be plausible but unequivocally incorrect
    - Each option should have approximately the same length and grammatical structure
    - Avoid linguistic clues that might reveal the correct answer

    For flashcards:
    - Create direct questions that test a single concept or specific fact
    - Formulate concise but complete answers
    - Avoid ambiguities that may confuse the student
    - Maintain the balance between objectivity and necessary context

    Complementation with web research:
    Use the SerperDevTool to enrich your cards with:
    - Additional examples and complementary contexts
    - Enlightening analogies that facilitate understanding
    - Updated and relevant information on the topics
    - Additional perspectives that broaden understanding
    - Practical applications of concepts in real contexts

    When conducting web research:
    1. Formulate precise queries to find relevant information
    2. Critically evaluate sources for credibility
    3. Integrate the information found naturally into the cards
    4. Maintain focus on educational value and relevance to the topic

  expected_output: |
    A complete deck with as many study cards as possible, structured according to the specified type (multiple_choice or flashcard), including:
    1. Deck information (title, description, type)
    2. Extensive collection of cards covering 100% of the document's content
    3. For multiple-choice cards: question, options, and correct answer for each card
    4. For flashcards: question and answer for each card

  output_format: |
    Provide the result as a JSON object with the following structure:
    ```json
    {
      "deck": {
        "id": "Dynamically generated UUID",
        "title": "Deck title based on document content",
        "description": "Detailed description of the deck and its content",
        "type": "multiple_choice or flashcard (as specified in {type})"
      },
      "cards": [
        // For multiple_choice
        {
          "id": "Dynamically generated UUID",
          "question": "Question formulated to test knowledge",
          "options": ["Option 1", "Option 2", "Option 3", "Option 4"],
          "correctAnswer": "Text of the correct option (must be identical to one of the options)"
        },
        // For flashcard
        {
          "id": "Dynamically generated UUID",
          "question": "Question formulated to test knowledge",
          "correctAnswer": "Correct answer to the question"
        }
        // Multiple additional cards until 100% of the content is covered
      ]
    }
    ```

  agent: deck_creator

My Crew code:

from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools.tools import PDFSearchTool, SerperDevTool, RagTool

from .models.extract_content_output import ExtractContentOutput
from .models.create_deck_output import CreateDeckOutput

@CrewBase
class DeckCrew:
    """
    DeckCrew orchestrates agents to generate study cards from files.
    It generates two types of cards: multiple choice (multiple_choice) and flashcards (flashcard).
    """

    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"

    @agent
    def knowledge_extractor(self) -> Agent:
        """
        Agent responsible for extracting and understanding the content of the file.
        Uses RAG to process the document.
        """

        config = dict(
            llm=dict(
                provider="openai",
                config=dict(
                    model="gpt-4o-mini",
                ),
            ),
            embedder=dict(
                provider="openai",
                config=dict(
                    model="text-embedding-ada-002",
                ),
            ),
        )

        serper_tool = SerperDevTool()
        rag_tool = RagTool()

        return Agent(
            config=self.agents_config["knowledge_extractor"],
            verbose=True,
            memory=True,
            tools=[PDFSearchTool(config=config), serper_tool, rag_tool],
            multimodal=True,
            respect_context_window=True,
            max_rpm=10,
            function_calling_llm="gpt-4o-mini",
        )

    @agent
    def deck_creator(self) -> Agent:
        """
        Agent responsible for creating decks of study cards.
        Uses the extracted content to create multiple choice cards or flashcards.
        """

        serper_tool = SerperDevTool()

        return Agent(
            config=self.agents_config["deck_creator"],
            verbose=True,
            memory=True,
            tools=[serper_tool],
            respect_context_window=True,
            max_rpm=10,
            function_calling_llm="gpt-4o-mini",
        )

    @task
    def extract_content(self) -> Task:
        """
        Task to extract and analyze the content of the file.
        Identifies the main concepts, definitions, and information.
        """

        return Task(
            config=self.tasks_config["extract_content"],
            async_execution=True,
            agent=self.knowledge_extractor(),
            output_pydantic=ExtractContentOutput,
        )

    @task
    def create_deck(self) -> Task:
        """
        Task to create a deck with cards based on the extracted content.
        Creates multiple choice or flashcards depending on the specified type.
        """

        return Task(
            config=self.tasks_config["create_deck"],
            agent=self.deck_creator(),
            output_pydantic=CreateDeckOutput,
        )

    @crew
    def crew(self) -> Crew:
        """
        Configures the crew for processing the file and generating cards.
        """
        return Crew(
            agents=self.agents,
            tasks=self.tasks,
            process=Process.sequential,
            verbose=True,
            # planning=True,
            # planning_llm=ChatOpenAI(model="gpt-4o-mini"),
        )
1 Like

I suggest converting to a Flow, have a planner agent that studies the document and lists out all the sections that need cards created for and then your current crew that creates the cards looped to cover all the sections.

2 Likes

First of all, I’m new to the AI world, so I apologize for any dumb questions.

When you say ‘Planner Agent,’ are you suggesting that the Crew’s planning property should be set to true, just like in the docs?

No, I I may, you create an agent that generates the high level structure (i.e the Table Of Content - TOC) then another one that iterates through items in the TOC to expand and create the actual content.