Why am I not getting the expected number of articles from the tool output, and the last article is cut off?

Nexago · November 26, 2024, 8:23am

I am feeling frustrated because I’ve spent over three weeks trying to implement a simple solution. While I’ve managed to achieve the desired output twice, the results are inconsistent—when I run the crew again, the output doesn’t meet the requirements. I am seriously considering abandoning CrewAI and switching to a different framework, I need to finish this project.

Here’s what my crew is supposed to do:

Create 3 web search queries related some keywords. Status: Working correctly
Search for each query received 15 articles using SerperDevTool. Status: Working correctly
Select the 25 most relevant and impactful articles for the user. Issue: It’s only returning 4 articles, and the last one is incomplete.
Summarize the selected articles based on specific criteria using OpenAI. Status: Working correctly
Return 10 correctly summarized articles. Issue: It’s only returning 1 summarized article.

I have tried executing the crew activating and deactivating the memory, using managers and not.

crewai version: 0.80.0

This is my code:

crew.yaml:

@CrewBase
class ArticleResearchCrew:
“”“NexagoAgentsProject crew”“”

agents_config = "config/agents.yaml"
tasks_config = "config/tasks.yaml"

def prepare_data(self, data):
    start_date = (datetime.today() - timedelta(days=7)).strftime('%Y-%m-%d')
    end_date = datetime.today().strftime('%Y-%m-%d')
    keywords = " and ".join(data.get("keywords", []))
    return {
        "keywords": keywords,
        "startDate": start_date,
        "endDate": end_date,
        "numQueries": data.get("numQueries"),
        "userQuery": data.get("userQuery")
    }

def kickoff(self, data):
    try:
        prepared_data = self.prepare_data(data)
        crew_instance = self.crew()

        result = crew_instance.kickoff(inputs=prepared_data)
        logger.debug("Kickoff result: %s", result)
        return result

    except Exception as e:
        logger.error(f"Error in kickoff: {e}")
        return {"error": str(e), "status": "kickoff_failed"}

@agent
def query_generator(self) -> Agent:
   return Agent(
       config=self.agents_config['query_generator'],
       llm=llm,
       verbose = False,
       allow_delegation=False
   )

@agent
def article_researcher(self) -> Agent:
    serper_tool = SerperDevTool(n_results=15, tbs="qdr:w", url="https://google.serper.dev/news")
    return Agent(
       config=self.agents_config['article_researcher'],
       tools=[serper_tool],
       verbose = True,
       llm=llm
   )

@agent
def article_summarizer(self) -> Agent:
    article_validation = ArticleValidatorExtractorTool()
    summarizer_tool = ArticleSummarizerTool()
    return Agent(
       config=self.agents_config['article_summarizer'],
       tools=[article_validation, summarizer_tool],
       verbose = True,
       llm=llm
   )

@task
def query_generator_task(self) -> Task:
    return Task(
        config=self.tasks_config['query_generator_task'],
        expected_output=self.tasks_config['query_generator_task'].get('expected_output'),
        agent=self.query_generator()  
    )

@task
def article_research_task(self) -> Task:
   return Task(
       config=self.tasks_config['article_research_task'],
       expected_output=self.tasks_config['article_research_task'].get('expected_output'),
       agent=self.article_researcher()
   )

@task
def article_summarizer_task(self) -> Task:
   return Task(
       config=self.tasks_config['article_summarizer_task'],
       expected_output=self.tasks_config['article_summarizer_task'].get('expected_output'),
       agent=self.article_summarizer()
   )

@crew
def crew(self) -> Crew:
    """Creates the NexagoAgentsProject crew"""
    logger.debug("Initializing agents and tasks in crew")
    return Crew(
        agents=self.agents,  # Automatically created by the @agent decorator
        tasks=self.tasks,    # Automatically created by the @task decorator
        #manager_llm=llm,
        process=Process.sequential,
        #process=Process.hierarchical,
        verbose=True,
        memory=True
    )

agents.yaml:

query_generator:
role: >
Expert Web Queries Generator
goal: >
Generate {numQueries} unique, targeted search queries related to {keywords}.
These queries should be designed to retrieve high-quality articles or news published recently.
Prioritize queries that are likely to yield the most relevant and engaging content for users searching on {keywords}.
backstory: >
As a seasoned researcher with deep expertise in {keywords}, you stay informed about the most recent projects, developments, innovations, and potential impacts.
You use your knowledge to develop queries that expose cutting-edge research and new developments in {keywords}.

article_researcher:
role: >
Senior Articles Researcher specializing in {keywords}
goal: >
Do thorough research using SerperDevTool to find all recent articles about queries provided to you.
Your goal is to create a list of the 20 most relevant and up-to-date articles about {keywords}.
backstory: >
You are a seasoned {keywords} researcher with a knack for putting together the most relevant and recent articles about queries provided.
Your expertise lies to turn raw research into clear and comprehensive articles list.
You are very strict with the final output, you meet all requirements.

article_summarizer:
role: >
Senior Articles Summarizer specializing in {keywords}
goal: >
You will receive a list of articles, you will:
1. Validate each link using the ArticleValidatorExtractorToolValidate to ensure it leads to a full-length article and, if so, extract its content.
2. Pass each content extracted as a block to ArticleSummarizerTool to create comprehensive, detailed, and cohesive summary for each article.
3. Create a final list of articles about {keywords}.
backstory: >
You are a seasoned expert in evaluating URLs to confirm they lead to high-quality, full-length articles.
Your expertise lies in crafting well-structured, detailed summaries that provide valuable insights into {keywords} for industry professionals and researchers.
You are very strict with the final output, you meet all requirements.

tasks.yaml:

query_generator_task:
description: >
Generate {numQueries} distinct, high-quality web search queries and identify the most promising full articles published recently about {keywords}.
Follow these guidelines:
- Query variety:
- For the first query, use phrases that directly relate to “{userQuery}”, if it is not null. If query user doesn’t define a site for search, don’t include it. If userQuery is null, don’t define a query pass to second query.
- The second query must be: “Latest news, trends, new developments, latest releases and new studies in {keywords}”. Don’t define in this query a “site”
- Subsequent queries must differ significantly from each other, each addressing distinct aspects of the topic, such as social or business impacts, trends, new studies, or applications.
Ensure diverse perspectives are explored by using specific and relevant keywords.
- Article focus:
- Use “full-length articles” to avoid listing pages, blog posts, or video content.
- Use always “filetype:html” to ensure article pages only.
- Emphasize up-to-date and relevant information specifiying the timeframe: “after:{startDate} before:{endDate}”.
- Clarity and relevance:
- Ensure each query targets the retrieval of insightful, engaging, and high-quality content that aligns with the specified {keywords}.
- Avoid ambiguous terms and focus on precision to enhance search accuracy.
- Web search format:
- Queries must have a structured web search format, varying in structure to ensure diverse results.
expected_output: >
A single list of {numQueries} well-structured web search queries with the following structure:
- queryText (str format)
agent: query_generator

article_research_task:
description: >
Conduct thorough research using each query received and SerperDevTool. Ensure the following:
- Use all {numQueries} queries for articles searching.
- The URLs should link to full-length articles, avoiding listing pages.
- Select the 25 most relevant articles found, taking in consideration:
- Each article found is directly related to the search queries and provides insightful, relevant, and engaging content published between {startDate} and {endDate}.
- Relevance score criteria:
- society impact level: up to 3 points
- business impact level: society impact level: up to 3 points
- audience engagement level: society impact level: up to 4 points
- Content:
- 40% of articles directly address {userQuery}.
- 40% focus on the latest news, developments, or research.
- 20% cover other relevant or related topics.
expected_output: >
- A structured single list containing the 25 most relevant articles, each one with following data:
- title
- link (raw URL)
- snippet
- date (date of publication, format: YYYY-MM-DD)
- source
- query (query used to find this article)
- All information should be factual, and no content should be invented.
- MANDATORY:
- Return 25 articles, don’t return less.
- Don’t return a list with articles without complete data.
agent: article_researcher

article_summarizer_task:
description: >
- For each article of the provided list, perform the following steps in order:
- Use the ArticleValidatorExtractorTool to validate its link and then, extract its content.
If the link does not lead to a valid article, exclude it from extraction and other further processing.
- Use the ArticleSummarizerTool to generate a summary of the content extracted.
ArticleSummarizerTool must receive the content extracted complete, don’t truncate it.
If you cannot summarize a content article, exclude it from further processing.
- Create a final list of 10 articles that meet these criteria:
- Information is accurate and directly sourced from the original articles.
- Articles are relevant and engaging for {keywords} experts.
- Articles that fail any step are excluded from the output.
- Ensure that the full text summary generated by ArticleSummarizerTool is included without modification or truncation, mantain paragraph separation.
- MANDATORY:
- Process all articles of the list.
- If some article retrieves error, pass to following article, until have finished all the list.
expected_output: >
- A single structured list of the 10 most relevance articles found in JSON format.
- Each article must contain:
- title
- link (only raw URL)
- date (date of publication, format: YYYY-MM-DD)
- source
- summarize (Ensure full text generated by ArticleSummarizerTool is included without truncation, mantain also paragraph separation)
- MANDATORY: return 10 articles, don’t return less. If you cannot do it, explain why.
agent: article_summarizer

I will appreciate any help.

rokbenko · November 26, 2024, 10:22am

Let’s start step-by-step, with the first issue.

Can you confirm that you’re not hitting the context window of the LLM you’re using?

If that’s not the case, can you confirm that the tool itself you’re using for getting the articles returns 25 articles? Set the verbose parameter to True to all your agents and crews to see what’s happening behind the scenes.

If that’s the case, then the agent is modifying the tool’s output. You can force tool output as a result by setting the result_as_answer parameter to True. As stated in the docs:

This parameter ensures that the tool output is captured and returned as the task result, without any modifications by the agent.

bigilad · November 26, 2024, 6:21pm

i would start with saving the serper results to text file and reading its documentation on how to get more results. there are other search tools as well (exa, tavily)
break down the issue into smaller ones
also if the temp. of llm is not 0 you will never get the same results

Nexago · November 26, 2024, 7:26pm

Thanks for your help. I have You can force tool output as a result by setting the result_as_answer parameter to True and setup the llm tokens to a higher value.

rokbenko · November 26, 2024, 9:08pm

@Nexago Glad we solved the issue!

system · November 27, 2024, 9:08pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Agent stops in the middle and never goes to the end of the input list CrewAI Community Support openai , agent , task	7	341	December 12, 2024
Error passing outputs between agents CrewAI Community Support	7	571	November 25, 2024
Crew's full_output attribute is not generating output of all tasks CrewAI Community Support crewai	2	330	January 24, 2025
Why is the final answer a summary of what the crew did instead of a detailed full report? General	2	154	December 12, 2024
Issue with crew.kickoff() only returning information from the last task (restaurants) in FastAPI, but working correctly in Jupyter Notebook CrewAI Community Support openai , tools_issues , agent , crewai	4	251	December 9, 2024

Why am I not getting the expected number of articles from the tool output, and the last article is cut off?

Related topics