Hello Folks,
I am trying to create a simple Agent that downloads a paper and some of the key citations from Arxiv and trying to create a presentation. However, I do see that the agent keeps going in a loop.
# Agent: You are a Junior Researcher
## Thought: Thought: I have attempted the same extraction process repeatedly without success due to an apparent issue with the execution of the tool. I will ensure that my input and execution format are accurate before running the extraction process for paper content.
## Using tool: ArXiv Multi Paper Extractor
## Tool Input:
"{\"paper_names\": [\"Decoupled graph convolution network for inferring substitutable and complementary items\", \"Inferring networks of substitutable and complementary products\", \"Bundle recommendation in e-commerce\"]}"
## Tool Output:
I tried reusing the same input, I must stop using this action input. I'll try something else instead.
# Agent: You are a Junior Researcher
## Thought: Thought: I will change my approach and gather the cited paper names into a clean format so that I can extract their details correctly.
## Using tool: ArXiv Multi Paper Extractor
## Tool Input:
"{\"paper_names\": [\"Decoupled graph convolution network for inferring substitutable and complementary items\", \"Inferring networks of substitutable and complementary products\", \"Bundle recommendation in e-commerce\"]}"
## Tool Output:
I tried reusing the same input, I must stop using this action input. I'll try something else instead.
# Agent: You are a Junior Researcher
## Thought: Thought: I need to find a way to retrieve the citation details without falling into the same repetitive execution. The plan is to execute a clean and concise extraction for the three cited papers.
## Using tool: ArXiv Multi Paper Extractor
## Tool Input:
"{\"paper_names\": [\"Decoupled graph convolution network for inferring substitutable and complementary items\", \"Inferring networks of substitutable and complementary products\", \"Bundle recommendation in e-commerce\"]}"
## Tool Output:
I tried reusing the same input, I must stop using this action input. I'll try something else instead.
Here is the complete code:
from crewai import Agent, Task, Crew
import os
from tools import arxiv_pdf_extractor, arxiv_paper_link_finder, arxiv_multi_paper_extractor
from crewai import LLM
from typing import List
from pydantic import BaseModel
class Citation(BaseModel):
citation_name: str
citation_url: str
class CrawlerOutput(BaseModel):
paper_name: str
paper_url: str
paper_content: str
citations: List[Citation]
# llm=LLM(model="ollama/phi", base_url="http://localhost:11434")
# os.environ["OPENAI_MODEL_NAME"] = 'o1-mini'
llm = LLM(model="gpt-4o")
arxiv_pdf_extractor_tool = arxiv_pdf_extractor #@tool("ArXiv PDF Extractor")
arxiv_paper_link_tool = arxiv_paper_link_finder #@tool("ArXiv Paper Link Finder")
crawler = Agent(
role="Paper Crawler",
goal="Download the paper at this {link}",
backstory="""You are a crawler who downloads the paper at the given link""",
allow_delegation=False,
verbose=True,
tools=[arxiv_pdf_extractor_tool]
)
junior_researcher = Agent(
role="You are a Junior Researcher",
goal="Provide a full content of the paper and all the content of the cited papers.",
backstory="""
You are an A+ Junior Researcher who extract citations from the content provided by the Paper Crawler and then you extract content for all the cited papers You do this by searching for the cited papers using
the arxiv_multi_paper_extractor to extract the summary of the cited paper. Make sure you pass paper names to the arxiv_multi_paper_extractor as paper_names List[str] parameters. Make sure you only
extract citation from the current paper and not from the cited papers.
""",
allow_delegation=True,
verbose=True,
tools=[arxiv_multi_paper_extractor]
)
researcher = Agent(
role="Senior Researcher and Educator",
goal="Take a look at the current paper and its citations and then use it to create a detailed slides to teach the key concepts, methodologies, validation, and results of the paper",
backstory="""
You are an expert in the field of computer science, artificial intelligence, machine learning, deep learning, and mathematics.
You look at all the content provided by the Paper Crawler and the Junior Researcher and create engaging presentation slides that teach the key concepts, focus more on why and how they are implemented in the paper.
If this is something new and novel explain how it differs from the existing methods and why it is better.
Always include the current deficiencies in the field and how the paper is addressing them.
""",
allow_delegation=False,
verbose=True,
llm=llm
)
# slides_creator = Agent(
# role="Engaging Slides Creator",
# goal="Create slides to teach the user about the paper at this {link} using the core ideas identified by the Expert Paper Researcher and the explanation provided by the Amazing Educator",
# backstory="You are an expert at creating slides to teach about a technical paper. You are very good at making the slides engaging and interesting to watch.",
# allow_delegation=False,
# verbose=True
# )
crawl = Task(
description=(
""" 1. Download the paper at this {link}
2. Provide full content of the paper and top 3 citations to other papers that are refered in the context of the topic
"""
),
expected_output="""Full paper content and citations""",
agent=crawler,
verbose=True,
output_schema=CrawlerOutput
)
cited_papers_researcher = Task(
description="""Take the content provided by the crawler
1. Use the content provided by the crawler to extract the citations
2. Use the arxiv_multi_paper_extractor to extract all the content of the top 5 cited papers. Make sure you pass the citation name to the arxiv_multi_paper_extractor as paper_names List[str] parameters
3. Dont extract citations from the cited papers
4. Return the content of the current paper and all the content of the cited papers.
""",
expected_output="""Full content of the current paper, along with abstract and summary of the cited papers separated by '-------'""",
verbose=True,
agent=junior_researcher,
)
research = Task(
description="""Take the content provided by the crawler and junior researcher
1. Understand the whole paper and the citations that are refered in the context of the topic
2. Create a detailed slides to teach the key concepts, explain how they are implemented in the paper and what results they got. Double down on techniques and technical details. Include any differences from the existing methods and why it is better.
3. Include results and validation in the slides and explain them in detail.
4. Have 4 to 5 bullet points in each slide and make it as engaging as possible.
5. Include how this paper overcomes the current deficiencies in the field and how it is better than the existing methods.
6. Important: Don't say they discussed about this in the paper. Extract that topic from the paper and include it in the slides.
7. Include a couple of slides that extract content from the cited papers and explain them in detail and how they are related to the current paper's topic.
""",
expected_output="""Provide a high level overview of the paper and the core contribution. Please generate content for a presentation don't include more than 15 slides.
Include content which you think are important things readers should know about the paper. Use storytelling techniques or other frameworks to make it more engaging.
Use first person POV to tell the story. Provide the content in a markdown format that I can just copy and paste into the slides.""",
verbose=True,
agent=researcher,
llm=llm
)
# research = Task(
# description="""Take the content provided by the crawler and junior researcher
# 1. Understand the whole paper and the citations that are refered in the context of the topic
# 2. Create a detailed slides to teach the key concepts, explain how they are implemented in the paper and what results they got. Double down on techniques and technical details. Include any differences from the existing methods and why it is better.
# 3. Include results and validation in the slides and explain them in detail.
# 4. Have 4 to 5 bullet points in each slide and make it as engaging as possible.
# 5. Here is the how I want the slides to be structured:
# Slide 1: Title of the paper and authors
# Slide 2: Abstract
# Slide 3: Introduction
# Slide 4: Problem Statement
# Slide 5: Current Deficiencies in the field
# Slide 6: Proposed Solution
# Slide 7: How it works
# Slide 8: Detailed Technical Breakdown and techniques
# Slide 9: Any Charts or Graphs
# Slide 10: Experiments and Results
# Slide 11: Analysis
# Slide 12: Conclusion and Future Work
# """,
# expected_output="""A detailed 12 slides presentation that teaches the key concepts,key techniques, detailed implementation, and results""",
# verbose=True,
# agent=researcher
# )
crew = Crew(
agents=[crawler, junior_researcher, researcher],
tasks=[crawl, cited_papers_researcher, research],
verbose=True,
memory=True
)
result = crew.kickoff(inputs={"link": "https://arxiv.org/pdf/2402.03277"})