New to CrewAI & I’m investigating how I can remove some of the ‘fuzziness’ in the interfaces between tasks by way of having a known input & output data schema.
Got this from Crew GPT Assistant:
from crewai import Agent, Task, Crew, Process
from pydantic import BaseModel, Field
# Define a Pydantic model to validate order data
class OrderInputModel(BaseModel):
order_id: int
product_name: str
quantity: int = Field(gt=0, description="Quantity must be greater than zero")
# Define a Pydantic model for the output structure
class OrderAnalysisOutputModel(BaseModel):
total_orders: int
top_product: str
total_quantity: int
# Create an agent to analyze orders
order_analyst = Agent(
role='Order Analyst',
goal='Analyze the list of orders and provide a summary report.',
verbose=True,
memory=True,
backstory="You are an expert in processing and analyzing order data."
)
# Define the task using the Pydantic models for input and output validation
order_analysis_task = Task(
description="Analyze the incoming list of orders and provide a report on total orders, top product, and total quantity.",
input_model=OrderInputModel, # Pydantic model for input validation
output_model=OrderAnalysisOutputModel, # Pydantic model for output structure
agent=order_analyst
)
# Define the crew and process
order_analysis_crew = Crew(
agents=[order_analyst],
tasks=[order_analysis_task],
process=Process.sequential
)
# Sample input data (list of orders)
input_data = [
{"order_id": 1, "product_name": "Laptop", "quantity": 2},
{"order_id": 2, "product_name": "Monitor", "quantity": 5},
{"order_id": 3, "product_name": "Laptop", "quantity": 3}
]
# Kick off the task
result = order_analysis_crew.kickoff(inputs={'input_data': input_data})
# The result will be validated and structured by the output_model
print(result)
I am guessing that ‘input_model/output_model’ params are hallucinations as I can not find another reference to them! However, would be good if it could work that way: pydantic model → input pydantic model → output
Back to my question: How can I get a consistent data schema as output from a task & how to use that known schema within the next task?
@matt this seems to be coming up everywhere when you look at the posts over the last few days here. I agree output_pydantic or json has helped me get more consistent results. Even a dumb dictionary in the narrative for task expected output helps. It would be great to do a somewhat sophisticated example of this and talk about best practices to intake this in a subsequent task.
I was going to start some experiments on using all caps, threatening the llm, using special delimiters around stuff to pass between tasks, etc to compare correctness and consistency of the recognition of these items in the jumble of context passed to the model as complexity grows in the prompt. Does the team have any data on this?
I haven’t looked in the code, but how does crewai treat the data in the json object when it is output from a task? Do you embed it any special way in the overall context you keep for the whole process?
From the docs: Output Pydantic(optional)output_pydantic Outputs a Pydantic model object, requiring an OpenAI client. Only one output format can be set.
Outputs a Pydantic model object
requiring an OpenAI client
Only one output format can be set
All 3 points raise more questions than answers for me
This is a straightforward example of structured output using crewai tools. What if you then use task.output, take what you want from that and pass it as context to the next task?
I’m trying to get output_json to work as well. I’m struggling to find the right syntax.
In the task I’m adding setting expected_output to the string “Return the results as JSON following this schema [{ ‘url’: ‘url’, ‘location’: ‘location’, ‘expiration_date’: ‘expiration_date’}]”.
class ResearchReport(BaseModel):
Title: str = Field(…, description=“The title of the article.”)
URL: str = Field(…, description=“The URL of the article.”)
Author: List[str] = Field(…, description=“An author of the article.”)
Published_Date: str = Field(…, format=“MM/YYYY”, description=“The publication date of the article using the format MM/YYYY.”)
Methodology: str = Field(…, description=“The methods or procedures used in the experiment in the article in great DETAIL.”)
Drug: str = Field(…, description=“The specific drug tested in the article.”)
Dosage: str = Field(…, description=“The dosage amount and frequency of the drug used in the article.”)
Results: str = Field(…, description=“The results or outcome of the experiment in the article in great DETAIL.”)
class ResearchReportList(BaseModel):
root: List[ResearchReport]