Issue: LLM Hallucination in Structured Data Extraction (CrewAI)

Category: CrewAI, Tools Issue, Task Processing

Description of the Problem:

We are using CrewAI to extract structured real estate details from property descriptions provided in a CSV file. The dataset contains columns such as House Address, Guide Price, and Description, where the Description column contains relevant property details like:

  • Tenure (Freehold/Leasehold)
  • Number of Bedrooms
  • House Type (Flat, Detached, Bungalow, etc.)
  • Annual Ground Rent (If applicable)
  • EPC Rating (Energy Efficiency Rating)

The issue arises when extracting structured details from the unstructured Description field. The LLM is hallucinating incorrect values and generating outputs that do not match the dataset.

What We Have Tried:

  1. Refining the Task Prompt
  • Clearly instructing the agent to extract only explicitly mentioned details and return “Not Provided” for missing fields.
  • Ensuring correct UK real estate terminology is used.
  1. Adjusting the LLM Configuration
  • Setting temperature to 0.0 to reduce randomness.
  • Using DeepSeek-R1 (1.5B) on Ollama for structured text extraction.
  1. Testing the CSV Tool Independently
  • The CSVSearchTool correctly retrieves the Description column.
  • However, when the agent processes the descriptions, it fabricates incorrect data.

Code Implementation Overview:

  • Agents Configuration (agents.yaml)

    • Role: Real Estate Data Analyst
    • Goal: Extract only factual details without making assumptions.
    • Backstory: Ensures structured extraction and correct data alignment.
  • Task Definition (tasks.yaml)

    • Extract details from the Description column and return structured CSV output.
    • Strict rules to avoid hallucination and ensure factual accuracy.
  • Crew Configuration (crew.py)

    • LLM: DeepSeek-R1 (1.5B)
    • Task Inputs: Property descriptions from the CSV.
    • Output: Structured CSV file with extracted details.

Current Issue:

  • The agent sometimes makes up property details (wrong house type, incorrect tenure, fabricated EPC ratings).
  • The output CSV is not aligned with the original dataset.
  • Even when explicitly told to return “Not Provided” for missing data, it generates incorrect values instead.

Question to the Community:

Has anyone faced similar issues with CrewAI or LLM hallucination in structured data extraction?
Are there specific techniques or settings that helped in ensuring factual consistency when extracting structured details?

Any insights on refining prompting techniques, CrewAI task configuration, or LLM adjustments to mitigate hallucination would be greatly appreciated

Have you tried using Pydantic models?

Hello Alain!!. I believe I have not tried them, Would you please guide me on this? Currently im running deepseek 1.5b r1 on my local and Using that.

check the doc, also my experience is that smaller models perform poorly when it comes to structured output. Experiment with other models if you can, work on your prompts, include examples … I am no expert but I struggled with this as well.

Respected Alain, Thank you so much for the suggestion.

I have been trying alot of deepseek models, Currently Im even not using any TOOL like CSV or JSON. I am manually preprocessing the Csv file and giving only the required Column to it.
Still im not getting exactly what i require.

I tried giving the complete thingy as a prompt to the same model without CrewAI. I got the Right output.

It will be very helpful if someone can guide me on this blocker

Hi, @Naveed_Ali. I think you’d have more success if you could illustrate your problem in a more concrete way. Even if your solution contains confidential data, you could anonymize them and perhaps generate 5 or 10 sample cases (CSV rows) of the source data and examples of how you would like the final data to be presented.