I build a AMS project using crewai that integrate with JIRA,Servicenow,Github,Knowledge base(RAG) I am facing Hallucination issue, can you please tell how to fix the issue. I am facing lot of issues
That’s pretty cool that you go all those integrations working. Would love to hear how easy it was. Or hard. I’d like to try the same thing for my company.
As for hallucinations. Do you have observability on your LLM calls? This will help you in identifying where in your pipeline you need more focus. Hallucinations is a challenge we all face.
Look at your task prompt and be sure to tell it to only use retrieved sources.
Look at your retrieved chunks to see if it is even possible to answer the question based on the chunks. These first 2 is great first place to look.
Afterwards, you will know if you need to change your embedding model, or prompt expansions.
Then you can check your output response and play with the LLM used in response.
Lastly add citations so you can instill credibility.
Good luck
This is not happening in rag, it is happening with JIRA,Sevicenow, even though there is no related to that project ID and that project ID at all not present it showing that it is creating the ticket, same like ServiceNow also it is generating random INC numbers and URLs
Great work.
For me when i face hallucinations I break down the crew into smaller parts. It will take a little time, but break it down and then find the problematic area.
Hope this helps
I’m curious, what sort of hallucinations they have been for example?
Generally they will get lost if i am too broad.. I call a hallucination anything that goes off my script
Could it be due to temperature setting of the LLM? Which LLM are you using and can u share settings?
This usually comes down to task boundary leakage + missing termination conditions.
In CrewAI I’ve seen hallucinations spike when agents are allowed to “self-extend” tasks without explicit success/fail criteria.
What helped me:
-
hard stop conditions per task
-
explicit “no new assumptions” system instruction
-
logging intermediate agent thoughts (even briefly) to spot divergence
Curious — are your hallucinations appearing mid-task or at final output?
Interesting problem — this might actually be a bit different from the usual “hallucination” people talk about.
In many agent systems the issue isn’t only hallucinated text, but what I’d call action hallucination. The agent starts executing tools or workflows even when the underlying signal is weak or ambiguous.
When tools like JIRA or ServiceNow are connected, a few things can help:
• Make tool descriptions extremely explicit about when they should be used.
• Add constraints in the task prompt (for example: only create tickets if a verified incident exists).
• Log the reasoning + tool calls so you can trace why the agent decided to create something.
In practice I’ve found the biggest stability improvement comes from adding a validation layer between the agent and external systems. Instead of executing actions immediately, the system checks whether the action is consistent with the context.
Curious if others here have run into similar “action hallucination” behavior when agents interact with enterprise tools.
@Bin_Zhang Thank you for the explanation. I would like to better understand what is meant by the validation layer. Is it intended to function as a set of guardrails, checkpoints, or another form of control mechanism? Additionally, where would this layer sit within the architecture, and how would adding an extra layer contribute to improved performance? Is it possible to achieve a similar result by designing the tasks to be more deterministic instead?
Good question.
The validation layer I mentioned is basically a control point between the agent and external systems.
Instead of letting the agent execute actions directly, the flow becomes:
agent → validation layer → tools / APIs
So the agent proposes an action, and the system checks it before execution.
Typical checks are simple things like:
• does the action match the current context
• is the tool call structurally valid
• is the action allowed by policy
• should the action be blocked or delayed
The reason I find this useful is that many agent failures are not just text hallucinations but what I call action hallucination.
The agent decides to execute something even though the signal is weak or ambiguous.
This becomes risky once agents are connected to systems like JIRA, databases, or internal APIs.
The validation layer acts as a control point between reasoning and execution.
Another benefit is that every action can be logged so the full chain of decisions can be reconstructed later.
I’ve been experimenting with this idea as an execution-integrity layer for agent runtimes:
Still early exploration, but the goal is to make action traces deterministic and portable across frameworks like LangGraph, CrewAI, and AutoGen.
