Inconsistent Results in Resume Ranking Application with CrewAI

I’ve built a resume ranking application using CrewAI that matches resumes against job descriptions. While the code runs without errors, I’m facing a critical issue: the same set of resumes receives different rankings and scores with each run, making the application unreliable for business use.

below are the yaml files that I used:

Tasks.yaml

talent_acquisition_task:
description: >
Go through all the resumes in the given drive link {link}. Look through each and
every document carefully for the requirements stated in {requirements}.
expected_output: >
Those who qualify for a minimum 75% of the requirements, their resumes are
then sent to senior recruiter.
agent: talent_acquisition_specialist

recruiting_task:
description: >
Review all the resumes sent by talent acquisition specialist and shortlist the
candidates for interviews ensuring there are no duplicate resumes. Also,
segregate the names of those candidates who didn’t get shortlisted for
interviews but could be a good fit for other job roles.
expected_output: >
State the names of candidates who got shortlisted for interviews along with
their requirement match percentage and the detailed reason why they were
chosen. Sort these candidates in the descending order of match percentage
starting with the candidates with highest score.
State the names of those candidates who did not get shortlisted for interviews
but could be a good fit for other roles based on their resume
and they should be different from those shortlisted candidates. Candidates who
have been already shortlisted should not be considered for other roles.
Do not change the match percentage score of a candidate when its profile is
submitted again and again.

agent: senior_recruiter

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

agents.yaml

talent_acquisition_specialist:
role: >
Talent Acquisition Specialist
goal: >
Review the resumes submitted by candidates
backstory: >
You’re a part of talent acquisition at a leading Multinational Company and have
been known to get the best candidates for the required position.

senior_recruiter:
role: >
Senior Recruiter
goal: >
Compare the resumes sent by talent acquisition specialist and shortlist
candidates for interview
backstory: >
You’re a senior recruiter at a leading Multinational Company and have been
known to shortlist the outstanding candidates for interviews.

How can I achieve completely deterministic and consistent results from CrewAI for resume evaluation without making substantial code changes?

Is there a recommended pattern for ensuring LLM outputs remain identical across multiple runs when using CrewAI? Are there any configurations or techniques specifically for enforcing consistency in agent evaluations?

Any solutions from the community would be greatly appreciated!

Hi Tanmay,
The first point that you need to understand is: AI/LLM’s have very low repeatability.
What does this mean? Consider a calculator, every time you enter numbers with division/multiplications etc you are always guaranteed to get the same result for the same inputs, 100% repeatability. LLM’s, as you have discovered have a very low repeatability score so will often give differing results. Semantically all LLM results may be correct, although the semantics will be accurate the actual text may differ.
Going deeper: LLM’s do work within bounds of repeatability in that they will always provide a list of probabilities for what the next word should be and choose based on that probability factor. The complexity and variance comes from the fact that these probabilities are based on context which could be many words before. As these adjacent contexts themselves have subtle differences due to the fact that they are in place due to previous considerations of probabilities, it becomes almost impossible for an LLM to have a good repeatability score.

You can improve/narrow LLM repeatability by giving it a more ‘defined’ task/prompt. Can I suggest that you try splitting the task over several agents. Thereby allowing you to focus your prompts on more specific requirements.
Hope this helps.

2 Likes

Thanks for an insightful reply.