What are the criteria behind the scores generated by the testing feature of crewAI. I understand they are generated by an LLM as a judge, but it would be helpful to know their logic. Also, LLMs assign number scores isn’t the best idea IMO, since LLMs have weak understanding of the difference between a 7.2 vs 7.45. Transparency on scoring criteria would help nonetheless making them more actionable. If I get a 7, how do I bring it to an 8? What led the LLM to penalize me by 3 points?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Crew - How to keep the consistency of "Score" | 1 | 30 | March 17, 2025 | |
How does training & evaluation work in CrewAI? | 0 | 51 | February 25, 2025 | |
Decision making Algorithms used in CrewAI | 6 | 67 | January 22, 2025 | |
How does training & evaluation actually work in CrewAI? | 4 | 684 | November 12, 2024 | |
Inconsistent Results in Resume Ranking Application with CrewAI | 1 | 15 | March 11, 2025 |