What are the criteria behind the scores generated by the testing feature of crewAI. I understand they are generated by an LLM as a judge, but it would be helpful to know their logic. Also, LLMs assign number scores isn’t the best idea IMO, since LLMs have weak understanding of the difference between a 7.2 vs 7.45. Transparency on scoring criteria would help nonetheless making them more actionable. If I get a 7, how do I bring it to an 8? What led the LLM to penalize me by 3 points?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
CrewAI o1-preview support | 1 | 153 | September 20, 2024 | |
Planning and Self learning (React) | 0 | 29 | August 22, 2025 | |
CrewAI Crew Factory project | 2 | 443 | June 22, 2025 | |
Welcome to CrewAI! :wave: | 1 | 1115 | September 9, 2024 | |
Conversational crew v1 | 3 | 155 | January 27, 2025 |