What are the criteria behind the scores generated by the testing feature of crewAI. I understand they are generated by an LLM as a judge, but it would be helpful to know their logic. Also, LLMs assign number scores isn’t the best idea IMO, since LLMs have weak understanding of the difference between a 7.2 vs 7.45. Transparency on scoring criteria would help nonetheless making them more actionable. If I get a 7, how do I bring it to an 8? What led the LLM to penalize me by 3 points?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| CrewAI studio Featurs | 1 | 48 | February 2, 2026 | |
| CrewAI o1-preview support | 1 | 178 | September 20, 2024 | |
| Planning and Self learning (React) | 0 | 73 | August 22, 2025 | |
| CrewAI Crew Factory project | 2 | 639 | June 22, 2025 | |
| Welcome to CrewAI! :wave: | 0 | 1464 | September 6, 2024 |