CrewAI test scoring logic

What are the criteria behind the scores generated by the testing feature of crewAI. I understand they are generated by an LLM as a judge, but it would be helpful to know their logic. Also, LLMs assign number scores isn’t the best idea IMO, since LLMs have weak understanding of the difference between a 7.2 vs 7.45. Transparency on scoring criteria would help nonetheless making them more actionable. If I get a 7, how do I bring it to an 8? What led the LLM to penalize me by 3 points?