Tool Usage not guaranteed

Hello
i am using a combination of agents that need to interact together to build a state machine representing a system :

def vision_agent(self) -> Agent:
    return Agent(
        role="HMI Analyser",
        goal="Command a robot to observe and interact with HMI in a strict sequence",
        backstory="Specialized in procedural HMI analysis with strict operation ordering",
        llm=self.llm,
        tools=[get_icons_tool, get_text_elements_tool],
        memory=MilvusMemory(),
        verbose=True
    )
def control_agent(self) -> Agent:
    return Agent(
        role="You are a Senior HMI Tester who Excels at taking decisions to explore HMI System Functionnalities",
        goal="Your Goal is to Command a Robot through tools to explore the HMI system you are testing",
        backstory="Specialized in procedural HMI analysis with strict operation ordering",
        llm=self.llm,
        tools=[click, double_click, swipe],
        memory=MilvusMemory(),
        verbose=True
    )

and these are the tasks :

def scan_interface_task(self) -> Task:
    return Task(
        description="""
            <description>
                <TOOLS_DESCRIPTIONS>
                    <tool>
                        <name>get_icons_tool</name>
                        <description>Triggered when we need to get the icons currently displayed in the HMI system. This tool does not require parameters.</description>
                    </tool>
                    <tool>
                        <name>get_text_elements_tool</name>
                        <description>Triggered when we need to get the text elements currently displayed in the HMI system. This tool does not require parameters.</description>
                    </tool>
                </TOOLS_DESCRIPTIONS>
                
                <RULES>
                    <rule>1. RETURN ONLY the completed JSON object. No extra explanation or output is allowed.</rule>
                </RULES>
                
                <INSTRUCTIONS>
                    <instruction> 0. Make sure to start with the <think> tag </instruction>
                    <instruction>1. Use the Tool: get_icons_tool to get the Current Icons. Fail the Task if the tool is not used.</instruction>
                    <instruction>2. Use the Tool: get_text_elements_tool to get the Current Text elements. Fail the Task if the tool is not used.</instruction>
                    <instruction>3. DO NOT hardcode or simulate results — always call the tools to fetch real-time data.</instruction>
                    <instruction>4. Populate the results into a JSON object with the following structure:
                        <json_structure>
{
    "id": "",
    "Current_Icons": [/* list of icon names */],
    "Current_text": [/* list of text elements */],
    "ui_state": {
        /* for each icon: { "interactable": true/false, "interaction_type": ["click/swipe/double_click"], "position": [] } */
    },
    "text_state": {
        /* for each text element: { "interactable": true/false, "interaction_type": ["click/swipe/double_click"], "position": [] } */
    }
}
                        </json_structure>
                    </instruction>
                    <instruction>5. The JSON keys and structure are fixed and must be followed exactly.</instruction>
                    <instruction>6. Interaction metadata (interactable, interaction_type, position) should be initialized to default values as shown in the example.</instruction>
                    <instruction>7. If either tool fails or returns no data, fail the task accordingly.</instruction>
                </INSTRUCTIONS>
            </description>
        """,
        expected_output="""
            <expected_output>
                <description>A JSON object representing the current UI state with detected icons and text.</description>
                <example_format>
{
    "id": "{node_id}",
    "Current_Icons": [{use_tool_to_get_it}],
    "Current_text": [{use_tool_to_get_it}],
    "ui_state": {
        /* for each icon: { "interactable": true, "interaction_type": ["click / swipe/ doubleclick"], "position": [] } */
    },
    "text_state": {
        /* for each text element: { "interactable": true, "interaction_type": ["click / swipe/ doubleclick"], "position": [] } */
    }
}
                </example_format>
            </expected_output>
        """,
        output_file="outputs/scan_output.json",
        output_json=ScanResult,
        agent=self.vision_agent(),
        tools=[get_icons_tool, get_text_elements_tool]
    )

def execute_action_task(self, context) -> Task:
    tools_description = """
        <TOOLS_DESCRIPTION>
            <tool_requirement>
                <rule>You MUST use one of these tools for every action. Never bypass them.</rule>
                <tool>
                    <name>click</name>
                    <description>Single press on a UI element. Parameters: {"element_id": "string"}</description>
                    <purpose>Select buttons/icons/text.</purpose>
                </tool>
                <tool>
                    <name>double_click</name>
                    <description>Two rapid presses. Parameters: {"element_id": "string", "interval_ms": 300}</description>
                    <purpose>Zoom/shortcuts/advanced menus.</purpose>
                </tool>
                <tool>
                    <name>swipe</name>
                    <description>Directional movement. Parameters: {"start_x": int, "start_y": int, "end_x": int, "end_y": int}</description>
                    <purpose>Scroll/swipe between screens.</purpose>
                </tool>
            </tool_requirement>
        </TOOLS_DESCRIPTION>
    """

    return Task(
        description=f"""
            <task_description>
                {tools_description}
                
                <INSTRUCTIONS>  
                    <instruction> 0. Make sure to start with the <think> tag </instruction>
                    <instruction>1. Carefully analyze the provided context including:
                        <subpoint>- Previous interface state</subpoint>
                    </instruction>
                    <instruction>2. Examine the current scanned interface data (icons, text elements)</instruction>
                    <instruction>3. Determine the most appropriate tool (click, double_click, or swipe) based on:
                        <subpoint>- The current interface state</subpoint>
                        <subpoint>- The historical context</subpoint>
                    </instruction>
                    <instruction>4. Make sure to Call the Tool Api provided to Command the Robot otherwise fail the Task</instruction>
                    <instruction>5. Construct a JSON transition node using the Template Provided:
                        <subpoint>- DO NOT COPY THE TEMPLATE AS IS</subpoint>
                        <subpoint>- MAKE SURE TO REPLACE ALL FIELDS WITH ACTUAL DATA</subpoint>
                    </instruction>

                    <RULES>
                        <rule>- Must thoroughly analyze context before selecting action</rule>
                        <rule>- Must wait for action execution to finish before post-action scan</rule>
                        <rule>- Final output must conform exactly to this schema:
                            <schema>{JSON_TEMPLATE}</schema>
                        </rule>
                    </RULES>
                </INSTRUCTIONS>
            </task_description>
        """,
        expected_output="<expected_output>A structured JSON representation of the HMI state transition with context analysis.</expected_output>",
        output_file="outputs/output.txt",
        output_json_schema=GraphModel,
        agent=self.control_agent(),
        tools=[click, double_click, swipe],
        max_retries=5,
        context=[context]
    ) 

the problem is i can’t guarantee a 100% tool use for each iteration , the process of the graph build can take a very long time and many iteration so a single absence of a tool can mess things up , should i use a flow instead or should i fix my prompts?
your insights are appreciated

Which model are you using>

Open source models are inconsistent when it comes to tool use so you need to be careful with them.

I would use a flow and a validator. So you can run the crew, test it and re-run if you don’t get the right result.
Worth trying lots of methods and feedback here

1 Like

Can you share your full code?

from crewai import Agent, Task, Process, Crew, LLM

from crewai.project import CrewBase, agent, task, crew

from vlm_interfaces import VLMClient

from crewai.tools import tool

from Memory.Vector import MilvusMemory

from Validation.model import GraphModel ,Node ,UIElement, ScanResult

vlm_client = VLMClient(ip_address="localhost:3000")

@tool
def get_icons_tool():
    """Fetch all visible icons from the HMI."""
    print("USING TOOLS  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
    return vlm_client.get_icons()


@tool
def get_text_elements_tool():
    """Fetch all visible text elements from the HMI."""
    # print("USING TOOLS  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
    return vlm_client.get_text_elements()
@tool 
def click(element):
    """Command the Robot to click on element"""
    print("USING Click  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
    return vlm_client.click(element)
@tool 
def double_click(element):
    """Command the Robot to double_click on element"""
    print("USING doubleClick  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")

    return vlm_client.double_click(element)
@tool 
def swipe(element, path_list):
    """        
        Perform a swipe gesture using the unified Execute command.
        Args:
        points: List of (x, y, z) tuples representing swipe path.
    """
    print("USING SWIPPPPPPPPPPPE  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")

    return vlm_client.swipe(element,path_list)



class VLMCore:
    """Class-based Crew for HMI analysis using Vision-Language and robotic control."""

    def __init__(self):
        self.llm = LLM(
            model="ollama/deepseek-r1:8b",
            base_url="http://192.168.22.28:5000",
            temperature=0.1,
            timeout=999999
        )

    def vision_agent(self) -> Agent:
        return Agent(
            role="HMI Analyser",
            goal="Command a robot to observe and interact with HMI in a strict sequence",
            backstory="Specialized in procedural HMI analysis with strict operation ordering",
            llm=self.llm,
            tools=[get_icons_tool, get_text_elements_tool],
            memory=MilvusMemory(),
            verbose=True
        )
    def control_agent(self) -> Agent:
        return Agent(
            role="You are a Senior HMI Tester who Excels at taking decisions to explore HMI System Functionnalities",
            goal="Your Goal is to Command a Robot through tools to explore the HMI system you are testing",
            backstory="Specialized in procedural HMI analysis with strict operation ordering",
            llm=self.llm,
            tools=[click, double_click, swipe],
            memory=MilvusMemory(),
            verbose=True
        )

    def scan_interface_task(self) -> Task:
        return Task(
            description="""
                <description>
                    <TOOLS_DESCRIPTIONS>
                        <tool>
                            <name>get_icons_tool</name>
                            <description>Triggered when we need to get the icons currently displayed in the HMI system. This tool does not require parameters.</description>
                        </tool>
                        <tool>
                            <name>get_text_elements_tool</name>
                            <description>Triggered when we need to get the text elements currently displayed in the HMI system. This tool does not require parameters.</description>
                        </tool>
                    </TOOLS_DESCRIPTIONS>
                    
                    <RULES>
                        <rule>1. RETURN ONLY the completed JSON object. No extra explanation or output is allowed.</rule>
                    </RULES>
                    
                    <INSTRUCTIONS>
                        <instruction> 0. Make sure to start with the <think> tag </instruction>
                        <instruction>1. Use the Tool: get_icons_tool to get the Current Icons. Fail the Task if the tool is not used.</instruction>
                        <instruction>2. Use the Tool: get_text_elements_tool to get the Current Text elements. Fail the Task if the tool is not used.</instruction>
                        <instruction>3. DO NOT hardcode or simulate results — always call the tools to fetch real-time data.</instruction>
                        <instruction>4. Populate the results into a JSON object with the following structure:
                            <json_structure>
    {
        "id": "",
        "Current_Icons": [/* list of icon names */],
        "Current_text": [/* list of text elements */],
        "ui_state": {
            /* for each icon: { "interactable": true/false, "interaction_type": ["click/swipe/double_click"], "position": [] } */
        },
        "text_state": {
            /* for each text element: { "interactable": true/false, "interaction_type": ["click/swipe/double_click"], "position": [] } */
        }
    }
                            </json_structure>
                        </instruction>
                        <instruction>5. The JSON keys and structure are fixed and must be followed exactly.</instruction>
                        <instruction>6. Interaction metadata (interactable, interaction_type, position) should be initialized to default values as shown in the example.</instruction>
                        <instruction>7. If either tool fails or returns no data, fail the task accordingly.</instruction>
                    </INSTRUCTIONS>
                </description>
            """,
            expected_output="""
                <expected_output>
                    <description>A JSON object representing the current UI state with detected icons and text.</description>
                    <example_format>
    {
        "id": "{node_id}",
        "Current_Icons": [{use_tool_to_get_it}],
        "Current_text": [{use_tool_to_get_it}],
        "ui_state": {
            /* for each icon: { "interactable": true, "interaction_type": ["click / swipe/ doubleclick"], "position": [] } */
        },
        "text_state": {
            /* for each text element: { "interactable": true, "interaction_type": ["click / swipe/ doubleclick"], "position": [] } */
        }
    }
                    </example_format>
                </expected_output>
            """,
            output_file="outputs/scan_output.json",
            output_json=ScanResult,
            agent=self.vision_agent(),
            tools=[get_icons_tool, get_text_elements_tool]
        )

    def execute_action_task(self, context) -> Task:
        tools_description = """
            <TOOLS_DESCRIPTION>
                <tool_requirement>
                    <rule>You MUST use one of these tools for every action. Never bypass them.</rule>
                    <tool>
                        <name>click</name>
                        <description>Single press on a UI element. Parameters: {"element_id": "string"}</description>
                        <purpose>Select buttons/icons/text.</purpose>
                    </tool>
                    <tool>
                        <name>double_click</name>
                        <description>Two rapid presses. Parameters: {"element_id": "string", "interval_ms": 300}</description>
                        <purpose>Zoom/shortcuts/advanced menus.</purpose>
                    </tool>
                    <tool>
                        <name>swipe</name>
                        <description>Directional movement. Parameters: {"start_x": int, "start_y": int, "end_x": int, "end_y": int}</description>
                        <purpose>Scroll/swipe between screens.</purpose>
                    </tool>
                </tool_requirement>
            </TOOLS_DESCRIPTION>
        """

        return Task(
            description=f"""
                <task_description>
                    {tools_description}
                    
                    <INSTRUCTIONS>  
                        <instruction> 0. Make sure to start with the <think> tag </instruction>
                        <instruction>1. Carefully analyze the provided context including:
                            <subpoint>- Previous interface state</subpoint>
                        </instruction>
                        <instruction>2. Examine the current scanned interface data (icons, text elements)</instruction>
                        <instruction>3. Determine the most appropriate tool (click, double_click, or swipe) based on:
                            <subpoint>- The current interface state</subpoint>
                            <subpoint>- The historical context</subpoint>
                        </instruction>
                        <instruction>4. Make sure to Call the Tool Api provided to Command the Robot otherwise fail the Task</instruction>
                        <instruction>5. Construct a JSON transition node using the Template Provided:
                            <subpoint>- DO NOT COPY THE TEMPLATE AS IS</subpoint>
                            <subpoint>- MAKE SURE TO REPLACE ALL FIELDS WITH ACTUAL DATA</subpoint>
                        </instruction>

                        <RULES>
                            <rule>- Must thoroughly analyze context before selecting action</rule>
                            <rule>- Must wait for action execution to finish before post-action scan</rule>
                            <rule>- Final output must conform exactly to this schema:
                                <schema>{JSON_TEMPLATE}</schema>
                            </rule>
                        </RULES>
                    </INSTRUCTIONS>
                </task_description>
            """,
            expected_output="<expected_output>A structured JSON representation of the HMI state transition with context analysis.</expected_output>",
            output_file="outputs/output.txt",
            output_json_schema=GraphModel,
            agent=self.control_agent(),
            tools=[click, double_click, swipe],
            max_retries=5,
            context=[context]
        )

sorry about you having to edit my code i am still new to posting on forums

1 Like

no worries. for code you just need to wrap it in these:

Screenshot 2025-05-23 at 10.37.54

also, here’s some guidelines to follow:

1 Like

So when you’re using the @tool decorator, here’s the deal: you’ve gotta nail your type hints. That means clearly specifying the data type for every single parameter your function takes. Why bother? Because that’s the exact info your LLM needs to call the tools correctly.

2 Likes