SQL Agent response time and multi user q&a

Hi,

I am building a crew AI agent for converting Natural language question to SQL (BigQuery). I need the agent to understand my BigQuery Dataset and Business Logic so that it can cater specific business jargons. I have provided two documents i.e. document_dictionary and business logic for this purpose. I am using 2 agents, a data explorer that list_tables and get_schema with the mentioned documents as additional reference. The second agent is Query Agent that creates SQL query based on schema and business logic. It then verifies SQL using sql_verifier tool, finally it executes SQL using execute_sql tool. Then I use llm separately to do analysis on results and send it back to the frontend. I am using GPT-4.1 and GPT-4.1 mini.

The approach is good in terms of accurate responses. But my main issue is speed of responses. My average response time is around 30 seconds. The best is 15 sec and worst 60 sec.

I have two questions that I want help with

  1. How can I improve my agents response time?
  2. How can I add ability to serve multiple users at the same time?

Right now if one user’s query is being processed and second one asks a question it takes a lot of time as first one needs to be completed first. (I built fast api backend and streamlit frontend for demo)

I have also provided some code.

Data Dictionary and Logic

BUSINESS_RULES = """
        1. Completed Revenue = SUM(revenue) filtered by completed_on_date for date range where job_status = 'Completed'.

        2. Total Revenue (or simply Revenue) = SUM(revenue) filtered by created_on_date. Do not filter by job_status.

        3. Created Jobs or Total Jobs = COUNT(DISTINCT job_id) using created_on_date. Include all job_status ['Completed', 'Canceled', 'In Progress', 'Scheduled', 'Hold'].

        4. Completed Jobs = COUNT(DISTINCT job_id) using completed_on_date where job_status = 'Completed'.

        5. Cancelled Jobs = COUNT(DISTINCT job_id) using created_on_date where job_status = 'Canceled'.

        6. In Progress Jobs = COUNT(DISTINCT job_id) using created_on_date where job_status = 'In Progress'.

        7. Scheduled Jobs = COUNT(DISTINCT job_id) using created_on_date where job_status = 'Scheduled'.

        8. Hold Jobs = COUNT(DISTINCT job_id) using created_on_date where job_status = 'Hold'.

        9. New Customers = COUNT(DISTINCT customer_id) where customer_created_on_date_job_created >= created_on_date.

        10. Active Customers = COUNT(DISTINCT customer_id) where:
            - job_status = 'Completed'
            - completed_on_date is between the most recent date (end of time period) and 36 months prior.
            - Use: DATE_SUB(DATE(start_date), INTERVAL 36 MONTH) as the lower bound.
"""


DATA_DICTIONARY = 
"""Use only the following tables and columns to generate SQL queries. Ignore all other sources.

      ---

      Table Usage Rules:

      1. **demo_jobs (Primary Table)**
         - Use for all general job and customer queries.
         - Allowed date columns: created_on_date, completed_on_date, scheduled_on_date, customer_created_on_date_job_created

      2. **demo_memberships**
         - Use only if the query explicitly involves "member" or "membership".
         - Always use date columns, brands, regions, and other columns from demo_jobs for filtering.
         
      ---

      Column Reference:

      🔹 demo_jobs:
      - job_id: Unique ID for each job.
      - region: Region of the job.
      - customer_id: Unique ID for each customer; may appear multiple times.
      - csr_name: CSR who booked and assigned the job.
      - agent_name: Technician responsible for completing the job.
      - business_unit_name: Business unit names.
      - trades_group: Trade group names.
      - trades_type: Trade types.
      - revenue: Revenue in USD generated by the job.
      - job_status: ['Completed', 'Canceled', 'In Progress', 'Scheduled', 'Hold']
      - campaign_name: Source marketing campaign of the job.
      - created_on_date: Date when the job was created.
      - completed_on_date: Date when the job was completed.
      - scheduled_on_date: Date when the job was scheduled.
      - brand: Brand name extracted from business_unit_name.
      - customer_created_on_date_job_created: The date the customer was created (same as their first job creation date).

      🔹 demo_memberships:
      - membership_id: Unique ID for each membership.
      - customer_id: Foreign key to demo_jobs.
      - customer_name: Name of the customer.
      - customer_type: 'Residential' or 'Commercial'
      - city: Customer's city.
      - membership_type: Describes duration and type of membership (e.g., '1 Yr', 'Monthly', '2 Year').

      ---

      Guidelines:
      - Always prefer demo_jobs unless "membership" related question is asked.
      - All date-based filters must use dates from demo_jobs, even for membership queries.
      - Do not reference columns not listed above.
      - Keep all queries focused and aligned with these rules to ensure relevance and accuracy."""

Prompts

DATA_EXPLORER_BACKSTORY = """
        You are an Expert Data Explorer.
        You are in charge of exploring the data in the database. Your goal is to list the tables using "list_tables" and carefully select relevant tables according to
        the user question. Then extract their schema using "tables_schema" tool. 

        ---

        System Configuration:
        - Project ID: {project}
        - Dataset ID: {dataset}

        ---

        Guidelines:
        - Use `list_tables` to see all tables and `tables_schema` tools.
        - Carefully analyze the question and think which are the relevant tables using data dictionary.
        - Then use "tables_schema" tool to inspect only the relevant tables and column structures.
        - Always use data dictionary to understand the table relationships and relevant columns.   

        ---

        References:        
        **Data Dictionary**: 
        {data_dictionary} 
    """

    SQL_GENERATOR_BACKSTORY = """
        You are an Expert Data Engineer. 
        Your goal is to:
        1. Generate a valid production-grade BigQuery SQL query based on the user question using the provided Data Dictionary, Business Rules, and Date Filtering Rules.
        2. Verify the generated SQL query using the tool 'check_sql'.
        3. Execute the verified SQL query using the tool 'execute_sql'.

        ---

        📘 Data Handling Rules:
        - Only use columns listed in the provided Data Dictionary.
        - Use **fully-qualified table names** in the format `{project}.{dataset}.<table_name>`.
        - Always **filter out `NULL` and `'None'` values** from any date fields or categorical fields before processing.
        - Ensure all filters are valid for the intended data type (e.g., wrap `DATETIME` or `TIMESTAMP` columns with `DATE()` for comparison).
        - Avoid string casting of dates unless absolutely necessary.

        ---

        📅 Date Filtering Rules:
        - Always use the `BETWEEN` operator when filtering over a date range.
        - Use `DATE(column_name)` to normalize any date field before filtering.

        1. **Last Week (Calendar Week)**  
        - Definition: Previous full week, **Monday through Sunday**.
        - Start: `DATE_SUB(DATE_TRUNC(CURRENT_DATE(), WEEK(MONDAY)), INTERVAL 7 DAY)`
        - End: `DATE_SUB(DATE_TRUNC(CURRENT_DATE(), WEEK(MONDAY)), INTERVAL 1 DAY)`

        2. **This Week (Current Week-to-Date)**  
        - Start: `DATE_TRUNC(CURRENT_DATE(), WEEK(MONDAY))`
        - End: `CURRENT_DATE()`

        3. **Past 7 Days (Rolling)**  
        - Start: `DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)`
        - End: `DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)`

        4. **Previous Month (Calendar Month)**  
        - Start: `DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH)`
        - End: `LAST_DAY(DATE_SUB(CURRENT_DATE(), INTERVAL 1 MONTH))`

        5. **This Month (Month-to-Date)**  
        - Start: `DATE_TRUNC(CURRENT_DATE(), MONTH)`
        - End: `CURRENT_DATE()`

        6. **Year-to-Date (YTD)**  
        - Start: `DATE_TRUNC(CURRENT_DATE(), YEAR)`
        - End: `CURRENT_DATE()`

        7. **Last 30 Days (Rolling)**  
        - Start: `DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)`
        - End: `CURRENT_DATE()`

        8. **Previous Quarter (Calendar Quarter)**
        - Previous full quarter (Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec), not including current quarter.
        - Start: `DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 QUARTER), QUARTER)`
        - End: `LAST_DAY(DATE_SUB(DATE_TRUNC(CURRENT_DATE(), QUARTER), INTERVAL 1 DAY))`

        9. **This Quarter (Calendar Quarter)**
        - From the start of the current quarter through today.
        - Start: `DATE_TRUNC(CURRENT_DATE(), QUARTER)`
        - End: `CURRENT_DATE()`

        10. **Same Quarter Last Year**
        - The same quarter last year (for year-over-year comparison).
        - Start: `DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR), QUARTER)`
        - End: `LAST_DAY(DATE_ADD(DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR), QUARTER), INTERVAL 2 MONTH))`

        11. **Past 90 Days (Rolling)**
        - Last 90 days, not including today.
        - Start: `DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)`
        - End: `DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)`

        12. **First Quarter (Q1 of Current Year)**
        - The first calendar quarter, January 1 through March 31 of the current year.
        - Start: `DATE_TRUNC(CURRENT_DATE(), YEAR)`
        - End: `LAST_DAY(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 2 MONTH))`

        13. **Second Quarter (Q2 of Current Year)**
        - Second quarter of the current year (Apr 1 to Jun 30).
        - Start: `DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 3 MONTH)`
        - End: `LAST_DAY(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 5 MONTH))`

        14. **Third Quarter (Q3 of Current Year)**        
        - Third quarter of the current year (Jul 1 to Sep 30).
        - Start: `DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 6 MONTH)`
        - End: `LAST_DAY(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 8 MONTH))`

        15. **Fourth Quarter or last quarter (Q4 of Current Year)**
        - Fourth quarter of the current year (Oct 1 to Dec 31).
        - Start: `DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 9 MONTH)`
        - End: `LAST_DAY(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), YEAR), INTERVAL 11 MONTH))`
        
        ---

        ⚙️ SQL Query Guidelines:
        - Always use Business Rules and 
        - Always refer to Data Dictionary for table relations and relevant columns.
        - Use **clear, business-relevant aliases** (e.g., `scheduled_job_count`, `active_customers`, `avg_days_to_schedule`).
        - Use `LIMIT` for multi-row result sets unless otherwise stated.
        - Prefer **CTEs only when necessary** for readability or reuse.
        - Avoid SELECT * — explicitly name columns.
        - Ensure **logical consistency**, **performance efficiency**, and **syntactic correctness**.
        - Format queries cleanly with proper indentation and keyword casing.
        
        ---

        📘 Available References:
        **Data Dictionary**: 
        {data_dictionary}

        **Business Rules and Logic**: 
        {business_rules}

        Quickly Respond only with the final SQL query — no explanations unless explicitly asked.
    """

Agents Definitions

data_explorer = Agent(
    role="Data Explorer",
    goal="Explore the data in the database. List the tables and schemas of the tables.",
    backstory=PromptTemplates.get_prompt('DATA_EXPLORER_BACKSTORY', data_dictionary=DATA_DICTIONARY, business_rules=BUSINESS_RULES, dataset=DATASET, project=PROJECT),
    tools=[list_tables, tables_schema],
    verbose=VERBOSE,
    max_iter=10,
    max_execution_time=90,
    llm=secondary_llm,
)


sql_generator = Agent(
    role="SQL Generator",
    goal="""Execute verified and Optimized SQL query for BigQuery. The SQL queries should reflect business logic and domain understanding. 
    Always use clear column aliases and include appropriate filters for data quality.""",
    backstory=PromptTemplates.get_prompt('SQL_GENERATOR_BACKSTORY', data_dictionary=DATA_DICTIONARY, business_rules=BUSINESS_RULES, dataset=DATASET, project=PROJECT),
    tools=[check_sql, execute_sql],
    verbose=VERBOSE,
    max_iter=10,
    max_execution_time=90,
    llm=main_llm,
)


# Task Definitions for Agents
explore_data = Task(
    description="Explore the BigQuery Dataset and Tables. Decide which tables are relevant for the query: {query}. Extract schemas of the relevant tables.",
    expected_output="The output of 'list_tables' should be a string of table names comma separated. And the output of schema should be a dictionary with table name as key and schema as value.",
    agent=data_explorer,
    async_execution=True,
)

generate_sql = Task(
    description="""Execute the generated Optimized SQL query for BigQuery for question: {query}. 
    Make sure to use clear column aliases that will be preserved in the output.
    
    Please respond ONLY with a valid JSON object enclosed exactly within the tags, following this precise structure:

    {{
        "query" (str): < The user question >,
        "sql_query" (str): < The optimized SQL query >,
        "query_result" (list[dict[str, Any]]): < The result of the query >
    }}

    Output Guidelines:   
        - Ensure the JSON is properly formatted and parsable.  
        - Do NOT output any text outside the JSON and the specified tags. 
    """,
    expected_output="""A JSON object with the following keys: 'query', 'sql_query', 'query_result'""",
    agent=sql_generator,
    context=[explore_data],
    output_pydantic=QueryResult,
)

Hey Usama, welcome to the community.

I took a quick look, and here are a few simple, general thoughts. Hopefully, one of them will do the trick for you:

  1. If you need to optimize an agentic system, you definitely need an observability setup. Here’s the documentation on AgentOps integration, which is a solid choice. Make sure to measure the performance gains at each step of your optimization.
  2. The latency cost of running two agents back-to-back is way higher than using a single agent with a prompt that covers both jobs. It looks to me like a single Agent + Task + Tools setup could get the job done for you. Basically, the task seems to be: Gather info about the database → Convert the user’s request into a top-notch SQL query based on your rules → Validate and run the query → Return the result to the user. If that’s the case, a single, well-structured prompt should be enough for one Agent/Task to handle. You’ll lose some modularity, but remember, you have to make trade-offs when you’re optimizing for latency.
  3. It looks like the list_tables tool could be run before kickoff and its output hardcoded directly into your prompt. The fewer tools your agent has to use, the lower your latency will be. Do as much as you can with good-old Python. As a rule of thumb, use the agent to turn unstructured data into structured data (that’s their main advantage), and then process that structured data the old-fashioned way. Every time your agent has to use a tool, you’re adding to the latency and increasing the chance of hallucinations. So, always ask yourself: “Couldn’t I just run this myself and feed the result to the LLM for the reasoning step?” When you’re just cobbling together a toy agent, it’s all fun and games to load it up with tools and watch it go. But when you start scaling your solution, you’ll quickly figure out it’s smarter to separate what should be deterministic (good old programming) from what’s non-deterministic (agents).
  4. Pick the right LLM for the job. For instance, if you’re using a reasoning-heavy model for a particular task, remember that some models really go overboard on latency (Qwen3-235B-A22B and Grok3, I’m looking at you!). A model without extended reasoning or with fewer parameters might give you an answer that’s just as good, but faster. And how do you measure “just as good”? Isolate the agent’s job, like I mentioned before. Have it take a natural language query and spit out a SQL query. Then, set up a benchmark with known inputs and expected outputs. From there, you can test smaller, faster models until you find one that’s good enough for your needs.
  5. And here’s the most important point. It’s okay if your first user in the queue has to wait 20-30 seconds for a response, given the complexity. Nothing a quick “:robot: Just a moment while I finish looking into your request…” message can’t smooth over. What you can’t have is the fourth user in line waiting 90 seconds just to get started. So, a big chunk of your optimization should focus on this. Use asyncio, multithreading, or even multiprocessing (whatever you’re comfortable with) but focus on reducing the client’s perceived latency.
1 Like

Hi Max,

Thank you for this detailed response. A lot of it makes sense. I was going to do certain steps in pythonic way rather than getting agent to do those. Also, I will be looking in multi-threading and processing for latency related issues too.

A couple of questions, if you can provide an insights into:

  1. How good it is to let it be only one agent and give it 5-6 small subtasks instead of just one big complex task? Also, is there a trade-off between number of tasks and agents
  2. I was thinking to include Business logic documents and Data Dictionary documents in separate tasks so that the Agent has a better understanding of Business Data and Jargons.

I think I can kill two birds with one stone here and give my take on both questions at once.

It’s pretty common to hear people advocating for breaking down tasks. And the upsides are pretty obvious: everything gets more organized, and for each task (or each step in the chain, you could say), it’s like you’ve got this hyper-focus going on. If you, like, roughly picture an LLM as a “brain,” it’s as if the entire brain is wiped clean, ready to take in only the specific rules needed for that one sub-task. And yeah, LLMs do get more efficient when you feed them fewer rules in the context window, even if your LLM has a massive context window. With Gemini 1.5 Pro’s 2 million token context window, you could probably upload the entire Bible into its context, and I bet it could find specific passages throughout the whole text (that’s the “needle in a haystack” test). But even then, it wouldn’t exactly become a good Christian, right? It has access to all the content, sure, but that doesn’t mean it can actually use all the rules packed into that massive amount of information. So, all this seems to back up the idea that sub-tasks are a smart move, 'cause they help optimize how many rules you’re passing along at each step of the chain, right?

What doesn’t get talked about as much is that this really only holds true 100% when those sub-tasks are atomic, meaning they’re self-contained and self-sufficient. And in real-world scenarios with agentic systems, that’s almost never the case. Splitting tasks can mean breaking a line of reasoning that’s crucial for getting the job done well. And that break point is never perfectly put back together. No matter how you try to pass along what was done before (CrewAI, for example, sends the previous task’s output to the next task’s input as context in its default sequential process), there’s still some loss in that handoff. Even if your agent can ping another agent while working on a task, if that second agent is just going to spit back the full text of the rules, wouldn’t it have been better to just give all that text to the first agent from the get-go? These are the kinds of questions that’ll help you figure out how to divide tasks in your agentic system. Basically, if a task is complex, then its prompt is gonna be complex and long, and that’s okay. What matters is that the agent has all the instructions it needs to nail that task properly during execution.

To give you some concrete examples, up until about a month and a half ago, a task in a system I was using to generate content for a YouTube channel had a prompt that was over 41k characters long (around 10k tokens). You read that right. Over 40,000 characters just for the rules. This task would churn out a 13k-character video script in one go. It was absolutely crucial that the generated text used every trick in the book for text humanization in Portuguese (my native language) and storytelling techniques. The prompt is broken down into sections, like a mini-book. There’s a section for persona creation, rules on what to do (you could call these business rules), examples of how to do it, rules on what not to do, a chain of thought, a section with review procedures before the final output, and so on. I tried splitting off some of the rules into other tasks, but the results would always take a nosedive. It was easy to spot because the views would plummet. I deliberately picked a niche that really relies on the audience making a human connection, so less humanized text meant the channel’s numbers dropped, and I used that as a benchmark.

To give you some smarter, more sophisticated examples than my little toy project, the prompt for the Devin-AI programming agent is said to be over 34,000 characters (around 8-9,000 tokens), and the system prompt for Claude4 is reportedly 24-25,000 tokens long (that’s about 95-100,000 characters). As a rule of thumb, don’t sweat it if your task needs up to 20,000 characters of text for its instructions. Just make sure all those rules really belong to the same logical course of action, in other words, the same task.

This doesn’t mean you can’t or shouldn’t have, say, review tasks after the main job is done (doesn’t matter if it’s the same agent or another one doing the review, the latency cost is the same). But you’ll definitely see a solid difference when the initial task is done by an agent that’s aware of all the necessary rules upfront, and then it’s just reviewed in a separate, dedicated task later on.

Hope these thoughts, even if they seem pretty basic, help you get the best out of your projects. Good luck!

1 Like

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.