Intermittent 401 for my custom LiteLLM

Hello,

I am building a sequential MAS using CrewAI with my custom litellm as below but getting 401 errors intermittently. Out of 3 times, once I get 401 in the middle of kickoffs.

Error

Message: 'Error: '
Arguments: ('Failed to convert text into a Pydantic model due to the following error: litellm.AuthenticationError: AuthenticationError: OpenAIException - Error code: 401',)
16:52:48 - LiteLLM:DEBUG: main.py:5490 - openai.py: Received openai error - Error code: 401
16:52:48 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
Error code: 401



Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

16:52:48 - LiteLLM:DEBUG: exception_mapping_utils.py:2230 - Logging Details: logger_fn - None | callable(logger_fn) - False
16:52:48 - LiteLLM:DEBUG: litellm_logging.py:1842 - Logging Details LiteLLM-Failure Call: ['litellm_custom_failure_callback', <crewai.utilities.token_counter_callback.TokenCalcHandler object at 0x000001AA45F1D3A0>]
16:52:48 - LiteLLM:DEBUG: utils.py:298 -
import os
from crewai import LLM
from src.backend.utils.logging_utils import logger
from src.backend.utils.litellm_utils import *

oai_api_key = os.getenv('OPENAI_API_KEY')
api_base = os.getenv('OPENAI_API_BASE')
api_model = os.getenv('OPENAI_MODEL_NAME')
gif_api_key = os.getenv('MY_API_KEY')

if not api_base or not api_model or not gif_api_key or not oai_api_key:
    raise ValueError(
        "Missing required environment variables: Ensure OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_MODEL_NAME, and MY_API_KEYare set.")

logger.info(f"Using API Base: {api_base}")
logger.info(f"sing Model: {api_model}")

litellm_params = {"Authorization": f"Bearer {gif_api_key}"}

os.environ["LITELLM_SUCCESS_CALLBACKS"] = "litellm_custom_success_callback"
os.environ["LITELLM_FAILURE_CALLBACKS"] = "litellm_custom_failure_callback"

custom_llm = LLM(
    model=api_model,
    base_url=api_base,
    api_key=gif_api_key
)

def litellm_custom_success_callback(
    kwargs,
    completion_response,
    start_time, end_time
):
    logger.info(
        f"On Success: Completion Response={completion_response}, "
        f"Duration={end_time - start_time} seconds",
        f"kwargs:{kwargs}"
    )


def litellm_custom_failure_callback(
    kwargs,
    completion_response,
    start_time, end_time
):
    logger.info(
        f"On failure: Completion Response={completion_response}, "
        f"Duration={end_time - start_time} seconds",
        f"kwargs:{kwargs}"
    )

logger.debug("----- Debugging -------")
os.environ['LITELLM_LOG'] = 'DEBUG'
os.environ['LITELLM_CACHE'] = 'True'

logger.debug("Turning litellm DEBUG logs on...")
litellm._turn_on_debug()

crewai = {extras = ["tools"], version = "^0.102.0"}

Even after I have added my success and failure callbacks, I don’t see any logs from those callbacks.

16:52:48 - LiteLLM:DEBUG: logging_callback_manager.py:173 - Custom logger of type TokenCalcHandler, key: TokenCalcHandler already exists in ['litellm_custom_failure_callback', <crewai.utilities.token_counter_callback.TokenCalcHandler object at 0x000001AA45F1D3A0>], not adding again..
16:52:48 - LiteLLM:DEBUG: logging_callback_manager.py:173 - Custom logger of type TokenCalcHandler, key: TokenCalcHandler already exists in [<crewai.utilities.token_counter_callback.TokenCalcHandler object at 0x000001AA45F1D3A0>], not adding again..
16:52:48 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<crewai.utilities.token_counter_callback.TokenCalcHandler object at 0x000001AA4C95B2F0>]

Can anyone please advise how to troubleshoot my intermittent 401 issues using callbacks or any other alternative?

Thanks,
PS

I reckon that CrewAI or Litellm is using default OpenAI instead of my base url and my key at some point and this leads to the error. It is intermittent but becoming more frequent. Any help would be highly appreciated. Thanks.