Trying to run Llama model that is downloaded on my server locally using huggingface hub

Hi i am trying to make a RAG pipeline for pdf, im using llama 3.1 as my llm which has been dowloaded locally on my server with L4 gpu how do i use that model with crew ai
PS: i want everything to run locally

Did you manage to get this working?

I suggest you look at Ollama and LMStudio. These should work out of the box with litellm which is used by crewAI for LLMs