Self-Hosted LLMs for Enterprise #4
This is the final part for deploying your own LLM model. After setting up all necessary services and tools in the previous parts, let's continue with downloading the model and creating an API Endpoint.
For those just reading this as the first part, you can follow the previous parts at:
Let's start Part 4!!
1. Create Project and Download Model
# 1. Create folder for project
mkdir -p llm-chat-api
cd llm-chat-api
# 2. Download model (Llama 3.2 1B Q8_0)
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF \
Llama-3.2-1B-Instruct-Q8_0.gguf --local-dir model
The model file will be stored at ./model/Llama-3.2-1B-Instruct-Q8_0.gguf
2. Create Python File to Run Model
Create main.py file
# main.py
from llama_cpp import Llama
llm = Llama(
model_path="model/Llama-3.2-1B-Instruct-Q8_0.gguf",
n_gpu_layers=-1,
verbose=False,
chat_format='llama-3'
)
output = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are an best assistant"},
{"role": "user", "content": "Introduce yourself"}
]
)
print(output["choices"][0]["message"]["content"])
Test run with:
python3 main.py
3. Open API with FastAPI
Install FastAPI and Uvicorn (server)
pip install fastapi uvicorn pydantic
Update main.py to be REST API with POST
# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from llama_cpp import Llama
app = FastAPI()
llm = Llama(
model_path="model/Llama-3.2-1B-Instruct-Q8_0.gguf",
n_gpu_layers=-1,
verbose=False,
chat_format='llama-3'
)
class PromptRequest(BaseModel):
prompt: str
@app.post("/chat")
def chat(req: PromptRequest):
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are an assistant who can help to answer general question"},
{"role": "user", "content": req.prompt}
]
)
return {"response": response["choices"][0]["message"]["content"]}
4. Run API Server
uvicorn main:app --host 0.0.0.0 --port 8000
API will be available at http://localhost:8000/chat
We can test it via curl:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Describe a sunset over the ocean."}'
Overall Summary
From all the parts, we can create a simple LLM endpoint for team use, whether for testing or developing into various products. Finally, I'd like to ask everyone to keep following other articles. I guarantee there will be interesting content to follow.