Ollama¶
Use locally-hosted models via Ollama with the ollama: prefix. No API key required.
Setup¶
- Install Ollama
- Pull a model:
ollama pull llama3 - Ollama runs locally on port 11434 by default
# Optional: configure custom endpoint
export OLLAMA_API_BASE_URL="http://localhost:11434/v1"
export OLLAMA_API_KEY="" # Optional, usually not needed
Available Models¶
Any model available in Ollama can be used:
| Model | Size | Description |
|---|---|---|
llama3 | 8B | Meta's Llama 3 |
llama3:70b | 70B | Larger Llama 3 |
mistral | 7B | Mistral 7B |
mixtral | 47B | Mixtral MoE |
codellama | 7B | Code-focused Llama |
phi3 | 3.8B | Microsoft Phi-3 |
Usage¶
from eval_lib import AnswerRelevancyMetric, BiasMetric
metric = AnswerRelevancyMetric(model="ollama:llama3", threshold=0.6)
metric = BiasMetric(model="ollama:mistral", threshold=0.7)
Advantages¶
- Free — no API costs
- Private — data never leaves your machine
- Fast — no network latency (with GPU)
Limitations¶
- Quality depends on model size and capability
- Requires local compute resources (GPU recommended)
- Smaller models may produce less reliable evaluation results
- Cost tracking returns
None(no API costs)
Tip
For best results with Ollama, use larger models (70B+) for evaluation metrics. Smaller models may struggle with the nuanced reasoning required for verdict generation.