Skip to content

Ollama

Use locally-hosted models via Ollama with the ollama: prefix. No API key required.

Setup

  1. Install Ollama
  2. Pull a model: ollama pull llama3
  3. Ollama runs locally on port 11434 by default
# Optional: configure custom endpoint
export OLLAMA_API_BASE_URL="http://localhost:11434/v1"
export OLLAMA_API_KEY=""  # Optional, usually not needed

Available Models

Any model available in Ollama can be used:

Model Size Description
llama3 8B Meta's Llama 3
llama3:70b 70B Larger Llama 3
mistral 7B Mistral 7B
mixtral 47B Mixtral MoE
codellama 7B Code-focused Llama
phi3 3.8B Microsoft Phi-3

Usage

from eval_lib import AnswerRelevancyMetric, BiasMetric

metric = AnswerRelevancyMetric(model="ollama:llama3", threshold=0.6)
metric = BiasMetric(model="ollama:mistral", threshold=0.7)

Advantages

  • Free — no API costs
  • Private — data never leaves your machine
  • Fast — no network latency (with GPU)

Limitations

  • Quality depends on model size and capability
  • Requires local compute resources (GPU recommended)
  • Smaller models may produce less reliable evaluation results
  • Cost tracking returns None (no API costs)

Tip

For best results with Ollama, use larger models (70B+) for evaluation metrics. Smaller models may struggle with the nuanced reasoning required for verdict generation.