Skip to content

Dashboard

Eval AI Library includes an interactive web dashboard for visualizing evaluation results.

Enabling the Dashboard

During Evaluation

results = asyncio.run(evaluate(
    test_cases=test_cases,
    metrics=metrics,
    show_dashboard=True,
    session_name="my-evaluation-2024-01"
))

Setting show_dashboard=True will automatically open the dashboard in your browser after evaluation completes.

Standalone Dashboard

Launch the dashboard to view cached results from previous sessions:

eval-lib dashboard --port 14500 --host 0.0.0.0 --cache-dir .eval_cache
Flag Default Description
--port 14500 Server port
--host 0.0.0.0 Server host
--cache-dir .eval_cache Directory for cached results

Dashboard Features

  • Session Overview — summary of all evaluation runs
  • Metric Breakdown — per-metric scores and pass/fail rates
  • Test Case Details — drill down into individual test cases
  • Cost Analysis — API cost tracking per metric and session
  • Success Rate — overall and per-metric success rates
  • Visual Charts — score distributions and trends

Session Caching

Results are cached in the .eval_cache/ directory:

.eval_cache/
├── session_2024-01-15_14-30-00.json
├── session_my-evaluation.json
└── ...

Use session_name to give meaningful names to evaluation runs:

# Named session
results = await evaluate(
    test_cases=test_cases,
    metrics=metrics,
    show_dashboard=True,
    session_name="rag-v2-regression-test"
)

Technology

The dashboard is built with Flask and serves a static frontend. It runs locally and doesn't send data to external services.