Dashboard¶
Eval AI Library includes an interactive web dashboard for visualizing evaluation results.
Enabling the Dashboard¶
During Evaluation¶
results = asyncio.run(evaluate(
test_cases=test_cases,
metrics=metrics,
show_dashboard=True,
session_name="my-evaluation-2024-01"
))
Setting show_dashboard=True will automatically open the dashboard in your browser after evaluation completes.
Standalone Dashboard¶
Launch the dashboard to view cached results from previous sessions:
| Flag | Default | Description |
|---|---|---|
--port | 14500 | Server port |
--host | 0.0.0.0 | Server host |
--cache-dir | .eval_cache | Directory for cached results |
Dashboard Features¶
- Session Overview — summary of all evaluation runs
- Metric Breakdown — per-metric scores and pass/fail rates
- Test Case Details — drill down into individual test cases
- Cost Analysis — API cost tracking per metric and session
- Success Rate — overall and per-metric success rates
- Visual Charts — score distributions and trends
Session Caching¶
Results are cached in the .eval_cache/ directory:
Use session_name to give meaningful names to evaluation runs:
# Named session
results = await evaluate(
test_cases=test_cases,
metrics=metrics,
show_dashboard=True,
session_name="rag-v2-regression-test"
)
Technology¶
The dashboard is built with Flask and serves a static frontend. It runs locally and doesn't send data to external services.