Dashboard¶

Eval AI Library includes an interactive web dashboard for visualizing evaluation results.

Enabling the Dashboard¶

During Evaluation¶

results = asyncio.run(evaluate(
    test_cases=test_cases,
    metrics=metrics,
    show_dashboard=True,
    session_name="my-evaluation-2024-01"
))

Setting show_dashboard=True will automatically open the dashboard in your browser after evaluation completes.

Standalone Dashboard¶

Launch the dashboard to view cached results from previous sessions:

eval-lib dashboard --port 14500 --host 0.0.0.0 --cache-dir .eval_cache

Flag	Default	Description
`--port`	`14500`	Server port
`--host`	`0.0.0.0`	Server host
`--cache-dir`	`.eval_cache`	Directory for cached results

Dashboard Features¶

Session Overview — summary of all evaluation runs
Metric Breakdown — per-metric scores and pass/fail rates
Test Case Details — drill down into individual test cases
Cost Analysis — API cost tracking per metric and session
Success Rate — overall and per-metric success rates
Visual Charts — score distributions and trends

Session Caching¶

Results are cached in the .eval_cache/ directory:

.eval_cache/
├── session_2024-01-15_14-30-00.json
├── session_my-evaluation.json
└── ...

Use session_name to give meaningful names to evaluation runs:

# Named session
results = await evaluate(
    test_cases=test_cases,
    metrics=metrics,
    show_dashboard=True,
    session_name="rag-v2-regression-test"
)

Technology¶

The dashboard is built with Flask and serves a static frontend. It runs locally and doesn't send data to external services.