Agent Evaluation Dashboard

EvalForge v0.1.0 · 500 conversations · 3600 turns · Evaluated 2026-02-20

Coverage

78.0%

Relevance

85.0%

Executability

72.0%

Practicality

69.0%

Intent Distribution

Overall Quality Scores

Quality by Intent Category

Output Format Distribution

Long article
35.2%
Code snippet
24.1%
Bullet list
18.3%
Short answer
12.0%
Table
5.4%
Structured data
3.0%
Other
2.0%

Negative Feedback Analysis

9.0%negative rate

Unclear Requirements
36%
Hallucination
22%
Content Issues
22%
Incomplete
12%
Other
8%