Agent Evaluation Dashboard
EvalForge v0.1.0 · 500 conversations · 3600 turns · Evaluated 2026-02-20
Coverage
78.0%
Relevance
85.0%
Executability
72.0%
Practicality
69.0%
Intent Distribution
Overall Quality Scores
Quality by Intent Category
Output Format Distribution
Long article
35.2%
Code snippet
24.1%
Bullet list
18.3%
Short answer
12.0%
Table
5.4%
Structured data
3.0%
Other
2.0%
Negative Feedback Analysis
9.0%negative rate
Unclear Requirements
36%
Hallucination
22%
Content Issues
22%
Incomplete
12%
Other
8%