EvalForge Dashboard
Unified evaluation results for generative video models and conversational agents
Video Track
5 models · 9 metrics · 200 prompts
Agent Track
500 conversations · 3600 turns · 7 intent categories
Key Highlights
Total Models Evaluated
5
Generative video models
Total Prompts
200
Across all categories
Avg Quality Score
76.0%
Agent response quality
Best Performer
Veo 3.1
Overall: 91.0%