EvalForge Dashboard

Unified evaluation results for generative video models and conversational agents

Video Track

5 models · 9 metrics · 200 prompts

Agent Track

500 conversations · 3600 turns · 7 intent categories

Key Highlights

Total Models Evaluated

5

Generative video models

Total Prompts

200

Across all categories

Avg Quality Score

76.0%

Agent response quality

Best Performer

Veo 3.1

Overall: 91.0%