The Evaluation Game: What Counts as Good AI

Name: The Evaluation Game: What Counts as Good AI
Start: 2025-11-10T17:30:00+01:00
Location: Station

Schedule

Mon, 10 Nov, 2025 at 05:30 pm

UTC+01:00

Location

Station | Frederiksberg, SF

Evaluation has long been the scoreboard of AI progress. It decides who’s ahead, what counts as “a good model”, and which systems are seen as breakthroughs. But as models race forward to break records one moment while failing at basic math the next, you start to wonder: are we still playing by the right rules? This talk by Ruchira Dhar from the University of Copenhagen explores how the “rules of play” in AI evaluation—our choices of metrics, datasets, and reporting—shape what we think models can do and what we expect of them. Evaluation isn’t just a step in the pipeline; it's a strategic game we play that has very real consequences for public trust, governance, and safety.
The event is hosted at Station by Effective Altruism Denmark. After the talk there will be opportunity for discussion (both on and off topic) and snacks will be provided.
Everyone warmly welcome!
The address is Howitzvej 30, 2000, Frederiksberg and we'll be in The Wunderkammer. If you have trouble finding us you can call me (Albert) at +45 42727690