The founders of the popular generative artificial intelligence benchmarking platform LMArena have said they’re founding an official company called Arena Intelligence Inc. to help them improve the ...
AI frontier models fail to provide safe and accurate output on medical topics. LMArena and DataTecnica aim to 'rigorously' test LLMs' medical knowledge. It's not clear how agents and medicine-specific ...
In the years since OpenAI launched ChatGPT to the world, kicking off the generative AI boom, developers have relied on LMArena (previously Chatbot Arena) as the default AI leaderboard. Now, Scale AI ...
The AI industry has become adept at measuring itself. Benchmarks improve, model scores rise, and every new release arrives with a list of metrics meant to signal progress. And yet, somewhere between ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results