CommunityNews

CommunityNews

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence | Andon Labs

Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: CommunityNews
Many recent big advances in tech have one key thing at the heart of then: artificial intelligence.
New
First poster: bot
An ancient language has defied decryption for 100 years. Can AI crack the code?. Machine learning can translate between two known langua...
New
First poster: bot
Building games and apps entirely through natural language using OpenAI’s code-davinci model. TL;DR: OpenAI has a new code generating mod...
New
First poster: bot
When Hyundai acquired Boston Dynamics at the end of 2020, there were plenty of open questions. Chief among them was why we should assume ...
New
First poster: bot
Ghostwriter - Code faster with AI. An AI pair programmer that helps you write better code, faster.
New
New
First poster: bot
Exascale Cerebras Andromeda cluster packs more cores than 1,954 Nvidia A100 GPUs.
New
First poster: bot
ChatGPT aims to produce accurate and harmless talk—but it’s a work in progress.
New
First poster: brennan
It’s Not a Hypothetical, I’ve Already Lost My Job to AI For The Last Year
New

Other popular topics Top

Devtalk
Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...
1042 20430 390
New
PragmaticBookshelf
Design and develop sophisticated 2D games that are as much fun to make as they are to play. From particle effects and pathfinding to soci...
New
Exadra37
I am asking for any distro that only has the bare-bones to be able to get a shell in the server and then just install the packages as we ...
New
PragmaticBookshelf
Use WebRTC to build web applications that stream media and data in real time directly from one user to another, all in the browser. ...
New
foxtrottwist
A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
New
AstonJ
Curious what kind of results others are getting, I think actually prefer the 7B model to the 32B model, not only is it faster but the qua...
New
PragmaticBookshelf
Fight complexity and reclaim the original spirit of agility by learning to simplify how you develop software. The result: a more humane a...
New
Fl4m3Ph03n1x
Background Lately I am in a quest to find a good quality TTS ai generation tool to run locally in order to create audio for some videos I...
New