Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What you suggest is not impossible but simply flies in the face of all currently available evidence and what all leading labs say and do. We know they are actively looking for ways to do things more efficiently. OpenAI alone did a couple of releases to that effect. Because of how easy it is to switch providers, if only one lab found a way to run a small model that competed with the big ones, it would simply win the entire space, so everyone has to be looking for that (and clearly they are, given that all of them do have smaller versions of their models)

Scepticism is fine, if it's plausible. If not it's conspiratorial.



There are at least two different optimizations happening:

1) optimizing the model training

2) optimizing the model operation

The $1B-spend holy grail is that it costs a lot of money to train, and almost nothing to operate, a proprietary model that benchmarks and chats better than anyone else’s.

OpenAI’s optimizations fall into the latter category. The risk to the business model is in the former — if someone can train a world-beating model without lots of money, it’s a tough day for the big players.


I disagree. Not axiomatically because you’re kind of right, but enough to comment. OpenAI doesn’t believe in optimizing the traisning costs of AI but believes in optimizing (read: maxing) the training period. Their billions go to collecting, collating, and transforming as much training data as they can get their hands on.

To see what optimizing model operation looks like, groq is a good example. OpenAI isn’t (yet) obviously in that kind of optimization, though I’m sure they’re working on it internally.


My argument wasn’t that the well-funded entities were optimizing to reduce training costs, but the opposite: they need creative ways to spend $1B that provide some tangible advantage. But they need operating costs to be low or they lose money and try to somehow make it up on volume.

I would roll data acquisition/cleaning processes into training costs for purposes of this because what else is the data for if not training?

If 4o wasn’t an optimization for model operation costs what was it?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact