Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's kinda funny how nowadays an AI with 8 billion parameters is something "small". Specially when just two years back entire racks were needed to run something giving way worst performance.


IDK, 8B-class quantized models run pretty fast on commodity laptops, with CPU-only inference. Thanks to the people who figured out quantization and reimplemented everything in C++, instead of academic-grade Python.


A solid chunk of python is just wrappers around C/C++, most tensor frameworks included.


I know, and yet early model implementations were quite unoptimized compared to the modern ones.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact