It's kinda funny how nowadays an AI with 8 billion parameters is something "smal...

atemerev · on Sept 21, 2024

IDK, 8B-class quantized models run pretty fast on commodity laptops, with CPU-only inference. Thanks to the people who figured out quantization and reimplemented everything in C++, instead of academic-grade Python.

actualwitch · on Sept 21, 2024

A solid chunk of python is just wrappers around C/C++, most tensor frameworks included.

atemerev · on Sept 21, 2024

I know, and yet early model implementations were quite unoptimized compared to the modern ones.