Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have you tried a Llamafile? Not sure what platform you are using. From their readme:

 > … by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation. 
Low cost to experiment IMO. I am personally using MacOS with an M1 chip and 64gb memory and it works perfectly, but the idea behind this project is to democratize access to generative AI and so it is at least possible that you will be able to use it.


With 64GB can you run the 70B size llama models well?


I should have qualified the meaning of “works perfectly” :) No 70b for me, but I am able to experiment with many quantized models (and I am using a Llama successfully, latency isn’t terrible)


No, you can't. I have 128 GB and a 70B llamafile is unusable.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact