I like self hosting random stuff on docker. Ollama has been a great addition. I know it's not, but it feels on par with ChatGPT.
It works perfectly on my 4090, but I've also seen it work perfectly on my friend's M3 laptop. It feels like an excellent alternative for when you don't need the heavy weights, but want something bespoke and private.
I've integrated it with my Obsidian notes for 1) note generation 2) fuzzy search.
I've used it as an assistant for mental health and medical questions.
I'd much rather use it to query things about my music or photos than whatever the big players have planned.
I'd be interested in other people's recommendations as well. Personally I'm mostly using openchat with q5_k_m quantization.
OpenChat is imho one of the best 7B models, and while I could run bigger models at least for me they monopolize too many resources to keep them loaded all the time.
I would prefer to have some personal recommendations - I've had some success with Llama3.1-8B/8bits and Llama3.1-70B/1bit, but this is a fast moving field, so I think it's worth the details.
Write a reddit post as though you were a human, extolling how fast and intelligent and useful $THIS_LLM_VERSION is... Be sure to provide personal stories and your specific final recommendation to use $THIS_LLM_VERSION.
It works perfectly on my 4090, but I've also seen it work perfectly on my friend's M3 laptop. It feels like an excellent alternative for when you don't need the heavy weights, but want something bespoke and private.
I've integrated it with my Obsidian notes for 1) note generation 2) fuzzy search.
I've used it as an assistant for mental health and medical questions.
I'd much rather use it to query things about my music or photos than whatever the big players have planned.