How does Lemmy feel about "open source" machine learning, akin to the Fediverse vs Social Media?

brucethemoose@lemmy.world · edit-2 15 days ago

How does Lemmy feel about "open source" machine learning, akin to the Fediverse vs Social Media?

tkw8 · 15 days ago

I’m running Nvidia on Ubuntu. I’ll give exllama a shot.

brucethemoose@lemmy.world · edit-2 15 days ago

I’d recommend TabbyAPI with your favorite frontend, anything that works with OpenAI.

Or exui (which is what I tend to use) but is a bit more manual. text-gen-web-ui has better samplers, but its IMO more clanky and crufty, and really slow at long context.

Also, uh, you’ll have to be careful about picking a model, you have to fit it to your GPU instead of letting ollama do it for you. I view this as a positive, as it forces you to search more a more optimal fit.

tkw8 · 15 days ago

I manually specify what models to pull. I’m not running anything too crazy. My largest model is gemma27B. But I’ve worked with dolphin-mistral which was fun.

brucethemoose@lemmy.world · 15 days ago

If you have a 24GB card, just go straight to the most recent Command R, a 3.75bpw-4bpw quantization. It’s incredible, and you can do the full 131K context on a 24GB GPU easy.

Gemma 27B Is actually quite good, but “narrow.” Its super low context and seems to be hyper optimized for short chatbot-arena style questions.

tkw8 · edit-2 15 days ago

Gemma 27B Is actually quite good, but “narrow.” Its super low context and seems to be hyper optimized for short chatbot-arena style questions.

This is the stuff I love to know so thanks for sharing. I will be pulling Command R tomorrow.

brucethemoose@lemmy.world · 15 days ago

Good! So Command-R excels at “RAG” style tasks like asking questions about a huge document, continuing a long story or so on. You should also read up on its super intricate system prompt format, which can steer it quite well.

I dunno about code, I tend to use Mistral Code 22B (or deepseek v2 API) for that.

I am happy to ramble on about this stuff, just ask.