6 comments

  • bluejay2387 a day ago ago

    2x 3090's running Ollama and VLLM... Ollama for most stuff and VLLM for the few models that I need to test that don't run on Ollama. Open Web UI as my primary interface. I just moved to Devstral for coding using the Continue plugin in VSCode. I use Qwen 3 32b for creative stuff and Flux Dev for images. Gemma 3 27b for most everything else (slightly less smart than Qwen, but its faster). Mixed Bread for embeddings (though apparently NV-Embed-v2 is better?). Pydantic as my main utility library. This is all for personal stuff. My stack at work is completely different and driven more by our Legal teams than technical decisions.

  • v5v3 12 hours ago ago

    Ollama on a M1 MacBook pro but will be moving to a Nvidia GPU setup.

  • fazlerocks 2 days ago ago

    Running Llama 3.1 70B on 2x4090s with vLLM. Memory is a pain but works decent for most stuff.

    Tbh for coding I just use the smaller ones like CodeQwen 7B. way faster and good enough for autocomplete. Only fire up the big model when I actually need it to think.

    The annoying part is keeping everything updated, new model drops every week and half don't work with whatever you're already running.

  • runjake a day ago ago

    Ollama + M3 Max 36GB Mac. Usually with Python + SQLite3.

    The models vary depending on the task. DeepSeek distilled has been a favorite for the past several months.

    I use various smaller (~3B) models for simpler tasks.

  • gabriel_dev a day ago ago

    Ollama + mac mini 24gb (inference)

  • xyc a day ago ago

    recurse.chat + M2 max Mac