Run on 16gb ram ?

#48
by Yoyo406 - opened

Is it possible to run the models on 16 GB of unified RAM?

Is it possible? Yes. Q4_K_M should fit just fine. Give it a try. I use a RTX 5080 and gemma 12B is pure speed. One misunderstanding is getting greedy about a higher quant just to realize you didn't account for context size, KV caches, etc.

@Yoyo406 Yes β€” Mk2Oracle has it right. Q4_K_M (about 7.4 GB) is the sweet spot on 16 GB; it leaves room for the app + KV cache. The thing to watch (as he hinted) is context: Gemma 4 can go to 256K, but the KV cache grows with it and eats your headroom fast β€” so cap context around 8K and you'll be comfortable. Want more margin? Q3_K_M (about 6 GB) also runs fine. Skip Q6/Q8 on 16 GB β€” they fit on disk but leave nothing for context. Thanks @Mk2Oracle for the assist.

Sign up or log in to comment