model w/o quantized?

#43
by junujunu - opened

I want to serve it on vllm

Sorry about that β€” there's no unquantized safetensors version up yet, only the GGUF. The honest reason is just
bandwidth: my home internet has been bad lately and I haven't managed to push the full fp16 master, even as of today.
I'll get it uploaded as soon as I have a stable enough connection.

You're right to want the safetensors for vLLM β€” GGUF support there is limited, so the fp16 master is what you'll want.
Once it's up you can point vLLM straight at it.

In the meantime, if you want to experiment right now: a community member (tepirale on HF) converted the v2 GGUF back
to safetensors and even grafted the vision/audio towers back on from the base model, so there's a usable safetensors
version on their profile. It's reconstructed from the GGUF so it's slightly lossy vs my original master, but it should
serve fine on vLLM for testing.

Sign up or log in to comment