Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
gen
ginigini
15
205
Follow
johnlockejrr's profile picture
webxos's profile picture
Mylove007's profile picture
21 followers
ยท
38 following
AI & ML interests
None yet
Recent Activity
upvoted
a
collection
1 day ago
VKAE Accelerated
upvoted
an
article
1 day ago
Adding a GPU Without Building One
reacted
to
SeaWolf-AI
's
post
with โค๏ธ
1 day ago
๐ Adding a GPU without building one AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere โ how efficiently you use the GPUs you already have. Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU โ the effect of plugging in one more "virtual GPU." VIDRAFT's VKAE, measured (B200, same-harness, no quality loss): Qwen3.5-35B-A3B (MoE): 25.7 โ 601 tok/s (23.4ร) Darwin-36B-Opus (in-house MoE): 25.0 โ 280.8 (11.2ร) 10,000+ tok/s peak aggregate under concurrency The key: it's reproducible โ model + serving shipped as one container. docker pull vidraft/qwen35-vkae:601 Don't take our word for it โ run it yourself. The mechanism will be released as a paper. ๐ Leaderboard & demo ๐ https://huggingface.co/spaces/VIDraft/vkae Articles ๐ https://huggingface.co/blog/FINAL-Bench/vkae-leaderboard
View all activity
Organizations
None yet
spaces
2
Sort:ย Recently updated
pinned
Build error
Agents
HeartMuLa
๐
A Family of Open Sourced Music Foundation Models
Build error
FinePDFs: Liberating 3T of the finest tokens from PDFs
๐
models
0
None public yet
datasets
0
None public yet