Currently waiting on finalization of "4 bit" compression for these model types to address tuning/Vram issues.
These are in progress at BNB / Unsloth.
That is really the only hold up.
Otherwise VRAM to train these sparse moes is in 70 to 100 GB range. And really slow too.