How Flash-MoE Runs a 397B Parameter Model on a MacBook Pro at 4.4 tok/s
A developer ran Qwen3.5-397B—a model bigger than GPT-4—on a laptop with no Python and no frameworks. Here's exactly how.
March 23, 202612 min read
1 post tagged with “mixture-of-experts”
A developer ran Qwen3.5-397B—a model bigger than GPT-4—on a laptop with no Python and no frameworks. Here's exactly how.