AIStackInsights

Tag: local-llm

1 post tagged with “local-llm”

How Flash-MoE Runs a 397B Parameter Model on a MacBook Pro at 4.4 tok/s

How Flash-MoE Runs a 397B Parameter Model on a MacBook Pro at 4.4 tok/s

A developer ran Qwen3.5-397B—a model bigger than GPT-4—on a laptop with no Python and no frameworks. Here's exactly how.

March 23, 202612 min read

local-llm mixture-of-experts inference-optimization