AIStackInsights

Tag: mixture-of-experts

2 posts tagged with “mixture-of-experts”

The iPhone 17 Pro Is Running a 400B LLM. Here's the Engineering That Makes It Possible.

The iPhone 17 Pro Is Running a 400B LLM. Here's the Engineering That Makes It Possible.

An iPhone with 12GB of RAM just ran a 400-billion-parameter model. The trick is streaming weights from flash — and the implications are massive.

March 24, 202613 min read

on-device-ai apple-neural-engine llm-inference

How Flash-MoE Runs a 397B Parameter Model on a MacBook Pro at 4.4 tok/s

How Flash-MoE Runs a 397B Parameter Model on a MacBook Pro at 4.4 tok/s

A developer ran Qwen3.5-397B—a model bigger than GPT-4—on a laptop with no Python and no frameworks. Here's exactly how.

March 23, 202612 min read

local-llm mixture-of-experts inference-optimization