The iPhone 17 Pro Is Running a 400B LLM. Here's the Engineering That Makes It Possible.
An iPhone with 12GB of RAM just ran a 400-billion-parameter model. The trick is streaming weights from flash — and the implications are massive.
March 24, 202613 min read