Writing

Long-form articles on LLM inference, performance engineering, low-level development, and AI compiler work for XPUs. Written here first, cross-posted to Medium.

Filter: All LLM Inference Performance Low-Level AI Compilers AI Agents
2025
01
LLM Inference Series · §00 Apr 23, 2025
Why Running an LLM Is Harder Than It Looks
Training gets the papers. Using models gets the API tutorials. In between sits inference — a complete systems engineering discipline that is almost entirely distinct from the ML research that produced the model.
02
LLM Inference Series · §01 Coming soon
Memory Mapping and How a 140GB Model Actually Loads
How safetensors and GGUF make zero-copy model loading possible, and why .pt files cannot.

More articles in progress. New pieces publish here first, then cross-posted to Medium.

Follow on Medium →