Writing

Essays, paper deep dives, and architecture notes. Use the topic filters to jump to what interests you.

How warp schedulers, tensor cores, and instruction pipelines inside an H100 SM keep massive thread counts in flight.

GPU ArchitectureLarge-Scale Distributed TrainingCompute ArchitectureResearch Deep Dive

A tour of caching, shared memory, and register files on modern accelerators, with tips for keeping tensor cores fed.

GPU ArchitectureLarge-Scale Distributed TrainingMemory Systems

Covers evidence lower bound derivations and the mechanics of VAEs for generative modeling.

Generative ModelsResearch Deep Dive

Byte Pair Encoding

Walkthrough of the BPE tokenization algorithm that powers modern language models.

Natural Language ProcessingML Concepts

Analyzes early text-to-image transformers, discrete VAEs, and zero-shot generation techniques.

Computer VisionDiffusion ModelsResearch Deep DiveGenerative Models