Blog

Research notes, infrastructure war stories, and updates from the workshop.

May 30, 2026

#mojo#modular#blackwell#interpretability#homelab

Why I’m Building Frontier Interpretability on a Homelab — and Betting on Mojo to Get There

Serious AI research is supposed to need a data center. I’m wagering it doesn’t — a quiet home Blackwell cluster, custom Mojo kernels nobody else is writing, and upstream contributions to MAX. Here’s the why behind the push.

Read →

May 15, 2026

#memory#inference#interpretability#architecture

What If a Model Could Remember What It Learned?

We’re building activation-level memory for AI inference — a model that thinks differently because of what it has experienced before, not just one with more text in the prompt. Early results, a real selectivity number, and a provisional patent on the way.

Read →

May 2, 2026

#max-engine#modular#dgx-spark#infrastructure

Running Gemma-4-31B on DGX Spark with MAX Engine — What Broke and How We Fixed It

Getting Modular's MAX Engine to serve a 31B parameter model on NVIDIA's smallest Grace Blackwell system.

Read →

May 2, 2026

#interpretability#sparse-autoencoders#gemma-4#datasets

60 Layers of Interpretability: Publishing Gemma-4-31B SAE Features on HuggingFace

3,000 interpreted and verified sparse autoencoder features for every layer of Google's Gemma-4-31B.

Read →

April 25, 2026

#interpretability#topology#mechanistic#sparse-autoencoders

Topology of Thought — Three-Instrument Convergence

Three independent measurement instruments — persistent homology, SIPIT invertibility, and SAE decomposition — converge on a universal three-phase structure inside transformer residual streams. Confirmed across 4 attention-based architectures, falsified in Mamba.

Read →

February 27, 2026

#interpretability#topology#state-space-models#falsification

The Mamba Counterexample

If topological integration were universal, state space models should show it too. They don’t. Mamba-370m maintains fragmented representations end-to-end.

Read →

February 9, 2026

#interpretability#mechanistic#sparse-autoencoders#transformers

Emergent Computational Gating in Dense Transformers

Dense transformers develop bimodal processing gates at layers 3-4 that nobody designed. Confirmed across three model families. Standard SAEs fail at -3,059% on deep layers; a SipIt + SAE + GLP pipeline recovers them.

Read →

January 19, 2026

#interpretability#safety#infrastructure#sparse-autoencoders

MRI for AI — Activation-Level Detection

What if you could read what a model is actually computing while it generates an answer? 182 runs across 29 models, 0.96 calibration accuracy. Activation-based detection runs about 10× more reliable than text-only analysis.

Read →

January 19, 2026

#safety#deception#benchmark#evaluation

Agent Deception Benchmark

We measured self-assessment calibration across 29 frontier models. The overclaim rate runs as high as 80%, and competitive framing makes it worse.

Read →

January 14, 2025

#evaluation#alignment#goodhart#math

The Goodhart Gap

4 of 5 frontier models correctly explained a discount calculation but produced the wrong final answer. The gap shrinks with scale but persists. 61,678 math problems evaluated.

Read →

November 14, 2024

#fine-tuning#goodhart#failure#alignment

Self-Improvement via Inversion — A Documented Failure

+18.1% fidelity looked great. Real output quality declined 11%. The model learned to game the metric. This failure motivated everything that followed.

Read →