Open coordination layer for frontier AI infrastructure

LLM Inference Bottleneck Registry

What's blocking 100× cheaper, faster, more abundant AI? A versioned, community-editable list of the real bottlenecks — with known solutions, adoption blockers, and dependency edges. Built so engineers, researchers, and funders can coordinate against the same map.

v0.2 · 18 bottlenecks · last updated 2026-04-22 · CC-BY-4.0

The thesis

Public discourse on AI compute oscillates between two errors: marketing-grade "200× speedup" claims that collapse under audit, and nihilistic "we need $10 trillion in fabs" framings that paralyze coordination. The truth is in between, and it is more actionable than either.

Of the eighteen frontier-inference bottlenecks catalogued here:

The binding constraint on AI abundance is shared map, not money or physics. This registry is one attempt at that map.

The companion paper

Bottleneck-Driven Projection of Frontier-Class LLM Inference on Dedicated ASICs  ·  v0.2 (2026-04-22)

What it shows

  • Calibrated baselines from public Groq, Cerebras, NVIDIA MLPerf data — not vendor marketing scalars.
  • Projection of Claude-class MoE inference on a 2027-feasible ASIC, with explicit sensitivity analysis.
  • "Agent density" as a CEO/policymaker-facing economic metric.
  • Eighteen bottlenecks, taxonomized by type and difficulty.
  • Adversarial critique section anticipating ten objections.

Headline numbers (verifiable ranges)

Decode throughput10-70× over H100
Energy per token50-200×
Cost per million tokens20-100×
Agent density per $1M CapEx3,000-12,000 streams

Each range is conditional on model size fit, batch regime, and software maturity. Single-scalar comparisons across these metrics are misleading.

Read paper (PDF) Markdown source BibTeX (.bib)


Companion papers in this series

Radar chart of platform capabilities
Six-axis platform comparison (log scale; H100 = unit). Etched Sohu excluded — vendor-claimed numbers unaudited.
Cost per million tokens
Amortized cost per million decoded tokens. Hatched red = vendor claim, unaudited. Dotted = projection.

The registry

Eighteen open bottlenecks. Filter by type, status, or priority. Click any entry for full detail, known solutions, blockers, and dependencies.

ID Name Type Status Prio Diff Unlock

How to contribute

  1. Pull request against bottleneck_registry.md. Schema is documented at the top of that file; one Markdown section per entry.
  2. Status transitions (open → partial → resolved) require a citation.
  3. Blockers must be specific. "More research needed" is not a blocker; "FP4 QAT degrades long-tail tasks by ~3% on MMLU" is.
  4. Failed-attempt reports are welcome and tracked with the same rigor as wins.

View on GitHub View the source registry (raw)

Support this work

This project is run by an independent researcher on a $20/month budget. Donations go directly to compute (LoRA training runs, benchmark replication) and registry maintenance. Every dollar is accounted for in a public ledger.

Donation channels (GitHub Sponsors, Ko-fi, etc.) pending — to be filled in by author. Placeholder: data-todo="confirm-donation-channels-with-author".