Bursty genes and the two-state promoter model¶
This tutorial is a follow-up to Modeling Assumptions for Single-Cell RNA-seq
with scribe. Every model in that tutorial was, at heart, a
negative binomial — and a negative binomial can never be genuinely
bimodal. This one asks the question those models structurally cannot answer:
which genes are genuinely bursty (a two-mode count distribution from a
promoter that switches slowly between OFF and ON states), and what does the
two-state (telegraph) promoter model capture that the negative binomial —
even with a gene-specific \(p_g\) — cannot?
The notebook has two phases:
- Validate on synthetic data, where the ground truth is known. We generate counts from a known two-state process (some genes deeply bursty, some negative-binomial), recover the bimodality a negative binomial cannot, and work carefully through which biophysical parameters are identifiable from snapshot counts and which are not (the mean, excess Fano, burst frequency, and bursting regime are; the absolute switching rates are not).
- Apply to a real monoculture — a deeply-sequenced K562 dataset from 10x Genomics. A monoculture is essential: it isolates promoter bursting within one cell type from the very different signal of a gene being on in one cell type and off in another (a job for a mixture model). The honest result is that most genes sit in the negative-binomial limit, which the two-state model correctly reports — and we examine the handful it flags as deviating.
Pre-computed outputs
This notebook requires a GPU to run. All outputs shown below were
pre-computed and exported to static HTML. To re-run it yourself, clone
the repository and execute the notebook with marimo edit docs/tutorials/k562_cells_twostate.py
on a GPU-enabled machine.