NB Log-Normal Model¶
The Negative Binomial Log-Normal (NBLN) model is the heavier-tailed sibling of the Poisson Log-Normal model. It shares the same GRN-derived multivariate Gaussian on log-rates, but replaces PLN's Poisson observation channel with a Negative Binomial — giving each gene an explicit dispersion parameter \(r_g\) that captures bursty transcription beyond what the Gaussian prior alone can produce.
Motivation¶
PLN's Poisson observation channel implicitly assumes that, conditional on the log-rate, gene counts are equi-dispersed (variance = mean). This is exact for the deterministic mass-action limit of bursty transcription, but real scRNA-seq data shows substantially over-dispersed per-cell counts even after the log-normal prior absorbs gene-to-gene rate variation:
-
Bursty transcription persists. The Gamma-Poisson mixture representation of bursty gene expression (Beta Negative Binomial) survives the GRN coupling: each gene's per-cell count is still NB-distributed given its log-rate, not Poisson.
-
PLN underfits the low-count regime. Genes with mean expression \(\sim 1\) UMI per cell show heavier-than-Poisson tails. PLN tries to absorb this through inflated \(\Sigma_{gg}\), which corrupts the cross-gene covariance structure.
-
The NB is the right per-gene channel. Adding a per-gene dispersion \(r_g\) decouples gene-intrinsic burst overdispersion from the cross-gene regulatory signal. The two get clean separate parameters.
The NBLN preserves PLN's log-concave posterior geometry, biology- informed capture handling, and Laplace inference path while restoring the Gamma-Poisson burst kinetics at the observation channel.
Definition¶
The NBLN distribution is a hierarchical model for a \(G\)-dimensional count vector:
where \(\mathrm{NegBinomial}_{\text{mean}}(\mu, r)\) is the NB parameterization with mean \(\mu\) and concentration (dispersion) \(r\) — equivalently, variance \(\mu + \mu^2 / r\). The latent \(\underline{x} \in \mathbb{R}^G\) is the log-rate vector and \(\underline{\underline{\Sigma}}\) encodes cross-gene regulatory correlations.
The PLN limit is \(r_g \to \infty\), where the NB collapses to its Poisson mean and the model recovers PLN exactly. The independent-gene NB limit is \(\underline{\underline{\Sigma}} \to 0\), where each gene becomes a Gamma-Poisson mixture with no cross-gene coupling.
Moments¶
The NBLN has tractable closed-form moments derived from the law of total variance:
| Quantity | Expression |
|---|---|
| Mean | \(\langle u_g \rangle = \exp(\mu_g + \tfrac{1}{2}\Sigma_{gg})\) |
| Variance | PLN variance \(+\;\langle u_g \rangle^2 / r_g\) — extra term is the NB-over-Poisson burst contribution |
| Covariance | \(\text{Cov}(u_g, u_{g'}) = \exp(\mu_g + \mu_{g'} + \tfrac{1}{2}(\Sigma_{gg} + \Sigma_{g'g'}))[\exp(\Sigma_{gg'}) - 1]\) |
Cross-gene covariance is identical to PLN: the per-gene dispersion \(r_g\) only inflates within-gene variance, not between-gene covariance.
Low-rank covariance¶
The covariance is parameterized the same way as PLN:
The latent dimension \(k\) (set via the latent_dim kwarg) is the
rank of the low-rank regulatory signal. With NBLN's Phase-3
loadings-shrinkage priors, \(k\) can be set
generously and the prior selects the effective rank adaptively.
Capture as a log-rate offset¶
NBLN inherits PLN's clean capture handling — capture enters as an additive log-rate offset \(\eta^{(c)}\), with the same truncated-normal biology-informed prior:
The capture model is structurally identical to PLN's, just routed through the NB likelihood instead of the Poisson likelihood.
Log-concave posterior + per-cell rigid-translation gauge¶
The per-cell conditional posterior on \(\underline{x}\) is strictly log-concave (same proof structure as PLN — the NB log-likelihood is concave in \(x_g\) at fixed \(r_g\), plus the Gaussian prior contributes another strictly concave term). This guarantees a unique MAP and validates the Laplace approximation.
NBLN has one structural complication PLN doesn't: a per-cell rigid-translation gauge. Under the transformation \((x_c, \eta_c) \to (x_c + \Delta_c \mathbf{1}, \eta_c + \Delta_c)\) for an arbitrary scalar \(\Delta_c\) per cell, the NB likelihood is exactly invariant (because \(x_g - \eta\) appears in the rate and the shift cancels). This is a \(C\)-dimensional degeneracy (one per cell) that's broken only by the priors on \(\underline{x}\) (MVN) and \(\eta\) (TruncatedNormal). Compositional predictions and the gauge-invariant projection \(\underline{\underline{W}}_\perp\) are exactly invariant under this gauge; the absolute log-rates and the raw loadings \(\underline{\underline{W}}\) are not.
Practical consequence: raw \(\underline{\underline{W}}\) carries
a rank-1 all-ones-direction contamination that reflects cell-scaling
slop rather than biology. For cross-gene correlation analysis, always
use the gauge-invariant projection
\(\underline{\underline{W}}_\perp = \underline{\underline{W}} -
\overline{\underline{\underline{W}}}\) (see
Loadings Shrinkage for the full theorem). The
get_W_compositional() and get_gauge_diagnostics() accessors expose
this projection and quantify how much gauge slop the raw fit carries.
SVI-cascade + freeze for the gauge¶
Because the per-cell gauge is \(C\)-dimensional rather than 1D (as in PLN), even the MVN+TruncN priors leave a near-flat ridge in the posterior at default hyperparameters. SCRIBE addresses this with a two-phase informative-prior cascade that pins the gauge structurally:
Phase 1: SVI cascade as soft prior¶
Fit an NBVCP-SVI source first, derive empirical Gaussian priors on \(r_g\), \(\mu_g\), and \(\eta_c\) from the SVI posterior samples, and inject them into the NBLN-Laplace loss. The priors' scales (not just locations) propagate into the Hessian:
import scribe, numpy as np
# Fit NBVCP-SVI (cascade source)
svi_results = scribe.fit(
adata, model="nbvcp", parameterization="mean_odds",
priors={"capture_efficiency": (np.log(100_000), 0.5)},
inference_method="svi", n_steps=50_000,
)
# NBLN-Laplace with cascade as informative prior
laplace_results = scribe.fit(
adata, model="nbln", inference_method="laplace",
informative_priors_from=svi_results,
informative_priors_tau=1.0, # default
n_steps=20_000,
)
Phase 2: Freeze for structural gauge fix¶
Promote selected cascade-derived parameters from soft priors to hard freezes: the NBLN M-step does not refine them. Default freeze is \((\underline{r}, \underline{\eta})\), which pins the per-cell rigid-translation gauge exactly:
laplace_results = scribe.fit(
adata, model="nbln", inference_method="laplace",
informative_priors_from=svi_results,
informative_priors_freeze=("r", "eta"), # default
n_steps=20_000,
)
The freeze excludes \(r_g\) and \(\eta_c\) from the optimizer's
parameter dict, so they cannot drift. Frozen parameters retain the
cascade's full SVI posterior (not a moment-matched Gaussian summary)
— get_distributions() and PPC paths route through
result.cascade_source to preserve full guide fidelity.
| Freeze level | Freeze set | When to use |
|---|---|---|
| Level 1 | () |
No freeze. Soft cascade only. Use for diagnostic comparison. |
| Level 2 | ("r",) |
Freeze dispersion. Use when NBVCP's \(\eta\) is suspect. |
| Level 3 | ("r", "eta") |
Default. Pins the gauge structurally. Recommended for production. |
| Level 4 | ("r", "mu", "eta") |
NBLN learns only \(\underline{\underline{W}}\) and \(\underline{d}\) on top of NBVCP. |
Descriptive aliases for freeze keys
informative_priors_freeze accepts either the internal short
names shown above or their descriptive aliases:
"r"↔"dispersion""mu"↔"expression"or"mean_expression""eta"↔"capture_efficiency"
So ("dispersion", "capture_efficiency") is equivalent to
("r", "eta"). Both forms work; the descriptive form matches
the convention used in priors={"capture_efficiency": ...}.
Passing both an internal name and its alias (e.g.
("r", "dispersion")) raises ValueError.
Loadings shrinkage for adaptive rank selection¶
Even with the Phase-2 freeze, at generous latent_dim (e.g. 32) the
loadings matrix \(\underline{\underline{W}}\) can still overfit by
filling unused factors with noise — manifested as spurious
cross-block diagonal contours in the
compositional corner PPC. The Phase-3
loadings-shrinkage prior adds an adaptive
rank-selection prior on the columns of
\(\underline{\underline{W}}_\perp\), letting the data pick the
effective rank:
laplace_results = scribe.fit(
adata, model="nbln", inference_method="laplace",
informative_priors_from=svi_results,
informative_priors_freeze=("r", "eta"),
priors={
"capture_efficiency": (np.log(100_000), 0.5),
"loadings": {"type": "horseshoe_columnwise", "tau_scale": 1.0},
},
latent_dim=16, # generous; prior picks effective rank
n_steps=20_000,
)
print(laplace_results.w_prior_diagnostics["effective_rank"]) # e.g. 3-5
See the Loadings Shrinkage theory page for
the strategy catalog (gaussian, horseshoe_columnwise,
neg_columnwise), calibration workflow, and math details.
Comparison with PLN and LNM¶
| Property | LNM | PLN | NBLN |
|---|---|---|---|
| Observation channel | Multinomial composition | Poisson on rate | NB on rate (gene-specific dispersion) |
| Per-gene overdispersion | Implicit via composition | Equi-dispersed (Var = mean) | Explicit via \(r_g\) |
| Bursty transcription | Absorbed into log-normal prior | Absorbed into log-normal prior | Native via NB observation channel |
| Gauge dimension | 0 (ALR-space W) | 1 (global \(\mu\)-\(\eta\) shift) | \(C\) (per-cell rigid translation) |
| Gauge resolution | Parametric (ALR reference) | Capture prior + Gaussian prior | Phase-2 cascade freeze on \(\eta\) |
| Total count | Separate NB on \(u_T\) | Emergent from per-gene Poissons | Emergent from per-gene NBs |
| Best for | Direct compositional analysis | Heavy-tailed totals, capture clean | Bursty data, cross-gene correlation, cascade-freeze fits |
Posterior predictive checks¶
NBLN supports three PPC conditioning regimes, plus a compositional PPC family that specifically tests the gauge-invariant signal:
| Mode | What it tests | Conditioning |
|---|---|---|
ppc_level="marginal" |
Honest full generative test | Fresh \(\underline{x}\), \(\eta\), totals from the model |
ppc_level="library_anchored" (default) |
Compositional fit | Fresh composition, totals from data |
ppc_level="per_cell" |
Per-cell predictive | Per-cell MAP latents + observation noise |
plot_compositional_corner_ppc |
Gauge-invariant compositions | Pre-observation-noise simplex draws vs empirical + pseudobulk |
For cascade-frozen fits, PPC routes \(r_g\), \(\eta_c\) (and
optionally \(\mu_g\)) through result.cascade_source to preserve the
full SVI posterior structure — no moment-matching.
When to use NBLN¶
| Use NBLN when… | Use PLN when… |
|---|---|
| You have a working NBVCP-SVI fit and want a cascade-frozen refinement | No NBVCP fit available; PLN as a standalone Laplace fit |
| Low-count regime where Poisson under-fits the per-gene tail | Counts are well-modeled by Poisson on the rate |
| Cross-gene regulatory correlation recovery is the primary goal | Compositional structure is the primary downstream interest |
| You need both per-gene dispersion and cross-gene covariance with structural gauge identifiability | Posterior log-concavity is more important than gauge structure |
Practical recommendation
For modern cross-gene correlation analysis on bursty scRNA-seq data, the recommended pipeline is:
- Fit
model="nbvcp"with SVI as the cascade source. - Fit
model="nbln"withinference_method="laplace",informative_priors_from=svi_results, default freeze("r", "eta"), andpriors={"loadings": {"type": "horseshoe_columnwise"}}for adaptive rank selection. - Validate via
plot_ppc(ppc_level="marginal")andplot_compositional_corner_ppc. - Extract correlations via
get_W_compositional()— the gauge-invariant projection.
See Loadings Shrinkage and the Model Selection guide for the decision tree.