Variational Guide Families¶

In variational inference, the guide (also called the variational distribution) is the family of distributions used to approximate the true posterior. The choice of guide family controls the trade-off between computational cost, posterior accuracy, and the ability to capture correlations between parameters.

SCRIBE supports seven guide families, configurable per parameter ---meaning different parameters in the same model can use different guide families. For example, gene-specific dispersion might use a normalizing flow guide while a scalar success probability uses mean-field.

At a glance¶

Guide Family	Correlations	Memory	Speed	Best For
Mean-Field	None	\(O(G)\)	Fastest	Default, most analyses
Low-Rank	Within a parameter group	\(O(Gk)\)	Fast	Gene-gene correlations
Joint Low-Rank	Across parameter groups	\(O(Gk)\)	Moderate	Cross-parameter correlations (e.g., \(\mu\) and \(p\))
Normalizing Flow	Within a parameter group (non-Gaussian)	Network size	Moderate	Multimodal, skewed, heavy-tailed posteriors
Joint Normalizing Flow	Across parameter groups (non-Gaussian)	Network size	Slower	Non-linear cross-parameter dependencies
Amortized	Data-driven	Network size	Moderate	Cell-specific parameters (capture probability)
VAE Latent	Learned latent space	Network size	Slowest	Representation learning, embeddings

Mean-Field¶

The simplest and fastest guide family. Each parameter has an independent variational distribution---no correlations are captured between parameters:

\[ q(\theta_1, \theta_2, \ldots) = q(\theta_1)\,q(\theta_2)\,\cdots \]

This is the default for all parameters when no guide_rank is specified.

Advantages:

Fast convergence, low memory
Works well when parameters are approximately independent
Good baseline for most analyses

Limitations:

Ignores correlations between genes or parameters
Can underestimate posterior uncertainty

# Mean-field is the default --- no special arguments needed
results = scribe.fit(adata, model="nbdm")

When mean-field is sufficient

For many scRNA-seq analyses, mean-field provides excellent results. Upgrade to low-rank only when downstream tasks (DE, denoising) benefit from capturing gene correlations.

Low-Rank¶

Captures correlations within a parameter group (e.g., between genes for the dispersion parameter \(r_g\)) using a low-rank multivariate normal approximation:

\[ \underline{\underline{\Sigma}} = \underline{\underline{W}}\,\underline{\underline{W}}^\top + \text{diag}(\underline{d}), \]

where \(\underline{\underline{W}}\) is \((G, k)\) and \(\underline{d}\) is the diagonal. The rank \(k\) controls how many correlation modes are captured, with memory scaling as \(O(Gk)\) instead of \(O(G^2)\) for a full covariance.

Advantages:

Captures the top-\(k\) correlations between genes
Memory-efficient compared to full covariance
Important for accurate DE and denoising (cross-gene uncertainty)

Limitations:

More parameters to optimize than mean-field
May be slower to converge

# Low-rank guide with rank 8
results = scribe.fit(adata, model="nbdm", guide_rank=8)

`guide_rank`	Use case
5--10	Standard analysis, moderate gene correlations
10--20	DE analysis where cross-gene uncertainty matters
20--50	Large datasets with complex correlation structures

Joint Low-Rank¶

Extends the low-rank guide to capture correlations across parameter groups. For example, the gene-specific mean \(\mu_g\) and the success probability may be correlated in the posterior---a joint guide captures this structure.

Internally, the joint guide uses a chain-rule decomposition via the Woodbury identity:

\[ q(\theta_1, \theta_2) = q(\theta_1)\,q(\theta_2 \mid \theta_1), \]

where both the marginal and the conditional are low-rank MVN of the same rank. This extends naturally to three or more parameter groups (e.g., \(\mu\), \(p\), and gate in a ZINB model).

Advantages:

Captures cross-parameter correlations (e.g., \(\mu_g\) and \(\phi\))
Supports heterogeneous dimensions (scalar + gene-specific in one group)
Each conditional is itself a low-rank MVN (efficient computation)

Limitations:

At rank \(k\), within-group expressivity is reduced vs. separate rank-\(k\) guides
More complex optimization landscape

# Joint low-rank for mu and phi
results = scribe.fit(
    adata,
    model="nbdm",
    parameterization="mean_odds",  # alias: "odds_ratio"
    unconstrained=True,
    guide_rank=10,
    joint_params="biological",  # resolves to ["phi", "mu"] for mean_odds
)

Dense vs. structured params¶

For models with many parameter groups, you can designate which parameters get full cross-gene low-rank coupling (dense_params) while others only couple locally. Both joint_params and dense_params accept semantic shorthands ("all", "biological", "mean", "prob", "gate") or explicit lists:

# mu gets cross-gene correlations; phi and gate only couple to mu per gene
results = scribe.fit(
    adata,
    zero_inflation=True,
    unconstrained=True,
    guide_rank=10,
    joint_params="all",            # ["phi", "mu", "gate"] for mean_odds ZINB
    dense_params="mean",           # ["mu"] — only mu gets cross-gene coupling
)

Normalizing Flow¶

All the Gaussian-based families above (Mean-Field, Low-Rank, Joint Low-Rank) share a fundamental limitation: the variational distribution is always a (possibly correlated) Gaussian in unconstrained space. When the true posterior is multimodal, skewed, or heavy-tailed, a Gaussian guide underestimates the real uncertainty. A normalizing flow guide replaces the Gaussian with a learned invertible transformation of a simple base distribution, enabling arbitrarily complex densities.

Use affine coupling for scRNA-seq

In the high-dimensional setting of scRNA-seq (thousands to tens of thousands of genes), only affine coupling layers are numerically stable enough for reliable training. Spline coupling and autoregressive flows can produce NaN gradients at these dimensions because per-layer log-determinant contributions accumulate rapidly and the conditioner networks face enormous fan-in. SCRIBE recommends guide_flow="affine_coupling" for all guide-level flow usage.

Spline coupling remains the recommended choice for VAE-level flows (vae_flow_type="spline_coupling"), where the latent dimension is low (typically 10--30) and the extra expressiveness per layer is beneficial.

Stability features¶

Training coupling flows in 20,000+ dimensions is inherently challenging. SCRIBE implements several stabilization techniques inspired by Andrade 2024 (arXiv:2402.16408), all enabled by default, that make high-dimensional affine coupling flows practical:

Feature	What it does	Why it matters
Zero-init output	Conditioner output layer is initialized to zero so the flow starts as an identity transform	Prevents log-determinant overflow at initialization when G is large
Layer normalization	`LayerNorm` after each hidden Dense in the conditioner MLP	Stabilizes activations when fan-in is large (e.g. 20K inputs into a 64-wide bottleneck)
Residual connections	Skip connections between hidden layers of the same width	Improves gradient flow during training
Soft clamping	Smooth asymmetric arctan-based clamp on the affine log-scale	Replaces hard clipping; caps per-layer expansion to approximately 10% while preserving gradients at the boundary
LOFT	Log Soft Extension layer + trainable final affine after all coupling layers	Compresses extreme sample magnitudes logarithmically, then re-expands to match the target posterior's scale
Float64 log-det	Accumulate the log-determinant Jacobian in float64	Prevents precision loss when summing many small per-layer contributions. Off by default; recommended only for datacenter GPUs (A100, H100) with full-rate float64

Usage¶

# Per-parameter affine coupling flow
results = scribe.fit(
    adata,
    model="nbdm",
    unconstrained=True,
    guide_flow="affine_coupling",
    guide_flow_num_layers=4,
    guide_flow_hidden_dims=[64, 64],
)

Parameters¶

Parameter	Default	Description
`guide_flow`	`None`	Flow type: `"affine_coupling"` (recommended for guides), `"spline_coupling"`, `"maf"`, `"iaf"`. Mutually exclusive with `guide_rank`
`guide_flow_num_layers`	`4`	Number of coupling layers
`guide_flow_hidden_dims`	`[64, 64]`	Hidden layer sizes in the conditioner MLP
`guide_flow_activation`	`"relu"`	Activation function (`"relu"`, `"gelu"`, `"silu"`, `"leaky_relu"`, ...)
`guide_flow_n_bins`	`8`	Spline bins (only for `"spline_coupling"`)
`guide_flow_mixture_strategy`	`"independent"`	`"independent"` (separate flow per component) or `"shared"` (one flow conditioned on one-hot index)
`guide_flow_zero_init`	`True`	Zero-initialize conditioner output (identity-init)
`guide_flow_layer_norm`	`True`	Apply LayerNorm in conditioner MLP
`guide_flow_residual`	`True`	Residual connections in conditioner MLP
`guide_flow_soft_clamp`	`True`	Smooth asymmetric arctan clamp on affine log-scale
`guide_flow_loft`	`True`	LOFT compression + trainable final affine
`guide_flow_log_det_f64`	`True`	Float64 log-det accumulation. Auto-promotes `enable_x64=True`

When to use flows vs. low-rank

For nearly Gaussian posteriors, LowRankGuide is faster and equally accurate---use it as your default when you need gene correlations. Switch to a flow guide when diagnostics suggest the posterior is substantially non-Gaussian (multimodality, skewness, heavy tails) and the low-rank approximation is visibly inadequate.

Joint Normalizing Flow¶

Analogous to Joint Low-Rank but uses normalizing flows instead of Gaussians. Cross-parameter dependencies are captured via a chain-rule decomposition:

\[ q(\theta_1, \theta_2) = q(\theta_1)\;q(\theta_2 \mid \theta_1), \]

where each factor is a full normalizing flow. The conditional \(q(\theta_2 \mid \theta_1)\) is implemented by passing the unconstrained sample of \(\theta_1\) as a continuous context vector to the flow for \(\theta_2\). This extends naturally to three or more parameters via cumulative context.

Advantages:

Captures non-linear cross-parameter dependencies
Each conditional is a full flow --- more expressive than the Woodbury low-rank MVN conditionals
Supports dense_params (same semantics as Joint Low-Rank)

Limitations:

More flow parameters than Joint Low-Rank
Context-conditioned flows add dimensionality to conditioner networks
For approximately Gaussian joint posteriors, Joint Low-Rank is more parameter-efficient

# Joint affine coupling flow for mu and phi
results = scribe.fit(
    adata,
    model="nbdm",
    parameterization="mean_odds",
    unconstrained=True,
    guide_flow="affine_coupling",
    joint_params="biological",     # ["phi", "mu"] for mean_odds
    guide_flow_num_layers=4,
)

Scalar parameters in a joint flow group (e.g. phi when it is not gene-specific) automatically receive a context-conditioned Normal instead of a full flow, since coupling flows require at least two features.

Dense vs. structured params¶

Just like Joint Low-Rank, you can designate which parameters get a full flow (dense_params) while others receive diagonal Normal treatment with learned regression on the dense-flow residuals. The same shorthands apply:

# mu gets a full flow; phi and gate regress on mu per gene
results = scribe.fit(
    adata,
    zero_inflation=True,
    unconstrained=True,
    guide_flow="affine_coupling",
    joint_params="all",            # ["phi", "mu", "gate"] for mean_odds ZINB
    dense_params="mean",           # only mu gets full flow
)

Mixture and dataset support¶

When a parameter has mixture components or dataset axes, the flow guide creates per-component or per-dataset flow instances. The behavior is controlled by guide_flow_mixture_strategy:

"independent" (default) --- a separate flow chain per component, each with its own parameters. Maximum expressiveness.
"shared" --- a single flow chain conditioned on a one-hot component index. More parameter-efficient when components share structure.

Amortized¶

Instead of learning separate variational parameters for each data point, an amortized guide uses a neural network to predict variational parameters from data features (sufficient statistics like total UMI count). This is particularly useful for cell-specific parameters where the number of variational parameters would otherwise scale with the number of cells.

Advantages:

Scales to arbitrarily many cells without per-cell parameters
Shares statistical strength across similar cells
Fewer total variational parameters

Limitations:

Requires choosing a network architecture
May not be as flexible as per-cell optimization
Training can be more sensitive to hyperparameters

The primary use case in SCRIBE is amortized capture probability for VCP models:

# Amortized inference for capture probability
results = scribe.fit(
    adata,
    variable_capture=True,
    amortize_capture=True,
    capture_hidden_dims=[128, 64],
    capture_activation="leaky_relu",
)

Parameter	Default	Description
`amortize_capture`	`False`	Enable amortized capture inference
`capture_hidden_dims`	`[64, 32]`	MLP hidden layer sizes
`capture_activation`	`"leaky_relu"`	Activation function
`capture_output_transform`	`"softplus"`	Output transform for positive params

VAE Latent¶

The VAE guide uses an encoder-decoder neural network architecture. The encoder maps each cell's counts to a low-dimensional latent representation, and the decoder maps back to model parameters. This provides both a variational approximation and a learned cell embedding.

Advantages:

Produces low-dimensional cell embeddings for visualization/clustering
Captures complex nonlinear relationships
Can be enhanced with normalizing flow priors for richer latent distributions

Limitations:

Most computationally expensive guide family
Requires tuning network architecture
Posterior quality depends on encoder/decoder capacity

# Standard VAE
results = scribe.fit(
    adata,
    model="nbdm",
    inference_method="vae",
    vae_latent_dim=10,
    n_steps=100_000,
    batch_size=256,
)

# Cell embeddings
embeddings = results.get_latent_embeddings(data=adata.X, n_samples=100)

Normalizing flow priors¶

For more expressive latent distributions, attach a normalizing flow:

results = scribe.fit(
    adata,
    model="nbdm",
    inference_method="vae",
    vae_latent_dim=10,
    vae_flow_type="spline_coupling",
    vae_flow_num_layers=4,
    vae_flow_hidden_dims=[64, 64],
)

Flow type	Description
`"affine_coupling"`	Fast baseline
`"spline_coupling"`	Expressive, recommended for production
`"maf"`	Fast density evaluation
`"iaf"`	Fast sampling

How to choose¶

graph TD
    Start["Start"] --> Q1{"Need cell<br/>embeddings?"}
    Q1 -->|Yes| VAE["VAE Latent"]
    Q1 -->|No| Q2{"Cell-specific<br/>parameters?"}
    Q2 -->|Yes| Amort["Amortized"]
    Q2 -->|No| Q3{"Posterior likely<br/>non-Gaussian?"}
    Q3 -->|Yes| Q3b{"Cross-parameter<br/>dependencies?"}
    Q3b -->|Yes| JointFlow["Joint Normalizing Flow"]
    Q3b -->|No| Flow["Normalizing Flow"]
    Q3 -->|No| Q4{"Need cross-parameter<br/>correlations?"}
    Q4 -->|Yes| Joint["Joint Low-Rank"]
    Q4 -->|No| Q5{"Need gene-gene<br/>correlations?"}
    Q5 -->|Yes| LR["Low-Rank"]
    Q5 -->|No| MF["Mean-Field"]

Rules of thumb:

Start with mean-field. It is fast and works well for most analyses.
Add low-rank when doing DE or denoising, where cross-gene uncertainty propagation matters. Rank 8--15 is usually sufficient.
Use joint low-rank for unconstrained models with hierarchical priors, where \(\mu\) and \(\phi\) (or \(p\)) are expected to correlate.
Upgrade to a normalizing flow guide when Gaussian-based guides visibly struggle (multimodality, skewness, heavy tails). Use guide_flow="affine_coupling" for high-dimensional gene parameters.
Use joint normalizing flow when cross-parameter relationships are non-linear or the joint posterior is non-Gaussian (banana-shaped, multimodal).
Use amortized for VCP models with many cells, to avoid a per-cell variational parameter.
Use VAE when you also need cell embeddings for visualization or clustering.

Combining guide families in one model¶

SCRIBE's guide families are per-parameter, so a single model can use multiple families simultaneously. Via scribe.fit():

guide_rank + joint_params configure gene-specific parameters with low-rank or joint low-rank guides
guide_flow + joint_params configure gene-specific parameters with normalizing flow or joint normalizing flow guides
amortize_capture configures cell-specific capture probability with an amortized guide
Parameters not covered by joint_params or guide_flow/guide_rank default to mean-field

guide_flow and guide_rank are mutually exclusive

You cannot use both in the same scribe.fit() call. Choose one approach for gene-specific parameters: Gaussian-based (low-rank) or flow-based.

For more on how guide families fit into the broader inference workflow, see the Inference Methods page.