The `scribe.fit()` Interface¶

scribe.fit() is the single entry point for all SCRIBE inference. Every model, parameterization, prior, inference engine, and guide family is configured through keyword arguments to this one function. This page walks through every parameter group, explains when and why each matters, and links to the deeper guides and theory pages for full details.

import scribe

# Sensible defaults --- variable capture is on by default
results = scribe.fit(adata)

# Add a low-rank guide for gene-gene correlations
results = scribe.fit(adata, guide_rank=64)

Read order

If you are new to SCRIBE, read sections 1--4 below and the Model Selection page. The remaining sections cover progressively more advanced features that you can explore as needed.

Naming convention

Parameters use descriptive names (expression_prior, prob_prior, zero_inflation_prior, etc.) rather than single-letter math notation. Legacy shorthand (mu_prior, p_prior, gate_prior, ...) is still accepted for backward compatibility but the descriptive forms are recommended.

Variable capture is on by default. The default model is "nbvcp", which includes cell-specific capture probability. Use variable_capture=False to disable it, or zero_inflation=True to add a zero-inflation gate. The model keyword still accepts "nbdm", "nbvcp", "zinb", and "zinbvcp" for the same four combinations. See Model selection for the full resolution table.

1. Data input¶

These parameters control what data SCRIBE reads and how it interprets the count matrix.

Parameter	Default	Description
`counts`	(required)	Count matrix (`jnp.ndarray`) or `AnnData` object. Shape is `(n_cells, n_genes)` when `cells_axis=0`
`cells_axis`	`0`	Which axis represents cells. `0` = rows are cells (the standard layout)
`layer`	`None`	AnnData layer to use for counts. `None` uses `.X`
`seed`	`42`	Random seed for reproducibility

# From a raw JAX/NumPy array
results = scribe.fit(counts_array, model="nbdm")

# From AnnData, reading a specific layer
results = scribe.fit(adata, layer="raw_counts")

Note

When you need AnnData-specific features---annotation priors, multi-dataset keys, or layer selection---counts must be an AnnData object.

2. Model selection¶

All four likelihoods share the same Negative Binomial core. The default includes variable capture (model="nbvcp"), which models cell-specific library-size variation---the right choice for the vast majority of scRNA-seq datasets. Use variable_capture and zero_inflation to compose the model explicitly, or set model to a single string. If you pass both flags and model=, they must agree or SCRIBE raises an error.

Parameter	Default	Description
`variable_capture`	`None`	`True` adds cell-specific capture probability. `False` removes it. `None` defers to `model`
`zero_inflation`	`None`	`True` adds a per-gene zero-inflation gate. `None` defers to `model`
`model`	`"nbvcp"`	Likelihood short name. Default includes variable capture. Flags override this when set

What you pass	Same as `model=`
Default (nothing)	`"nbvcp"`
`variable_capture=False`	`"nbdm"`
`zero_inflation=True`	`"zinbvcp"`
`variable_capture=False, zero_inflation=True`	`"zinb"`

# Default: variable capture is already on
results = scribe.fit(adata)

# Disable variable capture (plain NB) --- only when library sizes are very tight
results = scribe.fit(adata, variable_capture=False)

# Add zero inflation on top of the default variable capture
results = scribe.fit(adata, zero_inflation=True)

# String form is still supported
results = scribe.fit(adata, model="zinbvcp")

Add a low-rank guide

Adding guide_rank=64 gives SCRIBE a parameter-efficient way to capture gene-gene correlations that a mean-field posterior would miss. See Model Selection for the full decision guide.

Why variable capture is on by default

Empirically, we have not yet encountered a dataset that does not benefit from variable capture. Cell-specific capture probability accounts for library-size heterogeneity that is ubiquitous in scRNA-seq protocols. Set variable_capture=False if your library sizes are tightly controlled (less than 2x variation between cells).

Full guide: Model Selection | Parameter cheatsheet: Parameter Reference

3. Parameterization¶

How the Negative Binomial parameters are represented internally. The choice affects optimization speed, numerical stability, and which downstream analyses are available. This is independent of whether you select the likelihood with variable_capture / zero_inflation or with a model= string (both remain valid; see Model selection).

Parameter	Default	Description
`parameterization`	`"canonical"`	`"canonical"` (alias `"standard"`), `"mean_prob"` (alias `"linked"`), or `"mean_odds"` (alias `"odds_ratio"`)
`unconstrained`	`False`	Use Normal + transform instead of constrained distributions. Required for hierarchical priors and BNB overdispersion

Name	Code	Samples	Derives	Best for
Canonical	`"canonical"`	\(p, r\)	---	Direct interpretation
Mean probs	`"mean_prob"`	\(p, \mu\)	\(r = \mu(1-p)/p\)	Couples mean and success probability
Mean odds	`"mean_odds"`	\(\phi, \mu\)	\(p = 1/(1+\phi)\), \(r = \mu\phi\)	Stable when \(p\) is near 1

# Mean odds parameterization (often converges faster)
results = scribe.fit(adata, variable_capture=True, parameterization="mean_odds")

# Unconstrained mode --- needed for hierarchical priors and BNB
results = scribe.fit(adata, model="nbdm", unconstrained=True)

When to use unconstrained=True

You must set unconstrained=True when using any of the following: hierarchical priors (expression_prior, prob_prior, zero_inflation_prior), mean anchoring (expression_anchor), BNB overdispersion (overdispersion="bnb"), or dataset-level priors. SCRIBE will raise a ValueError if you forget.

Full guide: Model Selection > Parameterizations | Parameter cheatsheet: Parameter Reference

4. Inference method¶

SCRIBE supports three inference backends, all accessed through the same scribe.fit() call.

Parameter	Default	Description
`inference_method`	`"svi"`	`"svi"` (Stochastic Variational Inference), `"mcmc"` (NUTS), or `"vae"` (Variational Autoencoder)

SVI parameters¶

Parameter	Default	Description
`n_steps`	`50_000`	Number of optimization steps
`batch_size`	`None`	Mini-batch size. `None` = full-batch. Recommended for > 10 K cells
`stable_update`	`True`	Numerically stable parameter updates
`log_progress_lines`	`False`	Emit periodic plain-text progress lines (useful for SLURM logs)
`early_stopping`	`None`	Dict or `EarlyStoppingConfig` for automatic convergence detection
`restore_best`	`False`	Track the best variational parameters during training and restore them at the end
`optimizer_config`	`None`	Custom optimizer: `{"name": "adam", "step_size": 1e-3}`. Supports `"adam"`, `"clipped_adam"`, `"adagrad"`, `"rmsprop"`, `"sgd"`, `"momentum"`

# SVI with mini-batching and early stopping
results = scribe.fit(
    adata,
    variable_capture=True,
    n_steps=200_000,
    batch_size=512,
    early_stopping={
        "patience": 500,
        "min_delta": 1.0,
        "smoothing_window": 50,
        "restore_best": True,
    },
)

MCMC parameters¶

Parameter	Default	Description
`n_samples`	`2_000`	Posterior samples per chain
`n_warmup`	`1_000`	Warmup (burn-in) samples
`n_chains`	`1`	Number of parallel NUTS chains
`svi_init`	`None`	`ScribeSVIResults` to warm-start MCMC (cross-parameterization supported)
`enable_x64`	`None`	Float64 precision. Defaults to `True` for MCMC, `False` for SVI/VAE

# MCMC warm-started from SVI
svi_results = scribe.fit(adata, model="nbdm", n_steps=50_000)
mcmc_results = scribe.fit(
    adata,
    model="nbdm",
    inference_method="mcmc",
    svi_init=svi_results,
    n_samples=2_000,
    n_warmup=500,
    n_chains=4,
)

Cross-parameterization initialization

The svi_init parameter handles parameterization mapping automatically. You can run SVI with parameterization="mean_prob" and initialize MCMC with parameterization="mean_odds"---SCRIBE converts the MAP estimates internally.

Full guide: Inference Methods

5. Variational guide configuration¶

The guide (variational family) controls how well the approximate posterior can capture correlations between parameters.

Low-rank Gaussian guides¶

Parameter	Default	Description
`guide_rank`	`None`	Rank for low-rank guide on gene-specific parameters. `None` = mean-field (fully factorized)
`joint_params`	`None`	Parameter names to model jointly. Accepts shorthands (`"all"`, `"biological"`, `"mean"`, `"prob"`, `"gate"`) or an explicit list (e.g. `["mu", "phi"]`). Works with `guide_rank` or `guide_flow`
`dense_params`	`None`	Subset of `joint_params` that get full cross-gene coupling. Accepts same shorthands or explicit list. Others get gene-local conditioning

# Low-rank guide (captures gene-gene correlations)
results = scribe.fit(adata, model="nbdm", guide_rank=8)

# Joint low-rank across mu and phi
results = scribe.fit(
    adata,
    model="nbdm",
    parameterization="mean_odds",
    unconstrained=True,
    guide_rank=10,
    joint_params=["mu", "phi"],
)

Normalizing flow guides¶

Replaces the Gaussian variational family with a learned invertible transformation, enabling multimodal, skewed, and heavy-tailed posterior approximations. Mutually exclusive with guide_rank.

Parameter	Default	Description
`guide_flow`	`None`	Flow type: `"affine_coupling"` (recommended), `"spline_coupling"`, `"maf"`, `"iaf"`
`guide_flow_num_layers`	`4`	Number of coupling layers
`guide_flow_hidden_dims`	`[64, 64]`	Hidden sizes in the conditioner MLP
`guide_flow_activation`	`"relu"`	Activation function for conditioner MLPs
`guide_flow_n_bins`	`8`	Spline bins (only for `"spline_coupling"`)
`guide_flow_mixture_strategy`	`"independent"`	`"independent"` or `"shared"` for mixture/dataset components
`guide_flow_zero_init`	`True`	Identity-init via zero output layer
`guide_flow_layer_norm`	`True`	LayerNorm in conditioner MLP
`guide_flow_residual`	`True`	Residual connections in conditioner MLP
`guide_flow_soft_clamp`	`True`	Smooth arctan clamp on affine log-scale (Andrade 2024)
`guide_flow_loft`	`True`	LOFT compression + trainable final affine
`guide_flow_log_det_f64`	`False`	Float64 log-det accumulation (datacenter GPUs only)

# Affine coupling flow guide (recommended for high-dimensional gene params)
results = scribe.fit(
    adata,
    model="nbdm",
    unconstrained=True,
    guide_flow="affine_coupling",
    guide_flow_num_layers=4,
)

# Joint flow across mu and phi
results = scribe.fit(
    adata,
    model="nbdm",
    parameterization="mean_odds",
    unconstrained=True,
    guide_flow="affine_coupling",
    joint_params=["mu", "phi"],
)

Use affine_coupling for guide-level flows

In scRNA-seq, gene-specific parameters live in thousands to tens of thousands of dimensions. Only affine coupling layers are numerically stable enough at this scale. Spline coupling and autoregressive flows are better suited for low-dimensional settings like VAE latent spaces.

Full guide: Variational Guide Families

6. Capture amortization (VCP models)¶

When variable_capture=True (NBVCP and ZINBVCP), each cell has its own capture probability. Amortization replaces per-cell variational parameters with a small neural network that predicts them from total UMI count, reducing the parameter count from \(O(N_{\text{cells}})\) to the network weights.

Parameter	Default	Description
`amortize_capture`	`False`	Enable neural-network amortization of capture probability
`capture_hidden_dims`	`[64, 32]`	Hidden layer sizes for the amortizer MLP
`capture_activation`	`"leaky_relu"`	Activation function (`"relu"`, `"gelu"`, `"silu"`, `"tanh"`, ...)
`capture_output_transform`	`"softplus"`	Output transform for positive parameters (`"softplus"` or `"exp"`)
`capture_clamp_min`	`0.1`	Minimum clamp for MLP outputs. `None` to disable
`capture_clamp_max`	`50.0`	Maximum clamp for MLP outputs. `None` to disable
`capture_amortization`	`None`	`AmortizationConfig` or dict that overrides all six parameters above

# Amortized capture with defaults --- useful for very large datasets
results = scribe.fit(adata, variable_capture=True, amortize_capture=True)

# Custom amortizer architecture
results = scribe.fit(
    adata,
    variable_capture=True,
    amortize_capture=True,
    capture_hidden_dims=[128, 64, 32],
    capture_activation="gelu",
)

When to amortize

Amortization is most beneficial when the number of cells is so large that you get out-of-memory issues. For small datasets the per-cell parameterization is fine and avoids the neural network overhead.

7. Hierarchical priors (gene-level)¶

Hierarchical priors provide adaptive shrinkage across mixture components (for expression_prior) or across genes (for prob_prior, zero_inflation_prior). They share statistical strength so that most parameters stay close to a population center while allowing true outliers to deviate. All require unconstrained=True.

Parameter	Default	Description
`expression_prior`	`"none"`	Hierarchical prior on \(\mu\) (or \(r\)) across mixture components. Requires `n_components >= 2`
`prob_prior`	`"none"`	Hierarchical prior on \(p\) (or \(\phi\)) across genes
`zero_inflation_prior`	`"none"`	Hierarchical prior on zero-inflation gate across genes. Only for ZI models

All three accept: "none", "gaussian", "horseshoe", or "neg".

Gaussian --- simple Normal shrinkage. Lightest; suitable when most parameters are expected to differ moderately.
Horseshoe --- strong shrinkage toward zero with heavy tails for true outliers. Good default for sparse signals.
NEG (Normal-Exponential-Gamma) --- even heavier tails than Horseshoe with continuous adaptive shrinkage.

# Horseshoe shrinkage on mu across cell types
results = scribe.fit(
    adata,
    variable_capture=True,
    unconstrained=True,
    n_components=5,
    expression_prior="horseshoe",
)

# Gaussian prior on gene-specific p
results = scribe.fit(
    adata,
    model="nbdm",
    unconstrained=True,
    prob_prior="gaussian",
)

# NEG prior on zero-inflation gate
results = scribe.fit(
    adata,
    zero_inflation=True,
    unconstrained=True,
    zero_inflation_prior="neg",
)

Hyperparameters¶

Fine-tune the behavior of Horseshoe and NEG priors:

Parameter	Default	Prior	Description
`horseshoe_tau0`	`1.0`	Horseshoe	Global shrinkage scale. Smaller = stronger shrinkage
`horseshoe_slab_df`	`4`	Horseshoe	Degrees of freedom of the regularizing slab
`horseshoe_slab_scale`	`2.0`	Horseshoe	Scale of the regularizing slab
`neg_u`	`1.0`	NEG	Shape parameter \(u\)
`neg_a`	`1.0`	NEG	Shape parameter \(a\)
`neg_tau`	`1.0`	NEG	Scale parameter \(\tau\)

# Horseshoe with tighter global shrinkage
results = scribe.fit(
    adata,
    variable_capture=True,
    unconstrained=True,
    n_components=4,
    expression_prior="horseshoe",
    horseshoe_tau0=0.5,
    horseshoe_slab_scale=1.0,
)

Full guide: Theory: Hierarchical Priors

8. Mean anchoring prior¶

The mean anchoring prior resolves the \(\mu\)--\(\phi\) degeneracy in the Negative Binomial by centering each gene's biological mean on its observed sample mean, adjusted for average capture efficiency.

Parameter	Default	Description
`expression_anchor`	`False`	Enable data-informed anchoring. Automatically sets `unconstrained=True`
`expression_anchor_sigma`	`0.3`	Log-scale standard deviation. `0.1`--`0.2` = tight, `0.3`--`0.5` = recommended, `> 1` = weak

For VCP models, SCRIBE needs to estimate the average capture probability from data. Provide biology-informed capture information via the priors dictionary:

# Mean anchoring with organism-informed capture
results = scribe.fit(
    adata,
    variable_capture=True,
    expression_anchor=True,
    expression_anchor_sigma=0.3,
    priors={"organism": "human"},
    amortize_capture=True,
)

# Mean anchoring with explicit capture efficiency
results = scribe.fit(
    adata,
    variable_capture=True,
    expression_anchor=True,
    priors={"capture_efficiency": (10.0, 1e5)},
)

Note

Without variable capture (nbdm, zinb), the anchor uses the implicit capture \(\bar{\nu} = 1\), so no extra priors are needed.

Full guide: Theory: Anchoring Priors

9. BNB overdispersion¶

The Beta Negative Binomial extension adds a per-gene concentration parameter \(\kappa_g\) that allows heavier tails than the standard NB. It can be combined with any model.

Parameter	Default	Description
`overdispersion`	`"none"`	`"none"` (standard NB) or `"bnb"` (Beta Negative Binomial)
`overdispersion_prior`	`"horseshoe"`	Hierarchical prior on \(\kappa_g\): `"horseshoe"` or `"neg"`

results = scribe.fit(
    adata,
    variable_capture=True,
    overdispersion="bnb",
    unconstrained=True,
    amortize_capture=True,
)

Fit variable capture first

What appears as heavy-tailed gene expression often reflects variable capture efficiency rather than genuine per-gene overdispersion. Always fit an NBVCP model first and check the posterior predictive distribution. Add BNB only when excess dispersion persists after accounting for capture.

Full guide: Theory: Beta Negative Binomial | Model Selection > BNB

10. Mixture models¶

Mixture models discover cell subpopulations by fitting \(K\) sets of gene-specific parameters. Each cell is softly assigned to a component.

Parameter	Default	Description
`n_components`	`None`	Number of mixture components. `None` = single component (no mixture). Must be >= 2 if set
`mixture_params`	`"all"`	Which parameters are component-specific. Accepts shorthands: `"all"` (every param incl. gate), `"biological"` (core NB params only), `"mean"`, `"prob"`, `"gate"`, or an explicit list like `["r"]`

# Discover 5 cell types
results = scribe.fit(
    adata,
    variable_capture=True,
    n_components=5,
    n_steps=150_000,
    amortize_capture=True,
)

# Extract assignments
assignments = results.cell_type_assignments(counts=adata.X)

Annotation priors¶

If you have partial or complete cell-type labels, use them as soft priors on mixture assignments. This guides the model without forcing hard assignments.

Parameter	Default	Description
`annotation_key`	`None`	Column(s) in `adata.obs` with cell-type labels. Accepts a string or a list of strings for composite labels
`annotation_confidence`	`3.0`	Prior strength \(\kappa\). `0` = ignored, `3` = ~20x boost, large = near-hard assignment
`annotation_component_order`	`None`	Explicit label-to-component mapping. `None` sorts labels alphabetically
`annotation_min_cells`	`None`	Minimum cells per label. Labels below this threshold are treated as unlabeled

# Use existing annotations as soft priors
results = scribe.fit(
    adata,
    variable_capture=True,
    n_components=5,
    annotation_key="cell_type",
    annotation_confidence=3.0,
    amortize_capture=True,
)

# Composite labels from two columns (e.g. cell_type x treatment)
results = scribe.fit(
    adata,
    variable_capture=True,
    annotation_key=["cell_type", "treatment"],
    annotation_confidence=5.0,
    annotation_min_cells=20,
    amortize_capture=True,
)

Automatic component inference

When annotation_key is set but n_components is omitted, SCRIBE automatically infers the number of components from the unique non-null labels (filtered by annotation_min_cells if set).

See also: Results Class (mixture assignments and components)

11. Multi-dataset hierarchy¶

When your experiment spans multiple datasets (e.g. batches, conditions, or labs), SCRIBE can share statistical strength across datasets via dataset-level hierarchical priors on gene-specific parameters.

Dataset specification¶

Parameter	Default	Description
`dataset_key`	`None`	Column in `adata.obs` identifying which dataset each cell belongs to
`n_datasets`	`None`	Number of datasets. Auto-inferred from `dataset_key` when `None`
`dataset_params`	`None`	Which parameters become dataset-specific (auto-determined from priors when `None`)
`dataset_mixing`	`None`	Dataset-specific mixture weights. `None` = auto (`True` when >= 2 datasets)
`auto_downgrade_single_dataset_hierarchy`	`True`	Automatically simplify hierarchy when `dataset_key` resolves to a single dataset

Dataset-level priors¶

Each parameter that varies across genes can also vary across datasets, with a hierarchical prior controlling how much dataset-to-dataset variation is allowed.

Parameter	Default	Description
`expression_dataset_prior`	`"none"`	Prior on \(\mu\) across datasets: `"none"`, `"gaussian"`, `"horseshoe"`, `"neg"`
`prob_dataset_prior`	`"none"`	Prior on \(p\) across datasets
`prob_dataset_mode`	`"gene_specific"`	How \(p\) varies: `"scalar"`, `"gene_specific"`, or `"two_level"`
`zero_inflation_dataset_prior`	`"none"`	Prior on zero-inflation gate across datasets
`overdispersion_dataset_prior`	`"none"`	Prior on BNB \(\kappa\) across datasets. Requires `overdispersion="bnb"`
`capture_scaling_prior`	`"none"`	Prior on per-dataset capture scaling \(\eta_d\). For VCP models

# Two-dataset comparison with horseshoe shrinkage on mu
results = scribe.fit(
    adata,
    variable_capture=True,
    unconstrained=True,
    dataset_key="batch",
    expression_dataset_prior="horseshoe",
    prob_dataset_prior="gaussian",
    amortize_capture=True,
)

Single-dataset downgrade

When dataset_key points to a column with only one unique value, SCRIBE automatically downgrades dataset-level priors to gene-level equivalents (or drops them) and emits a UserWarning. Disable this with auto_downgrade_single_dataset_hierarchy=False.

Full guide: Theory: Hierarchical Priors > Multiple datasets

12. Custom prior hyperparameters¶

The priors dictionary lets you override default prior hyperparameters for any model parameter. Values are tuples of hyperparameters whose meaning depends on the distribution family.

Parameter	Default	Description
`priors`	`None`	Dict mapping parameter names to hyperparameter tuples

# Override p and r priors
results = scribe.fit(
    adata,
    model="nbdm",
    priors={
        "p": (1.0, 1.0),     # Beta(1, 1) --- uniform
        "r": (0.0, 1.0),     # LogNormal(0, 1)
    },
)

# Symmetric Dirichlet for mixture weights (scalar is broadcast)
results = scribe.fit(
    adata,
    variable_capture=True,
    n_components=4,
    priors={"mixing": 5.0},  # equivalent to (5.0, 5.0, 5.0, 5.0)
)

# Biology-informed capture prior
results = scribe.fit(
    adata,
    variable_capture=True,
    priors={"organism": "human"},
)

13. VAE architecture¶

When inference_method="vae", these parameters configure the encoder-decoder neural network architecture.

Parameter	Default	Description
`vae_latent_dim`	`10`	Dimensionality of the latent space
`vae_encoder_hidden_dims`	`None`	Encoder hidden layer sizes (e.g. `[512, 256]`)
`vae_decoder_hidden_dims`	`None`	Decoder hidden layer sizes
`vae_activation`	`None`	Activation function (`"relu"`, `"gelu"`, `"silu"`, ...)
`vae_input_transform`	`"log1p"`	Input preprocessing: `"log1p"`, `"log"`, `"sqrt"`, `"identity"`
`vae_standardize`	`False`	Standardize transformed inputs to zero mean, unit variance
`vae_decoder_transforms`	`None`	Per-parameter decoder output transforms

Normalizing flow priors¶

For more expressive latent distributions, attach a normalizing flow:

Parameter	Default	Description
`vae_flow_type`	`"none"`	`"none"`, `"affine_coupling"`, `"spline_coupling"`, `"maf"`, or `"iaf"`
`vae_flow_num_layers`	`4`	Number of flow layers
`vae_flow_hidden_dims`	`None`	Hidden dimensions in each flow layer

# VAE with spline coupling flow
results = scribe.fit(
    adata,
    model="nbdm",
    inference_method="vae",
    vae_latent_dim=15,
    vae_encoder_hidden_dims=[512, 256],
    vae_flow_type="spline_coupling",
    vae_flow_num_layers=4,
    n_steps=100_000,
    batch_size=256,
)

# Cell embeddings
embeddings = results.get_latent_embeddings(data=adata.X, n_samples=100)

Full guide: Inference Methods > VAE

14. Power-user overrides¶

For maximum control, bypass the flat keyword interface and pass fully constructed configuration objects. When provided, these override the corresponding keyword arguments.

Parameter	Default	Description
`model_config`	`None`	`ModelConfig` object. Overrides flat `model`, `variable_capture`, `zero_inflation`, `parameterization`, `unconstrained`, `n_components`, `mixture_params`, `joint_params`, `dense_params`, `guide_rank`, and `priors`
`inference_config`	`None`	`InferenceConfig` object. Overrides `inference_method`, `n_steps`, `batch_size`, `stable_update`, `log_progress_lines`, `n_samples`, `n_warmup`, and `n_chains`

from scribe.models.config import ModelConfigBuilder

# Build a model config step by step (horseshoe shrinkage on per-component means)
builder = (
    ModelConfigBuilder()
    .for_model("nbvcp")
    .with_parameterization("mean_odds")
    .unconstrained()
    .as_mixture(n_components=5)
)
builder._expression_prior = "horseshoe"
model_cfg = builder.build()

results = scribe.fit(adata, model_config=model_cfg)

Return value¶

scribe.fit() returns a results object whose type depends on the inference method:

Inference method	Return type	Key capabilities
`"svi"`	`ScribeSVIResults`	Posterior samples, loss history, PPC, denoising
`"mcmc"`	`ScribeMCMCResults`	Chain samples, NUTS diagnostics, PPC, denoising
`"vae"`	`ScribeVAEResults`	Latent embeddings, posterior samples, PPC, denoising

All result types share a common analysis API: posterior sampling, posterior predictive checks, Bayesian denoising, and log-likelihood computation.

Full guide: Results Class

Common recipes¶

Typical single-dataset analysis¶

# NBVCP with amortized capture and early stopping
results = scribe.fit(
    adata,
    variable_capture=True,
    parameterization="mean_odds",
    n_steps=100_000,
    batch_size=512,
    amortize_capture=True,
    early_stopping={"patience": 500, "restore_best": True},
)

Multi-dataset with hierarchical priors¶

# Share strength across batches
results = scribe.fit(
    adata,
    variable_capture=True,
    unconstrained=True,
    dataset_key="batch",
    expression_dataset_prior="horseshoe",
    prob_dataset_prior="gaussian",
    amortize_capture=True,
    n_steps=200_000,
)

Mixture model with annotations¶

# 8 cell types, guided by partial annotations
results = scribe.fit(
    adata,
    variable_capture=True,
    n_components=8,
    annotation_key="cell_type",
    annotation_confidence=3.0,
    annotation_min_cells=50,
    amortize_capture=True,
    n_steps=150_000,
)

SVI-to-MCMC warm start¶

# Fast exploration with SVI
svi_results = scribe.fit(
    adata, model="nbdm", parameterization="mean_prob", n_steps=50_000,
)

# Gold-standard posteriors with MCMC, initialized from SVI
mcmc_results = scribe.fit(
    adata,
    model="nbdm",
    parameterization="mean_odds",
    inference_method="mcmc",
    svi_init=svi_results,
    n_samples=4_000,
    n_warmup=500,
    n_chains=4,
)

Full hierarchical model with anchoring and BNB¶

# Everything turned on: VCP, anchoring, horseshoe, BNB
results = scribe.fit(
    adata,
    variable_capture=True,
    unconstrained=True,
    expression_anchor=True,
    expression_anchor_sigma=0.3,
    priors={"organism": "human"},
    overdispersion="bnb",
    prob_prior="gaussian",
    amortize_capture=True,
    n_steps=300_000,
    batch_size=512,
)

VAE with normalizing flows¶

# Latent representation + spline flow
results = scribe.fit(
    adata,
    model="nbdm",
    inference_method="vae",
    vae_latent_dim=15,
    vae_encoder_hidden_dims=[512, 256],
    vae_flow_type="spline_coupling",
    n_steps=100_000,
    batch_size=256,
)

# Retrieve embeddings for downstream analysis
z = results.get_latent_embeddings(data=adata.X, n_samples=100)

Quick reference¶

All scribe.fit() parameters at a glance, grouped by function:

Complete parameter table (click to expand)

Data input

Parameter	Default	Type
`counts`	(required)	`ndarray` or `AnnData`
`cells_axis`	`0`	`int`
`layer`	`None`	`str`
`seed`	`42`	`int`

Model

Parameter	Default	Type
`variable_capture`	`None`	`bool`
`zero_inflation`	`None`	`bool`
`model`	`"nbvcp"`	`str`
`parameterization`	`"canonical"`	`str`
`unconstrained`	`False`	`bool`

Hierarchical priors (gene-level)

Parameter	Default	Type
`expression_prior`	`"none"`	`str`
`prob_prior`	`"none"`	`str`
`zero_inflation_prior`	`"none"`	`str`

Prior hyperparameters

Parameter	Default	Type
`horseshoe_tau0`	`1.0`	`float`
`horseshoe_slab_df`	`4`	`int`
`horseshoe_slab_scale`	`2.0`	`float`
`neg_u`	`1.0`	`float`
`neg_a`	`1.0`	`float`
`neg_tau`	`1.0`	`float`

Mean anchoring

Parameter	Default	Type
`expression_anchor`	`False`	`bool`
`expression_anchor_sigma`	`0.3`	`float`

Overdispersion

Parameter	Default	Type
`overdispersion`	`"none"`	`str`
`overdispersion_prior`	`"horseshoe"`	`str`

Mixture

Parameter	Default	Type
`n_components`	`None`	`int`
`mixture_params`	`"all"`	`str` or `list[str]`

Annotation priors

Parameter	Default	Type
`annotation_key`	`None`	`str` or `list[str]`
`annotation_confidence`	`3.0`	`float`
`annotation_component_order`	`None`	`list[str]`
`annotation_min_cells`	`None`	`int`

Multi-dataset

Parameter	Default	Type
`dataset_key`	`None`	`str`
`n_datasets`	`None`	`int`
`dataset_params`	`None`	`list[str]`
`dataset_mixing`	`None`	`bool`
`expression_dataset_prior`	`"none"`	`str`
`prob_dataset_prior`	`"none"`	`str`
`prob_dataset_mode`	`"gene_specific"`	`str`
`zero_inflation_dataset_prior`	`"none"`	`str`
`overdispersion_dataset_prior`	`"none"`	`str`
`capture_scaling_prior`	`"none"`	`str`
`auto_downgrade_single_dataset_hierarchy`	`True`	`bool`

Guide (Gaussian)

Parameter	Default	Type
`guide_rank`	`None`	`int`
`joint_params`	`None`	`str` or `list[str]`
`dense_params`	`None`	`str` or `list[str]`
`priors`	`None`	`dict`

Guide (Normalizing Flow)

Parameter	Default	Type
`guide_flow`	`None`	`str`
`guide_flow_num_layers`	`4`	`int`
`guide_flow_hidden_dims`	`[64, 64]`	`list[int]`
`guide_flow_activation`	`"relu"`	`str`
`guide_flow_n_bins`	`8`	`int`
`guide_flow_mixture_strategy`	`"independent"`	`str`
`guide_flow_zero_init`	`True`	`bool`
`guide_flow_layer_norm`	`True`	`bool`
`guide_flow_residual`	`True`	`bool`
`guide_flow_soft_clamp`	`True`	`bool`
`guide_flow_loft`	`True`	`bool`
`guide_flow_log_det_f64`	`False`	`bool`

Capture amortization

Parameter	Default	Type
`amortize_capture`	`False`	`bool`
`capture_hidden_dims`	`[64, 32]`	`list[int]`
`capture_activation`	`"leaky_relu"`	`str`
`capture_output_transform`	`"softplus"`	`str`
`capture_clamp_min`	`0.1`	`float`
`capture_clamp_max`	`50.0`	`float`
`capture_amortization`	`None`	`AmortizationConfig`

Inference

Parameter	Default	Type
`inference_method`	`"svi"`	`str`
`n_steps`	`50_000`	`int`
`batch_size`	`None`	`int`
`optimizer_config`	`None`	`dict`
`stable_update`	`True`	`bool`
`log_progress_lines`	`False`	`bool`
`early_stopping`	`None`	`dict` or `EarlyStoppingConfig`
`restore_best`	`False`	`bool`
`n_samples`	`2_000`	`int`
`n_warmup`	`1_000`	`int`
`n_chains`	`1`	`int`
`svi_init`	`None`	`ScribeSVIResults`
`enable_x64`	`None`	`bool`

VAE

Parameter	Default	Type
`vae_latent_dim`	`10`	`int`
`vae_encoder_hidden_dims`	`None`	`list[int]`
`vae_decoder_hidden_dims`	`None`	`list[int]`
`vae_activation`	`None`	`str`
`vae_input_transform`	`"log1p"`	`str`
`vae_standardize`	`False`	`bool`
`vae_decoder_transforms`	`None`	`dict`
`vae_flow_type`	`"none"`	`str`
`vae_flow_num_layers`	`4`	`int`
`vae_flow_hidden_dims`	`None`	`list[int]`

Power-user overrides

Parameter	Default	Type
`model_config`	`None`	`ModelConfig`
`inference_config`	`None`	`InferenceConfig`

The scribe.fit() Interface¶