The scribe.fit() Interface¶
scribe.fit() is the single entry point for all SCRIBE inference. Every
model, parameterization, prior, inference engine, and guide family is
configured through keyword arguments to this one function. This page walks
through every parameter group, explains when and why each matters, and links
to the deeper guides and theory pages for full details.
import scribe
# Sensible defaults --- variable capture is on by default
results = scribe.fit(adata)
# Add a low-rank guide for gene-gene correlations
results = scribe.fit(adata, guide_rank=64)
Read order
If you are new to SCRIBE, read sections 1--4 below and the Model Selection page. The remaining sections cover progressively more advanced features that you can explore as needed.
Naming convention
Parameters use descriptive names (expression_prior, prob_prior,
zero_inflation_prior, etc.) rather than single-letter math notation.
Legacy shorthand (mu_prior, p_prior, gate_prior, ...) is still
accepted for backward compatibility but the descriptive forms are
recommended.
Variable capture is on by default. The default model is "nbvcp", which
includes cell-specific capture probability. Use variable_capture=False
to disable it, or zero_inflation=True to add a zero-inflation gate.
The model keyword still accepts "nbdm", "nbvcp", "zinb", and
"zinbvcp" for the same four combinations. See
Model selection for the full resolution table.
1. Data input¶
These parameters control what data SCRIBE reads and how it interprets the count matrix.
| Parameter | Default | Description |
|---|---|---|
counts |
(required) | Count matrix (jnp.ndarray) or AnnData object. Shape is (n_cells, n_genes) when cells_axis=0 |
cells_axis |
0 |
Which axis represents cells. 0 = rows are cells (the standard layout) |
layer |
None |
AnnData layer to use for counts. None uses .X |
seed |
42 |
Random seed for reproducibility |
# From a raw JAX/NumPy array
results = scribe.fit(counts_array, model="nbdm")
# From AnnData, reading a specific layer
results = scribe.fit(adata, layer="raw_counts")
Note
When you need AnnData-specific features---annotation priors, multi-dataset
keys, or layer selection---counts must be an AnnData object.
2. Model selection¶
All four likelihoods share the same Negative Binomial core. The default
includes variable capture (model="nbvcp"), which models cell-specific
library-size variation---the right choice for the vast majority of scRNA-seq
datasets. Use variable_capture and zero_inflation to compose the
model explicitly, or set model to a single string. If you pass both
flags and model=, they must agree or SCRIBE raises an error.
| Parameter | Default | Description |
|---|---|---|
variable_capture |
None |
True adds cell-specific capture probability. False removes it. None defers to model |
zero_inflation |
None |
True adds a per-gene zero-inflation gate. None defers to model |
model |
"nbvcp" |
Likelihood short name. Default includes variable capture. Flags override this when set |
| What you pass | Same as model= |
|---|---|
| Default (nothing) | "nbvcp" |
variable_capture=False |
"nbdm" |
zero_inflation=True |
"zinbvcp" |
variable_capture=False, zero_inflation=True |
"zinb" |
# Default: variable capture is already on
results = scribe.fit(adata)
# Disable variable capture (plain NB) --- only when library sizes are very tight
results = scribe.fit(adata, variable_capture=False)
# Add zero inflation on top of the default variable capture
results = scribe.fit(adata, zero_inflation=True)
# String form is still supported
results = scribe.fit(adata, model="zinbvcp")
Add a low-rank guide
Adding guide_rank=64 gives SCRIBE a parameter-efficient way to capture
gene-gene correlations that a mean-field posterior would miss. See
Model Selection for the full decision guide.
Why variable capture is on by default
Empirically, we have not yet encountered a dataset that does not
benefit from variable capture. Cell-specific capture probability accounts
for library-size heterogeneity that is ubiquitous in scRNA-seq protocols.
Set variable_capture=False if your library sizes are tightly controlled
(less than 2x variation between cells).
Full guide: Model Selection | Parameter cheatsheet: Parameter Reference
3. Parameterization¶
How the Negative Binomial parameters are represented internally. The choice
affects optimization speed, numerical stability, and which downstream
analyses are available. This is independent of whether you select the
likelihood with variable_capture / zero_inflation or with a model=
string (both remain valid; see Model selection).
| Parameter | Default | Description |
|---|---|---|
parameterization |
"canonical" |
"canonical" (alias "standard"), "mean_prob" (alias "linked"), or "mean_odds" (alias "odds_ratio") |
unconstrained |
False |
Use Normal + transform instead of constrained distributions. Required for hierarchical priors and BNB overdispersion |
| Name | Code | Samples | Derives | Best for |
|---|---|---|---|---|
| Canonical | "canonical" |
\(p, r\) | --- | Direct interpretation |
| Mean probs | "mean_prob" |
\(p, \mu\) | \(r = \mu(1-p)/p\) | Couples mean and success probability |
| Mean odds | "mean_odds" |
\(\phi, \mu\) | \(p = 1/(1+\phi)\), \(r = \mu\phi\) | Stable when \(p\) is near 1 |
# Mean odds parameterization (often converges faster)
results = scribe.fit(adata, variable_capture=True, parameterization="mean_odds")
# Unconstrained mode --- needed for hierarchical priors and BNB
results = scribe.fit(adata, model="nbdm", unconstrained=True)
When to use unconstrained=True
You must set unconstrained=True when using any of the following:
hierarchical priors (expression_prior, prob_prior,
zero_inflation_prior), mean anchoring (expression_anchor), BNB
overdispersion (overdispersion="bnb"), or dataset-level priors. SCRIBE
will raise a ValueError if you forget.
Full guide: Model Selection > Parameterizations | Parameter cheatsheet: Parameter Reference
4. Inference method¶
SCRIBE supports three inference backends, all accessed through the same
scribe.fit() call.
| Parameter | Default | Description |
|---|---|---|
inference_method |
"svi" |
"svi" (Stochastic Variational Inference), "mcmc" (NUTS), or "vae" (Variational Autoencoder) |
SVI parameters¶
| Parameter | Default | Description |
|---|---|---|
n_steps |
50_000 |
Number of optimization steps |
batch_size |
None |
Mini-batch size. None = full-batch. Recommended for > 10 K cells |
stable_update |
True |
Numerically stable parameter updates |
log_progress_lines |
False |
Emit periodic plain-text progress lines (useful for SLURM logs) |
early_stopping |
None |
Dict or EarlyStoppingConfig for automatic convergence detection |
restore_best |
False |
Track the best variational parameters during training and restore them at the end |
optimizer_config |
None |
Custom optimizer: {"name": "adam", "step_size": 1e-3}. Supports "adam", "clipped_adam", "adagrad", "rmsprop", "sgd", "momentum" |
# SVI with mini-batching and early stopping
results = scribe.fit(
adata,
variable_capture=True,
n_steps=200_000,
batch_size=512,
early_stopping={
"patience": 500,
"min_delta": 1.0,
"smoothing_window": 50,
"restore_best": True,
},
)
MCMC parameters¶
| Parameter | Default | Description |
|---|---|---|
n_samples |
2_000 |
Posterior samples per chain |
n_warmup |
1_000 |
Warmup (burn-in) samples |
n_chains |
1 |
Number of parallel NUTS chains |
svi_init |
None |
ScribeSVIResults to warm-start MCMC (cross-parameterization supported) |
enable_x64 |
None |
Float64 precision. Defaults to True for MCMC, False for SVI/VAE |
# MCMC warm-started from SVI
svi_results = scribe.fit(adata, model="nbdm", n_steps=50_000)
mcmc_results = scribe.fit(
adata,
model="nbdm",
inference_method="mcmc",
svi_init=svi_results,
n_samples=2_000,
n_warmup=500,
n_chains=4,
)
Cross-parameterization initialization
The svi_init parameter handles parameterization mapping automatically.
You can run SVI with parameterization="mean_prob" and initialize MCMC
with parameterization="mean_odds"---SCRIBE converts the MAP estimates
internally.
Full guide: Inference Methods
5. Variational guide configuration¶
The guide (variational family) controls how well the approximate posterior can capture correlations between parameters.
Low-rank Gaussian guides¶
| Parameter | Default | Description |
|---|---|---|
guide_rank |
None |
Rank for low-rank guide on gene-specific parameters. None = mean-field (fully factorized) |
joint_params |
None |
Parameter names to model jointly. Accepts shorthands ("all", "biological", "mean", "prob", "gate") or an explicit list (e.g. ["mu", "phi"]). Works with guide_rank or guide_flow |
dense_params |
None |
Subset of joint_params that get full cross-gene coupling. Accepts same shorthands or explicit list. Others get gene-local conditioning |
# Low-rank guide (captures gene-gene correlations)
results = scribe.fit(adata, model="nbdm", guide_rank=8)
# Joint low-rank across mu and phi
results = scribe.fit(
adata,
model="nbdm",
parameterization="mean_odds",
unconstrained=True,
guide_rank=10,
joint_params=["mu", "phi"],
)
Normalizing flow guides¶
Replaces the Gaussian variational family with a learned invertible
transformation, enabling multimodal, skewed, and heavy-tailed posterior
approximations. Mutually exclusive with guide_rank.
| Parameter | Default | Description |
|---|---|---|
guide_flow |
None |
Flow type: "affine_coupling" (recommended), "spline_coupling", "maf", "iaf" |
guide_flow_num_layers |
4 |
Number of coupling layers |
guide_flow_hidden_dims |
[64, 64] |
Hidden sizes in the conditioner MLP |
guide_flow_activation |
"relu" |
Activation function for conditioner MLPs |
guide_flow_n_bins |
8 |
Spline bins (only for "spline_coupling") |
guide_flow_mixture_strategy |
"independent" |
"independent" or "shared" for mixture/dataset components |
guide_flow_zero_init |
True |
Identity-init via zero output layer |
guide_flow_layer_norm |
True |
LayerNorm in conditioner MLP |
guide_flow_residual |
True |
Residual connections in conditioner MLP |
guide_flow_soft_clamp |
True |
Smooth arctan clamp on affine log-scale (Andrade 2024) |
guide_flow_loft |
True |
LOFT compression + trainable final affine |
guide_flow_log_det_f64 |
False |
Float64 log-det accumulation (datacenter GPUs only) |
# Affine coupling flow guide (recommended for high-dimensional gene params)
results = scribe.fit(
adata,
model="nbdm",
unconstrained=True,
guide_flow="affine_coupling",
guide_flow_num_layers=4,
)
# Joint flow across mu and phi
results = scribe.fit(
adata,
model="nbdm",
parameterization="mean_odds",
unconstrained=True,
guide_flow="affine_coupling",
joint_params=["mu", "phi"],
)
Use affine_coupling for guide-level flows
In scRNA-seq, gene-specific parameters live in thousands to tens of thousands of dimensions. Only affine coupling layers are numerically stable enough at this scale. Spline coupling and autoregressive flows are better suited for low-dimensional settings like VAE latent spaces.
Full guide: Variational Guide Families
6. Capture amortization (VCP models)¶
When variable_capture=True (NBVCP and ZINBVCP), each cell has its own capture
probability. Amortization replaces per-cell variational parameters with a
small neural network that predicts them from total UMI count, reducing the
parameter count from \(O(N_{\text{cells}})\) to the network weights.
| Parameter | Default | Description |
|---|---|---|
amortize_capture |
False |
Enable neural-network amortization of capture probability |
capture_hidden_dims |
[64, 32] |
Hidden layer sizes for the amortizer MLP |
capture_activation |
"leaky_relu" |
Activation function ("relu", "gelu", "silu", "tanh", ...) |
capture_output_transform |
"softplus" |
Output transform for positive parameters ("softplus" or "exp") |
capture_clamp_min |
0.1 |
Minimum clamp for MLP outputs. None to disable |
capture_clamp_max |
50.0 |
Maximum clamp for MLP outputs. None to disable |
capture_amortization |
None |
AmortizationConfig or dict that overrides all six parameters above |
# Amortized capture with defaults --- useful for very large datasets
results = scribe.fit(adata, variable_capture=True, amortize_capture=True)
# Custom amortizer architecture
results = scribe.fit(
adata,
variable_capture=True,
amortize_capture=True,
capture_hidden_dims=[128, 64, 32],
capture_activation="gelu",
)
When to amortize
Amortization is most beneficial when the number of cells is so large that you get out-of-memory issues. For small datasets the per-cell parameterization is fine and avoids the neural network overhead.
See also: Variational Guide Families > Amortized
7. Hierarchical priors (gene-level)¶
Hierarchical priors provide adaptive shrinkage across mixture components
(for expression_prior) or across genes (for prob_prior,
zero_inflation_prior). They share statistical strength so that most parameters
stay close to a population center while allowing true outliers to deviate. All
require unconstrained=True.
| Parameter | Default | Description |
|---|---|---|
expression_prior |
"none" |
Hierarchical prior on \(\mu\) (or \(r\)) across mixture components. Requires n_components >= 2 |
prob_prior |
"none" |
Hierarchical prior on \(p\) (or \(\phi\)) across genes |
zero_inflation_prior |
"none" |
Hierarchical prior on zero-inflation gate across genes. Only for ZI models |
All three accept: "none", "gaussian", "horseshoe", or "neg".
- Gaussian --- simple Normal shrinkage. Lightest; suitable when most parameters are expected to differ moderately.
- Horseshoe --- strong shrinkage toward zero with heavy tails for true outliers. Good default for sparse signals.
- NEG (Normal-Exponential-Gamma) --- even heavier tails than Horseshoe with continuous adaptive shrinkage.
# Horseshoe shrinkage on mu across cell types
results = scribe.fit(
adata,
variable_capture=True,
unconstrained=True,
n_components=5,
expression_prior="horseshoe",
)
# Gaussian prior on gene-specific p
results = scribe.fit(
adata,
model="nbdm",
unconstrained=True,
prob_prior="gaussian",
)
# NEG prior on zero-inflation gate
results = scribe.fit(
adata,
zero_inflation=True,
unconstrained=True,
zero_inflation_prior="neg",
)
Hyperparameters¶
Fine-tune the behavior of Horseshoe and NEG priors:
| Parameter | Default | Prior | Description |
|---|---|---|---|
horseshoe_tau0 |
1.0 |
Horseshoe | Global shrinkage scale. Smaller = stronger shrinkage |
horseshoe_slab_df |
4 |
Horseshoe | Degrees of freedom of the regularizing slab |
horseshoe_slab_scale |
2.0 |
Horseshoe | Scale of the regularizing slab |
neg_u |
1.0 |
NEG | Shape parameter \(u\) |
neg_a |
1.0 |
NEG | Shape parameter \(a\) |
neg_tau |
1.0 |
NEG | Scale parameter \(\tau\) |
# Horseshoe with tighter global shrinkage
results = scribe.fit(
adata,
variable_capture=True,
unconstrained=True,
n_components=4,
expression_prior="horseshoe",
horseshoe_tau0=0.5,
horseshoe_slab_scale=1.0,
)
Full guide: Theory: Hierarchical Priors
8. Mean anchoring prior¶
The mean anchoring prior resolves the \(\mu\)--\(\phi\) degeneracy in the Negative Binomial by centering each gene's biological mean on its observed sample mean, adjusted for average capture efficiency.
| Parameter | Default | Description |
|---|---|---|
expression_anchor |
False |
Enable data-informed anchoring. Automatically sets unconstrained=True |
expression_anchor_sigma |
0.3 |
Log-scale standard deviation. 0.1--0.2 = tight, 0.3--0.5 = recommended, > 1 = weak |
For VCP models, SCRIBE needs to estimate the average capture probability
from data. Provide biology-informed capture information via the priors
dictionary:
# Mean anchoring with organism-informed capture
results = scribe.fit(
adata,
variable_capture=True,
expression_anchor=True,
expression_anchor_sigma=0.3,
priors={"organism": "human"},
amortize_capture=True,
)
# Mean anchoring with explicit capture efficiency
results = scribe.fit(
adata,
variable_capture=True,
expression_anchor=True,
priors={"capture_efficiency": (10.0, 1e5)},
)
Note
Without variable capture (nbdm, zinb), the anchor uses the implicit
capture \(\bar{\nu} = 1\), so no extra priors are needed.
Full guide: Theory: Anchoring Priors
9. BNB overdispersion¶
The Beta Negative Binomial extension adds a per-gene concentration parameter \(\kappa_g\) that allows heavier tails than the standard NB. It can be combined with any model.
| Parameter | Default | Description |
|---|---|---|
overdispersion |
"none" |
"none" (standard NB) or "bnb" (Beta Negative Binomial) |
overdispersion_prior |
"horseshoe" |
Hierarchical prior on \(\kappa_g\): "horseshoe" or "neg" |
results = scribe.fit(
adata,
variable_capture=True,
overdispersion="bnb",
unconstrained=True,
amortize_capture=True,
)
Fit variable capture first
What appears as heavy-tailed gene expression often reflects variable capture efficiency rather than genuine per-gene overdispersion. Always fit an NBVCP model first and check the posterior predictive distribution. Add BNB only when excess dispersion persists after accounting for capture.
Full guide: Theory: Beta Negative Binomial | Model Selection > BNB
10. Mixture models¶
Mixture models discover cell subpopulations by fitting \(K\) sets of gene-specific parameters. Each cell is softly assigned to a component.
| Parameter | Default | Description |
|---|---|---|
n_components |
None |
Number of mixture components. None = single component (no mixture). Must be >= 2 if set |
mixture_params |
"all" |
Which parameters are component-specific. Accepts shorthands: "all" (every param incl. gate), "biological" (core NB params only), "mean", "prob", "gate", or an explicit list like ["r"] |
# Discover 5 cell types
results = scribe.fit(
adata,
variable_capture=True,
n_components=5,
n_steps=150_000,
amortize_capture=True,
)
# Extract assignments
assignments = results.cell_type_assignments(counts=adata.X)
Annotation priors¶
If you have partial or complete cell-type labels, use them as soft priors on mixture assignments. This guides the model without forcing hard assignments.
| Parameter | Default | Description |
|---|---|---|
annotation_key |
None |
Column(s) in adata.obs with cell-type labels. Accepts a string or a list of strings for composite labels |
annotation_confidence |
3.0 |
Prior strength \(\kappa\). 0 = ignored, 3 = ~20x boost, large = near-hard assignment |
annotation_component_order |
None |
Explicit label-to-component mapping. None sorts labels alphabetically |
annotation_min_cells |
None |
Minimum cells per label. Labels below this threshold are treated as unlabeled |
# Use existing annotations as soft priors
results = scribe.fit(
adata,
variable_capture=True,
n_components=5,
annotation_key="cell_type",
annotation_confidence=3.0,
amortize_capture=True,
)
# Composite labels from two columns (e.g. cell_type x treatment)
results = scribe.fit(
adata,
variable_capture=True,
annotation_key=["cell_type", "treatment"],
annotation_confidence=5.0,
annotation_min_cells=20,
amortize_capture=True,
)
Automatic component inference
When annotation_key is set but n_components is omitted, SCRIBE
automatically infers the number of components from the unique non-null
labels (filtered by annotation_min_cells if set).
See also: Results Class (mixture assignments and components)
11. Multi-dataset hierarchy¶
When your experiment spans multiple datasets (e.g. batches, conditions, or labs), SCRIBE can share statistical strength across datasets via dataset-level hierarchical priors on gene-specific parameters.
Dataset specification¶
| Parameter | Default | Description |
|---|---|---|
dataset_key |
None |
Column in adata.obs identifying which dataset each cell belongs to |
n_datasets |
None |
Number of datasets. Auto-inferred from dataset_key when None |
dataset_params |
None |
Which parameters become dataset-specific (auto-determined from priors when None) |
dataset_mixing |
None |
Dataset-specific mixture weights. None = auto (True when >= 2 datasets) |
auto_downgrade_single_dataset_hierarchy |
True |
Automatically simplify hierarchy when dataset_key resolves to a single dataset |
Dataset-level priors¶
Each parameter that varies across genes can also vary across datasets, with a hierarchical prior controlling how much dataset-to-dataset variation is allowed.
| Parameter | Default | Description |
|---|---|---|
expression_dataset_prior |
"none" |
Prior on \(\mu\) across datasets: "none", "gaussian", "horseshoe", "neg" |
prob_dataset_prior |
"none" |
Prior on \(p\) across datasets |
prob_dataset_mode |
"gene_specific" |
How \(p\) varies: "scalar", "gene_specific", or "two_level" |
zero_inflation_dataset_prior |
"none" |
Prior on zero-inflation gate across datasets |
overdispersion_dataset_prior |
"none" |
Prior on BNB \(\kappa\) across datasets. Requires overdispersion="bnb" |
capture_scaling_prior |
"none" |
Prior on per-dataset capture scaling \(\eta_d\). For VCP models |
# Two-dataset comparison with horseshoe shrinkage on mu
results = scribe.fit(
adata,
variable_capture=True,
unconstrained=True,
dataset_key="batch",
expression_dataset_prior="horseshoe",
prob_dataset_prior="gaussian",
amortize_capture=True,
)
Single-dataset downgrade
When dataset_key points to a column with only one unique value, SCRIBE
automatically downgrades dataset-level priors to gene-level equivalents
(or drops them) and emits a UserWarning. Disable this with
auto_downgrade_single_dataset_hierarchy=False.
Full guide: Theory: Hierarchical Priors > Multiple datasets
12. Custom prior hyperparameters¶
The priors dictionary lets you override default prior hyperparameters for
any model parameter. Values are tuples of hyperparameters whose meaning
depends on the distribution family.
| Parameter | Default | Description |
|---|---|---|
priors |
None |
Dict mapping parameter names to hyperparameter tuples |
# Override p and r priors
results = scribe.fit(
adata,
model="nbdm",
priors={
"p": (1.0, 1.0), # Beta(1, 1) --- uniform
"r": (0.0, 1.0), # LogNormal(0, 1)
},
)
# Symmetric Dirichlet for mixture weights (scalar is broadcast)
results = scribe.fit(
adata,
variable_capture=True,
n_components=4,
priors={"mixing": 5.0}, # equivalent to (5.0, 5.0, 5.0, 5.0)
)
# Biology-informed capture prior
results = scribe.fit(
adata,
variable_capture=True,
priors={"organism": "human"},
)
13. VAE architecture¶
When inference_method="vae", these parameters configure the encoder-decoder
neural network architecture.
| Parameter | Default | Description |
|---|---|---|
vae_latent_dim |
10 |
Dimensionality of the latent space |
vae_encoder_hidden_dims |
None |
Encoder hidden layer sizes (e.g. [512, 256]) |
vae_decoder_hidden_dims |
None |
Decoder hidden layer sizes |
vae_activation |
None |
Activation function ("relu", "gelu", "silu", ...) |
vae_input_transform |
"log1p" |
Input preprocessing: "log1p", "log", "sqrt", "identity" |
vae_standardize |
False |
Standardize transformed inputs to zero mean, unit variance |
vae_decoder_transforms |
None |
Per-parameter decoder output transforms |
Normalizing flow priors¶
For more expressive latent distributions, attach a normalizing flow:
| Parameter | Default | Description |
|---|---|---|
vae_flow_type |
"none" |
"none", "affine_coupling", "spline_coupling", "maf", or "iaf" |
vae_flow_num_layers |
4 |
Number of flow layers |
vae_flow_hidden_dims |
None |
Hidden dimensions in each flow layer |
# VAE with spline coupling flow
results = scribe.fit(
adata,
model="nbdm",
inference_method="vae",
vae_latent_dim=15,
vae_encoder_hidden_dims=[512, 256],
vae_flow_type="spline_coupling",
vae_flow_num_layers=4,
n_steps=100_000,
batch_size=256,
)
# Cell embeddings
embeddings = results.get_latent_embeddings(data=adata.X, n_samples=100)
Full guide: Inference Methods > VAE
14. Power-user overrides¶
For maximum control, bypass the flat keyword interface and pass fully constructed configuration objects. When provided, these override the corresponding keyword arguments.
| Parameter | Default | Description |
|---|---|---|
model_config |
None |
ModelConfig object. Overrides flat model, variable_capture, zero_inflation, parameterization, unconstrained, n_components, mixture_params, joint_params, dense_params, guide_rank, and priors |
inference_config |
None |
InferenceConfig object. Overrides inference_method, n_steps, batch_size, stable_update, log_progress_lines, n_samples, n_warmup, and n_chains |
from scribe.models.config import ModelConfigBuilder
# Build a model config step by step (horseshoe shrinkage on per-component means)
builder = (
ModelConfigBuilder()
.for_model("nbvcp")
.with_parameterization("mean_odds")
.unconstrained()
.as_mixture(n_components=5)
)
builder._expression_prior = "horseshoe"
model_cfg = builder.build()
results = scribe.fit(adata, model_config=model_cfg)
Return value¶
scribe.fit() returns a results object whose type depends on the inference
method:
| Inference method | Return type | Key capabilities |
|---|---|---|
"svi" |
ScribeSVIResults |
Posterior samples, loss history, PPC, denoising |
"mcmc" |
ScribeMCMCResults |
Chain samples, NUTS diagnostics, PPC, denoising |
"vae" |
ScribeVAEResults |
Latent embeddings, posterior samples, PPC, denoising |
All result types share a common analysis API: posterior sampling, posterior predictive checks, Bayesian denoising, and log-likelihood computation.
Full guide: Results Class
Common recipes¶
Typical single-dataset analysis¶
# NBVCP with amortized capture and early stopping
results = scribe.fit(
adata,
variable_capture=True,
parameterization="mean_odds",
n_steps=100_000,
batch_size=512,
amortize_capture=True,
early_stopping={"patience": 500, "restore_best": True},
)
Multi-dataset with hierarchical priors¶
# Share strength across batches
results = scribe.fit(
adata,
variable_capture=True,
unconstrained=True,
dataset_key="batch",
expression_dataset_prior="horseshoe",
prob_dataset_prior="gaussian",
amortize_capture=True,
n_steps=200_000,
)
Mixture model with annotations¶
# 8 cell types, guided by partial annotations
results = scribe.fit(
adata,
variable_capture=True,
n_components=8,
annotation_key="cell_type",
annotation_confidence=3.0,
annotation_min_cells=50,
amortize_capture=True,
n_steps=150_000,
)
SVI-to-MCMC warm start¶
# Fast exploration with SVI
svi_results = scribe.fit(
adata, model="nbdm", parameterization="mean_prob", n_steps=50_000,
)
# Gold-standard posteriors with MCMC, initialized from SVI
mcmc_results = scribe.fit(
adata,
model="nbdm",
parameterization="mean_odds",
inference_method="mcmc",
svi_init=svi_results,
n_samples=4_000,
n_warmup=500,
n_chains=4,
)
Full hierarchical model with anchoring and BNB¶
# Everything turned on: VCP, anchoring, horseshoe, BNB
results = scribe.fit(
adata,
variable_capture=True,
unconstrained=True,
expression_anchor=True,
expression_anchor_sigma=0.3,
priors={"organism": "human"},
overdispersion="bnb",
prob_prior="gaussian",
amortize_capture=True,
n_steps=300_000,
batch_size=512,
)
VAE with normalizing flows¶
# Latent representation + spline flow
results = scribe.fit(
adata,
model="nbdm",
inference_method="vae",
vae_latent_dim=15,
vae_encoder_hidden_dims=[512, 256],
vae_flow_type="spline_coupling",
n_steps=100_000,
batch_size=256,
)
# Retrieve embeddings for downstream analysis
z = results.get_latent_embeddings(data=adata.X, n_samples=100)
Quick reference¶
All scribe.fit() parameters at a glance, grouped by function:
Complete parameter table (click to expand)
Data input
| Parameter | Default | Type |
|---|---|---|
counts |
(required) | ndarray or AnnData |
cells_axis |
0 |
int |
layer |
None |
str |
seed |
42 |
int |
Model
| Parameter | Default | Type |
|---|---|---|
variable_capture |
None |
bool |
zero_inflation |
None |
bool |
model |
"nbvcp" |
str |
parameterization |
"canonical" |
str |
unconstrained |
False |
bool |
Hierarchical priors (gene-level)
| Parameter | Default | Type |
|---|---|---|
expression_prior |
"none" |
str |
prob_prior |
"none" |
str |
zero_inflation_prior |
"none" |
str |
Prior hyperparameters
| Parameter | Default | Type |
|---|---|---|
horseshoe_tau0 |
1.0 |
float |
horseshoe_slab_df |
4 |
int |
horseshoe_slab_scale |
2.0 |
float |
neg_u |
1.0 |
float |
neg_a |
1.0 |
float |
neg_tau |
1.0 |
float |
Mean anchoring
| Parameter | Default | Type |
|---|---|---|
expression_anchor |
False |
bool |
expression_anchor_sigma |
0.3 |
float |
Overdispersion
| Parameter | Default | Type |
|---|---|---|
overdispersion |
"none" |
str |
overdispersion_prior |
"horseshoe" |
str |
Mixture
| Parameter | Default | Type |
|---|---|---|
n_components |
None |
int |
mixture_params |
"all" |
str or list[str] |
Annotation priors
| Parameter | Default | Type |
|---|---|---|
annotation_key |
None |
str or list[str] |
annotation_confidence |
3.0 |
float |
annotation_component_order |
None |
list[str] |
annotation_min_cells |
None |
int |
Multi-dataset
| Parameter | Default | Type |
|---|---|---|
dataset_key |
None |
str |
n_datasets |
None |
int |
dataset_params |
None |
list[str] |
dataset_mixing |
None |
bool |
expression_dataset_prior |
"none" |
str |
prob_dataset_prior |
"none" |
str |
prob_dataset_mode |
"gene_specific" |
str |
zero_inflation_dataset_prior |
"none" |
str |
overdispersion_dataset_prior |
"none" |
str |
capture_scaling_prior |
"none" |
str |
auto_downgrade_single_dataset_hierarchy |
True |
bool |
Guide (Gaussian)
| Parameter | Default | Type |
|---|---|---|
guide_rank |
None |
int |
joint_params |
None |
str or list[str] |
dense_params |
None |
str or list[str] |
priors |
None |
dict |
Guide (Normalizing Flow)
| Parameter | Default | Type |
|---|---|---|
guide_flow |
None |
str |
guide_flow_num_layers |
4 |
int |
guide_flow_hidden_dims |
[64, 64] |
list[int] |
guide_flow_activation |
"relu" |
str |
guide_flow_n_bins |
8 |
int |
guide_flow_mixture_strategy |
"independent" |
str |
guide_flow_zero_init |
True |
bool |
guide_flow_layer_norm |
True |
bool |
guide_flow_residual |
True |
bool |
guide_flow_soft_clamp |
True |
bool |
guide_flow_loft |
True |
bool |
guide_flow_log_det_f64 |
False |
bool |
Capture amortization
| Parameter | Default | Type |
|---|---|---|
amortize_capture |
False |
bool |
capture_hidden_dims |
[64, 32] |
list[int] |
capture_activation |
"leaky_relu" |
str |
capture_output_transform |
"softplus" |
str |
capture_clamp_min |
0.1 |
float |
capture_clamp_max |
50.0 |
float |
capture_amortization |
None |
AmortizationConfig |
Inference
| Parameter | Default | Type |
|---|---|---|
inference_method |
"svi" |
str |
n_steps |
50_000 |
int |
batch_size |
None |
int |
optimizer_config |
None |
dict |
stable_update |
True |
bool |
log_progress_lines |
False |
bool |
early_stopping |
None |
dict or EarlyStoppingConfig |
restore_best |
False |
bool |
n_samples |
2_000 |
int |
n_warmup |
1_000 |
int |
n_chains |
1 |
int |
svi_init |
None |
ScribeSVIResults |
enable_x64 |
None |
bool |
VAE
| Parameter | Default | Type |
|---|---|---|
vae_latent_dim |
10 |
int |
vae_encoder_hidden_dims |
None |
list[int] |
vae_decoder_hidden_dims |
None |
list[int] |
vae_activation |
None |
str |
vae_input_transform |
"log1p" |
str |
vae_standardize |
False |
bool |
vae_decoder_transforms |
None |
dict |
vae_flow_type |
"none" |
str |
vae_flow_num_layers |
4 |
int |
vae_flow_hidden_dims |
None |
list[int] |
Power-user overrides
| Parameter | Default | Type |
|---|---|---|
model_config |
None |
ModelConfig |
inference_config |
None |
InferenceConfig |