Skip to content

scribe-infer CLI Guide

scribe-infer is the unified command-line interface for running SCRIBE inference. It wraps the full scribe.fit() pipeline behind Hydra-managed YAML configs, so you can launch reproducible runs --- locally or on a SLURM cluster --- without writing Python scripts.

The CLI automatically detects whether the selected dataset should run as:

  • Standard single-run inference (split_by not set in data config), or
  • Covariate-split orchestration (split_by set), which launches one independent fit per unique value of the splitting variable.

Installation

The CLI requires optional Hydra dependencies:

pip install 'scribe[hydra]'

Or, if using uv:

uv pip install 'scribe[hydra]'

Quick start

# 1. Scaffold starter config files
scribe-infer --initialize ./conf

# 2. Copy the example data config and edit it
cp conf/data/example.yaml conf/data/my_dataset.yaml
# edit conf/data/my_dataset.yaml: set name and path

# 3. Run inference
scribe-infer --config-path ./conf data=my_dataset

Initialize starter configs

Use --initialize to scaffold a documented conf/ tree:

# Interactive path selection (or defaults to ./conf in non-interactive mode)
scribe-infer --initialize

# Explicit target
scribe-infer --initialize ./conf
scribe-infer --initialize /path/to/conf

The generated tree includes:

conf/
├── config.yaml                          # Global config (model, priors, guide, output layout)
├── data/
│   └── example.yaml                     # Template dataset config
├── inference/
│   ├── svi.yaml                         # SVI defaults (optimizer, early stopping, etc.)
│   ├── mcmc.yaml                        # MCMC defaults (samples, warmup, chains)
│   └── vae.yaml                         # VAE defaults (inherits SVI + architecture)
├── viz/
│   └── default.yaml                     # Visualization defaults for scribe-visualize
├── amortization/
│   └── capture.yaml                     # Capture amortization preset
├── dirname_aliases/
│   └── default.yaml                     # Path aliasing for compact output directories
├── paths/
│   ├── paths.yaml                       # Default output directory
│   └── paths.local.yaml.example         # Machine-local override example
├── slurm/
│   └── default.yaml                     # Reusable SLURM profile
└── README.md

If managed files already exist, the CLI prompts before overwriting each file.


Usage

scribe-infer --config-path ./conf data=<dataset_key> [hydra_overrides...]

Everything after the scribe-infer flags is forwarded as Hydra overrides, so any config field can be changed from the command line.

Common examples

# Standard single-run inference with default model (NBVCP)
scribe-infer --config-path ./conf data=singer

# Override model to ZINB
scribe-infer --config-path ./conf data=singer model=zinb

# Enable variable capture (→ NBVCP) with amortized capture
scribe-infer --config-path ./conf data=singer variable_capture=true \
    amortization.capture.enabled=true

# Use MCMC instead of SVI
scribe-infer --config-path ./conf data=singer inference=mcmc

# Override multiple settings
scribe-infer --config-path ./conf data=singer \
    variable_capture=true \
    parameterization=mean_odds \
    unconstrained=true \
    prob_prior=neg \
    guide_rank=8 \
    inference.n_steps=100000

Configuration reference

Global config (config.yaml)

The global config controls model behavior, priors, guide settings, and output layout. Key sections:

Section Key fields Description
Model flags zero_inflation, variable_capture Toggle model components. Maps to nbdm / zinb / nbvcp / zinbvcp
Overdispersion overdispersion, overdispersion_prior "none" or "bnb" with horseshoe/NEG prior
Parameterization parameterization, unconstrained canonical, linked (mean_prob), or mean_odds
Gene-level priors expression_prior, prob_prior, zero_inflation_prior "none", "gaussian", "horseshoe", "neg"
Multi-dataset dataset_key, n_datasets, expression_dataset_prior, ... Joint multi-dataset fitting
Guide guide_rank, joint_params, dense_params Low-rank / joint low-rank guide. joint_params/dense_params accept shorthands ("all", "biological", "mean", "prob", "gate") or explicit lists
Flow guide guide_flow, guide_flow_num_layers, ... Normalizing flow guide (mutually exclusive with guide_rank)
Mixture n_components, mixture_params Mixture model components. mixture_params defaults to "all" and accepts shorthands or explicit lists
Priors priors.organism, priors.eta_capture, ... Biology-informed and base distribution priors
Anchoring expression_anchor, expression_anchor_sigma Mean anchoring prior
Amortization amortization.capture.* Amortized capture inference
Annotations annotation_key, annotation_confidence Annotation-informed mixture priors

For the meaning of each parameter, see the Parameter Reference and the scribe.fit() Interface.

Dataset config (data/*.yaml)

Each dataset gets its own YAML file under conf/data/. Required fields:

Field Description
name Short identifier used in output paths and job names
path Path to count matrix (.h5ad or .csv)

Optional fields:

Field Description
layer AnnData layer name when counts are not in adata.X
split_by Column in adata.obs for automatic split orchestration
filter_obs Pre-filter observations before fitting (dict of column → allowed values)
preprocessing Scanpy-like pipeline (filter_cells, filter_genes, normalize_total, log1p, highly_variable_genes)

SCRIBE fits on counts

Even when preprocessing includes HVG selection or normalization, SCRIBE always uses the raw count matrix for model fitting. Preprocessing is applied for gene selection only.

Inference configs (inference/*.yaml)

Three presets are provided:

Config Method Key defaults
svi.yaml SVI n_steps=50000, batch_size=null, early_stopping.enabled=false
mcmc.yaml MCMC n_samples=2000, n_warmup=1000, n_chains=1, enable_x64=true
vae.yaml VAE Inherits SVI settings + vae_latent_dim=10, vae_flow_type=coupling_spline

Select at runtime:

scribe-infer --config-path ./conf data=singer inference=mcmc

Dispatch behavior

scribe-infer inspects each selected data=<key> config under <config-path>/data/:

  • If at least one data config defines split_by, split mode is used. The CLI launches one independent inference run per unique value (or value combination when split_by is a list).
  • Otherwise, direct inference mode is used.

All remaining CLI tokens are forwarded as Hydra overrides unchanged.


SLURM integration

Interactive launch

Use --slurm to route execution through Hydra's submitit_slurm launcher with interactive prompts for cluster resources:

scribe-infer --slurm --config-path ./conf data=singer

The CLI prompts for partition (required), account (optional), CPUs, memory, and timeout. The same command auto-dispatches to direct vs. split mode.

Reusable SLURM profiles

Keep reusable cluster settings in conf/slurm/*.yaml:

scribe-infer --slurm-profile default --config-path ./conf data=singer

Resolution rules:

  • Named profile default resolves to ./conf/slurm/default.yaml
  • You may also pass an explicit path to --slurm-profile
  • --slurm-set key=value entries override profile values
  • Missing core fields fall back to interactive prompts (or defaults where safe)
  • --slurm-profile and --slurm-set automatically enable SLURM mode

Per-run overrides

scribe-infer --slurm-profile default \
    --slurm-set partition=gpu \
    --slurm-set timeout=0-08:00 \
    --slurm-set mem_gb=128 \
    --config-path ./conf data=singer

Default SLURM profile fields

Field Default Description
partition null (required) Cluster partition name
account null Optional account/project string
cpus_per_task 4 CPU cores per task
mem_gb 64 Memory in GB
timeout_min 240 Wall-time limit in minutes
array_parallelism 1 Max concurrent array jobs (split mode)
job_name scribe_infer SLURM job name
submitit_folder slurm_logs/submitit/%j Log directory
gres null Generic resources (e.g., gpu:1)
launcher_overrides {} Escape hatch for cluster-specific submitit keys

Config root structure

The CLI expects this minimum structure:

conf/
├── config.yaml
├── data/
│   └── <dataset_key>.yaml
└── inference/
    ├── svi.yaml
    ├── mcmc.yaml
    └── vae.yaml

Override config root and top-level config name:

scribe-infer --config-path /path/to/conf --config-name config data=my_dataset

CLI flags reference

Flag Default Description
--config-path ./conf Hydra config root directory
--config-name config Top-level Hydra config filename (without .yaml)
--initialize [PATH] --- Scaffold starter configs. Cannot be combined with --slurm
--slurm false Launch via submitit with interactive resource prompts
--slurm-profile PROFILE null Load a reusable SLURM profile (auto-enables SLURM mode)
--slurm-set KEY=VALUE --- Per-run SLURM override (repeatable, auto-enables SLURM mode)

Workflow recipes

Fit NBVCP with biology-informed capture on a cluster

scribe-infer --slurm-profile default \
    --slurm-set partition=gpu \
    --slurm-set gres=gpu:1 \
    --config-path ./conf \
    data=my_dataset \
    variable_capture=true \
    amortization.capture.enabled=true \
    priors.organism=human \
    inference.n_steps=100000

Covariate-split run (one fit per condition)

Create conf/data/experiment.yaml:

# @package data
name: "experiment"
path: "data/experiment.h5ad"
split_by: "condition"

Then:

scribe-infer --config-path ./conf data=experiment variable_capture=true

The CLI detects split_by and launches one independent fit per unique value of the condition column.

MCMC warm-started from SVI

# Step 1: SVI
scribe-infer --config-path ./conf data=singer inference=svi

# Step 2: MCMC initialized from SVI results
scribe-infer --config-path ./conf data=singer inference=mcmc \
    svi_init=/path/to/svi/results.pkl

For the full Python API, see the scribe.fit() Interface. For model and parameter details, see Model Selection and the Parameter Reference.