Quick Overview

SCRIBE (Single-Cell RNA-Seq Inference using Bayesian Estimation) takes a fundamentally different approach to single-cell RNA sequencing analysis. Instead of treating cells as isolated data points that need to be normalized, batch-corrected, and processed, SCRIBE views each cell as a sample from a statistical model that describes the entire dataset.

The Power of Probabilistic Modeling

At its core, SCRIBE embraces the inherent uncertainty in scRNA-seq data through Bayesian modeling. Consider a typical dataset with 10,000 cells and 20,000 genes —that’s 200 million data points! Rather than making point estimates about expression levels, SCRIBE learns probability distributions that capture:

  • How variable each gene’s expression truly is

  • Which zeros represent technical dropouts versus biological absence

  • How capture efficiency varies between cells

  • Whether cells belong to distinct subpopulations

Example posterior distribution

Rather than single values, SCRIBE learns complete probability distributions for model parameters. Here, the posterior distribution for the success probability parameter shows the range of plausible values given the data.

Why Variational Inference?

With such high-dimensional data, traditional Bayesian methods such as Markov Chain Monte Carlo (MCMC) would take prohibitively long to converge. SCRIBE uses variational inference—a method that transforms Bayesian inference into an optimization problem—to learn approximate posterior distributions efficiently. This allows SCRIBE to:

  • Scale to datasets with millions of cells

  • Accelerate inference with GPUs

  • Provide results in minutes to hours rather than days

  • Maintain uncertainty quantification despite approximations

What Can You Do with SCRIBE?

Once SCRIBE learns your model, you can:

  • Generate normalized expression values with principled uncertainty estimates

  • Identify technical artifacts and batch effects probabilistically

  • Find cell subpopulations without arbitrary clustering

  • Make predictions about new cells

  • Compare different models to understand your data’s structure

Posterior predictive checks

SCRIBE can generate synthetic data (blue bands) that matches the statistical properties of your real data (black line), allowing you to validate model fit and make predictions.

The Bayesian Advantage

The Bayesian framework provides several key benefits for single-cell analysis:

  • Uncertainty Quantification: Every estimate comes with credible regions (what most scientist confuse with confidence intervals from the frequentist world)

  • Model Comparison: Rigorous ways to choose between competing models

  • Missing Data: Principled handling of dropouts and technical zeros

  • Integration: Natural ways to combine data from multiple experiments (to be developed)

  • Predictions: Generate synthetic data with realistic properties

SCRIBE makes these powerful Bayesian methods accessible through a simple Python interface, while maintaining the mathematical rigor necessary for proper statistical inference. Whether you’re interested in basic normalization or complex mixture modeling, SCRIBE provides a principled foundation for your single-cell analysis.