Mixture Models

This document explains SCRIBE’s mixture model extensions for handling heterogeneous cell populations. Each base model (NBDM, ZINB, NBVCP, ZINBVCP) can be extended to a mixture model by introducing multiple components and component-specific parameters.

General Structure

All mixture models in SCRIBE share a common hierarchical structure:

Global Parameters (shared across all components):

Base success probability \(p \sim \text{Beta}(\alpha_p, \beta_p)\)

Mixing weights \(\pi \sim \text{Dirichlet}(\alpha_{\text{mixing}})\)

Component-Specific Parameters:

Gene dispersion parameters \(r_{k,g} \sim \text{Gamma}(\alpha_r, \beta_r)\)

One per gene \(g\) per component \(k\)

Additional parameters depending on base model

Cell-Specific Parameters (when applicable):

Capture probabilities \(\nu^{(c)}\) (for NBVCP and ZINBVCP variants)

Independent of components

Gene-Specific Parameters (when applicable):

Dropout probabilities (for ZINB and ZINBVCP variants)

Component-specific versions in mixture setting

Parameter Dependencies by Model

NBDM Mixture

Component-dependent:
- Gene dispersion parameters \(r_{k,g}\)
Component-independent:
- Base success probability \(p\)
No cell-specific parameters

ZINB Mixture

Component-dependent:
- Gene dispersion parameters \(r_{k,g}\)
- Dropout probabilities \(\pi_{k,g}\)
Component-independent:
- Base success probability \(p\)
No cell-specific parameters

NBVCP Mixture

Component-dependent:
- Gene dis0ersion parameters \(r_{k,g}\)
Component-independent:
- Base success probability \(p\)
- Cell capture probabilities \(\nu^{(c)}\)
Cell-specific:
- Capture probabilities

ZINBVCP Mixture

Component-dependent:
- Gene dispersion parameters \(r_{k,g}\)
- Dropout probabilities \(\pi_{k,g}\)
Component-independent:
- Base success probability \(p\)
- Cell capture probabilities \(\nu^{(c)}\)
Cell-specific:
- Capture probabilities

Learning Process

For all mixture models:

Component Assignment Phase:
- Each cell’s data influences the posterior over component assignments
- Mixing weights are learned globally
- Component-specific parameters adapt to their assigned cells
Parameter Updates:
- Global parameters: Updated using data from all cells
- Component parameters: Updated primarily using data from cells assigned to that component
- Cell-specific parameters: Updated using that cell’s data across all components

Usage Guidelines

When to use mixture models:

Clear biological heterogeneity (multiple cell types)
Multimodal expression patterns
Complex technical variation that varies by cell type

Model selection considerations:

NBDM Mixture: Baseline mixture model, good for initial exploration
ZINB Mixture: When dropout patterns vary by cell type
NBVCP Mixture: When capture efficiency varies significantly
ZINBVCP Mixture: Most complex, but handles both dropout and capture variation

Implementation Details

All mixture models use:

Shared parameters across cells within each component
Soft assignments of cells to components
Variational inference for parameter estimation
Mini-batch processing for scalability

Inference and Results

The mixture model variants return specialized results objects that provide:

Component-specific parameter estimates
Cell assignment probabilities
Model-specific normalizations
Uncertainty quantification for all parameters

Key Differences from Base Models

Parameter Interpretation:
- Parameters now represent component-specific patterns
- Cell assignments provide clustering information
- Mixing weights quantify population proportions
Computational Considerations:
- Higher computational cost
- More parameters to estimate
- Requires more data for reliable inference
Biological Interpretation:
- Captures subpopulation structure
- Allows different technical characteristics by component
- Provides natural clustering framework

References

Base model documentation:

SCRIBE Models for Single-Cell RNA Sequencing