Mixture Models
This document explains SCRIBE’s mixture model extensions for handling heterogeneous cell populations. Each base model (NBDM, ZINB, NBVCP, ZINBVCP) can be extended to a mixture model by introducing multiple components and component-specific parameters.
General Structure
All mixture models in SCRIBE share a common hierarchical structure:
Global Parameters (shared across all components):
Base success probability \(p \sim \text{Beta}(\alpha_p, \beta_p)\)
Mixing weights \(\pi \sim \text{Dirichlet}(\alpha_{\text{mixing}})\)
Component-Specific Parameters:
Gene dispersion parameters \(r_{k,g} \sim \text{Gamma}(\alpha_r, \beta_r)\)
One per gene \(g\) per component \(k\)
Additional parameters depending on base model
Cell-Specific Parameters (when applicable):
Gene-Specific Parameters (when applicable):
Parameter Dependencies by Model
NBDM Mixture
Component-dependent:
Gene dispersion parameters \(r_{k,g}\)
Component-independent:
Base success probability \(p\)
No cell-specific parameters
ZINB Mixture
Component-dependent:
Gene dispersion parameters \(r_{k,g}\)
Dropout probabilities \(\pi_{k,g}\)
Component-independent:
Base success probability \(p\)
No cell-specific parameters
NBVCP Mixture
Component-dependent:
Gene dis0ersion parameters \(r_{k,g}\)
Component-independent:
Base success probability \(p\)
Cell capture probabilities \(\nu^{(c)}\)
Cell-specific:
Capture probabilities
ZINBVCP Mixture
Component-dependent:
Gene dispersion parameters \(r_{k,g}\)
Dropout probabilities \(\pi_{k,g}\)
Component-independent:
Base success probability \(p\)
Cell capture probabilities \(\nu^{(c)}\)
Cell-specific:
Capture probabilities
Learning Process
For all mixture models:
Component Assignment Phase:
Each cell’s data influences the posterior over component assignments
Mixing weights are learned globally
Component-specific parameters adapt to their assigned cells
Parameter Updates:
Global parameters: Updated using data from all cells
Component parameters: Updated primarily using data from cells assigned to that component
Cell-specific parameters: Updated using that cell’s data across all components
Usage Guidelines
When to use mixture models:
Clear biological heterogeneity (multiple cell types)
Multimodal expression patterns
Complex technical variation that varies by cell type
Model selection considerations:
NBDM Mixture: Baseline mixture model, good for initial exploration
ZINB Mixture: When dropout patterns vary by cell type
NBVCP Mixture: When capture efficiency varies significantly
ZINBVCP Mixture: Most complex, but handles both dropout and capture variation
Implementation Details
All mixture models use:
Shared parameters across cells within each component
Soft assignments of cells to components
Variational inference for parameter estimation
Mini-batch processing for scalability
Inference and Results
The mixture model variants return specialized results objects that provide:
Component-specific parameter estimates
Cell assignment probabilities
Model-specific normalizations
Uncertainty quantification for all parameters
Key Differences from Base Models
Parameter Interpretation:
Parameters now represent component-specific patterns
Cell assignments provide clustering information
Mixing weights quantify population proportions
Computational Considerations:
Higher computational cost
More parameters to estimate
Requires more data for reliable inference
Biological Interpretation:
Captures subpopulation structure
Allows different technical characteristics by component
Provides natural clustering framework
References
Base model documentation: