Energy-based evolutionary dynamics

Author
Affiliation

Manuel Razo-Mejia

Department of Biology, Stanford University

Keywords

evolutionary dynamics, fitness landscape, genotype-phenotype density, Metropolis-Hastings

(c) This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

# Import project package
import Antibiotic

# Import basic math libraries
import StatsBase
import LinearAlgebra
import Random
import Distributions

# Load CairoMakie for plotting
using CairoMakie
import ColorSchemes

# Activate backend
CairoMakie.activate!()

# Set custom plotting style
Antibiotic.viz.theme_makie!()

Evolutionary dynamics on a fitness and genotype-phenotype density landscape

In this notebook we will explore a simple model of evolution on a fitness landscape with an explicit genotype-phenotype density function.

The setting is the following:

Let \(\underline{x}(t) = (x_1(t), x_2(t), \cdots, x_N(t))\) be an \(N\) dimensional vector of phenotypes at time \(t\). The fitness for a given environment \(E\) is a scalar function \(F_E(\underline{x})\) such that \[ F_E: \mathbb{R}^N \to \mathbb{R}, \tag{1}\]

i.e, the coordinates in phenotype space map to a fitness value. As a particular fitness function we will consider a series of Gaussian fitness peaks,

\[ F_E(\underline{x}) = \sum_{p=1}^P A_p \exp\left( -\frac{1}{2} (\underline{x} - \underline{\hat{x}}^{(p)})^T \Sigma_p^{-1} (\underline{x} - \underline{\hat{x}}^{(p)}) \right), \tag{2}\]

where \(P\) is the number of peaks, \(A_p\) is the amplitude of the \(p\)-th peak, \(\underline{\hat{x}}^{(p)}\) is the \(N\)-dimensional vector of optimal phenotypes for the \(p\)-th peak, and \(\Sigma_p\) is the \(N \times N\) covariance matrix for the \(p\)-th peak. This formulation allows for multiple peaks in the fitness landscape, each potentially having different heights, widths, and covariance structures between dimensions. The covariance matrix \(\Sigma_p\) captures potential interactions between different phenotypic dimensions, allowing for more complex and realistic fitness landscapes.

In the same vein, we define a genotype-phenotype density map \(GP(\underline{x})\) as a scalar function that maps a phenotype to a number related to the probability of sampling a given phenotype, i.e.,

\[ GP: \mathbb{R}^N \to \mathbb{R}, \tag{3}\]

where \(GP(\underline{x})\) is related to the probability of a genotype mapping to a phenotype \(\underline{x}\). As a particular genotype-phenotype density function we will consider a set of negative Gaussian peaks that will serve as “mutational barriers” that limit the range of phenotypes that can be reached. The deeper the peak, the more difficult it is to reach the phenotype associated with that peak. This idea is captured by the following genotype-phenotype density function:

\[ GP(\underline{x}) = \sum_{p=1}^P -B_p \exp\left( -\frac{1}{2}(\underline{x} - \underline{\hat{x}}^{(p)})^T \Sigma_p^{-1} (\underline{x} - \underline{\hat{x}}^{(p)}) \right), \tag{4}\]

where \(B_p\) is the depth of the \(p\)-th peak, and \(\underline{\hat{x}}^{(p)}\) is the \(N\)-dimensional vector of optimal phenotypes for the \(p\)-th peak.

Both, the fitness landscape and the genotype-phenotype density function are built on Gaussian functions. Let’s therefore define a struct to hold the parameters of a single peak.

@doc raw"""
    GaussianPeak

A struct to hold the parameters of a single Gaussian peak.

# Fields
- `amplitude::AbstractFloat`: The amplitude of the peak.
- `mean::AbstractVector`: The mean of the peak.
- `covariance::AbstractMatrix`: The covariance matrix of the peak.
"""
struct GaussianPeak
    amplitude::AbstractFloat
    mean::AbstractVector
    covariance::AbstractMatrix
end

Next, let’s define a function to evaluate the fitness landscape at a given phenotype.

@doc raw"""
    fitness(peak::GaussianPeak, x::AbstractVecOrMat; min_value::AbstractFloat=0.0)
    fitness(peaks::Vector{GaussianPeak}, x::AbstractVecOrMat; min_value::AbstractFloat=0.0)

Calculate the fitness value for a given phenotype `x` based on Gaussian peak(s).

# Arguments
- `peak::GaussianPeak`: A single Gaussian peak.
- `peaks::Vector{GaussianPeak}`: A vector of Gaussian peaks.
- `x::AbstractVecOrMat`: The phenotype(s) for which to calculate the fitness.
  Can be a vector for a single phenotype or a matrix for multiple phenotypes,
  where each column corresponds to a phenotype.
- `min_value::AbstractFloat=1.0`: The minimum fitness value to be added to the
  Gaussian contribution.

# Returns
The calculated fitness value(s).

# Description
The first method computes the fitness for a single Gaussian peak, while the
second method computes the sum of fitness values for multiple Gaussian peaks. In
both cases, the `min_value` is added to the Gaussian contribution to ensure a
minimum fitness level.
"""
function fitness(
    peak::GaussianPeak, x::AbstractVecOrMat; min_value::AbstractFloat=1.0
)
    # Calculate the Gaussian peak
    gaussian = peak.amplitude * exp(
        -0.5 *
        (x - peak.mean)' *
        LinearAlgebra.inv(peak.covariance) *
        (x - peak.mean)
    )
    # Return the fitness by shifting the Gaussian peak by the minimum value
    return gaussian + min_value
end

function fitness(
    peaks::Vector{GaussianPeak},
    x::AbstractVecOrMat;
    min_value::AbstractFloat=1.0
)
    # Sum the Gaussian contributions from all peaks
    total_gaussian = sum(
        peak.amplitude *
        exp(
            -0.5 *
            (x - peak.mean)' *
            LinearAlgebra.inv(peak.covariance) *
            (x - peak.mean)
        )
        for peak in peaks
    )
    # Return the fitness by shifting the Gaussian peak by the minimum value
    return total_gaussian + min_value
end

Let’s test the fitness function with a single Gaussian peak.

# Define peak parameters
amplitude = 1.0
mean = [0.0, 0.0]
covariance = [0.5 0.0; 0.0 0.5]

# Create peak
fit_peak = GaussianPeak(amplitude, mean, covariance)

# Define range of phenotypes to evaluate
x = range(-2, 2, length=100)
y = range(-2, 2, length=100)

# Create meshgrid
F = fitness.(Ref(fit_peak), [[x, y] for x in x, y in y])

# Initialize figure
fig = Figure(size=(700, 300))

# Add axis for 3D plot
ax3 = Axis3(
    fig[1, 1],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    zlabel="fitness",
    yticklabelsvisible=false,
    xticklabelsvisible=false,
    zticklabelsvisible=false,
)

# Add axis for contour plot
ax2 = Axis(
    fig[1, 2],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    aspect=AxisAspect(1),
    yticklabelsvisible=false,
    xticklabelsvisible=false,
)

# Plot fitness landscape
surface!(ax3, x, y, F, color=F, colormap=:algae)

# Plot a heatmap of the fitness landscape
hm = heatmap!(ax2, x, y, F, colormap=:algae)

# Add a colorbar
Colorbar(
    fig[1, 3],
    hm,
    label="fitness",
    labelsize=13,
    height=Relative(1),
    ticksvisible=false,
    ticklabelsvisible=false
)
# Plot contour plot
contour!(ax2, x, y, F, color=:white)

fig

Next, let’s define a function to evaluate the genotype-phenotype density function at a given phenotype.

@doc raw"""
    genotype_phenotype_density(
        peak::Union{GaussianPeak, Vector{GaussianPeak}}, 
        x::AbstractVecOrMat;
        max_value::Float64 = 1.0
    )

Calculate the genotype-phenotype density value for a given phenotype `x` based
on Gaussian peak(s).

# Arguments
- `peak::Union{GaussianPeak, Vector{GaussianPeak}}`: A single Gaussian peak or a
  vector of Gaussian peaks.
- `x::AbstractVecOrMat`: The phenotype(s) for which to calculate the genotype-
  phenotype density. Can be a vector for a single phenotype or a matrix for
  multiple phenotypes, where each column corresponds to a phenotype.
- `max_value::Float64`: Optional. The maximum value of the genotype-phenotype
  density. Default is 1.0.
  1.0.

# Returns
The calculated genotype-phenotype density value(s).
"""
function genotype_phenotype_density(
    peak::GaussianPeak, x::AbstractVecOrMat; max_value::AbstractFloat=1.0
)
    # Calculate the Gaussian peak
    gaussian = peak.amplitude * exp(
        -0.5 *
        (x - peak.mean)' *
        LinearAlgebra.inv(peak.covariance) *
        (x - peak.mean)
    )
    # Return the genotype-phenotype density by inverting the Gaussian peak
    # shifted by the maximum value
    return -(gaussian - max_value)
end

function genotype_phenotype_density(
    peaks::Vector{GaussianPeak},
    x::AbstractVecOrMat;
    max_value::AbstractFloat=1.0
)
    # Sum the Gaussian contributions from all peaks
    total_gaussian = sum(
        peak.amplitude *
        exp(
            -0.5 *
            (x - peak.mean)' *
            LinearAlgebra.inv(peak.covariance) *
            (x - peak.mean)
        )
        for peak in peaks
    )
    # Return the genotype-phenotype density by inverting the Gaussian peak
    # shifted by the maximum value
    return -(total_gaussian - max_value)
end

Let’s test the genotype-phenotype density function with four Gaussian peaks in the corners of a square in phenotype space.

# Define peak parameters
amplitude = 0.5
means = [
    [-1.5, -1.5],
    [1.5, -1.5],
    [1.5, 1.5],
    [-1.5, 1.5],
]
covariance = [0.45 0.0; 0.0 0.45]

# Create peak
mut_peaks = GaussianPeak.(Ref(amplitude), means, Ref(covariance))

# Define range of phenotypes to evaluate
x = range(-3, 3, length=100)
y = range(-3, 3, length=100)

# Create meshgrid
M = genotype_phenotype_density.(Ref(mut_peaks), [[x, y] for x in x, y in y])

# Initialize figure
fig = Figure(size=(700, 300))

# Add axis for 3D plot
ax3 = Axis3(
    fig[1, 1],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    zlabel="∝ probability of \nphenotype",
    yticklabelsvisible=false,
    xticklabelsvisible=false,
    zticklabelsvisible=false,
)

# Add axis for contour plot
ax2 = Axis(
    fig[1, 2],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    aspect=AxisAspect(1),
    yticklabelsvisible=false,
    xticklabelsvisible=false,
)

# Plot genotype-phenotype density
surface!(ax3, x, y, M, color=M, colormap=Reverse(ColorSchemes.Purples_9))

# Plot a heatmap of the genotype-phenotype density
hm = heatmap!(ax2, x, y, M, colormap=Reverse(ColorSchemes.Purples_9))

# Add a colorbar
Colorbar(
    fig[1, 3],
    hm,
    label="∝ probability of phenotype",
    labelsize=13,
    height=Relative(1),
    ticksvisible=false,
    ticklabelsvisible=false
)

# Plot contour plot
contour!(ax2, x, y, M, color=:white)

fig

Evolutionary dynamics

Metropolis-Hastings Algorithm in Evolutionary Dynamics

The Metropolis-Hastings algorithm is a Markov Chain Monte Carlo (MCMC) method used to sample from a target probability distribution \(\pi(\underline{x})\). In our evolutionary context, \(\pi(\underline{x})\) can be thought of as the steady-state distribution of phenotypes under the combined influence of selection and mutation. We can define this distribution as:

\[ \pi(\underline{x}) \propto \exp\left( -\beta U(\underline{x}) \right), \tag{5}\]

here, \(\beta\) is an inverse temperature parameter (from statistical mechanics) that controls the level of stochasticity in the system. A higher \(\beta\) means the system is more likely to move towards higher fitness and mutational accessibility.

Proposal Distribution

At each step, we propose a new phenotype \(\underline{x}{\prime}\) from a proposal distribution \(q(\underline{x}{\prime} | \underline{x})\). For simplicity, we can use a symmetric proposal distribution, such as a Gaussian centered at the current phenotype. Furthere, we constrain the proposal distribution to be symmetric, i.e., \[ q(x | x') = q(x' | x). \tag{6}\]

This symmetry simplifies the acceptance probability later on.

Acceptance Probability

The acceptance probability \(P_{\text{accept}}\) for moving from \(\underline{x}\) to \(\underline{x}{\prime}\) is given by:

\[ P_{\text{accept}} = \min\left(1, \frac{\pi(\underline{x}{\prime}) q(\underline{x} | \underline{x}{\prime})} {\pi(\underline{x}) q(\underline{x}{\prime} | \underline{x})} \right). \tag{7}\]

Given the symmetry of the proposal distribution, \(q(\underline{x}{\prime} | \underline{x}) = q(\underline{x} | \underline{x}{\prime})\) , so the acceptance probability simplifies to:

\[ P_{\text{accept}} = \min\left( 1, \frac{\pi(\underline{x}{\prime})}{\pi(\underline{x})} \right) = \min\left( 1, e^{\beta [\ln F_E(\underline{x}{\prime}) + \ln M(\underline{x}{\prime})] - \beta [\ln F_E(\underline{x}) + \ln M(\underline{x})]} \right). \tag{8}\]

This can be rewritten as \[ P_{\text{accept}} = \min\left( 1, \frac{F_E(\underline{x}{\prime})^β M(\underline{x}{\prime})^β} {F_E(\underline{x})^β M(\underline{x})^β} \right), \tag{9}\]

This expression shows that the acceptance probability depends on the difference in fitness and mutational accessibility between the current and proposed phenotypes.

Let’s now implement the Metropolis-Hastings algorithm in code.

@doc raw"""
    evo_metropolis_hastings(x0, fitness_peaks, mut_peaks, β, µ, n_steps)

Perform evolutionary Metropolis-Hastings algorithm to simulate phenotypic
evolution.

# Arguments
- `x0::AbstractVecOrMat`: Initial phenotype vector or matrix.
- `fitness_peaks::Union{GaussianPeak, Vector{GaussianPeak}}`: Fitness landscape
  defined by one or more Gaussian peaks.
- `mut_peaks::Union{GaussianPeak, Vector{GaussianPeak}}`: genotype-phenotype
  density function defined by one or more Gaussian peaks.
- `β::AbstractFloat`: Inverse temperature parameter controlling selection
  strength.
- `µ::AbstractFloat`: Mutation step size standard deviation.
- `n_steps::Int`: Number of steps to simulate.

# Returns
- `Matrix{Float64}`: Matrix of phenotypes, where each column represents a step
  in the simulation.

# Description
This function implements the Metropolis-Hastings algorithm to simulate
phenotypic evolution in a landscape defined by fitness and mutational
accessibility. It uses the following steps:

1. Initialize the phenotype trajectory with the given starting point.
2. For each step: 
   a. Propose a new phenotype by adding Gaussian noise. 
   b. Calculate the fitness and genotype-phenotype density values for the new 
      phenotype.
   c. Compute the acceptance probability based on the ratio of new and current 
      values. 
   d. Accept or reject the proposed phenotype based on the acceptance 
      probability.
3. Return the complete phenotype trajectory.

The acceptance probability is calculated using the simplified form: 
P_accept = min(1, (F_E(x_new) * GP(x_new)) / (F_E(x) * GP(x)))

where F_E is the fitness function and GP is the genotype-phenotype density
function.
"""
function evo_metropolis_hastings(
    x0::AbstractVector,
    fitness_peaks::Union{GaussianPeak,Vector{GaussianPeak}},
    mut_peaks::Union{GaussianPeak,Vector{GaussianPeak}},
    β::Real,
    µ::Real,
    n_steps::Int,
)
    # Initialize array to hold phenotypes
    x = Matrix{Float64}(undef, length(x0), n_steps + 1)
    # Set initial phenotype
    x[:, 1] = x0

    # Compute fitness and genotype-phenotype density at initial phenotype
    fitness_val = fitness(fitness_peaks, x0)
    mut_val = genotype_phenotype_density(mut_peaks, x0)

    # Loop over steps
    for t in 1:n_steps
        # Propose new phenotype
        x_new = x[:, t] + µ * randn(length(x0))

        # Calculate fitness and genotype-phenotype density
        fitness_val_new = fitness(fitness_peaks, x_new)
        mut_val_new = genotype_phenotype_density(mut_peaks, x_new)

        # Compute acceptance probability
        P_accept = min(
            1, (fitness_val_new * mut_val_new / (fitness_val * mut_val))^β
        )

        # Accept or reject proposal
        if rand() < P_accept
            # Accept proposal
            x[:, t+1] = x_new
            # Update fitness and genotype-phenotype density
            fitness_val = fitness_val_new
            mut_val = mut_val_new
        else
            # Reject proposal
            x[:, t+1] = x[:, t]
        end
    end # for
    return x
end # function

Let’s test this function with the previously defined fitness and genotype-phenotype density functions.

Random.seed!(42)

# Define peak parameters
amplitude = 5.0
mean = [0.0, 0.0]
covariance = [3.0 0.0; 0.0 3.0]
# Create peak
fit_peak = GaussianPeak(amplitude, mean, covariance)
# Define peak parameters
amplitude = 1.0
means = [
    [-1.5, -1.5],
    [1.5, -1.5],
    [1.5, 1.5],
    [-1.5, 1.5],
]
covariance = [0.45 0.0; 0.0 0.45]

# Create peak
mut_peaks = GaussianPeak.(Ref(amplitude), means, Ref(covariance))

# Set initial phenotype
x0 = [-2.5, -2.5]

# Set parameters
β = 10.0
µ = 0.1
n_steps = 300

# Run Metropolis-Hastings algorithm
x_traj = evo_metropolis_hastings(x0, fit_peak, mut_peaks, β, µ, n_steps)

Let’s plot the trajectory of the phenotypes in phenotype space.

Random.seed!(42)

# Define range of phenotypes to evaluate
x = range(-4, 4, length=100)
y = range(-4, 4, length=100)

# Create meshgrid
F = fitness.(Ref(fit_peak), [[x, y] for x in x, y in y])
M = genotype_phenotype_density.(Ref(mut_peaks), [[x, y] for x in x, y in y])

# Initialize figure
fig = Figure(size=(600, 300))

# Add axis for trajectory in fitness landscape
ax1 = Axis(
    fig[1, 1],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    aspect=AxisAspect(1),
    title="Fitness landscape",
    yticklabelsvisible=false,
    xticklabelsvisible=false,
)
# Add axis for trajectory in genotype-phenotype density
ax2 = Axis(
    fig[1, 2],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    aspect=AxisAspect(1),
    title="Genotype-phenotype\nDensity",
    yticklabelsvisible=false,
    xticklabelsvisible=false,
)

# Plot a heatmap of the fitness landscape
heatmap!(ax1, x, y, F, colormap=:algae)
# Plot heatmap of genotype-phenotype density
heatmap!(ax2, x, y, M, colormap=Reverse(ColorSchemes.Purples_9))

# Plot contour plot
contour!(ax1, x, y, F, color=:white)
contour!(ax2, x, y, M, color=:white)

# Plot trajectory
scatterlines!.(
    [ax1, ax2],
    Ref(x_traj[1, :]),
    Ref(x_traj[2, :]),
    color=ColorSchemes.seaborn_colorblind[4],
    markersize=5
)

fig

Let’s plot the energy for the phenotypes in the trajectory.

# Initialize array to hold energy
U = Vector{Float64}(undef, n_steps + 1)

# Loop over steps
for (i, x) in enumerate(eachcol(x_traj))
    U[i] = -log(fitness(fit_peak, x)) - log(genotype_phenotype_density(mut_peaks, x))
end

# Initialize figure
fig = Figure(size=(350, 300))

# Add axis for energy
ax = Axis(
    fig[1, 1],
    xlabel="step",
    ylabel="energy",
    yticklabelsvisible=false,
)

# Plot energy
lines!(ax, U)

fig

Let’s now repeat the simulation multiple times and plot all different trajectories.

Random.seed!(42)

# Define number of simulations
n_sim = 5

# Run simulations
x_traj_list = [
    evo_metropolis_hastings(x0, fit_peak, mut_peaks, β, µ, n_steps)
    for _ in 1:n_sim
]

# Initialize figure
fig = Figure(size=(600, 300))

# Add axis for fitness landscape
ax1 = Axis(
    fig[1, 1],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    aspect=AxisAspect(1),
    title="Fitness",
    yticklabelsvisible=false,
    xticklabelsvisible=false,
)
# Add axis for genotype-phenotype density
ax2 = Axis(
    fig[1, 2],
    xlabel="phenotype 1",
    ylabel="phenotype 2",
    aspect=AxisAspect(1),
    title="Genotype-phenotype\nDensity",
    yticklabelsvisible=false,
    xticklabelsvisible=false,
)

# Plot fitness landscape
heatmap!(ax1, x, y, F, colormap=:algae)
# Plot heatmap of genotype-phenotype density
heatmap!(ax2, x, y, M, colormap=Reverse(ColorSchemes.Purples_9))

# Plot contour plot
contour!(ax1, x, y, F, color=:white)
contour!(ax2, x, y, M, color=:white)

# Loop over simulations
for (i, x_traj) in enumerate(x_traj_list)
    # Plot trajectory
    scatterlines!.(
        [ax1, ax2],
        Ref(x_traj[1, :]),
        Ref(x_traj[2, :]),
        color=ColorSchemes.seaborn_colorblind[i],
        markersize=3
    )
end

# Set limits
xlims!(ax1, -4, 4)
ylims!(ax1, -4, 4)
xlims!(ax2, -4, 4)
ylims!(ax2, -4, 4)

fig

As desired, all trajectories climb up the fitness peak avoiding the low genotype-phenotype density regions.

Conclusion

In this notebook we proposed a simple model for evolutionary dynamics inspired on the classic Metropolis-Hastings algorithm. We used this model to simulate the evolution of a phenotype in a fitness landscape with an explicit genotype-phenotype density function. The simple energy-based model allows us to integrate both fitness and genotype-phenotype density in a single framework. This model is the basis for the simulations presented in the main text of the paper.