Chapter V

Supporting Information for First-principles prediction of the information processing capacity of a simple genetic circuit


A version of this chapter originally appeared as Razo-Mejia, M., Marzen, S., Chure, G., Taubman, R., Morrison, M., and Phillips, R. (2020). First-principles prediction of the information processing capacity of a simple genetic circuit. Physical Review E 102, 022404. DOI:https://doi:10.1103/PhysRevE.102.022404.

Abstract

Given the stochastic nature of gene expression, genetically identical cells exposed to the same environmental inputs will produce different outputs. This heterogeneity has been hypothesized to have consequences for how cells are able to survive in changing environments. Recent work has explored the use of information theory as a framework to understand the accuracy with which cells can ascertain the state of their surroundings. Yet the predictive power of these approaches is limited and has not been rigorously tested using precision measurements. To that end, we generate a minimal model for a simple genetic circuit in which all parameter values for the model come from independently published data sets. We then predict the information processing capacity of the genetic circuit for a suite of biophysical parameters such as protein copy number and protein-DNA affinity. We compare these parameter-free predictions with an experimental determination of protein expression distributions and the resulting information processing capacity of E. coli cells. We find that our minimal model captures the scaling of the cell-to-cell variability in the data and the inferred information processing capacity of our simple genetic circuit up to a systematic deviation.

Three-State Promoter Model for Simple Repression

To tackle the question of how much information the simple repression motif can process, we require the joint probability distribution of mRNA and protein \(P(m, p; t)\). To obtain this distribution, we use the chemical master equation formalism. Specifically, we assume a three-state model, where the promoter can be found 1) in a transcriptionally active state (\(A\) state), 2) in a transcriptionally inactive state without the repressor bound (\(I\) state), and 3) with the repressor bound (\(R\) state). (see Fig. ¿fig:ch3_fig02?(A)). These three states generate a system of coupled differential equations for each of the three state distributions \(P_A(m, p)\), \(P_I(m, p)\), and \(P_R(m, p)\). Given the rates shown in Fig. ¿fig:ch3_fig02?(A), let us define the system of ODEs. For the transcriptionally active state, we have \[ \begin{split} \frac{d P_A(m, p)}{dt} &= - \overbrace{k^{(p)}_{\text{off}} P_A(m, p)}^{A \rightarrow I} % A -> I + \overbrace{k^{(p)}_{\text{on}} P_I(m, p)}^{I \rightarrow A}\\ % I -> A &+ \overbrace{r_m P_A(m-1, p)}^{m-1 \rightarrow m} % m-1 -> m - \overbrace{r_m P_A(m, p)}^{m \rightarrow m+1}% m -> m+1 + \overbrace{\gamma _m (m + 1) P_A(m+1 , p)}^{m+1 \rightarrow m} % m+1 -> m - \overbrace{\gamma _m m P_A(m , p)}^{m \rightarrow m-1}\\ % m -> m-1 &+ \overbrace{r_p m P_A(m, p - 1)}^{p-1 \rightarrow p} % p-1 -> p - \overbrace{r_p m P_A(m, p)}^{p \rightarrow p+1} % p -> p+1 + \overbrace{\gamma _p (p + 1) P_A(m, p + 1)}^{p + 1 \rightarrow p} % p+1 -> p - \overbrace{\gamma _p p P_A(m, p)}^{p \rightarrow p-1}. % p -> p-1 \end{split} \] For the inactive promoter state \(I\), we have \[ \begin{split} \frac{d P_I(m, p)}{dt} &= \overbrace{k^{(p)}_{\text{off}} P_A(m, p)}^{A \rightarrow I} % A -> I - \overbrace{k^{(p)}_{\text{on}} P_I(m, p)}^{I \rightarrow A} % I -> A + \overbrace{k^{(r)}_{\text{off}} P_R(m, p)}^{R \rightarrow I} % R -> I - \overbrace{k^{(r)}_{\text{on}} P_I(m, p)}^{I \rightarrow R}\\ % I -> R &+ \overbrace{\gamma _m (m + 1) P_I(m+1 , p)}^{m+1 \rightarrow m} % m+1 -> m - \overbrace{\gamma _m m P_I(m , p)}^{m \rightarrow m-1}\\ % m -> m-1 &+ \overbrace{r_p m P_I(m, p - 1)}^{p-1 \rightarrow p} % p-1 -> p - \overbrace{r_p m P_I(m, p)}^{p \rightarrow p+1} % p -> p+1 + \overbrace{\gamma _p (p + 1) P_I(m, p + 1)}^{p + 1 \rightarrow p} % p+1 -> p - \overbrace{\gamma _p p P_I(m, p)}^{p \rightarrow p-1}. % p -> p-1 \end{split} \] And finally, for the repressor bound state \(R\), we have \[ \begin{split} \frac{d P_R(m, p)}{dt} &= - \overbrace{k^{(r)}_{\text{off}} P_R(m, p)}^{R \rightarrow I} % R -> I + \overbrace{k^{(r)}_{\text{on}} P_I(m, p)}^{I \rightarrow R}\\ % I -> R &+ \overbrace{\gamma _m (m + 1) P_R(m+1 , p)}^{m+1 \rightarrow m} % m+1 -> m - \overbrace{\gamma _m m P_R(m , p)}^{m \rightarrow m-1}\\ % m -> m-1 &+ \overbrace{r_p m P_R(m, p - 1)}^{p-1 \rightarrow p} % p-1 -> p - \overbrace{r_p m P_R(m, p)}^{p \rightarrow p+1} % p -> p+1 + \overbrace{\gamma _p (p + 1) P_R(m, p + 1)}^{p + 1 \rightarrow p} % p+1 -> p - \overbrace{\gamma _p p P_R(m, p)}^{p \rightarrow p-1}. % p -> p-1 \end{split} \]

For an unregulated promoter, i.e., a promoter in a cell that has no repressors present and therefore constitutively expresses the gene, we use a two-state model in which the state \(R\) is not allowed. All the terms in the system of ODEs containing \(k^{(r)}_{\text{on}}\) or \(k^{(r)}_{\text{off}}\) are then set to zero.

It is convenient to express this system using matrix notation  [1]. For this, we define \(\mathbf{P}(m, p) = (P_A(m, p), P_I(m, p), P_R(m, p))^T\). Then the system of ODEs can be expressed as \[ \begin{split} \frac{d \mathbf{P}(m, p)}{dt} &= \mathbf{K} \mathbf{P}(m, p) - \mathbf{R}_m \mathbf{P}(m, p) + \mathbf{R}_m \mathbf{P}(m-1, p)\\ &- m \mathbf{\Gamma}_m \mathbf{P}(m, p) + (m + 1) \mathbf{\Gamma}_m \mathbf{P}(m + 1, p)\\ &- m \mathbf{R}_p \mathbf{P}(m, p) + m \mathbf{R}_p \mathbf{P}(m, p - 1)\\ &- p \mathbf{\Gamma}_p \mathbf{P}(m, p) + (p + 1) \mathbf{\Gamma}_p \mathbf{P}(m, p + 1), \end{split} \] where we defined matrices representing the promoter state transition \(\mathbf{K}\), \[ \mathbf{K} \equiv \begin{bmatrix} -k^{(p)}_{\text{off}} & k^{(p)}_{\text{on}} & 0\\ k^{(p)}_{\text{off}} & -k^{(p)}_{\text{on}} -k^{(r)}_{\text{on}} & k^{(r)}_{\text{off}}\\ 0 & k^{(r)}_{\text{on}} & -k^{(r)}_{\text{off}} \end{bmatrix}, \] mRNA production, \(\mathbf{R}_m\), and degradation, \(\mathbf{\Gamma}_m\), as \[ \mathbf{R}_m \equiv \begin{bmatrix} r_m & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0\\ \end{bmatrix}, \] and \[ \mathbf{\Gamma}_m \equiv \begin{bmatrix} \gamma _m & 0 & 0\\ 0 & \gamma _m & 0\\ 0 & 0 & \gamma _m\\ \end{bmatrix}. \] For the protein, we also define production \(\mathbf{R}_p\) and degradation \(\mathbf{\Gamma}_p\) matrices as \[ \mathbf{R}_p \equiv \begin{bmatrix} r_p & 0 & 0\\ 0 & r_p & 0\\ 0 & 0 & r_p\\ \end{bmatrix} \] and \[ \mathbf{\Gamma}_p \equiv \begin{bmatrix} \gamma _p & 0 & 0\\ 0 & \gamma _p & 0\\ 0 & 0 & \gamma _p\\ \end{bmatrix}. \]

The corresponding equation for the unregulated two-state promoter takes the same form with the definition of the matrices following the same scheme without including the third row and third column, and setting \(k^{(r)}_{\text{on}}\) and \(k^{(r)}_{\text{off}}\) to zero.

A closed-form solution for this master equation might not even exist. The approximate solution of chemical master equations of this kind is an active area of research. As we will see later in this chapter, the two-state promoter master equation has been analytically solved for the mRNA  [2] and protein distributions  [3]. For our purposes, we will detail how to use the Maximum Entropy principle to approximate the full distribution for the two- and three-state promoter.

Parameter Inference

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

To generate falsifiable predictions with meaningful parameters, we infer the kinetic rates for this three-state promoter model using different data sets generated in our lab over the last decade concerning different aspects of the regulation of the simple repression motif. For example, for the unregulated promoter transition rates \(k^{(p)}_{\text{on}}\) and \(k^{(p)}_{\text{off}}\) and the mRNA production rate \(r_m\), we use single-molecule mRNA FISH counts from an unregulated promoter  [4]. Once these parameters are fixed, we use the values to constrain the repressor rates \(k^{(r)}_{\text{on}}\) and \(k^{(r)}_{\text{off}}\). These repressor rates are obtained using information from mean gene expression measurements from bulk LacZ colorimetric assays  [5]. We also expand our model to include the allosteric nature of the repressor protein, taking advantage of video microscopy measurements done in the context of multiple promoter copies  [6] and flow-cytometry measurements of the mean response of the system to different levels of induction  [7]. In what follows, we detail the steps taken to infer the parameter values. At each step, the values of the parameters inferred in previous steps constrain the values of the parameters that are not yet determined, building in this way a self-consistent model informed by work that spans several experimental techniques.

Unregulated Promoter Rates

We begin our parameter inference problem with the promoter on and off rates \(k^{(p)}_{\text{on}}\) and \(k^{(p)}_{\text{off}}\), as well as the mRNA production rate \(r_m\). In this case, there are only two states available to the promoter— the inactive state \(I\) and the transcriptionally active state \(A\). That means that the third ODE for \(P_R(m, p)\) is removed from the system. The mRNA steady-state distribution for this particular two-state promoter model was solved analytically by Peccoud and Ycart  [2]. This distribution \(P(m) \equiv P_I(m) + P_A(m)\) is of the form \[ {\scriptstyle P(m \mid k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m, \gamma _m) = {\Gamma \left( \frac{k^{(p)}_{\text{on}}}{\gamma _m} + m \right) \over \Gamma(m + 1) \Gamma\left( \frac{k^{(p)}_{\text{off}}+k^{(p)}_{\text{on}}}{\gamma _m} + m \right)} {\Gamma\left( \frac{k^{(p)}_{\text{off}}+k^{(p)}_{\text{on}}}{\gamma _m} \right) \over \Gamma\left( \frac{k^{(p)}_{\text{on}}}{\gamma _m} \right)} \left( \frac{r_m}{\gamma _m} \right)^m F_1^1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma _m} + m, \frac{k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}}}{\gamma _m} + m, -\frac{r_m}{\gamma _m} \right), } \label{eq:two_state_mRNA} \] where \(\Gamma(\cdot)\) is the gamma function, and \(F_1^1\) is the confluent hypergeometric function of the first kind. This rather complicated expression will aid us in finding parameter values for the rates. The inferred rates \(k^{(p)}_{\text{on}}\), \(k^{(p)}_{\text{off}}\), and \(r_m\) will be expressed in units of the mRNA degradation rate \(\gamma _m\). This is because the model in Eq. \(\ref{eq:two_state_mRNA}\) is homogeneous in time, meaning that if we divide all rates by a constant, it would be equivalent to multiplying the characteristic time scale by the same constant. As we will discuss in the next section, Eq. \(\ref{eq:two_state_mRNA}\) has degeneracy in the parameter values. What this means is that a change in one of the parameters, specifically \(r_m\), can be compensated by a change in another parameter, specifically \(k^{(p)}_{\text{off}}\), to obtain the same distribution. To work around this intrinsic limitation of the model, we will include information from what we know from equilibrium-based models in our inference prior.

Bayesian Parameter Inference of RNAP Rates

To make progress at inferring the unregulated promoter state transition rates, we make use of the single-molecule mRNA FISH data from Jones et al.  [4]. Fig. 1 shows the distribution of mRNA per cell for the lacUV5 promoter used for our inference. This promoter, being very strong, has a mean copy number of \(\langle m \rangle \approx 18\) mRNA/cell.

Figure 1: lacUV5 mRNA per cell distribution. Data from  [4] of the unregulated lacUV5 promoter as inferred from single-molecule mRNA FISH. The Python code (ch5_fig01.py) used to generate this figure can be found on the original paper’s GitHub repository.

Having these data in hand, we now turn to Bayesian parameter inference. Writing Bayes’ theorem, we have \[ P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m \mid D) = \frac{P(D \mid k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m) P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m)}{P(D)}, \label{eq:bayes_rnap_rates} \] where \(D\) represents the data. For this case, the data consists of single-cell mRNA counts \(D = \{ m_1, m_2, \ldots, m_N \}\), where \(N\) is the number of cells. We assume that each cell’s measurement is independent of the others such that we can rewrite Eq. \(\ref{eq:bayes_rnap_rates}\) as \[ P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m \mid \{m_i\}) \propto \left[\prod_{i=1}^N P(m_i \mid k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m) \right] P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m), \label{eq:bayes_sample} \] where we ignore the normalization constant \(P(D)\). The likelihood term \(P(m_i \mid k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m)\) is exactly given Eq. \(\ref{eq:two_state_mRNA}\) by with \(\gamma _m = 1\). Given that we have this functional form for the distribution, we can use Markov Chain Monte Carlo (MCMC) sampling to explore the 3D parameter space in order to fit \(\ref{eq:two_state_mRNA}\) to the mRNA-FISH data.

Constraining the Rates Given Prior Thermodynamic Knowledge

One of the Bayesian approach’s strengths is that we can include all the prior knowledge on the parameters when performing an inference  [8]. Basic features such as the fact that the rates have to be strictly positive constrain these parameters’ values. We know more than the simple constraint of non-negative values for the specific rates analyzed in this section. For example, the expression of an unregulated promoter has been studied from a thermodynamic perspective  [9]. Given the underlying assumptions of these equilibrium models, in which the probability of finding the RNAP bound to the promoter is proportional to the transcription rate  [10], they can only make statements about the mean expression level. Nevertheless, if both the thermodynamic and kinetic models describe the same process, the mean gene expression level predictions must agree. That means that we can use what we know about the mean gene expression and how this is related to parameters such as molecule copy numbers and binding affinities to constrain the values that the rates in question can take.

In the case of this two-state promoter, it can be shown that the mean number of mRNA is given by  [1] (see Sec. 5.3 for moment computation) \[ \langle m \rangle = \frac{r_m}{\gamma _m} \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}. \label{eq:mean_kinetic} \] Another way of expressing this is as \(\frac{r_m}{\gamma _m} \times p_{\text{active}}^{(p)}\), where \(p_{\text{active}}^{(p)}\) is the probability of the promoter being in the transcriptionally active state. The thermodynamic picture has an equivalent result where the mean number of mRNA is given by  [9,10] \[ \left\langle m \right\rangle = \frac{r_m}{\gamma _m} \frac{\frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p} }{1 + \frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p}}, \label{eq:mean_thermo} \] where \(P\) is the number of RNAP per cell, \(N_{NS}\) is the number of non-specific binding sites, \(\Delta\varepsilon_p\) is the RNAP binding energy in \(k_BT\) units and \(\beta\equiv {(k_BT)}^{-1}\). Using Eq. \(\ref{eq:mean_kinetic}\) and Eq. \(\ref{eq:mean_thermo}\), we can easily see that if these frameworks are to be equivalent, then it must be true that \[ \frac{k^{(p)}_{\text{on}}}{ k^{(p)}_{\text{off}}} = \frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p}, \] or equivalently \[ \ln \left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) = -\beta\Delta\varepsilon_p + \ln P - \ln N_{NS}. \] To put numerical values into these variables, we can use information from the literature. The RNAP copy number is of order \(P \approx 1000-3000\) RNAP/cell for a one-hour doubling time  [11]. As for the number of non-specific binding sites and the binding energy, we have that \(N_{NS} = 4.6\times 10^6\)  [10] and \(-\beta\Delta\varepsilon_p \approx 5 - 7\)  [9]. Given these values, we define a Gaussian prior for the log ratio of these two quantities of the form \[ P\left(\ln \left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) \right) \propto \exp \left\{ - \frac{\left(\ln \left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) - \left(-\beta\Delta\varepsilon_p + \ln P - \ln N_{NS} \right) \right)^2 }{2 \sigma^2} \right\}, \label{eq:prior_single} \] where \(\sigma\) is the variance that accounts for the uncertainty in these parameters. We include this prior as part of the prior term \(P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m)\) of Eq. \(\ref{eq:bayes_sample}\). We then use MCMC to sample the posterior distribution given by Eq. \(\ref{eq:bayes_sample}\). Fig. 2 shows the MCMC samples of the posterior distribution. For the case of the \(k^{(p)}_{\text{on}}\) parameter, there is a single symmetric peak. \(k^{(p)}_{\text{off}}\) and \(r_m\) have a rather long tail towards large values. The 2D projection of \(k^{(p)}_{\text{off}}\) vs. \(r_m\) shows that the model is sloppy, meaning that the two parameters are highly correlated. This feature is a common problem for many non-linear systems used in biophysics and systems biology  [12]. What this implies is that we can change the value of \(k^{(p)}_{\text{off}}\), and then compensate by a change in \(r_m\) to maintain the shape of the mRNA distribution. Therefore, it is impossible for the data and the model to narrow down a single value for the parameters. Nevertheless, since we included the prior information on the rates as given by the analogous form between the equilibrium and non-equilibrium expressions for the mean mRNA level, we obtained a more constrained parameter value for the RNAP rates and the transcription rate we will take as the peak of this long-tailed distribution.

Figure 2: MCMC posterior distribution. Sampling out of Eq. \ref{eq:bayes_sample}, the plot shows 2D and 1D projections of the 3D parameter space. The parameter values are (in units of the mRNA degradation rate \gamma _m) k^{(p)}_{\text{on}} = 4.3^{+1}_{-0.3}, k^{(p)}_{\text{off}} = 18.8^{+120}_{-10} and r_m = 103.8^{+423}_{-37} which are the modes of their respective distributions, where the superscripts and subscripts represent the upper and lower bounds of the 95^\text{th} percentile of the parameter value distributions. The Python code (ch5_fig02.py) used to generate this figure can be found on the original paper’s GitHub repository.

The inferred values \(k^{(p)}_{\text{on}} = 4.3^{+1}_{-0.3}\), \(k^{(p)}_{\text{off}} = 18.8^{+120}_{-10}\) and \(r_m = 103.8^{+423}_{-37}\) are given in units of the mRNA degradation rate \(\gamma _m\). Given the asymmetry of the parameter distributions, we report the upper and lower bound of the 95\(^\text{th}\) percentile of the posterior distributions. Assuming a mean life-time for mRNA of \(\approx\) 3 min (from this link), we have an mRNA degradation rate of \(\gamma _m \approx 2.84 \times 10^{-3} s^{-1}\). Using this value gives the following values for the inferred rates: \(k^{(p)}_{\text{on}} = 0.024_{-0.002}^{+0.005} s^{-1}\), \(k^{(p)}_{\text{off}} = {0.11}_ {-0.05}^{+0.66} s^{-1}\), and \(r_m = 0.3_{-0.2}^{+2.3} s^{-1}\).

Fig. 3 compares the experimental data from Fig. 1 with the resulting distribution obtained by substituting the most likely parameter values into Eq. \(\ref{eq:two_state_mRNA}\). As we can see, this two-state model fits the data adequately.

Figure 3: Experimental vs. theoretical distribution of mRNA per cell using parameters from Bayesian inference. Dotted line shows the result of using Eq. \ref{eq:two_state_mRNA} along with the parameters inferred for the rates. Blue bars are the same data as Fig. 1 obtained from  [4]. The Python code (ch5_fig03.py) used to generate this figure can be found on the original paper’s GitHub repository.

Accounting for Variability in the Number of Promoters

As discussed in ref.  [4] and further expanded in  [13], an essential source of cell-to-cell variability in gene expression in bacteria is the fact that, depending on the growth rate and the position relative to the chromosome replication origin, cells can have multiple copies of any given gene. Genes closer to the replication origin have, on average, higher gene copy numbers compared to genes at the opposite end. For the locus in which our reporter construct is located (galK) and the doubling time of the mRNA FISH experiments, we expect to have \(\approx\) 1.66 copies of the gene  [4,14]. This implies that the cells spend 2/3 of the cell cycle with two copies of the promoter and the rest with a single copy.

To account for this variability in gene copy, we extend the model assuming that when cells have two copies of the promoter, the mRNA production rate is \(2 r_m\) compared to the rate \(r_m\) for a single promoter copy. The probability of observing a particular mRNA copy \(m\) is therefore given by \[ {\scriptstyle P(m) = P(m \mid \text{one promoter}) \cdot P(\text{one promoter}) + P(m \mid \text{two promoters}) \cdot P(\text{two promoters}). } \label{eq:prob_multipromoter} \] Both terms \(P(m \mid \text{promoter copy})\) are given by Eq. \(\ref{eq:two_state_mRNA}\) with the only difference being the rate \(r_m\). It is important to acknowledge that Eq. \(\ref{eq:prob_multipromoter}\) assumes that once the gene is replicated, the time scale in which the mRNA count relaxes to the new steady state is much shorter than the time that the cells spend in this two promoter copies state. This approximation should be valid for a short-lived mRNA molecule, but the assumption is not applicable for proteins whose degradation rate is comparable to the cell cycle length as explored in Sec. 5.5.

To repeat the Bayesian inference, including this variability in gene copy number, we must split the mRNA count data into two sets—cells with a single copy of the promoter and cells with two copies of the promoter. There is no labeling of the locus for the single-molecule mRNA FISH data, making it impossible to determine the promoter’s number of copies for any given cell. We, therefore, follow Jones et al.  [4] in using the cell area as a proxy for the stage in the cell cycle. They sorted cells by area in their approach, considering cells below the 33\(^\text{th}\) percentile having a single promoter copy and the rest as having two copies. This approach ignores that cells are not uniformly distributed along the cell cycle. As first derived in  [15], populations of cells in a log-phase are exponentially distributed along the cell cycle. This distribution is of the form \[ P(a) = (\ln 2) \cdot 2^{1 - a}, \label{eq:cell_cycle_dist} \] where \(a \in [0, 1]\) is the stage of the cell cycle, with \(a = 0\) being the start of the cycle and \(a = 1\) being the cell division (see Sec. 5.10 for a derivation of Eq. \(\ref{eq:cell_cycle_dist}\)). Fig. 4 shows the separation of the two groups based on the area where Eq. \(\ref{eq:cell_cycle_dist}\) was used to weight the distribution along the cell cycle.

Figure 4: Separation of cells based on cell size. Using the area as a proxy for position in the cell cycle, cells can be sorted into two groups—small cells (with one promoter copy) and large cells (with two promoter copies). The vertical black line delimits the threshold that divides both groups as weighted by Eq. \ref{eq:cell_cycle_dist}. The Python code (ch5_fig04.py) used to generate this figure can be found on the original paper’s GitHub repository.

A subtle but important consequence of Eq. \(\ref{eq:cell_cycle_dist}\) is that computing any quantity for a single cell is not equivalent to computing the same quantity for a population of cells. For example, let us assume that we want to compute the mean mRNA copy number \(\langle m \rangle\). For a single cell, this would be of the form \[ \langle m \rangle_{\text{cell}} = \langle m \rangle_1 \cdot f + \langle m \rangle_2 \cdot (1 - f), \] where \(\langle m \rangle_i\) is the mean mRNA copy number with \(i\) promoter copies in the cell, and \(f\) is the fraction of the cell cycle that cells spend with a single copy of the promoter. For a single cell, the probability of having a single promoter copy is equivalent to this fraction \(f\). But Eq. \(\ref{eq:cell_cycle_dist}\) tells us that if we sample unsynchronized cells, we are not sampling uniformly across the cell cycle. Therefore for a population of cells, the mean mRNA is given by \[ \langle m \rangle_{\text{population}} = \langle m \rangle_1 \cdot \phi + \langle m \rangle_2 \cdot (1 - \phi) \label{eq:mean_m_pop} \] where the probability of sampling a cell with one promoter \(\phi\) is given by \[ \phi = \int_0^f P(a) da, \] where \(P(a)\) is given by Eq. \(\ref{eq:cell_cycle_dist}\). What this equation computes is the probability of sampling a cell during a stage of the cell cycle \(< f\) where the reporter gene has not been replicated yet. Fig. 5 shows the distribution of both groups. As expected, larger cells have a higher mRNA copy number on average.

Figure 5: mRNA distribution for small and large cells. (A) histogram and (B) the cumulative distribution function of the small and large cells as determined in Fig. 4. The triangles above histograms in (A) indicate the mean mRNA copy number for each group. The Python code (ch5_fig05.py) used to generate this figure can be found on the original paper’s GitHub repository.

We modify Eq. \(\ref{eq:bayes_sample}\) to account for the two separate groups of cells. Let \(N_s\) be the number of cells in the small size group and \(N_l\) the number of cells in the large size group. Then the posterior distribution for the parameters is of the form \[ {\scriptstyle P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m \mid \{m_i\}) \propto \left[\prod_{i=1}^{N_s} P(m_i \mid k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m)\right] \left[\prod_{j=1}^{N_l} P(m_j \mid k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, 2 r_m)\right] P(k^{(p)}_{\text{on}}, k^{(p)}_{\text{off}}, r_m), } \label{eq:bayes_sample_double} \] where we split the product of small and large cells.

For the two-promoter model, the prior shown in Eq. \(\ref{eq:prior_single}\) requires a small modification. Eq. \(\ref{eq:mean_m_pop}\) gives the mean mRNA copy number of a population of asynchronous cells growing at a steady-state. Given that we assume that the only difference between having one vs. two promoter copies state is the change in transcription rate from \(r_m\) in the single promoter case to \(2 r_m\) in the two-promoter case, we can write Eq. \(\ref{eq:mean_m_pop}\) as \[ \langle m \rangle = \phi \cdot \frac{r_m}{\gamma _m} \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} + (1 -\phi) \cdot \frac{2 r_m}{\gamma _m} \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}. \] This can be simplified to \[ \langle m \rangle = (2 - \phi) \frac{r_m}{\gamma _m} \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}. \label{eq:mean_m_double_rates} \]

Equating Eq. \(\ref{eq:mean_m_double_rates}\) and Eq. \(\ref{eq:mean_thermo}\) to again require self-consistent predictions of the mean mRNA from the equilibrium and kinetic models gives \[ (2 - \phi) \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} = \frac{\frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p} }{1 + \frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p}}. \] Solving for \(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\) results in \[ \left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) = \frac{\rho}{\left[ (1 + \rho)(2 - \phi) - \rho \right]}, \label{eq:kinetic_thermo_equiv} \] where we define \(\rho \equiv \frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p}\). To simplify things further, we notice that for the specified values of \(P = 1000 - 3000\) per cell, \(N_{NS} = 4.6 \times 10^6\) bp, and \(-\beta\Delta\varepsilon_p = 5 - 7\), we can safely assume that \(\rho \ll 1\). This simplifying assumption has been previously called the weak promoter approximation  [5]. Given this, we can simplify Eq. \(\ref{eq:kinetic_thermo_equiv}\) as \[ \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}} = \frac{1}{2 - \phi} \frac{P}{N_{NS}} e^{-\beta\Delta\varepsilon_p}. \] Taking the log of both sides gives \[ \ln\left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) = -\ln (2 - \phi) + \ln P - \ln N_{NS} - \beta\Delta\varepsilon_p. \] With this, we can set as before a Gaussian prior to constrain the ratio of the RNAP rates as \[ P\left(\ln \left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) \right) \propto \exp \left\{ - \frac{\left(\ln \left(\frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}}\right) - \left[-\ln(2 - \phi) -\beta\Delta\varepsilon_p + \ln P - \ln N_{NS} \right) \right]^2 }{2 \sigma^2} \right\}. \label{eq:prior_double} \]

Fig. 6 shows the result of sampling out of Eq. \(\ref{eq:bayes_sample_double}\). Again we see that the model is highly sloppy with large credible regions obtained for \(k^{(p)}_{\text{off}}\) and \(r_m\). Nevertheless, the prior information allows us to get parameter values consistent with the equilibrium picture.

Figure 6: MCMC posterior distribution for a multi-promoter model. Sampling out of Eq. \ref{eq:bayes_sample_double}, the plot shows 2D and 1D projections of the 3D parameter space. The parameter values are (in units of the mRNA degradation rate \gamma _m) k^{(p)}_{\text{on}} = 6.4^{+0.8}_{-0.4}, k^{(p)}_{\text{off}} = 132^{+737}_{-75} and r_m = 257^{+1307}_{-132} which are the modes of their respective distributions, where the superscripts and subscripts represent the upper and lower bounds of the 95^\text{th} percentile of the parameter value distributions. The sampling was bounded to values < 1000 for numerical stability when computing the confluent hypergeometric function. The Python code (ch5_fig06.py) used to generate this figure can be found on the original paper’s GitHub repository.

Using again a mRNA mean lifetime of \(\approx 3\) min gives the following values for the parameters: \(k^{(p)}_{\text{on}} = {0.03}_{-0.002}^{+0.004} s^{-1}\), \(k^{(p)}_{\text{off}} = {0.7} _{-0.4}^{+4.1} s^{-1}\), and \(r_m = {1.4}_{-0.7}^{+7.3} s^{-1}\). Fig. 7 shows the result of applying Eq. \(\ref{eq:prob_multipromoter}\) using these parameter values. Specifically Fig. 7(A) shows the global distribution, including cells with one and two promoters and Fig. 7(B) split the distributions within the two populations. Given that the model adequately describes both populations independently and pooled together, we confirm that using the cell area as a proxy for the stage in the cell cycle and the doubling of the transcription rate once cells have two promoters are reasonable approximations.

Figure 7: Experimental vs. theoretical distribution of mRNA per cell using parameters for multi-promoter model. (A) Solid line shows the result of using Eq. \ref{eq:prob_multipromoter} with the parameters inferred by sampling Eq. \ref{eq:bayes_sample_double}. Blue bars are the same data as Fig. 1 from  [4]. (B) Split distributions of small cells (light blue bars) and large cells (dark blue) with the corresponding theoretical predictions with transcription rate r_m (light blue line) and transcription rate 2 r_m (dark blue line). The Python code (ch5_fig07.py) used to generate this figure can be found on the original paper’s GitHub repository.

It is hard to compare literature-reported values because these kinetic rates are effective parameters hiding a lot of the complexity of transcription initiation  [16]. Also, the parameters’ non-identifiability restricts our explicit comparison of the actual numerical values of the inferred rates. Nevertheless, from the model, we can see that the mean burst size for each transcription event is given by \(r_m / k^{(p)}_{\text{off}}\). We obtain a mean burst size of \(\approx 1.9\) transcripts per cell from our inferred values. This mean burst size is similar to the reported burst size of 1.15 on a similar system on E. coli  [17].

Repressor Rates from a Three-State Regulated Promoter

Having determined the unregulated promoter transition rates, we now proceed to determine the repressor rates \(k^{(r)}_{\text{on}}\) and \(k^{(r)}_{\text{off}}\). These rates’ values are constrained again by the correspondence between our kinetic picture and what we know from equilibrium models  [18]. For this analysis, we again exploit the feature that, at the mean, both the kinetic language and the thermodynamic language should have equivalent predictions. Over the last decade, there has been a great effort in developing equilibrium models for gene expression regulation  [10,19,20]. In particular, our group has extensively characterized the simple repression motif using this formalism  [57].

The dialogue between theory and experiments has led to simplified expressions that capture the phenomenology of the gene expression response as a function of natural variables such as molecule count and affinities between molecular players. A particularly interesting quantity for the simple repression motif used by Garcia & Phillips  [5] is the fold-change in gene expression, defined as \[ \text{fold-change} = \frac{\left\langle{\text{gene expression}(R \neq 0)}\right\rangle}{ \left\langle{\text{gene expression}(R = 0)}\right\rangle}, \] where \(R\) is the number of repressors per cell and \(\left\langle{\cdot}\right\rangle\) is the population average. The fold-change is simply the mean expression level in the presence of the repressor relative to the mean expression level in the absence of regulation. In the language of statistical mechanics, this quantity takes the form \[ \text{fold-change} = \left( 1 + \frac{R}{N_{NS}} e^{-\beta\Delta\varepsilon_r} \right)^{-1}, \label{eq:fc_thermo} \] where \(\Delta\varepsilon_r\) is the repressor-DNA binding energy, and as before, \(N_{NS}\) is the number of non-specific binding sites where the repressor can bind  [5].

To compute the fold-change in the chemical master equation language, we compute the first moment of the steady-state mRNA distribution \(\langle m \rangle\) for both the three-state promoter (\(R \neq 0\)) and the two-state promoter case (\(R=0\)) (see Sec. 5.3 for moment derivation). The unregulated (two-state) promoter mean mRNA copy number is given by Eq. \(\ref{eq:mean_m_double_rates}\). For the regulated (three-state) promoter, we have an equivalent expression of the form \[ \left\langle{m (R \neq 0)}\right\rangle = (2 - \phi)\frac{r_m}{\gamma _m} \frac{k^{(r)}_{\text{off}}k^{(p)}_{\text{on}} }{k^{(p)}_{\text{off}}k^{(r)}_{\text{off}} + k^{(p)}_{\text{off}}k^{(r)}_{\text{on}} + k^{(r)}_{\text{off}}k^{(p)}_{\text{on}}}. \] Computing the fold-change then gives \[ \text{fold-change} = \frac{\left\langle{m (R \neq 0)}\right\rangle} {\left\langle{m (R = 0)}\right\rangle} = \frac{k^{(r)}_{\text{off}} \left( k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}} \right)}{ k^{(p)}_{\text{off}}k^{(r)}_{\text{on}} + k^{(r)}_{\text{off}} \left( k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}} \right)}, \label{eq:fold_change_cme} \] where the factor \((2 - \phi)\) due to the multiple promoter copies, the transcription rate \(r_m\), and the mRNA degradation rate \(\gamma _m\) all cancel out.

Given that the number of repressors per cell \(R\) is an experimental variable that we can control, we assume that the rate at which the promoter transitions from the transcriptionally inactive state to the repressor bound state, \(k^{(r)}_{\text{on}}\), is given by the concentration of repressors \([R]\) times a diffusion-limited on rate \(k_o\). For the diffusion-limited constant \(k_o\), we use the value used by Jones et al.  [4]. With this in hand, we can rewrite Eq. \(\ref{eq:fold_change_cme}\) as \[ \text{fold-change} = \left( 1 + \frac{k_0 [R]}{k^{(r)}_{\text{off}}} \frac{k^{(p)}_{\text{off}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} \right)^{-1}. \label{eq:fc_kinetic} \]

We note that both Eq. \(\ref{eq:fc_thermo}\) and Eq. \(\ref{eq:fc_kinetic}\) have the same functional form. Therefore if both languages predict the same output for the mean gene expression level, it must be true that \[ \frac{k_o [R]}{k^{(r)}_{\text{off}}} \frac{k^{(p)}_{\text{off}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} = \frac{R}{N_{NS}} e^{-\beta\Delta\varepsilon_r}. \] Solving for \(k^{(r)}_{\text{off}}\) gives \[ k^{(r)}_{\text{off}} = \frac{k_o [R] N_{NS} e^{\beta\Delta\varepsilon_r}}{R} \frac{k^{(p)}_{\text{off}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}. \label{eq:kroff_complete} \]

Since the reported value of \(k_o\) is given in units of nM\(^{-1}\)s\(^{-1}\) for the units to cancel properly, the repressor concentration must be given in nM rather than absolute count. If we consider that the repressor concentration is equal to \[ [R] = \frac{R}{V_{cell}}\cdot \frac{1}{N_A}, \] where \(R\) is the absolute repressor copy number per cell, \(V_{cell}\) is the cell volume, and \(N_A\) is Avogadro’s number. The E. coli cell volume is 2.1 fL  [21], and Avogadro’s number is \(6.022 \times 10^{23}\). If we further include the conversion factor to turn M into nM, we find that \[ [R] = \frac{R}{2.1 \times 10^{-15} L} \cdot \frac{1}{6.022 \times 10^{23}} \cdot \frac{10^9 \text{ nmol}}{1 \text{ mol}} \approx 0.8 \times R. \] Using this, we simplify Eq. \(\ref{eq:kroff_complete}\) as \[ k^{(r)}_{\text{off}} \approx 0.8 \cdot k_o \cdot N_{NS} e^{\beta\Delta\varepsilon_r} \cdot \frac{k^{(p)}_{\text{off}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}. \label{eq:kroff} \] What Eq. \(\ref{eq:kroff}\) shows is the direct relationship that must be satisfied if the equilibrium model is set to be consistent with the non-equilibrium kinetic picture. Tbl. 1 summarizes the values obtained for the three operator sequences used throughout this work. To compute these numbers, the number of non-specific binding sites \(N_{NS}\) was taken to be \(4.6 \times 10^6\) bp, i.e., the size of the E. coli K12 genome.

Table 1: Binding sites and corresponding parameters.
Operator \(\Delta\varepsilon_r\; (k_BT)\) \(k^{(r)}_{\text{off}}\; (s^{-1})\)
O1 -15.3 0.002
O2 -13.9 0.008
O3 -9.7 0.55

In-vivo measurements of the Lac repressor off rate have been done with single-molecule resolution  [22]. The authors report a mean residence time of \(5.3 \pm 0.2\) minutes for the repressor on an O1 operator. The corresponding rate is \(k^{(r)}_{\text{off}} \approx 0.003\) \((s^{-1})\), very similar value to what we inferred from our model. In this same reference, the authors determined that, on average, the repressor takes \(30.9 \pm 0.5\) seconds to bind to the operator  [22]. Given the kinetic model presented in Fig. ¿fig:ch3_fig02?(A), this time can be converted to the probability of not being on the repressor bound state \(P_{\text{not }R}\). This is computed as \[ P_{\text{not }R} = {\tau_{\text{not }R} \over \tau_{\text{not }R} + \tau_{R}}, \] where \(\tau_{\text{not }R}\) is the average time that the repressor does not occupy the operator, and \(\tau_{R}\) is the average time that the repressor spends bound to the operator. Substituting the numbers from  [22] gives \(P_{\text{not }R} \approx 0.088\). From our model, we can compute the zeroth moment \(\left\langle{m^0 p^0}\right\rangle\) for each of the three promoter states. This moment is equivalent to the probability of being on each of the promoter states. Upon substitution of our inferred rate parameters, we can compute \(P_{\text{not }R}\) as \[ P_{\text{not }R} = 1 - P_R \approx 0.046, \] where \(P_R\) is the probability of the promoter being bound by the repressor. The value we obtained is within a factor of two from the one reported in  [22].

Computing Moments from the Master Equation

This section will compute the moment equations for the distribution \(P(m, p)\). Without loss of generality, here, we will focus on the three-state regulated promoter. The computation of the two-state promoter’s moments follows the same procedure, changing only the matrices’ definition in the master equation.

Computing Moments of a Distribution

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

To compute any moment of our chemical master equation, let us define a vector \[ \left\langle \mathbf{m^x p^y} \right\rangle \equiv (\left\langle m^x p^y \right\rangle_A, \left\langle m^x p^y \right\rangle_I, \left\langle m^x p^y \right\rangle_R)^T, \] where \(\left\langle m^x p^y \right\rangle_S\) is the expected value of \(m^x p^y\) in state \(S \in \{A, I, R\}\) with \(x, y \in \mathbb{N}\). In other words, just as we defined the vector \(\mathbf{P}(m, p)\), here we define a vector to collect the expected value of each of the promoter states. By definition, these moments \(\left\langle m^x p^y \right\rangle_S\) are computed as \[ \left\langle m^x p^y \right\rangle_S \equiv \sum_{m=0}^\infty \sum_{p=0}^\infty m^x p^y P_S(m, p). \label{eq:mom_def} \] To simplify the notation, let \(\sum_x \equiv \sum_{x=0}^\infty\). Since we are working with a system of three ODEs, one for each state, let us define the following operation: \[ \left\langle \mathbf{m^x p^y} \right\rangle = \sum_m \sum_p m^x p^y \mathbf{P}(m, p) \equiv \begin{bmatrix} \sum_m \sum_p m^x p^y P_A(m, p)\\ \sum_m \sum_p m^x p^y P_I(m, p)\\ \sum_m \sum_p m^x p^y P_R(m, p)\\ \end{bmatrix}. \]

With this in hand, we can then apply this sum over \(m\) and \(p\) to Eq. ¿eq:ch3_eq09?. For the left-hand side, we have \[ \sum_m \sum_p m^x p^y \frac{d \mathbf{P}(m, p)}{dt} = \frac{d}{dt}\left[ \sum_m \sum_p m^x p^y \mathbf{P}(m, p) \right], \label{eq:sum_mom} \] where we made use of the linearity property of the derivative to switch the order between the sum and the derivative. Notice that the right-hand side of Eq. \(\ref{eq:sum_mom}\) contains the definition of a moment from Eq. \(\ref{eq:mom_def}\). That means that we can rewrite it as \[ \frac{d}{dt}\left[ \sum_m \sum_p m^x p^y \mathbf{P}(m, p) \right] = \frac{d \mathbf{\left\langle m^x p^y \right\rangle}}{dt}. \]

Distributing the sum on the right-hand side of Eq. ¿eq:ch3_eq09? gives \[ \begin{split} \frac{d \mathbf{\left\langle m^x p^y \right\rangle}}{dt} &= \mathbf{K} \sum_m \sum_p m^x p^y \mathbf{P}(m, p)\\ &- \mathbf{R}_m \sum_m \sum_p m^x p^y \mathbf{P}(m, p) + \mathbf{R}_m \sum_m \sum_p m^x p^y \mathbf{P}(m-1, p)\\ &- \mathbf{\Gamma}_m \sum_m \sum_p (m) m^x p^y \mathbf{P}(m, p) + \mathbf{\Gamma}_m \sum_m \sum_p (m + 1) m^x p^y \mathbf{P}(m + 1, p)\\ &- \mathbf{R}_p \sum_m \sum_p (m) m^x p^y \mathbf{P}(m, p) + \mathbf{R}_p \sum_m \sum_p (m) m^x p^y \mathbf{P}(m, p - 1)\\ &- \mathbf{\Gamma}_p \sum_m \sum_p (p) m^x p^y \mathbf{P}(m, p) + \mathbf{\Gamma}_p \sum_m \sum_p (p + 1) m^x p^y \mathbf{P}(m, p + 1). \end{split} \label{eq:master_sum} \]

Let us look at each term on the right-hand side individually. For the terms in Eq. \(\ref{eq:master_sum}\) involving \(\mathbf{P}(m, p)\), we can again use Eq. \(\ref{eq:mom_def}\) to rewrite them in a more compact form. This means that we can rewrite the state transition term as \[ \mathbf{K} \sum_m \sum_p m^x p^y \mathbf{P}(m, p) = \mathbf{K} \mathbf{\left\langle m^x p^y \right\rangle}. \] The mRNA production term involving \(\mathbf{P}(m, p)\) can be rewritten as \[ \mathbf{R}_m \sum_m \sum_p m^x p^y \mathbf{P}(m, p) = \mathbf{R}_m \mathbf{\left\langle m^x p^y \right\rangle}. \] In the same way, the mRNA degradation term gives \[ \mathbf{\Gamma}_m \sum_m \sum_p (m) m^x p^y \mathbf{P}(m, p) = \mathbf{\Gamma}_m \mathbf{\left\langle{m^{(x + 1)} p^y}\right\rangle}. \] For the protein production and degradation terms involving \(\mathbf{P}(m, p)\), we have \[ \mathbf{R}_p \sum_m \sum_p (m) m^x p^y \mathbf{P}(m, p) = \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} p^y}\right\rangle}, \] and \[ \mathbf{\Gamma}_p \sum_m \sum_p (p) m^x p^y \mathbf{P}(m, p) = \mathbf{\Gamma}_p \mathbf{\left\langle{m^x p^{(y + 1)}}\right\rangle}, \] respectively.

For the terms of the sum in Eq. \(\ref{eq:master_sum}\) involving \(\mathbf{P}(m \pm 1, p)\) or \(\mathbf{P}(m, p \pm 1)\), we can reindex the sum to work around this mismatch. To be more specific, let us again look at each term case by case. For the mRNA production term involving \(\mathbf{P}(m-1, p)\) we define \(m' \equiv m - 1\). Using this, we write \[ \mathbf{R}_m \sum_m \sum_p m^x p^y \mathbf{P}(m-1, p) = \mathbf{R}_m \sum_{m' = -1}^\infty \sum_p (m' + 1)^x p^y \mathbf{P}(m', p). \] Since having negative numbers of mRNA or protein does not make physical sense, we have that \(\mathbf{P}(-1, p) = 0\). Therefore we can rewrite the sum starting from 0 rather than from -1, obtaining \[ \mathbf{R}_m \sum_{m' = -1}^\infty \sum_p (m' + 1)^x p^y \mathbf{P}(m', p) = \mathbf{R}_m \sum_{m'=0}^\infty \sum_p (m' + 1)^x p^y \mathbf{P}(m', p). \label{eq:reindex} \] Recall that our distribution \(\mathbf{P}(m, p)\) takes \(m\) and \(p\) as numerical inputs and returns a probability associated with such a molecule count. Nevertheless, \(m\) and \(p\) themselves are dimensionless quantities that serve as indices of how many molecules are in the cell. The distribution is the same whether the variable is called \(m\) or \(m'\); for a specific number, let us say \(m = 5\), or \(m' = 5\), \(\mathbf{P}(5, p)\) will return the same result. This means that the variable name is arbitrary, and the right-hand side of Eq. \(\ref{eq:reindex}\) can be written as \[ \mathbf{R}_m \sum_{m'=0}^\infty \sum_p (m' + 1)^x p^y \mathbf{P}(m', p) = \mathbf{R}_m \mathbf{\left\langle{(m+1)^x p^y}\right\rangle}, \] since the left-hand side corresponds to the definition of a moment.

For the mRNA degradation term involving \(\mathbf{P}(m + 1, p)\), we follow a similar procedure in which we define \(m' = m + 1\) to obtain \[ \mathbf{\Gamma}_m \sum_m \sum_p (m + 1) m^x p^y \mathbf{P}(m + 1, p) = \mathbf{\Gamma}_m \sum_{m' = 1}^\infty \sum_p m' (m' - 1)^x p^y \mathbf{P}(m', p). \] Since the term on the right-hand side of the equation is multiplied by \(m'\), starting the sum over \(m'\) from zero rather than one will not affect the result since this factor will not contribute to the total sum. Nevertheless, this is useful since our definition of a moment from Eq. \(\ref{eq:mom_def}\) requires the sum to start at zero. This means that we can rewrite this term as \[ \mathbf{\Gamma}_m \sum_{m' = 1}^\infty m' \sum_p (m' - 1)^x p^y \mathbf{P}(m', p) = \mathbf{\Gamma}_m \sum_{m' = 0}^\infty m' \sum_p (m' - 1)^x p^y \mathbf{P}(m', p). \] Here again, we can change the arbitrary label \(m'\) back to \(m\), obtaining \[ \mathbf{\Gamma}_m \sum_{m' = 0}^\infty m' \sum_p (m' - 1)^x p^y \mathbf{P}(m', p) = \mathbf{\Gamma}_m \mathbf{\left\langle{m (m - 1)^x p^y}\right\rangle}. \]

The protein production term involving \(\mathbf{P}(m, p - 1)\) can be reindexed by defining \(p' \equiv p - 1\). This gives \[ \mathbf{R}_p \sum_m \sum_p (m) m^x p^y \mathbf{P}(m, p - 1) = \mathbf{R}_p \sum_m \sum_{p'=-1}^\infty m^{(x + 1)} (p + 1)^y \mathbf{P}(m, p'). \] We again use the fact that negative molecule copy numbers are assigned with probability zero to begin the sum from 0 rather than -1 and the arbitrary nature of the label \(p'\) to write \[ \mathbf{R}_p \sum_m \sum_{p'=0}^\infty m^{(x + 1)} (p + 1)^y \mathbf{P}(m, p') = \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} (p + 1)^y}\right\rangle}. \] Finally, we take care of the protein degradation term involving \(\mathbf{P}(m, p + 1)\). As before, we define \(p' = p + 1\) and substitute this to obtain \[ \mathbf{\Gamma}_p \sum_m \sum_p (p + 1) m^x p^y \mathbf{P}(m, p + 1) = \mathbf{\Gamma}_p \sum_m \sum_{p'=1}^\infty (p') m^x (p' - 1)^y \mathbf{P}(m, p'). \] Just as with the mRNA degradation term, having a term \(p'\) inside the sum allows us to start the sum over \(p'\) from 0 rather than 1. We can therefore write \[ \mathbf{\Gamma}_p \sum_m \sum_{p'=0}^\infty (p') m^x (p' - 1)^y \mathbf{P}(m, p') = \mathbf{\Gamma}_p \mathbf{\left\langle{m^x p (p - 1)^y}\right\rangle}. \]

Putting all these terms together, we can write the general moment ODE. This is of the form \[ \begin{split} \frac{d\mathbf{\left\langle m^x p^y \right\rangle}}{dt} &= \mathbf{K} \mathbf{\left\langle m^x p^y \right\rangle} \text{ (promoter state transition)}\\ &- \mathbf{R}_m \mathbf{\left\langle m^x p^y \right\rangle} + \mathbf{R}_m \mathbf{\left\langle{(m+1)^x p^y}\right\rangle} \text{ (mRNA production)}\\ &- \mathbf{\Gamma}_m \mathbf{\left\langle{m^{(x + 1)} p^y}\right\rangle} + \mathbf{\Gamma}_m \mathbf{\left\langle{m (m - 1)^x p^y}\right\rangle} \text{ (mRNA degradation)}\\ &- \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} p^y}\right\rangle} + \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} (p + 1)^y}\right\rangle} \text{ (protein production)}\\ &- \mathbf{\Gamma}_p \mathbf{\left\langle{m^x p^{(y + 1)}}\right\rangle} + \mathbf{\Gamma}_p \mathbf{\left\langle{m^x p (p - 1)^y}\right\rangle} \text{ (protein degradation)}. \end{split} \label{eq:mom_ode} \]

Moment Closure of the Simple-Repression Distribution

A very interesting and useful feature of Eq. \(\ref{eq:mom_ode}\) is that for a given value of \(x\) and \(y\), the moment \(\mathbf{\left\langle m^x p^y \right\rangle}\) is only a function of lower moments. Specifically \(\mathbf{\left\langle m^x p^y \right\rangle}\) is a function of moments \(\mathbf{\left\langle{m^{x'} p^{y'}}\right\rangle}\) that satisfy two conditions:

\[ \begin{split} &1) y' \leq y,\\ &2) x' + y' \leq x + y. \end{split} \label{eq:mom_conditions} \]

To prove this, we rewrite Eq. \(\ref{eq:mom_ode}\) as \[ \begin{split} \frac{d\mathbf{\left\langle m^x p^y \right\rangle}}{dt} &= \mathbf{K} \mathbf{\left\langle m^x p^y \right\rangle}\\ &+ \mathbf{R}_m \mathbf{\left\langle{p^y \left[ (m + 1)^x -m^x \right]}\right\rangle}\\ &+ \mathbf{\Gamma}_m \mathbf{\left\langle{m p^y \left[ (m - 1)^x - m^x \right]}\right\rangle}\\ &+ \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ (p + 1)^y - p^y \right]}\right\rangle}\\ &+ \mathbf{\Gamma}_p \mathbf{\left\langle{m^x p \left[ (p - 1)^y - p^y \right]}\right\rangle}, \end{split} \label{eq:mom_ode_factorized} \] where the factorization is valid given the linearity of expected values. The objective is to find the highest moment for each term once the relevant binomial, such as \((m-1)^x\), is expanded. Take, for example, a simple case in which we want to find the second moment of the mRNA distribution. We then set \(x = 2\) and \(y = 0\). Eq. \(\ref{eq:mom_ode_factorized}\) then becomes \[ \begin{split} \frac{\mathbf{\left\langle{m^2 p^0}\right\rangle}}{dt} &= \mathbf{K} \mathbf{\left\langle{m^2 p^0}\right\rangle}\\ &+ \mathbf{R}_m \mathbf{\left\langle{p^0 \left[ (m + 1)^2 - m^2 \right]}\right\rangle}\\ &+ \mathbf{\Gamma}_m \mathbf{\left\langle{m p^0 \left[ (m - 1)^2 - m^2 \right]}\right\rangle}\\ &+ \mathbf{R}_p \mathbf{\left\langle{m^{(2 + 1)} \left[ (p + 1)^0 - p^0 \right]}\right\rangle}\\ &+ \mathbf{\Gamma}_p \mathbf{\left\langle{m^2 p \left[ (p - 1)^0 - p^0 \right]}\right\rangle}. \end{split} \] Simplifying this equation gives \[ \frac{d \mathbf{\left\langle{m^2}\right\rangle}}{dt} = \mathbf{K} \mathbf{\left\langle{m^2}\right\rangle} + \mathbf{R}_m \mathbf{\left\langle{\left[ 2m + 1 \right]}\right\rangle} + \mathbf{\Gamma}_m \mathbf{\left\langle{\left[- 2m^2 + m \right]}\right\rangle}. \label{eq:second_mom_mRNA} \]

Eq. \(\ref{eq:second_mom_mRNA}\) satisfies both of our conditions. Since we set \(y\) to be zero, none of the terms depend on any moment that involves the protein number. Therefore \(y' \leq y\) is satisfied. Also, the highest moment in Eq. \(\ref{eq:second_mom_mRNA}\) also satisfies \(x' + y' \leq x + y\) since the second moment of mRNA does not depend on any moment higher than \(\mathbf{\left\langle{m^2}\right\rangle}\). To demonstrate that this is true for any \(x\) and \(y\), we now rewrite Eq. \(\ref{eq:mom_ode_factorized}\), making use of the binomial expansion \[ (z \pm 1)^n = \sum_{k=0}^n {n \choose k} (\pm 1)^{k} z^{n-k}. \] Just as before, let us look at each term individually. For the mRNA production term, we have \[ \mathbf{R}_m \mathbf{\left\langle{p^y \left[ (m + 1)^x -m^x \right]}\right\rangle} = \mathbf{R}_m \mathbf{\left\langle{p^y \left[ \sum_{k=0}^x {x \choose k} m^{x-k} - m^x \right]}\right\rangle}. \] When \(k = 0\), the term inside the sum on the right-hand side cancels with the other \(m^x\) so that we can simplify to \[ \mathbf{R}_m \mathbf{\left\langle{p^y \left[ (m + 1)^x -m^x \right]}\right\rangle} = \mathbf{R}_m \mathbf{\left\langle{p^y \left[ \sum_{k=1}^x {x \choose k} m^{x-k} \right]}\right\rangle}. \] Once the sum is expanded, we can see that the highest moment in this sum is given by \(\mathbf{\left\langle{m^{(x-1)} p^y}\right\rangle}\) which satisfies both of the conditions on Eq. \(\ref{eq:mom_conditions}\).

For the mRNA degradation term, we similarly have \[ \mathbf{\Gamma}_m \mathbf{\left\langle{m p^y \left[ (m - 1)^x - m^x \right]}\right\rangle} = \mathbf{\Gamma}_m \mathbf{\left\langle{m p^y \left[ \sum_{k=0}^x {x \choose k}(-1)^k m^{x-k} - m^x \right]}\right\rangle}. \] Simplifying terms, we obtain \[ \mathbf{\Gamma}_m \mathbf{\left\langle{m p^y \left[ \sum_{k=0}^x {x \choose k}(-1)^k m^{x-k} - m^x \right]}\right\rangle} = \mathbf{\Gamma}_m \mathbf{\left\langle{p^y \left[ \sum_{k=1}^x {x \choose k}(-1)^k m^{x+1-k} \right]}\right\rangle}. \] The largest moment in this case is \(\mathbf{\left\langle{m^x p^y}\right\rangle}\), which again satisfies the conditions on Eq. \(\ref{eq:mom_conditions}\).

The protein production term gives \[ \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ (p + 1)^y - p^y \right]}\right\rangle} = \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ \sum_{k=0}^y {y \choose k} (-1)^k p^{y-k} - p^y \right]}\right\rangle}. \] Upon simplification, we obtain \[ \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ \sum_{k=0}^y {y \choose k} (-1)^k p^{y-k} - p^y \right]}\right\rangle} = \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ \sum_{k=1}^y {y \choose k}(-1)^k p^{y-k} \right]}\right\rangle}. \] Here the largest moment is given by \(\mathbf{\left\langle{m^{x+1} p^{y-1}}\right\rangle}\), that again satisfies both of our conditions. For the last term, for protein degradation, we have \[ \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ (p + 1)^y - p^y \right]}\right\rangle} = \mathbf{R}_p \mathbf{\left\langle{m^{(x + 1)} \left[ \sum_{k=1}^y {y \choose k} (-1^k) p^{y - k} \right]}\right\rangle}. \] The largest moment involved in this term is therefore \(\mathbf{\left\langle{m^x p^{y-1}}\right\rangle}\). With this, we show that the four terms involved in our general moment equation depend only on lower moments that satisfy Eq. \(\ref{eq:mom_conditions}\).

As a reminder, we showed in this section that the kinetic model introduced in Fig. ¿fig:ch3_fig02?(A) has no moment-closure problem. In other words, moments of the joint mRNA and protein distribution can be computed from knowledge of lower moments. This allows us to cleanly integrate the distribution moment dynamics as cells progress through the cell cycle.

Computing Single Promoter Steady-State Moments

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

One of the main factors contributing to cell-to-cell variability in gene expression is the change in gene copy number during the cell cycle as cells replicate their genome before cell division. Our minimal model accounts for this variability by considering the time trajectory of the distribution moments given by Eq. \(\ref{eq:mom_ode_factorized}\). These predictions will be contrasted with the predictions from a kinetic model that does not account for gene copy numbers changes during the cell cycle in Sec. 4.4.

Suppose we do not account for the change in gene copy number during the cell cycle or the partition of proteins during division. In that case, the dynamics of the moments of the distribution described in this section will reach a steady state. To compute the kinetic model’s steady-state moments with a single gene across the cell cycle, we use the moment closure property of our master equation. By equating Eq. \(\ref{eq:mom_ode_factorized}\) to zero for a given \(\mathbf{x}\) and \(\mathbf{y}\), we can solve the resulting linear system and obtain a solution for \(\mathbf{\left\langle m^x p^y \right\rangle}\) at steady state as a function of moments \(\mathbf{\left\langle{m^{x'} p^{y'}}\right\rangle}\) that satisfy Eq. \(\ref{eq:mom_conditions}\). Then, by solving for the zero\(^\text{th}\) moment \(\mathbf{\left\langle{m^0 p^0}\right\rangle}\) subject to the constraint that the probability of the promoter being in any state should add up to one, we can substitute back all of the solutions in terms of moments \(\mathbf{\left\langle{m^{x'} p^{y'}}\right\rangle}\) with solutions in terms of the rates shown in Fig. ¿fig:ch3_fig02?. In other words, through an iterative process, we can get at the value of any moment of the distribution. We start by solving for the zero\(^\text{th}\) moment. Since all higher moments depend on lower moments, we can use the solution of the zero\(^\text{th}\) moment to compute the first mRNA moment. This solution is then used for higher moments in a hierarchical iterative process.

Accounting for the Variability in Gene Copy Number During the Cell Cycle

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

When growing in rich media, bacteria can double every \(\approx\) 20 minutes. With two replication forks, each traveling at \(\approx\) 1000 bp per second, and a genome of \(\approx\) 5 Mbp for E. coli  [23], a cell would need \(\approx\) 40 minutes to replicate its genome. The apparent paradox of growth rates faster than one division per 40 minutes is solved because cells have multiple replisomes, i.e., molecular machines that replicate the genome running in parallel. Cells can have up to 8 copies of the genome being replicated simultaneously, depending on the growth rate  [14].

This observation implies that during the cell cycle, gene copy number varies. This variation depends on the growth rate and the relative position of the gene with respect to the replication origin, having genes close to the replication origin spending more time with multiple copies than genes closer to the replication termination site. This change in gene dosage directly affects cell-to-cell variability in gene expression  [4,13].

Numerical Integration of Moment Equations

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook)

For our specific locus (galK) and a doubling time of \(\approx\) 60 min for our experimental conditions, cells have on average 1.66 copies of the reporter gene during the cell cycle  [4]. This means that cells spend 60% of the time having one copy of the gene and 40% of the time with two copies. To account for this variability in gene copy number across the cell cycle, we numerically integrate the moment equations derived in for a time \(t = [0, t_s]\) with an mRNA production rate \(r_m\), where \(t_s\) is the time point at which the replication fork reaches our specific locus. For the remaining time before the cell division \(t = [t_s, t_d]\) that the cell spends with two promoters, we assume that the only parameter that changes is the mRNA production rate from \(r_m\) to \(2 r_m\). This simplifying assumption ignores potential changes in protein translation rate \(r_p\) or changes in the repressor copy number that would be reflected in changes on the repressor on rate \(k^{(r)}_{\text{on}}\).

Computing Distribution Moments After Cell Division

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook)

We have already solved a general form for the dynamics of the moments of the distribution, i.e., we wrote differential equations for the moments \(\frac{d\left\langle{m^x p^y}\right\rangle}{dt}\). Given that we know all parameters for our model, we can numerically integrate these equations to compute how the distribution moments evolve as cells progress through their cell cycle. Once the cell reaches a time \(t_d\) and divides, the mRNA and proteins that we are interested in undergo a binomial partitioning between the two daughter cells. In other words, each molecule flips a coin to decide to which daughter cell to go. The question then becomes given that we have a value for the moment \(\left\langle m^x p^y \right\rangle_{t_d}\) at a time before the cell division, what would the value of this moment be after the cell division takes place \(\left\langle m^x p^y \right\rangle_{t_o}\)?

The probability distribution of mRNA and protein after the cell division \(P_{t_o}(m, p)\) must satisfy \[ P_{t_o}(m, p) = \sum_{m'=m}^\infty \sum_{p'=p}^\infty P(m, p \mid m', p') P_{t_d}(m', p'), \label{eq_dist_post_div} \] where we are summing over all the possibilities of having \(m'\) mRNA and \(p'\) proteins before cell division. Note that the sums start at \(m\) and \(p\); this is because for a daughter cell to have these copy numbers before cell division, it is a requirement that the mother cell had at least such a copy number since we are not assuming that there is any production at the instantaneous cell division time. Since we assume that the partition of mRNA is independent of the partition of protein, the conditional probability \(P(m, p \mid m', p')\) is given by a product of two binomial distributions, one for the mRNA and one for the protein, i.e. \[ P(m, p \mid m', p') = {m' \choose m} \left( \frac{1}{2} \right)^{m'} \cdot {p' \choose p} \left( \frac{1}{2} \right)^{p'}. \label{eq_binom_prod} \] Because of this product of binomial probabilities, we are allowed to extend the sum from Eq. \(\ref{eq_dist_post_div}\) to start at \(m'=0\) and \(p'=0\) as \[ P_{t_o}(m, p) = \sum_{m'=0}^\infty \sum_{p'=0}^\infty P(m, p \mid m', p') P_{t_d}(m', p'), \] since the product of the binomial distributions in Eq. \(\ref{eq_binom_prod}\) is zero for all \(m' < m\) and/or \(p' < 0\). Thus, to simplify notation, from now on in this section we will assume that a sum of the form \(\sum_x \equiv \sum_{x=0}^\infty\) .

We can then compute the distribution moments after the cell division \(\left\langle{m^x p^y}\right\rangle_{t_o}\) as \[ \left\langle m^x p^y \right\rangle_{t_o} = \sum_m \sum_p m^x p^y P_{t_o}(m, p), \] for all \(x, y \in \mathbb{N}\). Substituting Eq. \(\ref{eq_dist_post_div}\) results in \[ \left\langle m^x p^y \right\rangle_{t_o} = \sum_m \sum_p m^x p^y \sum_{m'} \sum_{p'} P(m, p \mid m', p') P_{t_d}(m', p'). \] We can rearrange the sums to be \[ \left\langle m^x p^y \right\rangle_{t_o} = \sum_{m'} \sum_{p'} P_{t_d}(m', p') \sum_m \sum_p m^x p^y P(m, p \mid m', p'). \] The fact that Eq. \(\ref{eq_binom_prod}\) is the product of two independent events allows us to rewrite the joint probability \(P(m, p \mid m', p')\) as \[ P(m, p \mid m', p') = P(m \mid m') \cdot P(p \mid p'). \] With this, we can then write the moment \(\left\langle m^x p^y \right\rangle_{t_o}\) as \[ \left\langle m^x p^y \right\rangle_{t_o} = \sum_{m'} \sum_{p'} P_{t_d}(m', p') \sum_m m^x P(m \mid m') \sum_p p^y P(p \mid p'). \] Notice that both terms summing over \(m\) and \(p\) are the conditional expected values, i.e. \[ \sum_z z^x P(z \mid z') \equiv \left\langle{z^x \mid z'}\right\rangle, \; {\text{ for } z\in \{m, p \}}. \] These conditional expected values are the expected values of a binomial random variable \(z \sim \text{Bin}(z', 1/2)\), which can be easily computed, as we will show later in this section. We then rewrite the expected values after the cell division in terms of these moments of a binomial distribution \[ \left\langle m^x p^y \right\rangle_{t_o} = \sum_{m'} \sum_{p'} \left\langle{m^x \mid m'}\right\rangle \left\langle{p^y \mid p'}\right\rangle P_{t_d}(m', p'). \label{eq_general_binom_mom} \]

To see how this general formula for the moments after the cell division works, let us compute the mean protein per cell after the cell division \(\left\langle{p}\right\rangle_{t_o}\). That means setting \(x = 0\), and \(y = 1\). This results in \[ \left\langle{p}\right\rangle_{t_o} = \sum_{m'} \sum_{p'} \left\langle{m^0 \mid m'}\right\rangle \left\langle{p \mid p'}\right\rangle P_{t_d}(m', p'). \] The zeroth moment \(\left\langle{m^0 \mid m'}\right\rangle\) by definition must be one, i.e., \[ \left\langle{m^0 \mid m'}\right\rangle = \sum_m m^0 P(m \mid m') = \sum_m P(m \mid m') = 1, \] since the probability distribution must be normalized. This leaves us then with \[ \left\langle{p}\right\rangle_{t_o} = \sum_{m'} \sum_{p'} P_{t_d}(m', p') \left\langle p \mid p' \right\rangle. \] If we take the sum over \(m'\), we simply compute the marginal probability distribution \(\sum_{m'} P_{t_d}(m', p') = P_{t_d}(p')\); then we have \[ \left\langle p \right\rangle_{t_o} = \sum_{p'} \left\langle p \mid p' \right\rangle P_{t_d}(p'). \] For the particular case of the first moment of the binomial distribution with parameters \(p'\) and \(1/2\), we know that \[ \left\langle p \mid p' \right\rangle = \frac{p'}{2}. \] Therefore the moment after division is equal to \[ \left\langle p \right\rangle_{t_o} = \sum_{p'} \frac{p'}{2} P_{t_d}(p') = \frac{1}{2} \sum_{p'} p' P_{t_d}(p'). \] Notice that this is just 1/2 of the expected value of \(p'\) averaging over the distribution before cell division, i.e. \[ \left\langle p \right\rangle_{t_o} = \frac{\left\langle{p'}\right\rangle_{t_d}}{2}, \] where \(\left\langle{\cdot}\right\rangle_{t_d}\) highlights that the moment is computed prior to the cell division. This result makes perfect sense. What this is saying is that the mean protein copy number right after the cell divides is half of the mean protein copy number just before the cell division. That is exactly what we would expect. So, in principle, to know the first moment of either the mRNA distribution \(\langle m \rangle_{t_o}\) or the protein distribution \(\langle m \rangle_{t_o}\) right after cell division, it suffices to multiply the moments before the cell division \(\langle m \rangle_{t_d}\) or \(\left\langle p \right\rangle_{t_d}\) by 1/2. Let us now explore how this generalizes to any other moment \(\left\langle m^x p^y \right\rangle_{t_o}\).

Computing the Moments of a Binomial Distribution

The last section’s result depended on us knowing the functional form of the first moment of the binomial distribution. For higher moments, we need some systematic way to compute such moments. Luckily for us, we can do so by using the so-called moment generating function (MGF). The MGF of a random variable \(X\) is defined as \[ M_X(t) = \left\langle{e^{tX}}\right\rangle, \] where \(t\) is a dummy variable. Once we know the MGF, we can obtain any moment of the distribution by simply computing \[ \left\langle{X^n}\right\rangle = \left. \frac{d^n}{dt^n} M_X(t) \right\vert_{t=0}, \label{eq_mgf_def} \] i.e., taking the \(n\)-th derivative of the MGF returns the \(n\)-th moment of the distribution. For the particular case of the binomial distribution \(X \sim \text{Bin}(N, q)\), it can be shown that the MGF is of the form \[ M_X(t) = \left[ (1 - q) + qe^{t} \right]^N. \] As an example, let us compute the first moment of this binomially distributed variable. For this, the first derivative of the MGF results in \[ \frac{d M_X(t)}{dt} = N [(1 - q) + qe^t]^{N - 1} q e^t. \] We just need to follow Eq. \(\ref{eq_mgf_def}\) and set \(t = 0\) to obtain the first moment \[ \left. \frac{d M_X(t)}{dt} \right\vert_{t=0} = N q, \label{eq_mgf_mean} \] which is precisely the expected value of a binomially distributed random variable.

So according to Eq. \(\ref{eq_general_binom_mom}\), to compute any moment \(\left\langle{m^x p^y}\right\rangle\) after cell division, we can just take the \(x\)-th derivative and the \(y\)-th derivative of the binomial MGF to obtain \(\left\langle{m^x \mid m'}\right\rangle\) and \(\left\langle{p^y \mid p'}\right\rangle\), respectively, and take the expected value of the result. Let us compute the specific case for the moment \(\left\langle{m p}\right\rangle\) to illustrate the procedure. When computing the moment after cell division \(\left\langle{mp}\right\rangle_{t_o}\) which is of the form \[ \left\langle{mp}\right\rangle_{t_o} = \sum_{m'} \sum{p'} \left\langle{m \mid m'}\right\rangle \left\langle p \mid p' \right\rangle P_{t_d}(m', p'), \] the product \(\left\langle{m \mid m'}\right\rangle \left\langle p \mid p' \right\rangle\) is then \[ \left\langle{m \mid m'}\right\rangle \left\langle p \mid p' \right\rangle = \frac{m'}{2} \cdot \frac{p'}{2}, \] where we used the result in Eq. \(\ref{eq_mgf_mean}\), substituting \(m\) and \(p\) for \(X\), respectively, and \(q\) for 1/2. Substituting this result into the moment gives \[ \left\langle{mp}\right\rangle_{t_o} = \sum_{m'} \sum_{p'} \frac{m' p'}{4} P_{t_d}(m', p') = \frac{\left\langle{m' p'}\right\rangle_{t_d}}{4}. \] Therefore to compute the moment after cell division \(\left\langle{mp}\right\rangle_{t_o}\) we simply have to divide by 4 the corresponding equivalent moment before the cell division.

Not all moments after cell division depend only on the equivalent moment before cell division. For example, if we compute the third moment of the protein distribution \(\left\langle{p^3}\right\rangle_{t_o}\), we find \[ \left\langle{p^3}\right\rangle_{t_o} = \frac{\left\langle{p^3}\right\rangle_{t_d}}{8} + \frac{3 \left\langle{p^2}\right\rangle_{t_d}}{8}. \] For this particular case, the third moment of the protein distribution depends on the third moment and the second moment before the cell division. In general, all moments after cell division \(\left\langle m^x p^y \right\rangle_{t_o}\) linearly depend on moments before cell division. Furthermore, there is “moment closure” for this specific case in the sense that all moments after cell division depend on lower moments before cell division. To generalize these results to all the moments computed in this work, let us then define a vector to collect all moments before the cell division up the \(\left\langle m^x p^y \right\rangle_{t_d}\) moment, i.e. \[ \mathbf{\left\langle m^x p^y \right\rangle}_{t_d} = \left( \left\langle{m^0 p^0}\right\rangle_{t_d}, \left\langle{m^1}\right\rangle_{t_d}, \ldots , \left\langle m^x p^y \right\rangle_{t_d} \right). \] Then any moment after cell division \(\left\langle{m^{x'} p^{y'}}\right\rangle_{t_o}\) for \(x' \leq x\) and \(y' \leq y\) can be computed as \[ \left\langle{m^{x'} p^{y'}}\right\rangle_{t_o} = \mathbf{z}_{x'y'} \cdot \mathbf{\left\langle m^x p^y \right\rangle}_{t_d}, \] where we define the vector \(\mathbf{z}_{x'y'}\) as the vector containing all the coefficients that we obtain with the product of the two binomial distributions. For example, for the case of the third protein moment \(\left\langle{p^3}\right\rangle_{t_o}\) the vector \(\mathbf{z}_{x'y'}\) would have zeros for all entries except for the corresponding entry for \(\left\langle{p^2}\right\rangle_{t_d}\) and for \(\left\langle{p^3}\right\rangle_{t_d}\), where it would have \(3/8\) and \(1/8\) accordingly.

If we want then to compute all the moments after the cell division up to \(\left\langle m^x p^y \right\rangle_{t_o}\), let us define an equivalent vector \[ \mathbf{\left\langle m^x p^y \right\rangle}_{t_o} = \left( \left\langle{m^0 p^0}\right\rangle_{t_o}, \left\langle{m^1}\right\rangle_{t_o}, \ldots , \left\langle m^x p^y \right\rangle_{t_o} \right). \] Then we need to build a square matrix \(\mathbf{Z}\) such that each row of the matrix contains the corresponding vector \(\mathbf{z}_{x' y'}\) for each of the moments. Having this matrix, we would simply compute the moments after the cell division as \[ \mathbf{\left\langle{m^x p^x}\right\rangle}_{t_o} = \mathbf{Z} \cdot \mathbf{\left\langle{m^x p^x}\right\rangle}_{t_d}. \] In other words, the matrix \(\mathbf{Z}\) will contain all the coefficients that we need to multiply by the moments before the cell division to obtain the moments after cell division. The matrix \(\mathbf{Z}\) was then generated automatically using Python’s analytical math library sympy  [24].

Fig. 8 (adapted from Fig. ¿fig:ch3_fig03?(B)) shows how the first moment of both mRNA and protein changes over several cell cycles. The mRNA quickly relaxes to the steady-state corresponding to the parameters for both a single and two promoters copies. This is expected since the parameters for the mRNA production was determined in the first place under this assumption. We note that there is no apparent delay before reaching the steady-state of the mean mRNA count after the cell divides. This is because the mean mRNA count for the two promoter copies state is precisely twice the expected mRNA count for the single promoter state (see Sec. 5.1). Therefore, once the mean mRNA count is halved after the cell division, it is already at the steady-state value for the single promoter case. On the other hand, given that the degradation rate determines the relaxation time to steady-state, the mean protein count does not reach its corresponding steady-state value for either promoter copy number state. Interestingly, once a couple of cell cycles have passed, the first moment has a repetitive trajectory over cell cycles. We have observed this experimentally by tracking cells as they grow under the microscope. Comparing cells at the beginning of the cell cycle with the daughter cells that appear after cell division showed that, on average, all cells have the same amount of protein at the start of the cell cycle (see Fig. 18 of  [25]), suggesting that this dynamical steady state takes place in vivo.

Figure 8: First and second moment dynamics over the cell cycle. Mean \pm standard deviation mRNA (upper panel) and mean \pm standard deviation protein copy number (lower panel) as the cell cycle progresses. The dark shaded region delimits the fraction of the cell cycle that cells spend with a single copy of the promoter. The light-shaded region delimits the fraction of the cell cycle that cells spend with two copies of the promoter. For a 100 min doubling time at the galK locus, cells spend 60% of the time with one copy of the promoter and the rest with two copies. The Python code (ch5_fig08.py) used to generate this figure can be found on the original paper’s GitHub repository.

When measuring gene expression levels experimentally from an asynchronous culture, cells are sampled from any time point across their cell cycles. This means that the moments determined experimentally correspond to an average over the cell cycle. In the following section, we discuss how to account for the fact that cells are not uniformly distributed across the cell cycle to compute these averages.

Exponentially Distributed Ages

As mentioned in Sec. 5.2, cells in exponential growth have exponentially distributed ages across the cell cycle, having more young cells than old ones. Specifically, the probability of a cell being at any time point in the cell cycle is given by  [15] \[ P(a) = (\ln 2) \cdot 2^{1 - a}, \label{seq_age_prob} \] where \(a \in [0, 1]\) is the stage of the cell cycle, with \(a = 0\) being the start of the cycle and \(a = 1\) being the cell division. In Sec. 5.10, we reproduce this derivation. It is a surprising result, but it can be intuitively thought as follows: if the culture is growing exponentially, that means that all the time, there is an increasing number of cells. That means, for example, that if in a time interval \(\Delta t\) \(N\) “old” cells divided, these produced \(2N\) “young” cells. So at any point, there are always more younger than older cells.

Our numerical integration of the moment equations gave us a time evolution of the moments as cells progress through the cell cycle. Since experimentally we sample asynchronous cells that follow Eq. \(\ref{seq_age_prob}\), each time point along the moment dynamic must be weighted by the probability of having sampled a cell at such a specific time point of the cell cycle. Without loss of generality, let us focus on the first mRNA moment \(\left\langle{m(t)}\right\rangle\) (the same can be applied to all other moments). As mentioned before, to calculate the first moment across the entire cell cycle, we must weigh each time point by the corresponding probability that a cell is found at such a point of its cell cycle. This translates to computing the integral \[ \langle m \rangle_c = \int_{\text{beginning cell cycle}}^{\text{end cell cycle}} \left\langle{m(t)}\right\rangle P(t) dt, \] where \(\langle m \rangle_c\) is the mean mRNA copy number averaged over the entire cell cycle trajectory, and \(P(t)\) is the probability of a cell being at a time \(t\) of its cell cycle.

If we set the time in units of the cell cycle length, we can use Eq. \(\ref{seq_age_prob}\) and compute instead \[ \langle m \rangle = \int_0^1 \left\langle{m(a)}\right\rangle P(a) da, \label{seq_moment_avg} \] where \(P(a)\) is given by Eq. \(\ref{seq_age_prob}\).

What Eq. \(\ref{seq_moment_avg}\) implies is that to compute the first moment (or any moment of the distribution), we must weigh each point in the moment dynamics by the corresponding probability of a cell being at that point along its cell cycle. That is why when computing a moment, we take the time trajectory of a single cell cycle as the ones shown in Fig. 8 and compute the average using Eq. \(\ref{seq_age_prob}\) to weigh each time point. We perform this integral numerically for all moments using Simpson’s rule.

Reproducing the Equilibrium Picture

Given the large variability of the first moments depicted in Fig. 8, it is worth considering why a simplistic equilibrium picture has shown to be very successful in predicting the mean expression level under diverse conditions  [57,26]. This section compares the simple repression thermodynamic model with this dynamical picture of the cell cycle. But before diving into this comparison, it is worth recapping the assumptions that go into the equilibrium model.

Steady-State Under the Thermodynamic Model

For the thermodynamic model we can only describe the first moment’s dynamics using this theoretical framework  [18]. This is because these models are based on the probability distribution of the promoter microstates rather than the distribution over the mRNA and protein counts. Let us only focus on the mRNA first moment \(\langle m \rangle\). The same principles apply if we consider the protein first moment. We can write a dynamical system of the form \[ \frac{d \langle m \rangle}{dt} = r_m \cdot p_{\text{bound}} - \gamma _m \langle m \rangle, \] where \(r_m\) and \(\gamma _m\) are the mRNA production and degradation rates, respectively, and \(p_{\text{bound}}\) is the probability of finding the RNAP bound to the promoter  [10]. This dynamical system is predicted to have a single stable fixed point that we can find by computing the steady-state. When we solve for the mean mRNA copy number at steady-state \(\langle m \rangle_{ss}\), we find \[ \langle m \rangle_{ss} = \frac{r_m}{\gamma _m} p_{\text{bound}}. \]

Since we assume that the only effect that the repressor has over the promoter’s regulation is the exclusion of the RNAP from binding to the promoter, we assume that only \(p_{\text{bound}}\) depends on the repressor copy number \(R\). Therefore when computing the fold-change in gene expression, we are left with \[ \text{fold-change} = \frac{\left\langle{m (R \neq 0)}\right\rangle_{ss}}{ \left\langle{m (R = 0)}\right\rangle_{ss}} = \frac{p_{\text{bound}} (R \neq 0)}{p_{\text{bound}} (R = 0)}. \] As derived in  [5], this can be written in the language of equilibrium statistical mechanics as \[ \text{fold-change} = \left(1 + \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_r} \right)^{-1}, \label{seq_fold_change_thermo} \] where \(\beta \equiv (k_BT)^{-1}\), \(\Delta\varepsilon_r\) is the repressor-DNA binding energy, and \(N_{NS}\) is the number of non-specific binding sites where the repressor can bind.

To arrive at Eq. \(\ref{seq_fold_change_thermo}\), we ignore the physiological changes that occur during the cell cycle; one of the most important being the variability in gene copy number that we are exploring in this section. It is, therefore, worth thinking about whether or not the dynamical picture exemplified in Fig. 8 can be reconciled with the predictions made by Eq. \(\ref{seq_fold_change_thermo}\) both at the mRNA and protein level.

Fig. 9 compares the predictions of both theoretical frameworks for varying repressor copy numbers and repressor-DNA affinities. The solid lines are directly computed from Eq. \(\ref{seq_fold_change_thermo}\). The hollow triangles and the solid circles represent the fold-change in mRNA and protein, respectively, as computed from the moment dynamics. To compute the fold-change from the kinetic picture, we first numerically integrate the moment dynamics for both the two- and the three-state promoter (see Fig. 8 for the unregulated case), and then average the time series accounting for the probability of cells being sampled at each stage of the cell cycle as defined in Eq. \(\ref{seq_moment_avg}\). The small systematic deviations between both models come partly from the simplifying assumption that the repressor copy number, and therefore the repressor on rate \(k^{(r)}_{\text{on}}\) remains constant during the cell cycle. In principle, the gene producing the repressor protein itself is also subjected to the same duplication during the cell cycle, changing, therefore, the mean repressor copy number for both stages.

Figure 9: Comparison of the equilibrium and kinetic repressor titration predictions. The equilibrium model (solid lines) and the kinetic model with variation over the cell cycle (solid circles and white triangles) predictions are compared for varying repressor copy numbers and operator binding energy. The equilibrium model is directly computed from Eq. \ref{seq_fold_change_thermo}, while the kinetic model is computed by numerically integrating the moment equations over several cell cycles, and then averaging over the extent of the cell cycle as defined in Eq. \ref{seq_moment_avg} . The Python code (ch5_fig09.py) used to generate this figure can be found on the original paper’s GitHub repository.

For completeness, Fig. 10 compares the kinetic and equilibrium models for the extended model of  [7] in which the inducer concentration enters into the equation. The solid line is directly computed from Eq. 2.5. The hollow triangles and solid points follow the same procedure as for Fig. 9, where the only effect that the inducer is assumed to have in the kinetics is an effective change in the number of active repressors, affecting, therefore, \(k^{(r)}_{\text{on}}\).

Figure 10: Comparison of the equilibrium and kinetic inducer titration predictions. The equilibrium model (solid lines) and the kinetic model with variation over the cell cycle (solid circles and white triangles) predictions are compared for varying repressor copy numbers and inducer concentrations. The equilibrium model is directly computed as Eq. 5 of reference  [7] with repressor-DNA binding energy \Delta\varepsilon_r = -13.5 \; k_BT, while the kinetic model is computed by numerically integrating the moment dynamics over several cell cycles, and then averaging over the extent of a single cell cycle as defined in Eq. \ref{seq_moment_avg} The Python code (ch5_fig10.py) used to generate this figure can be found on the original paper’s GitHub repository.

Comparison Between Single- and Multi-Promoter Kinetic Model

After these calculations, it is worth questioning whether this change in gene dosage is drastically different from the more straightforward picture of a kinetic model that ignores the gene copy number variability during the cell cycle. To this end, we systematically computed the average moments for varying repressor copy numbers and repressor-DNA affinities. We then compare these results with the moments obtained from a single-promoter model and their corresponding parameters. The derivation of the steady-state moments of the distribution for the single-promoter model is detailed in Sec. 4.3.

Fig. 9 and Fig. 10 both suggest that, since the dynamic multi-promoter model can reproduce the results of the equilibrium model at the first-moment level, it must then also be able to reproduce the results of the single-promoter model at this level (see Sec. 5.2). The interesting comparison comes with higher moments. A useful metric to consider for gene expression variability is the noise in gene expression  [3]. This quantity, defined as the standard deviation divided by the mean, is a dimensionless metric of how much variability there is with respect to the mean of a distribution. As we will show below, this quantity differs from the commonly used metric known as the Fano factor (variance/mean). For experimentally determined expression levels in arbitrary fluorescent units, the noise is a dimensionless quantity while the Fano factor is not.

Fig. 11 shows the comparison of the predicted protein noise between the single- (dashed lines) and the multi-promoter model (solid lines) for different operators and repressor copy numbers. A striking difference between both is that the single-promoter model predicts that, as the inducer concentration increases, the standard deviation grows much slower than the mean, giving a very small noise. In comparison, the multi-promoter model has a much higher floor for the lowest value of the noise, reflecting the expected result that the variability in gene copy number across the cell cycle should increase the cell-to-cell variability in gene expression  [4,13]

Figure 11: Comparison of the predicted protein noise between a single- and a multi-promoter kinetic model. Comparison of the noise (standard deviation/mean) between a kinetic model that considers a single promoter at all times (dashed line) and the multi-promoter model developed in this section (solid line) for different repressor operators. (A) Operator O1, \Delta\varepsilon_r = -15.3 \; k_BT, (B) O2, \Delta\varepsilon_r = -13.9 \; k_BT, (C) O3, \Delta\varepsilon_r = -9.7 \; k_BT. The Python code (ch5_fig11.py) used to generate this figure can be found on the original paper’s GitHub repository.

Comparison with Experimental Data

Having shown that the kinetic model presented in this section can not only reproduce the results from the equilibrium picture at the mean level (see Fig. 9 and Fig. 10), but make predictions for the cell-to-cell variability as quantified by the noise (see Fig. 11), we can assess whether or not this model can predict experimental measurements of the noise. For this, we take the single-cell intensity measurements (see Methods) to compute the noise at the protein level.

This metric differs from the Fano factor since the noise is a dimensionless quantity for arbitrary fluorescent units. To see why, consider that the noise is defined as \[ \text{noise} \equiv \frac{\sqrt{\left\langle p^2 \right\rangle - \left\langle p \right\rangle^2}} {\left\langle p \right\rangle}. \label{seq_noise_protein} \] We assume that the intensity level of a cell \(I\) is linearly proportional to the absolute protein count, i.e., \[ I = \alpha p, \label{seq_calibration_factor} \] where \(\alpha\) is the proportionality constant between arbitrary units and absolute protein number \(p\). Substituting this definition on Eq. \(\ref{seq_noise_protein}\) gives \[ \text{noise} = \frac{\sqrt{\left\langle{(\alpha I)^2}\right\rangle - \left\langle{\alpha I}\right\rangle^2}}{ \left\langle{\alpha I}\right\rangle}. \]

Since \(\alpha\) is a constant, it can be taken out of the average operator \(\left\langle{\cdot}\right\rangle\), obtaining \[ \text{noise} = \frac{\sqrt{\alpha^2 \left(\left\langle{I^2}\right\rangle - \left\langle{I}\right\rangle^2 \right)}}{ \alpha \left\langle{I}\right\rangle} = \frac{\sqrt{\left(\left\langle{I^2}\right\rangle - \left\langle{I}\right\rangle^2 \right)}}{ \left\langle{I}\right\rangle}. \]

Notice that in Eq. \(\ref{seq_calibration_factor}\), the linear proportionality between intensity and protein count has no intercept. This ignores the autofluorescence that cells without reporter would generate. To account for this, in practice, we compute \[ \text{noise} = \frac{\sqrt{\left(\left\langle{(I - \left\langle {I_\text{auto}}\right\rangle)^2}\right\rangle - \left\langle{I - \left\langle{I_\text{auto}}\right\rangle}\right\rangle^2 \right)}}{ \left\langle{I - \left\langle{I_\text{auto}}\right\rangle}\right\rangle}. \] where \(I\) is the intensity of the strain of interest and \(\left\langle{I_\text{auto}}\right\rangle\) is the mean autofluorescence intensity, obtained from a strain that does not carry the fluorescent reporter gene.

Fig. 12 shows the comparison between theoretical predictions and experimental measurements for the unregulated promoter. The reason we split the data by operator despite the fact that, since these are unregulated promoters, they should, in principle, have identical expression profiles, is to make sure that this is the case precisely. We have found in the past that sequences downstream of the RNAP binding site can affect the expression level of constitutively expressed genes. We can see that both models, the single-promoter (gray dotted line) and the multi-promoter (black dashed line), underestimate the experimental noise to different degrees. The single-promoter model does a worse job predicting the experimental data since it does not account for the differences in gene dosage during the cell cycle. But still, we can see that accounting for this variability takes us to within a factor of two of the experimentally determined noise for these unregulated strains.

Figure 12: Protein noise of the unregulated promoter. Comparison of the experimental noise for different operators with the theoretical predictions for the single-promoter (gray dotted line) and the multi-promoter model (black dashed line). Each datum represents a single date measurement of the corresponding \Delta lacI strain with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. The Python code (ch5_fig12.py) used to generate this figure can be found on the original paper’s GitHub repository.

To further test the model’s predictive power, we compare the predictions for the three-state regulated promoter. Fig. 13 shows the theoretical predictions for the single- and multi-promoter model for varying repressor copy numbers and repressor-DNA binding affinities as a function of the inducer concentration. Again, we can see that our zero-parameter fits systematically underestimate the noise for all strains and all inducer concentrations. We highlight that the \(y\)-axis is shown in a log-scale to emphasize this deviation more, but, as we will show in the next section, our predictions still fall within a factor of two from the experimental data.

Figure 13: Protein noise of the regulated promoter. Comparison of the experimental noise for different operators ((A) O1, \Delta\varepsilon_r = -15.3 \; k_BT, (B) O2, \Delta\varepsilon_r = -13.9 \; k_BT, (C) O3, \Delta\varepsilon_r = -9.7 \; k_BT) with the theoretical predictions for the single-promoter (dashed lines) and the multi-promoter model (solid lines). Points represent the experimental noise as computed from single-cell fluorescence measurements of different E. coli strains under 12 different inducer concentrations. The dotted line indicates the plot in linear rather than logarithmic scale. Each datum represents a single date measurement of the corresponding strain and IPTG concentration with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. White-filled dots are plot at a different scale for better visualization. The Python code (ch5_fig13.py) used to generate this figure can be found on the original paper’s GitHub repository.

Systematic Deviation of the Noise in Gene Expression

Fig. 12 and Fig. 13 highlight that our model underestimates the cell-to-cell variability as measured by the noise. To further explore this systematic deviation, Fig. 14 shows the theoretical vs. experimental noise both in linear and log scale. As we can see, the data is systematically above the identity line. Their corresponding experimental fold-change values color the data. The data with the largest deviations from the identity line also corresponds to the data with the largest error bars and the smallest fold-change. This is because measurements with very small fold-changes correspond to intensities very close to the autofluorescence background. Therefore minimal changes when computing the noise are amplified given the ratio of std/mean. In Sec. 4.8, we will explore empirical ways to improve the agreement between our minimal and experimental data to guide future efforts to improve the minimal.

Figure 14: Systematic comparison of theoretical vs. experimental noise in gene expression. Theoretical vs. experimental noise both in linear (left) and log (right) scale. The dashed line shows the identity line of slope one and intercept zero. All data are colored by the corresponding experimental fold-changes value in gene expression, as indicated by the color bar. Each datum represents a single date measurement of the corresponding strain and IPTG concentration with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. The Python code (ch5_fig14.py) used to generate this figure can be found on the original paper’s GitHub repository.

Maximum Entropy Approximation of Distributions

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

On the one hand, chemical master equations like the one here represent a hard mathematical challenge. Peccoud and Ycart derived a closed-form solution for the two-state promoter at the mRNA level  [2]. In an impressive display of mathematical skills, Shahrezaei and Swain were able to derive an approximate solution for the one- (not considered in this work) and two-state promoter master equation at the protein level  [3]. Nevertheless, both of these solutions do not give instantaneous insights about the distributions as they involve complicated terms such as confluent hypergeometric functions.

On the other hand, there has been a great deal of work to generate methods that can approximate the solution of these discrete state Markovian models  [2731]. In particular, for master equations like the one that concerns us here, whose moments can be easily computed, the moment expansion method provides a simple method to approximate the full joint distribution of mRNA and protein  [31]. This section will explain the principles behind this method and show the implementation for our particular case study.

The MaxEnt Principle

The principle of maximum entropy (MaxEnt), first proposed by E. T. Jaynes in 1957, tackles the question of, given limited information, what is the least biased inference one can make about a particular probability distribution  [32]. In particular, Jaynes used this principle to show the correspondence between statistical mechanics and information theory, demonstrating, for example, that the Boltzmann distribution is the probability distribution that maximizes Shannon’s entropy subject to a constraint that the average energy of the system is fixed.

To illustrate the principle, let us focus on a univariate distribution \(P_X(x)\). The \(n^{\text{th}}\) moment of the distribution for a discrete set of possible values of \(x\) is given by \[ \left\langle{x^n}\right\rangle \equiv \sum_x x^n P_X(x). \label{eq:mom_ref} \]

Now assume that we have knowledge of the first \(m\) moments \(\mathbf{\left\langle{x}\right\rangle}_m = (\left\langle{x}\right\rangle, \left\langle{x^2}\right\rangle, \ldots, \left\langle{x^m}\right\rangle )\). The question is then how we can use this information to build an estimator \(P_H(x \mid \mathbf{\left\langle{x}\right\rangle}_m)\) of the distribution such that \[ \lim_{m \rightarrow \infty} P_H(x \mid \mathbf{\left\langle{x}\right\rangle}_m) \rightarrow P_X(x), \] i.e., that the more moments we add to our approximation, the more the estimator distribution converges to the real distribution.

The MaxEnt principle tells us that our best guess for this estimator is to build it based on maximizing the Shannon entropy, constrained by the information we have about these \(m\) moments. Shannon’s entropy maximization guarantees that we are the least committed to information that we do not possess. The Shannon entropy for a univariate discrete distribution is given by  [33] \[ H(x) \equiv - \sum_x P_X(x) \log P_X(x). \]

For an optimization problem subject to constraints, we make use of the method of the Lagrange multipliers. For this, we define the constraint equation \(\mathcal{L}(x)\) as \[ \mathcal{L}(x) \equiv H(x) - \sum_{i=0}^m \left[ \lambda_i \left( \left\langle{x^i}\right\rangle - \sum_x x^i P_X(x) \right) \right], \label{eq:constraint_eq} \] where \(\lambda_i\) is the Lagrange multiplier associated with the \(i^\text{th}\) moment. The inclusion of the zeroth moment is an additional constraint to guarantee the normalization of the resulting distribution. Since \(P_X(x)\) has a finite set of discrete values, when taking the derivative of the constraint equation with respect to \(P_X(x)\), we chose a particular value of \(X = x\). Therefore from the sum over all possible \(x\) values, only a single term survives. With this in mind, we take the derivative of the constraint equation, obtaining \[ \frac{d\mathcal{L}}{d P_X(x)} = -\log P_X(x) - 1 - \sum_{i=0}^m \lambda_i x^i. \]

Equating this derivative to zero and solving for the distribution (that we now start calling \(P_H(x)\), our MaxEnt estimator) gives \[ P_H(x) = \exp \left(- 1 - \sum_{i=0}^m \lambda_i x^i \right) = \frac{1}{\mathcal{Z}} \exp \left( - \sum_{i=1}^m \lambda_i x^i \right), \label{eq:maxEnt} \] where \(\mathcal{Z}\) is the normalization constant that can be obtained by substituting this solution into the normalization constraint. This results in \[ \mathcal{Z} \equiv \exp\left( 1 + \lambda_0 \right) = \sum_x \exp \left( - \sum_{i=1}^m \lambda_i x^i \right). \]

Eq. \(\ref{eq:maxEnt}\) is the general form of the MaxEnt distribution for a univariate distribution. The computational challenge then consists of finding numerical values for the Lagrange multipliers \(\{ \lambda_i \}\) such that \(P_H(x)\) satisfies our constraints. In other words, the Lagrange multipliers weigh the contribution of each term in the exponent such that when computing any of the moments, we recover the value of our constraint. Mathematically what this means is that \(P_H(x)\) must satisfy \[ \sum_x x^n P_H(x) = \sum_x \frac{x^n}{\mathcal{Z}} \exp \left( - \sum_{i=1}^m \lambda_i x^i \right) = \left\langle{x^n}\right\rangle. \]

As an example of applying the MaxEnt principle, let us use a six-face die’s classic problem. If we are only told that after a large number of die rolls, the mean value of the face is \(\left\langle{x}\right\rangle = 4.5\) (note that a fair die has a mean of \(3.5\)), what would the least biased guess for the distribution look like? The MaxEnt principle tells us that our best guess would be of the form \[ P_H(x) = \frac{1}{\mathcal{Z}} \exp \left( \lambda x \right). \] Using any numerical minimization package, we can easily find the value of the Lagrange multiplier \(\lambda\) that satisfies our constraint. Fig. 15 shows two examples of distributions that satisfy the constraint. Panel (A) shows a distribution consistent with the 4.5 average where both 4 and 5 are equally likely. Nevertheless, in the information we got about the nature of the die, it was never stated that some of the faces were forbidden. In that sense, the distribution is committing to information about the process that we do not possess. Panel (B), by contrast, shows the MaxEnt distribution that satisfies this constraint. Since this distribution maximizes Shannon’s entropy, it is guaranteed to be the least biased distribution given the available information.

Figure 15: Maximum entropy distribution of six-face die. (A) Biased distribution consistent with the constraint \left\langle{x}\right\rangle = 4.5. (B) MaxEnt distribution also consistent with the constraint. The Python code (ch5_fig15.py) used to generate this figure can be found on the original paper’s GitHub repository.

The mRNA and Protein Joint Distribution

The MaxEnt principle can easily be extended to multivariate distributions. For our particular case, we are interested in the mRNA and protein joint distribution \(P(m, p)\). The definition of a moment \(\left\langle m^x p^y \right\rangle\) is a natural extension of Eq. \(\ref{eq:mom_ref}\) of the form \[ \left\langle m^x p^y \right\rangle = \sum_m \sum_p m^x p^y P(m, p). \]

As a consequence, the MaxEnt joint distribution \(P_H(m, p)\) is of the form \[ P_H(m, p) = \frac{1}{\mathcal{Z}} \exp \left( - \sum_{(x,y)} \lambda_{(x,y)} m^x p^y \right), \label{eq:maxEnt_joint} \] where \(\lambda_{(x,y)}\) is the Lagrange multiplier associated with the moment \(\left\langle m^x p^y \right\rangle\), and again \(\mathcal{Z}\) is the normalization constant, given by \[ \mathcal{Z} = \sum_m \sum_p \exp \left( - \sum_{(x, y)} \lambda_{(x, y)} m^x p^y \right). \] Note that the sum in the exponent is taken over all available \((x, y)\) pairs that define the moment constraints for the distribution.

The Bretthorst Rescaling Algorithm

The Lagrange multipliers’ determination suffers from a numerical underflow and overflow problem due to the difference in magnitude between the constraints. This becomes a problem when higher moments are taken into account. The resulting numerical values for the Lagrange multipliers end up being separated by several orders of magnitude. For routines such as Newton-Raphson or other minimization algorithms that can be used to find these Lagrange multipliers, these different scales become problematic.

To get around this problem, we implemented a variation to the algorithm due to G. Larry Bretthorst—Jaynes’ last student. With a straightforward argument, we can show that linearly rescaling the constraints, the Lagrange multipliers, and the “rules” for computing each of the moments, i.e., each of the individual products that go into the moment calculation should converge to the same MaxEnt distribution. To see this, let us consider a univariate distribution \(P_X(x)\) that we are trying to reconstruct given the first two moments \(\left\langle{x}\right\rangle\), and \(\left\langle{x^2}\right\rangle\). The MaxEnt distribution can be written as \[ P_H(x) = \frac{1}{\mathcal{Z}} \exp \left(- \lambda_1 x - \lambda_2 x^2 \right) = \frac{1}{\mathcal{Z}} \exp \left(- \lambda_1 x \right) \exp \left( - \lambda_2 x^2 \right). \] We can always rescale the terms in any way and obtain the same result. Assume that, for some reason, we want to rescale the quadratic terms by a factor \(a\). We can define a new Lagrange multiplier \(\lambda_2' \equiv \frac{\lambda_2}{a}\) that compensates for the rescaling of the terms, obtaining \[ P_H(x) = \frac{1}{\mathcal{Z}} \exp \left(- \lambda_1 x \right) \exp \left( - \lambda_2' ax^2 \right). \] Computationally it might be more efficient to find the numerical value of \(\lambda_2'\) rather than \(\lambda_2\) maybe because it is of the same order of magnitude as \(\lambda_1\). Then we can always multiply \(\lambda_2'\) by \(a\) to obtain back the constraint for our quadratic term. This means that we can always rescale the MaxEnt problem to make it numerically more stable, then we can rescale it back to obtain the value of the Lagrange multipliers. The key to the Bretthorst algorithm lies in selecting what rescaling factor to choose to make the numerical inference more efficient.

Bretthorst’s algorithm goes even further by further transforming the constraints and the variables to make the constraints orthogonal, making the computation much more effective. We now explain the algorithm’s implementation for our joint distribution of interest \(P(m, p)\).

Algorithm Implementation

Let the \(M \times N\) matrix \(\mathbf{A}\) contain all the factors used to compute the moments that serve as constraints, where each entry is of the form \[ A_{ij} = m_i^{x_j} \cdot p_i^{y_j}. \label{eq:maxent_rules} \] In other words, recall that to obtain any moment \(\left\langle m^x p^y \right\rangle\), we compute \[ \left\langle m^x p^y \right\rangle = \sum_m \sum_p m^x p^y P(m, x). \] If we have \(M\) possible \((m, p)\) pairs in our truncated sample space (because we ca not include the sample space up to infinity) \(\{(m, p)_1, (m, p)_2, \ldots (m, p)_N \}\), and we have \(N\) exponent pairs \((x, y)\) corresponding to the \(N\) moments used to constraint the maximum entropy distribution \(\{(x, y)_1, (x, y)_2, \ldots, (x, y)_N \}\), then matrix \(\mathbf{A}\) contains all the possible \(M\) by \(N\) terms of the form described in Eq. \(\ref{eq:maxent_rules}\). Let also \(\mathbf{v}\) be a vector of length \(N\) containing all the constraints with each entry of the form \[ v_j = \left\langle{m^{x_j} p^{y_j}}\right\rangle, \] i.e., the information that we have about the distribution. That means that the constraint equation \(\mathcal{L}\) to be used for this problem takes the form \[ \mathcal{L} = -\sum_i P_i \ln P_i + \lambda_0 \left( 1 - \sum_i P_i \right) + \sum_{j>0} \lambda_j \left( v_j - \sum_i A_{ij} P_i \right), \] where \(\lambda_0\) is the Lagrange multiplier associated with the normalization constraint and \(\lambda_j\) is the Lagrange multiplier associated with the \(j^\text{th}\) constraint. This constraint equation is equivalent to Eq. \(\ref{eq:constraint_eq}\), but now all the details of how to compute the moments are specified in matrix \(\mathbf{A}\).

With this notation in hand, we now proceed to rescale the problem. The first step consists of rescaling the terms to compute the entries of the matrix \(\mathbf{A}\). As mentioned before, this is the crucial feature of the Bretthorst algorithm; the particular choice of rescaling factor used in the algorithm empirically promotes that the rescaled Lagrange multipliers are of the same order of magnitude. The rescaling takes the form \[ A_{ij}' = \frac{A_{ij}}{G_j}, \] where \(G_j\) serves to rescale the moments, providing numerical stability to the inference problem. Bretthorst proposes an empirical rescaling that satisfies \[ G_j^2 = \sum_i A_{ij}^2, \] or, in terms of our particular problem, \[ G_j^2 = \sum_m \sum_p \left( m^{x_j} p^{y_j} \right)^2. \] What this indicates is that each pair \(m_i^{x_j} p_i^{y_j}\) is normalized by the square root of the sum of all pairs of the same form squared.

Since we rescale the factors involved in computing the constraints, the constraints must also be rescaled simply as \[ v_j' = \left\langle{m^{x_j} p^{y_j}}\right\rangle' = \frac{\left\langle{m^{x_j} p^{y_j}}\right\rangle}{G_j}. \] The Lagrange multipliers must compensate for this rescaling since the probability must add up to the same value at the end of the day. Therefore, we rescale the \(\lambda_j\) terms as \[ \lambda_j' = \lambda_j G_j, \] such that any \(\lambda_j A_{ij} = \lambda_j' A_{ij}'\). If this empirical value for the rescaling factor makes the rescaled Lagrange multipliers \(\lambda_j'\) be of the same order of magnitude, this by itself would already improve the algorithm convergence. Bretthorst proposes another linear transformation to make the optimization routine even more efficient. For this, we generate orthogonal constraints that make Newton-Raphson and similar algorithms converge faster. The transformation is as follows \[ A_{ik}'' = \sum_j {e}_{jk} A_{ij}', \] for the entires of matrix \(\mathbf{A}\), and \[ v_k'' = \sum_j {e}_{jk} u_j', \] for entires of the constraint vector \(\mathbf{v}\), finally \[ \lambda_k'' = \sum_j {e}_{jk} \beta_j, \] for the Lagrange multipliers. Here \({e}_{jk}\) is the \(j^\text{th}\) component of the \(k^\text{th}\) eigenvector of the matrix \(\mathbf{E}\) with entries \[ {E}_{kj} = \sum_i {A}_{ik}' {A}_{ij}'. \] This transformation guarantees that the matrix \(\mathbf{A}''\) has the property \[ \sum_i A_{ij}'' A_{jk}'' = \beta_j \delta_{jk}, \] where \(\beta_j\) is the \(j^\text{th}\) eigenvalue of the matrix \(\mathbf{E}\) and \(\delta_{jk}\) is the Kronecker delta function. This means that, as desired, the constraints are orthogonal to each other, improving the algorithm convergence speed.

Predicting Distributions for Simple Repression Constructs

Having explained the theoretical background and the practical difficulties, and a workaround strategy proposed by Bretthorst, we implemented the inference using the moments obtained from averaging over the variability along the cell cycle (see Sec. 5.4). Fig. 16 and Fig. 17 present these inferences for both mRNA and protein levels, respectively, for different values of the repressor-DNA binding energy and repressor copy numbers per cell. From these plots, we can easily appreciate that even though the mean of each distribution changes as the induction level changes, there is a lot of overlap between distributions. This, as a consequence, means that at the single-cell level, cells cannot perfectly resolve between different inputs.

Figure 16: Maximum entropy mRNA distributions for simple repression constructs. mRNA distributions for different biophysical parameters. From left to right, the repressor-DNA affinity decreases as defined by the three lacI operators O1 (-15.3 \; k_BT), O2 (-13.9 \; k_BT), and O3 (-9.7 \; k_BT). From top to bottom, the mean repressor copy number per cell increases. The curves on each plot represent different IPTG concentrations. Each distribution was fitted using the first three moments of the mRNA distribution. The Python code (ch5_fig16.py) used to generate this figure can be found on the original paper’s GitHub repository..
Figure 17: Maximum entropy protein distributions for simple repression constructs. Protein distributions for different biophysical parameters. From left to right, the repressor-DNA affinity decreases as defined by the three lacI operators O1 (-15.3 \; k_BT), O2 (-13.9 \; k_BT), and O3 (-9.7 \; k_BT). From top to bottom, the mean repressor copy number per cell increases. The curves on each plot represent different IPTG concentrations. Each distribution was fitted using the first six moments of the protein distribution. The Python code (ch5_fig17.py) used to generate this figure can be found on the original paper’s GitHub repository..

Comparison with Experimental Data

Now that we have reconstructed an approximation of the probability distribution \(P(m, p)\), we can compare this with our experimental measurements. But just as detailed in the single-cell microscopy, measurements are given in arbitrary units of fluorescence. Therefore, we cannot directly compare our predicted protein distributions with these values. To get around this issue, we use the fact that the fold-change in gene expression that we defined as the ratio of the gene expression level in the presence of the repressor and the expression level of a knockout strain is a non-dimensional quantity. Therefore, we normalize all of our single-cell measurements by the mean fluorescence value of the \(\Delta lacI\) strain with the proper background fluorescence subtracted as explained in the noise measurements. In the case of the theoretical predictions of the protein distribution, we also normalize each protein value by the predicted mean protein level \(\left\langle p \right\rangle\), having now non-dimensional scales that can be directly compared. Fig. 18 shows the experimental (color curves) and theoretical (dark dashed line) cumulative distribution functions for the three \(\Delta lacI\) strains. As in Fig. 12, we do not expect differences between the operators, but we explicitly plot them separately to ensure that this is the case. We can see right away that as we would expect, given the model’s limitations to predict the noise and skewness of the distribution accurately, the model does not accurately predict the data. Our model predicts a narrower distribution compared to what we measured with single-cell microscopy.

Figure 18: Experiment vs. theory comparison for \Delta lacI strain. Example fold-change empirical cumulative distribution functions (ECDF) for strains with no repressors and different operators. The color curves represent single-cell microscopy measurements while the dashed black lines represent the theoretical distributions as reconstructed by the maximum entropy principle. The theoretical distributions were fitted using the first six moments of the protein distribution. The Python code (ch5_fig18.py) used to generate this figure can be found on the original paper’s GitHub repository.

The same narrower prediction applies to the regulated promoters. Fig. 19, shows the theory-experiment comparison of the cumulative distribution functions for different repressor binding sites (different figures), repressor copy numbers (rows), and inducer concentrations (columns). In general, the predictions are systematically narrower compared to the actual experimental data.

Figure 19: Experiment vs. theory comparison for regulated promoters. Example fold-change empirical cumulative distribution functions (ECDF) for regulated strains with the three operators (different colors) as a function of repressor copy numbers (rows) and inducer concentrations (columns). The color curves represent single-cell microscopy measurements, while the dashed black lines represent the theoretical distributions as reconstructed by the maximum entropy principle. The theoretical distributions were fitted using the first six moments of the protein distribution. The Python code (ch5_fig19.py) used to generate this figure can be found on the original paper’s GitHub repository.

Gillespie Simulation of the Master Equation

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

So far, we have generated a way to compute an approximated form of the joint distribution of protein and mRNA \(P(m, p)\) as a function of the moments of the distribution \(\left\langle m^x p^y \right\rangle\). This is a non-conventional form to work with the resulting distribution of the master equation. A more conventional approach to work with master equations whose closed-form solutions are not known or not computable is to use stochastic simulations, commonly known as Gillespie simulations. To benchmark our approach’s performance based on distribution moments and maximum entropy, we implemented the Gillespie algorithm. Our implementation, as detailed in the corresponding Jupyter notebook, makes use of just-in-time compilation as implemented with the Python package numba.

mRNA Distribution with Gillespie Simulations

To confirm that the Gillespie simulation’s implementation was correct, we perform the simulation at the mRNA level, for which the closed-form solution of the steady-state distribution is known as detailed in Sec. 5.2. Fig. 20 shows example trajectories of mRNA counts. Each of these trajectories was computed over several cell cycles, where the cell division was implemented, generating a binomially distributed random variable that depended on the last mRNA count before the division event.

Figure 20: Stochastic trajectories of mRNA counts. 100 stochastic trajectories generated with the Gillespie algorithm for mRNA counts over time for a two-state unregulated promoter. Cells spend a fraction of the cell cycle with a single copy of the promoter (light brown) and the rest of the cell cycle with two copies (light yellow). When trajectories reach a new cell cycle, the mRNA counts undergo binomial partitioning to simulate the cell division. The Python code (ch5_fig20.py) used to generate this figure can be found on the original paper’s GitHub repository.

To check the implementation of our stochastic algorithm, we generated several of these stochastic trajectories to reconstruct the mRNA steady-state distribution. These reconstructed distributions for a single- and double-copy of the promoter can be compared with Eq. \(\ref{eq:two_state_mRNA}\)—the steady-state distribution for the two-state promoter. Fig. 21 shows the excellent agreement between the stochastic simulation and the analytical result, confirming that our implementation of the Gillespie simulation is correct.

Figure 21: Comparison of analytical and simulated mRNA distribution. Solid lines show the steady-state mRNA distributions for one copy (light blue) and two copies of the promoter (dark blue) as defined by Eq. \ref{eq:two_state_mRNA}. Shaded regions represent the corresponding distribution obtained using 2500 stochastic mRNA trajectories and taking the last cell cycle to approximate the distribution. The Python code (ch5_fig21.py) used to generate this figure can be found on the original paper’s GitHub repository.

Protein Distribution with Gillespie Simulations

Having confirmed that our implementation of the Gillespie algorithm that includes the binomial partitioning of molecules reproduces analytical results, we extended the implementation to include protein counts. Fig. 22 shows representative trajectories for both mRNA and protein counts over several cell cycles. Especially for the protein, we can see that it takes several cell cycles for counts to converge to the dynamical steady-state observed with the deterministic moment equations. Once this steady-state is reached, the ensemble of trajectories between cell cycles looks very similar.

Figure 22: Stochastic trajectories of mRNA and protein counts. 2500 protein counts over time for a two-state unregulated promoter. Cells spend a fraction of the cell cycle with a single copy of the promoter (light brown) and the rest of the cell cycle with two copies (light yellow). When trajectories reach a new cell cycle, the molecule counts undergo binomial partitioning to simulate the cell division. The Python code (ch5_fig22.py) used to generate this figure can be found on the original paper’s GitHub repository..

From these trajectories, we can compute the steady-state protein distribution, taking into account the cell-age distribution, as detailed in Sec. 5.5. Fig. 23 shows the comparison between this distribution and the one generated using the maximum entropy algorithm. Although the notorious differences between the distributions, the Gillespie simulation and the maximum entropy results are indistinguishable in terms of the mean, variance, and skewness of the distribution. We remind the reader that the maximum entropy approximates the distribution that gets better the more moments we add. We, therefore, claim that the approximation works sufficiently well for our purpose. The enormous advantage of the maximum entropy approach comes from the computation time. For the number of distributions needed for our calculations, the Gillespie algorithm proved to be a very inefficient method given the ample sample space. Our maximum entropy approach reduces the computation time by several orders of magnitude, allowing us to explore different regulatory models’ parameters extensively.

Figure 23: Comparison of protein distributions. Comparison of the protein distribution generated with Gillespie stochastic simulations (blue curve) and the maximum entropy approach (orange curve). The upper panel shows the probability mass function. The lower panel compares the cumulative distribution functions. The Python code (ch5_fig23.py) used to generate this figure can be found on the original paper’s GitHub repository.

Computational Determination of the Channel Capacity

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

This section details the computation of the channel capacity of the simple genetic circuit shown in Fig. ¿fig:ch3_fig05?. The channel capacity is defined as the mutual information between input \(c\) and output \(p\) maximized over all possible input distributions \(P(c)\)  [33]. In principle, there is an infinite number of input distributions, so the task of finding \(\hat{P}(c)\), the input distribution at channel capacity, requires an algorithmic approach that guarantees the convergence to this distribution. Tkačik, Callan, and Bialek developed an analytical approximation to find the \(\hat{P}(c)\) distribution  [34]. The validity of their so-called small noise approximation requires the standard deviation of the output distribution \(P(p \mid c)\) to be much smaller than the distribution domain. For our particular case, such a condition is not satisfied given the spread of the inferred protein distributions shown in Fig. ¿fig:ch3_fig04?.

Fortunately, a numerical algorithm can approximate \(\hat{P}(c)\) for discrete distributions. In 1972, Blahut and Arimoto independently came up with an algorithm mathematically shown to converge to \(\hat{P}(c)\)  [35]. To compute both the theoretical and the experimental channel capacity shown in Fig. ¿fig:ch3_fig05?, we implemented Blahut’s algorithm. In the following section, we detail the definitions needed for the algorithm. Then we describe how to compute the experimental channel capacity when the distribution bins are not clear given the arbitrary intrinsic nature of microscopy fluorescence measurements.

Blahut’s algorithm

Following  [35], we implemented the algorithm to compute the channel capacity. We define \(\mathbf{p_c}\) to be an array containing the probability of each of the input inducer concentrations (twelve concentrations, See Methods). Each entry \(j\) of the array is then of the form \[ p_c^{(j)} = P(c = c_j), \] with \(j \in \{1, 2, \ldots, 12 \}\). The objective of the algorithm is to find the entries \(p_c^{(j)}\) that maximize the mutual information between inputs and outputs. We also define \(\mathbf{Q}\) to be a \(\vert \mathbf{p_c} \vert\) by \(\vert \mathbf{p_{p \mid c}} \vert\) matrix, where \(\vert \cdot \vert\) specifies the length of the array, and \(\mathbf{p_{p \mid c}}\) is an array containing the probability distribution of an output given a specific value of the input. In other words, the matrix \(\mathbf{Q}\) recollects all of the individual output distribution arrays \(\mathbf{p_{p \mid c}}\) into a single object. Then each entry of the matrix \(\mathbf{Q}\) is of the form \[ Q^{(i, j)} = P(p = p_i \mid c = c_j). \]

For the case of the theoretical predictions of the channel capacity (solid lines in Fig. ¿fig:ch3_fig05?), the entries of the matrix \(\mathbf{Q}\) are given by the inferred maximum entropy distributions as shown in Fig. ¿fig:ch3_fig04?. In the next section, we will discuss how to define this matrix for the single-cell fluorescence measurements. Having defined these matrices, we proceed to implement the algorithm shown in Figure 1 of  [35].

Channel Capacity from Arbitrary Units of Fluorescence

A difficulty when computing the channel capacity between inputs and outputs from experimental data is that ideally, we would like to compute \[ C(g; c) \equiv \sup_{P(c)} I(g; c), \] where \(g\) is the gene expression level, and \(c\) is the inducer concentration. But in reality, we are computing \[ C(f(g); c) \equiv \sup_{P(c)} I(f(g); c), \] where \(f(g)\) is a function of gene expression that has to do with our mapping from the YFP copy number to some arbitrary fluorescent value as computed from the images taken with the microscope. The data processing inequality, as derived by Shannon himself, tells us that for a Markov chain of the form \(c \rightarrow g \rightarrow f(g)\), it must be true that  [33] \[ I(g; c) \geq I(f(g); c), \] meaning that information can only be lost when mapping from the real relationship between gene expression and inducer concentration to a fluorescence value.

On top of that, given the limited number of samples that we have access to when computing the channel capacity, there is a bias in our estimate given this undersampling. The definition of accurate, unbiased descriptors of mutual information is still an area of active research. For our purposes, we will use the method described in  [36]. The basic idea of the method is to write the mutual information as a series expansion in terms of inverse powers of the sample size, i.e. \[ I_{\text{biased}} = I_\infty + \frac{a_1}{N} + \frac{a_2}{N^2} + \cdots, \] where \(I_{\text{biased}}\) is the biased estimate of the mutual information as computed from experimental data, \(I_\infty\) is the quantity we would like to estimate, being the unbiased mutual information when having access to an infinity number of experimental samples and the coefficients \(a_i\) depend on the underlying distribution of the signal and the response. This is an empirical choice to be tested. Intuitively, this choice satisfies the limit that as the number of samples from the distribution grows, the empirical estimate of the mutual information \(I_{\text{biased}}\) should get closer to the actual value \(I_\infty\).

In principle, for a good number of data points, the terms of higher-order become negligible. So we can write the mutual information as \[ I_{\text{biased}} \approx I_\infty + \frac{a_1}{N} + \mathcal{O}(N^{-2}). \label{eq:mutual_biased} \] This means that if this particular arbitrary choice of functional form is a good approximation, when computing the mutual information for varying numbers of samples—by taking subsamples of the experimental data—we expect to find a linear relationship as a function of the inverse of these number of data points. From this linear relationship, the intercept is a bias-corrected estimate of the mutual information. Therefore, we can bootstrap the data by taking different sample sizes and then use the Blahut-Arimoto algorithm we implemented earlier to estimate the biased channel capacity. We can then fit a line and extrapolate when \(1/N = 0\), which corresponds to our unbiased estimate of the channel capacity.

Let us go through each of the steps to illustrate the method. Fig. 24 show a typical data set for a strain with an O2 binding site (\(\Delta\varepsilon_r = -13.9 \; k_BT\)) and \(R = 260\) repressors per cell. Each of the distributions in arbitrary units is binned into a specified number of bins to build matrix \(\mathbf{Q}\).

Figure 24: Single-cell fluorescence distributions for different inducer concentrations. Fluorescence distribution histogram (A) and cumulative distribution function (B) for a strain with 260 repressors per cell and a binding site with binding energy \Delta\varepsilon_r = -13.9\; k_BT. The different curves show the single-cell fluorescence distributions under the 12 different IPTG concentrations used throughout this work. The triangles in (A) show the mean of each of the distributions. The Python code (ch5_fig24.py) used to generate this figure can be found on the original paper’s GitHub repository.

Given a specific number of bins used to construct \(\mathbf{Q}\), we subsample a fraction of the data and compute the channel capacity for such matrix using the Blahut-Arimoto algorithm. Fig. 25 shows an example where 50% of the data on each distribution from Fig. 24 was sampled and binned into 100 equal bins. The counts on each of these bins are then normalized and used to build matrix \(\mathbf{Q}\) that is then fed to the Blahut-Arimoto algorithm. We can see that for these 200 bootstrap samples, the channel capacity varies by \(\approx\) 0.1 bits. Not a significant variability; nevertheless, we consider it essential to bootstrap the data multiple times to estimate the channel capacity better.

Figure 25: Channel capacity bootstrap for experimental data. The cumulative distribution function of the resulting channel capacity estimates obtained by subsampling 200 times 50% of each distribution shown in Fig. 24, binning it into 100 bins, and feeding the resulting \mathbf{Q} matrix to the Blahut-Arimoto algorithm. The Python code (ch5_fig25.py) used to generate this figure can be found on the original paper’s GitHub repository.

Eq. \(\ref{eq:mutual_biased}\) tells us that if we subsample each of the distributions from Fig. 24 at different fractions and plot them as a function of the inverse sample size, we will find a linear relationship if the expansion of the mutual information is valid. To test this idea, we repeated the bootstrap estimate of Fig. 25 sampling 10%, 20%, and so on until taking 100% of the data. We repeated this for different numbers of bins since a priori for arbitrary units of fluorescence, we do not have a way to select the optimal number of bins. Fig. 26 shows the result of these estimates. We can see that the linear relationship proposed in Eq. \(\ref{eq:mutual_biased}\) holds for all number of bins selected. We also note that the value of the linear regression intercept varies depending on the number of bins.

Figure 26: Inverse sample size vs. channel capacity. As indicated in Eq. \ref{eq:mutual_biased}, if the channel capacity obtained for different subsample sizes of the data are plotted against the inverse sample size, there must exist a linear relationship between these variables. Here, we perform 15 bootstrap samples of the data from Fig. 24, then we bin these samples using a different number of bins, and finally perform a linear regression (solid lines) between the bootstrap channel capacity estimates and the inverse sample size. The Python code (ch5_fig26.py) used to generate this figure can be found on the original paper’s GitHub repository..

To address the variability in the estimates of the unbiased channel capacity \(I_\infty\) we again follow the methodology suggested in  [36]. We perform the data subsampling and computation of the channel capacity for a varying number of bins. As a control, we perform the same procedure with shuffled data, where the structure that connects the fluorescence distribution to the inducer concentration input is lost. The expectation is that this control should give a channel capacity of zero if the data is not “over-binned.” Once the number of bins is too high, we expect some structure to emerge in the data that would cause the Blahut-Arimoto algorithm to return non-zero channel capacity estimates.

Fig. 27 shows the result of the unbiased channel capacity estimates obtained for the data shown in Fig. 24. For the blue curve, we can distinguish three phases: 1. A rapid increment from 0 bits to about 1.5 bits as the number of bins increases. 2. A flat region between \(\approx\) 50 and 1000 bins. 3. A second rapid increment for a large number of bins.

We can see that the randomized data presents two phases only: 1. A flat region where there is, as expected, no information being processed since the structure of the data was lost when the data was shuffled. 2. A region with a fast growth of the channel capacity as the over-binning generates separated peaks on the distribution, making it look like there is a structure in the data.

We take the flat region of the experimental data (\(\approx\) 100 bins) to be our best unbiased estimate of the channel capacity from this experimental dataset.

Figure 27: Channel capacity as a function of the number of bins. Unbiased channel capacity estimates we obtained from linear regressions as in Fig. 26. The blue curve shows the estimates obtained from the data shown in Fig. 24. The orange curve is generated from estimates where the same data is shuffled, losing the relationship between fluorescence distributions and inducer concentration. The Python code (ch5_fig27.py) used to generate this figure can be found on the original paper’s GitHub repository.

Assumptions Involved in the Computation of the Channel Capacity

An interesting suggestion by Professor Gasper Tkacik was to dissect the different physical assumptions that went into the construction of the input-output function \(P(p \mid c)\), and their relevance when comparing the theoretical channel capacities with the experimental inferences. In what follows, we describe the relevance of four important aspects that all affect the predictions of the information processing capacity.

(i) Cell Cycle Variability.

We think that the inclusion of the gene copy number variability during the cell cycle and non-Poissonian protein degradation is crucial to our estimation of the input-output functions and channel capacity. This variability in gene copy number is an additional source of noise that systematically decreases the system’s ability to resolve different inputs. The absence of the effects that the gene copy number variability and the protein partition have on the information processing capacity leads to an overestimate of the channel capacity, as shown in Fig. 28. When these noise sources are included in our inferences, we capture the experimental channel capacities with no additional fit parameters.

Figure 28: Comparison of channel capacity predictions for single- and multi-promoter models. Channel capacity for the multi-promoter model (solid lines) vs. the single-promoter steady-state model (dot-dashed lines) as a function of repressor copy numbers for different repressor-DNA binding energies. The single-promoter model assumes Poissonian protein degradation (\gamma _p > 0) and steady-state, while the multi-promoter model accounts for gene copy number variability during the cell cycle and has protein degradation as an effect due to dilution as cells grow and divide. The Python code (ch5_fig28.py) used to generate this figure can be found on the original paper’s GitHub repository.

(ii) Non-Gaussian Noise Distributions.

For the construction of the probability distributions used in Chapter 3 (Fig. ¿fig:ch3_fig04?), we utilized the first six moments of the protein distribution. The maximum entropy formalism tells us that the more constraints we include in the inference, the closer the maximum entropy distribution will be to the real distribution. But a priori there is no way of knowing how many moments should be included to capture the distribution’s essence. In principle, two moments could suffice to describe the entire distribution as happens with the Gaussian distribution. To compare the effect of including more or fewer constraints on the maximum entropy inference, we constructed maximum entropy distributions using an increasing number of moments from 2 to 6. We then computed the Kullback-Leibler divergence \(D_{KL}\) of the form \[ D_{KL}(P_6(p \mid c) || P_i(p \mid c)) = \sum_p P_6(p \mid c) \log_2 \frac{P_6(p \mid c)}{P_i(p \mid c)}, \] where \(P_i(p \mid c)\) is the maximum entropy distribution constructed with the first \(i\) moments, \(i \in \{2, 3, 4, 5, 6\}\). Since the Kullback-Leibler divergence \(D_{KL}(P || Q)\) can be interpreted as the amount of information lost by assuming the incorrect distribution \(Q\) when the correct distribution is \(P\), we used this metric as a way of how much information we would have lost by using fewer constraints compared to the six moments used in Chapter 3.

Fig. 29 shows this comparison for different operators and repressor copy numbers. We can see from here that using fewer moments as constraints gives the same result. This is because most of the values of the Kullback-Leibler divergence is significantly smaller than 0.1 bits. The entropy of these distributions is, in general, \(> 10\) bits, so we would lose less than 1% of the information contained in these distributions by utilizing only two moments as constraints. Therefore the use of non-Gaussian noise is not an essential feature for our inferences.

Figure 29: Measuring the loss of information by using a different number of constraints. The Kullback-Leibler divergence was computed between the maximum entropy distribution constructed using the first six moments of the distribution and a variable number of moments. The Python code (ch5_fig29.py) used to generate this figure can be found on the original paper’s GitHub repository.

(iii) Multi-State Promoter.

This particular point is something that we are still exploring from a theoretical perspective. We have shown that, to capture the single-molecule mRNA FISH data, a single-state promoter would not suffice. This model predicts a Poisson distribution as the steady-state, and the data shows super Poissonian noise. Given the bursty nature of gene expression, we opt to use a two-state promoter to reflect effective transcriptionally “active” and “inactive” states. We are currently exploring alternative formulations of this model to turn it into a single state with a geometrically distributed burst size.

(iv) Optimal vs Log-Flat Distributions.

The relevance of having to use the Blahut-Arimoto algorithm to predict the maximum mutual information between input and outputs was to understand the best-case scenario. We show the comparison between theoretical and experimental input-output functions \(P(p \mid c)\) in Fig. 19. Given the good agreement between these distributions, we could compute the mutual information \(I(c; p)\) for any arbitrary input distribution \(P(c)\) and obtain a good agreement with the corresponding experimental mutual information.

The reason we opted to report the mutual information at the channel capacity was to put the results in context. By reporting the upper bound in performance of these genetic circuits, we can start to dissect how different molecular parameters such as repressor-DNA binding affinity or repressor copy number affect the ability of this genetic circuit to extract information from the environmental state.

Empirical Fits to Noise Predictions

(Note: The Python code used for the calculations presented in this section can be found in the following link as an annotated Jupyter notebook.)

In Fig. ¿fig:ch3_fig03?(C), we show that our minimal model has a systematic deviation on the gene expression noise predictions compared to the experimental data. This systematics will need to be addressed on an improved version of the minimal model presented in this work. To guide the insights into the origins of this systematic deviation in this appendix, we will explore the model’s empirical modifications to improve the agreement between theory and experiment.

Multiplicative Factor for the Noise

The first option we will explore is to modify our noise predictions by a constant multiplicative factor. This means that we assume that the relationship between our minimal model predictions and the data for noise in gene expression are of the form \[ \text{noise}_{\text{exp}} = \alpha \cdot \text{noise}_{\text{theory}}, \] where \(\alpha\) is a dimensionless constant to be fit from the data. The data, especially in Fig. 12, suggests that our predictions are within a factor of \(\approx\) two from the experimental data. To further check that intuition, we performed a weighted linear regression between the experimental and theoretical noise measurements. The weight for each datum was proportional to the bootstrap errors in the noise estimate (this to have poorly determined noises weigh less during the linear regression). This regression with no intercept shows that a factor of two systematically improves the theoretical vs. experimental predictions. Fig. 30 shows the improved agreement when the noise’s theoretical predictions are multiplied by \(\approx 1.5\).

Figure 30: Multiplicative factor in improving theoretical vs. experimental comparison of noise in gene expression. Theoretical vs. experimental noise both in linear (left) and log (right) scale. The dashed line shows the identity line of slope 1 and intercept zero. All data are colored by the corresponding experimental fold-changes in gene expression as indicated by the color bar. The x-axis was multiplied by a factor of \approx 1.5 as determined by linear regression from the data in Fig. 11. Each datum represents a single date measurement of the corresponding strain and IPTG concentration with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. The Python code (ch5_fig30.py) used to generate this figure can be found on the original paper’s GitHub repository..

For completeness, Fig. 31 shows the noise in gene expression as a function of the inducer concentration, including this factor of \(\approx 1.5\). Thus, overall a simple multiplicative factor improves the predictive power of the model.

Figure 31: Protein noise of the regulated promoter with multiplicative factor. Comparison of the experimental noise for different operators ((A) O1, \Delta\varepsilon_r = -15.3 \; k_BT, (B) O2, \Delta\varepsilon_r = -13.9 \; k_BT, (C) O3, \Delta\varepsilon_r = -9.7 \; k_BT) with the theoretical predictions for the multi-promoter model. Linear regression revealed that multiplying the theoretical noise prediction by a factor of \approx 1.5 would improve agreement between theory and data. Points represent the experimental noise as computed from single-cell fluorescence measurements of different E. coli strains under 12 different inducer concentrations. The dotted line indicates the plot in linear rather than logarithmic scale. Each datum represents a single date measurement of the corresponding strain and IPTG concentration with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. White-filled dots are plot at a different scale for better visualization. The Python code (ch5_fig31.py) used to generate this figure can be found on the original paper’s GitHub repository.

Additive Factor for the Noise

As an alternative way to empirically improve our model’s predictions, we will now test the idea of an additive constant. What this means is that our minimal model underestimates the noise in gene expression as \[ \text{noise}_{\text{exp}} = \beta + \text{noise}_{\text{theory}}, \] where \(\beta\) is an additive constant to be determined from the data. As with the multiplicative constant, we performed a regression to determine this empirical additive constant, comparing experimental and theoretical gene expression noise values. We use the error in the 95% bootstrap confidence interval as a weight for the linear regression. Fig. 32 shows the resulting theoretical vs. experimental noise where \(\beta \approx 0.2\). We can see a great improvement in the agreement between theory and experiment with this additive constant.

Figure 32: Additive factor in improving theoretical vs. experimental comparison of noise in gene expression. Theoretical vs. experimental noise both in linear (left) and log (right) scale. The dashed line shows the identity line of slope 1 and intercept zero. All data are colored by the corresponding experimental fold-change in gene expression as indicated by the color bar. A value of \approx 0.2 was added to all values in the x-axis as determined by linear regression from the data in Fig. 11. Each datum represents a single date measurement of the corresponding strain and IPTG concentration with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. The Python code (ch5_fig32.py) used to generate this figure can be found on the original paper’s GitHub repository..

For completeness, Fig. 33 shows the noise in gene expression as a function of the inducer concentration, including this additive factor of \(\beta \approx 0.2\). If anything, the additive factor seems to improve the agreement between theory and data even more than the multiplicative factor.

Figure 33: Protein noise of the regulated promoter with an additive factor. Comparison of the experimental noise for different operators ((A) O1, \Delta\varepsilon_r = -15.3 \; k_BT, (B) O2, \Delta\varepsilon_r = -13.9 \; k_BT, (C) O3, \Delta\varepsilon_r = -9.7 \; k_BT) with the theoretical predictions for the multi-promoter model. Linear regression revealed that an additive factor of \approx 0.2 to the theoretical noise prediction would improve agreement between theory and data. Points represent the experimental noise as computed from single-cell fluorescence measurements of different E. coli strains under 12 different inducer concentrations. The dotted line indicates the plot in linear rather than logarithmic scale. Each datum represents a single date measurement of the corresponding strain and IPTG concentration with \geq 300 cells. The points correspond to the median, and the error bars correspond to the 95% confidence interval as determined by 10,000 bootstrap samples. White-filled dots are plot at a different scale for better visualization. The Python code (ch5_fig33.py) used to generate this figure can be found on the original paper’s GitHub repository.

Correction Factor for Channel Capacity with a Multiplicative Factor

A constant multiplicative factor can reduce the discrepancy between the model predictions and the data concerning the noise (standard deviation/mean) in protein copy numbers. Finding the equivalent correction for the channel capacity requires gaining insights from the so-called small noise approximation  [34]. The small noise approximation assumes that the input-output function can be modeled as a Gaussian distribution in which the standard deviation is small. Using these assumptions, one can derive a closed-form for the channel capacity. Although our data and model predictions do not satisfy the small noise approximation requirements, we can gain some intuition for how the channel capacity would scale given a systematic deviation in the cell-to-cell variability predictions compared with the data.

Using the small noise approximation, one can derive the form of the input distribution at channel capacity \(P^*(c)\). To do this, we use the fact that there is a deterministic relationship between the input inducer concentration \(c\) and the mean output protein value \(\left\langle p \right\rangle\), therefore we can work with \(P(\left\langle p \right\rangle)\) rather than \(P(c)\) since the deterministic relation allows us to write \[ P(c) dc = P(\left\langle p \right\rangle) d\left\langle p \right\rangle. \] Optimizing over all possible distributions \(P(\left\langle p \right\rangle)\) using calculus of variations results in a distribution of the form \[ P^*(\left\langle p \right\rangle) = \frac{1}{\mathcal{Z}} \frac{1}{\sigma_p(\left\langle p \right\rangle)}, \] where \(\sigma_p(\left\langle p \right\rangle)\) is the standard deviation of the protein distribution as a function of the mean protein expression, and \(\mathcal{Z}\) is a normalization constant defined as \[ \mathcal{Z} \equiv \int_{\left\langle{p(c=0)}\right\rangle} ^{\left\langle{p(c\rightarrow \infty)}\right\rangle} \frac{1}{\sigma_p(\left\langle p \right\rangle)} d\left\langle p \right\rangle. \] Under these assumptions, the small noise approximation tells us that the channel capacity is of the form  [34] \[ I = \log_2 \left( \frac{\mathcal{Z}}{\sqrt{2 \pi e}} \right). \]

From the theory-experiment comparison we know that the standard deviation predicted by our model is systematically off by a factor of two compared to the experimental data, i.e., \[ \sigma_p^{\exp} = 2 \sigma_p^{\text{theory}}. \] This then implies that the normalization constant \(\mathcal{Z}\) between theory and experiment must follow a relationship of the form \[ \mathcal{Z}^{\exp} = \frac{1}{2} \mathcal{Z}^{\text{theory}}. \] With this relationship, the small noise approximation would predict that the difference between the experimental and theoretical channel capacity should be of the form \[ I^{\exp} = \log_2 \left( \frac{\mathcal{Z}^{\exp}}{\sqrt{2 \pi e}} \right) = \log_2 \left( \frac{\mathcal{Z}^{\text{theory}}}{\sqrt{2 \pi e}} \right) - \log_2(2). \] Therefore under the small noise approximation, we would expect our predictions for the channel capacity to be off by a constant of 1 bit (\(\log_2(2)\)) of information. Again, the conditions for the small noise approximation do not apply to our data given the intrinsic level of cell-to-cell variability in the system; nevertheless, what this analysis tells us is that we expect that an additive constant should be able to explain the discrepancy between our model predictions and the experimental channel capacity. To test this hypothesis, we performed a “linear regression” between the model predictions and the experimental channel capacity with a fixed slope of 1. The intercept of this regression, -0.56 bits, indicates the systematic deviation we expect should explain the difference between our model and the data. Fig. 34 shows the comparison between the original predictions shown in Fig. 5(A) and the resulting predictions with this shift. Thus, other than the data with zero channel capacity, this shift can correct the systematic deviation for all data. We, therefore, conclude that our model ends up underestimating the experimentally determined channel capacity by a constant amount of 0.43 bits.

Figure 34: Additive correction factor for channel capacity. Solid lines represent the theoretical predictions of the channel capacity shown in (A). The dashed lines show the resulting predictions with a constant shift of -0.43 bits. Points represent single biological replicas of the inferred channel capacity. The Python code (ch5_fig34.py) used to generate this figure can be found on the original paper’s GitHub repository.

Derivation of the Steady-State mRNA Distribution

In this section, we will derive the two-state promoter mRNA distribution we quote in Sec. 5.2. For this method, we will make use of the so-called generating functions. Generating functions are mathematical objects on which we can encode a series of infinite numbers as coefficients of a power series. The power of generating functions comes from the fact that we can convert an infinite-dimensional system of coupled ordinary differential equations–in our case, the system of differential equations defining all probabilities \(P(m, t)\) for \(m \in \mathbb{Z}\)–into a single partial differential equation that we can then solve to extract back the probability distributions.

To motivate the use of generating functions, we will begin with the simplest case: the one-state Poisson promoter.

One-state Poisson promoter

We begin by defining the reaction scheme that defines the one-state promoter. Fig. 35 shows the schematic representation of the Poisson promoter as a simple cartoon (part (A)) and as the Markov chain that defines the state space of the system (part (B)).

Figure 35: One-state Poisson promoter. (A) Schematic of the kinetics of the one-state promoter. mRNA is produced and degrade stochastically with a rate r_m and \gamma_m, respectively. (B) Representation of the Markov chain for the state space that the promoter can be. The distribution P(m, t) represents the probability of having a certain discrete number of mRNA m at time t. The transition between states depends on the previously mentioned rates.

The dynamics of the probability distribution \(P(m, t)\) are governed by the chemical master equation \[ \frac{d P(m, t)}{dt} = \overbrace{r_m P(m-1, t)}^{m-1 \rightarrow m} - \overbrace{r_m P(m, t)}^{m \rightarrow m+1} + \overbrace{\gamma_m (m+1) P(m+1, t)}^{m+1 \rightarrow m} - \overbrace{\gamma_m m P(m, t)}^{m \rightarrow m-1}. \label{eq:one_state_master} \]

When solving for the distribution, our objective is to obtain the equation that defines \(P(m, t)\) for all possible values of \(m \in \mathbb{Z}\). The power of the generating functions is that these probability distribution values are used as a power series’s coefficients. To make this clear, let us define the generating function \(G(z, t)\) as \[ G(z, t) \equiv \sum_{m=0}^\infty z^m P(m, t), \label{eq:gen_fn_def} \] where \(z\) is a “dummy” variable that we do not care about. The reason this is useful is that if we find the closed-form solution for this generating function, and we are able to split the factor \(z^m\) from its coefficient \(P(m, t)\), then we will have to find the solution for the distribution. Furthermore, the generating function allows us to compute the moments of the distribution. For example, for the zeroth moment \(\langle m^0 \rangle\), we know that \[ \langle m^0 \rangle = \sum_{m=0}^\infty m^0 P(m, t) = 1, \] i.e., this is the normalization constraint of the distribution. From the definition of the generating function, we can then see that \[ G(1, t) = \sum_{m=0}^\infty 1^m P(m, t) = 1. \label{eq:generating_norm} \] Furthermore, the first moment of the distribution is defined as \[ \langle m \rangle = \sum_{m=0}^\infty m P(m, t). \] From the definition of the generating function, we can construct this quantity by computing \[ \left. \frac{\partial G(z, t)}{\partial z} \right\vert_{z=1} = \frac{\partial}{\partial z} \left[ \sum_{m=0}^\infty z^m P(m, t) \right]_{z=1} = \sum_{m=0}^\infty m P(m, t). \] Therefore we have that \[ \langle m \rangle = \left. \frac{\partial G(z, t)}{\partial z} \right\vert_{z=1}. \label{eq:first_mom_gen} \] Similar constructions can be built for higher moments of the distribution.

Let us then apply the definition of the generating function to Eq. \(\ref{eq:one_state_master}\). For this, we multiply both sides by \(z^m\) and sum over all values of \(m\), obtaining \[ \begin{split} \sum_{m=0}^\infty z^m \frac{d P(m, t)}{dt} &= \sum_{m=0}^\infty z^m \left[ r_m P(m-1, t) - r_m P(m, t) \right. \\ &+ \left. \gamma_m (m+1) P(m+1, t) - \gamma_m m P(m, t) \right]. \end{split} \] Distributing the sum, we find \[ \begin{split} \frac{d}{dt} \sum_{m=0}^\infty z^m P(m, t) &= \sum_{m=0}^\infty z^m r_m P(m-1, t) - \sum_{m=0}^\infty z^m r_m P(m, t) \\ &+ \sum_{m=0}^\infty z^m \gamma_m (m+1) P(m+1, t) - \sum_{m=0}^\infty z^m \gamma_m m P(m, t). \end{split} \label{eq:one_state_master_sum} \] We see that the terms involving \(z^m P(m, t)\) can be directly substituted with Eq. \(\ref{eq:gen_fn_def}\). For the other terms, we have to be slightly more clever. The first trick will allow us to rewrite the term involving \(z^m m P(m, t)\) as \[ \begin{aligned} \sum_{m} z^{m} \cdot m \cdot P(m, t) &= \sum_{m} z \frac{\partial z^{m}}{\partial z} P(m, t), \\ &=\sum_{m} z \frac{\partial}{\partial z}\left(z^{m} P(m, t)\right), \\ &=z \frac{\partial}{\partial z}\left(\sum_{m} z^{m} P(m, t)\right), \\ &=z \frac{\partial G(z, t)}{\partial z}. \end{aligned} \] Next, let us deal with the term involving \((m+1)\). We first define \(k = m + 1\). With this, we can write \[ \begin{aligned} \sum_{m=0}^{\infty} z^{m} \cdot(m+1) \cdot P(m+1, t) &= \sum_{k=1}^{\infty} z^{k-l} \cdot k \cdot P(k, t), \\ &=z^{-1} \sum_{k=1}^{\infty} z^{k} \cdot k \cdot P(k, t), \\ &=z^{-1} \sum_{k=0}^{\infty} z^{k} \cdot k \cdot P(k, t), \\ &=z^{-1}\left(z \frac{ \partial G(z)}{\partial z}\right), \\ &=\frac{\partial G(z)}{\partial z}, \end{aligned} \] where for the third step, we reindexed the sum to include \(k=0\) since it does not contribute to the total sum. Finally, for the term involving \(P(m-1, t)\), we define \(k = m-1\). This allows us to rewrite the term as \[ \begin{aligned} \sum_{m=0}^{\infty} z^{m} P(m-1, t) &=\sum_{k=-1}^{\infty} z^{k+1} P(k, t), \\ &=\sum_{k=0}^{\infty} z^{k+1} P(k, t), \\ &=z \sum_{k=0}^{\infty} z^{k} P(k, t), \\ &=z G(z, t). \end{aligned} \] For the second step, we reindexed the sum from \(-1\) to \(0\) since \(P(-1, t) = 0\).

All of these clever reindexing allows us to rewrite Eq. \(\ref{eq:one_state_master_sum}\) as \[ \frac{\partial G(z, t)}{\partial t} = r z G(z, t)-r G(z, t) + \gamma \frac{\partial G(z, t)}{\partial z} - \gamma z \frac{\partial G(z, t)}{\partial z}. \] Factorizing terms, we have \[ \frac{\partial G(z, t)}{\partial t} =-r G(z, t)(1-z) +\gamma \frac{\partial G(z, t)}{\partial z}(1-z). \] Let us appreciate how beautiful this is: we took an infinite-dimensional system of ordinary differential equations—the master equation—and turned it into a single partial differential equation (PDE). All we have to do now is solve this PDE, and then transform the solution into a power series to extract the distribution.

Let us focus on the steady-state case. For this, we set the time derivative to zero. Doing this cancels the \((1-z)\) term, leaving a straightforward ordinary differential equation for \(G(z)\) \[ \frac{dG(z)}{dz} = \frac{r}{\gamma} G(z). \] Solving this equation by separation of variables results in \[ G(z) = C e^{\frac{r}{\gamma}z}. \] To obtain the integration constant, we use the normalization condition of the probability distribution (Eq. \(\ref{eq:generating_norm}\)), obtaining \[ 1 = C e^{\frac{r}{\gamma}} \Rightarrow C = e^{-\frac{r}{\gamma}}. \] This means that the generating function takes the form \[ G(z) = e^{-\frac{r}{\gamma}} e^{\frac{r}{\gamma}z}. \] All we have left is trying to rewrite the generating function as a power series on \(z\). If we succeed in doing so, we will have recovered the probability distribution \(P(m, t)\). For this, we simply use the Taylor expansion of \(e^x\), obtaining \[ G(z) = e^{-\frac{r}{\gamma}} \sum_{m=0}^\infty \frac{\left( \frac{r}{\gamma}z \right)^m}{m!}. \] From this form, it becomes clear how to split the \(z^m\) term from the coefficient that, by the definition of the generating function, is the probability distribution we are looking for. The separation takes the form \[ G(z)=\sum_{m=0}^{\infty} z^{m} \left[\frac{e^{-r / \gamma}\left(\frac{r}{\gamma}\right)^{m}}{m !}\right], \] where we can see that we recover the expected Poisson distribution for this one-state promoter \[ P(m) = e^{-r / \gamma}\frac{\left(\frac{r}{\gamma}\right)^{m}}{m !}. \]

Two-state promoter

Having shown the generating function’s power, let us now turn our attention to the relevant equation we are after: the two-state mRNA distribution. This model assumes that the promoter can exist in two discrete states (see Fig. 36(A)): a transcriptionally active state \(A\) from which transcription can take place at a constant rate \(r_m\), and an inactive state \(I\) where no transcription takes place. The mRNA is stochastically degraded with a rate \(\gamma_m\) regardless of the state of the promoter. Fig. 36(B) shows the Markov chain that connects all of the possible states of the promoter. For this particular case, there are not only “horizontal” transitions where the mRNA copy number changes, but “vertical” transitions where only the promoter’s state changes. Because of this, we need to define two coupled master equations that take the form \[ \begin{aligned} \frac{d P_{A}(m, t)}{d t} &=-k^{(p)}_{\text{off}} P_{A}(m, t) + k^{(p)}_{\text{on}} P_{I}(m, t) \\ &+\gamma_m (m+1) P_{A}(m+1, t) - \gamma_m m P_{n}(m, t) \\ &+r_m P_{A}(m-1, t)-r_m P_{A}(m, t) \\ \end{aligned} \] for the active state, and \[ \begin{aligned} \frac{d P_{I}(m, t)}{dt} &=k^{(p)}_{\text{off}} P_{A}(m, t)- k^{(p)}_{\text{on}} P_{I}(m, t) \\ &+\gamma_m (m+1) P_{I}(m+1, t)-\gamma_m m P_{I}(m, t), \end{aligned} \] for the inactive state.

Figure 36: Two-state Poisson promoter. (A) Schematic of the kinetics of the two-state promoter. The promoter is imagined to exist in two states—a transcriptionally active state A and an inactive state I. The transition between these states is governed by the rates k^{(p)}_{\text{on}} and k^{(p)}_{\text{off}} mRNA is produced and degrade stochastically with a rate r_m and \gamma_m, respectively. (B) Representation of the Markov chain for the state space that the promoter can be in. The distribution P(m, t) represents the probability of having a certain discrete number of mRNA m at time t. The transition between states depends on the previously mentioned rates.

Obtaining the partial differential equation for the generating function

The first thing we must do is to transform this infinite-dimensional system of ordinary differential equations in \(m\) to a single partial differential equation using the generating function. For this particular case, there are two generating functions of the form \[ G_x(z, t) = \sum_{m=0}^\infty z^m P_x(m, t), \] where \(x \in \{A, I \}\). The probability of having \(m\) mRNA at time \(t\) regardless of the promoter state is given by \[ P(m, t) = P_A(m, t) + P_I(m, t). \] Therefore, the corresponding generating function for the whole system is given by \[ G(z, t) = G_A(z, t) + G_I(z, t). \]

As with the one-state promoter case, let us transform our master equations by multiplying both sides by \(z^m\) and sum over all \(m\). For the active state \(A\), we have \[ \begin{aligned} \sum_{m} z^{m} \frac{d P_{A}(m, t)}{d t} = \sum_{m} z^{m} &\left[-k^{(p)}_{\text{off}} P_{A}(m, t) + k^{(p)}_{\text{on}} P_{I}(m, t)\right.\\ &+ \gamma_m (m+1) P_{A}(m+1, t)-\gamma_m m P_{m}(m, t) \\ &\left. + r_m P_{A}(m-1, t) - r_m P_{A}(m, t)\right]. \end{aligned} \] After distributing the sum, we can use the tricks from the previous section, allowing us to write this as a partial differential equation of the form \[ \begin{aligned} \frac{\partial G_{A}(z, t)}{\partial t} &= -k^{(p)}_{\text{off}} G_{A}(z, t)+k^{(p)}_{\text{on}} G_{I}(z, t) \\ &-\gamma_m(z-1) \frac{\partial G_A(z, t)}{\partial z} +r_m(z-1) G_{A}(z, t). \end{aligned} \label{eq:gn_fn_act} \] An equivalent process can be done for the inactive state I, obtaining \[ \begin{aligned} \frac{\partial G_{I}(z, t)}{\partial t} &= k^{(p)}_{\text{off}} G_{A}(z, t) - k^{(p)}_{\text{on}} G_{I}(z, t) \\ &-\gamma_m(z-1) \frac{\partial G A(z, t)}{\partial z} +r_m(z-1) G_{I}(z, t). \end{aligned} \label{eq:gn_fn_inact} \] We turned the infinite-dimensional system of ordinary differential equations into a system of two coupled partial differential equations. Let us transform the equations further. Since we have a common term \((z - 1)\), it will be convenient to define \(v \equiv (z -1)\). From the chain rule, it follows that \[ d v=d(z-1)=d z \Rightarrow \frac{\partial G}{\partial v} = \frac{\partial G}{\partial z} \frac{d z}{d v}. \] Making this substitution in Eqs. \(\ref{eq:gn_fn_act}\) and \(\ref{eq:gn_fn_inact}\) results in \[ \begin{aligned} \frac{\partial G_{A}(v, t)}{\partial t} &=-k^{(p)}_{\text{off}} G_{A}(v, t) + k^{(p)}_{\text{on}} G_{I}(v, t) \\ &-\gamma_m v \frac{\partial G_{A}(v, t)}{\partial v} + r_m v G_{A}(v, t) \\ \end{aligned} \] for the inactive state, and \[ \begin{aligned} \frac{\partial G I(v, t)}{\partial t}=& k^{(p)}_{\text{off}} G_{A}(v, t) - k^{(p)}_{\text{on}} G_{I}(v, t) \\ &-r_m v \frac{\partial G_{I}(v, t)}{\partial v}, \end{aligned} \] for the active state.

Since we care about the steady-state distribution, it is at this point that we set the time derivative of both equations to zero. Doing this results in \[ \gamma_{m} v \frac{d G_{A}(v)}{d v}= -k^{(p)}_{\text{off}} G_{A}(v) + k^{(p)}_{\text{on}} G_{I}(v). + r_m v G_{A}(v), \label{eq:steady_act} \] and \[ \gamma_{m} v \frac{d G_{I}(v)}{d v}= k^{(p)}_{\text{off}} G_{A}(v) - k^{(p)}_{\text{on}} G_{I}(v). \label{eq:steady_inact} \] Adding Eqs. \(\ref{eq:steady_act}\) and \(\ref{eq:steady_inact}\) gives a simple result \[ \gamma_m \frac{d G(v)}{dv} = r_m G_A(v). \label{eq:gen_fn_rel} \]

Our objective is not to write Eqs. \(\ref{eq:steady_act}\) and \(\ref{eq:steady_inact}\) as a function of only one of the generating functions, i.e., we want two independent differential equations. These equations are both functions of \(G_A(v)\) and \(G_I(v)\), but Eq. \(\ref{eq:gen_fn_rel}\) tells us how to relate both generating functions via the first derivative. This suggests that taking another derivative of Eqs. \(\ref{eq:steady_act}\) and \(\ref{eq:steady_inact}\) with respect to \(z\) could be useful. Let us go ahead and compute these derivatives. For the active state, we find \[ \small \gamma_m \frac{d G_{A}(v)}{d v} + \gamma_{m} v \frac{d^{2} G_{A}(v)}{d v^{2}} = -k^{(p)}_{\text{off}} \frac{d G_{A}(v)}{d v} + k^{(p)}_{\text{on}} \frac{d \sigma I(v)}{d v} + r_m G_{A}(v, t) + r_m v \frac{d G_{A}(v)}{d v}. \] Upon simplification, we can write this equation as \[ \gamma_m v \frac{d^2 G_A}{d v^{2}} + \left(\gamma_m +k^{(p)}_{\text{off}} -r_m v\right) \frac{d G_A}{d v} -k^{(p)}_{\text{on}} \frac{d G_I}{d v} -r_m G_{A}(v)=0. \label{eq:gen_fn_2nd_act} \] From Eq. \(\ref{eq:gen_fn_rel}\), we have that \[ \frac{G_I}{dv} = \frac{r_m}{\gamma_m} G_A(v) - \frac{d G_A}{dv}. \] Substituting this into Eq. \(\ref{eq:gen_fn_2nd_act}\) results in \[ \gamma_m v \frac{d^{2} G_A}{d v^{2}} + \left(\gamma_m + k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}} - r_m v \right) \frac{d G_A}{d v} - r_m \left(1+\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right) G_A(v) = 0. \label{eq:gen_2nd_act_final} \] For the inactive state, upon taking a derivative with respect to \(v\), we find \[ \gamma_{m} v \frac{d^{2} G_{I}}{d v^{2}} + \left(\gamma_m + k^{(p)}_{\text{on}}\right) \frac{d G_I}{d v} - k^{(p)}_{\text{off}} \frac{d G_A}{d v} = 0. \label{eq:gen_fn_2nd_inact} \] Again from Eq. \(\ref{eq:gen_fn_rel}\), we have that \[ \frac{d G_A}{dv} = \frac{r_m}{\gamma_m} G_A - \frac{d G_I}{dv}. \] Substituting this result into Eq. \(\ref{eq:gen_fn_2nd_inact}\) gives \[ \gamma_{m} v \frac{d^{2} G_{I}}{d v^{2}} + \left(\gamma_m+k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}\right) \frac{d G_I}{d v} - \frac{k^{(p)}_{\text{off}} r_m}{\gamma_m} G_{A}(v)=0. \label{eq:gen_fn_2nd_inact_2} \] So far, we have not removed the dependence on \(G_A(v)\). But we notice that from Eq. \(\ref{eq:steady_inact}\), we have that \[ G_A(v) = \frac{\gamma_m v}{k^{(p)}_{\text{off}}} \frac{d G_I}{d v} + \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{off}}} G_I. \] Using this identity allows us to write Eq. \(\ref{eq:gen_fn_2nd_inact_2}\) as \[ \gamma_{m} v \frac{d^{2} G_{I}}{d v^{2}} + \left(\gamma_m + k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} -r_m v\right) \frac{d G_I}{d v} - \frac{k^{(p)}_{\text{on}} r_m}{\gamma_m} G_I = 0. \label{eq:gen_2nd_inact_final} \]

To obtain a single partial differential equation, we add Eqs. \(\ref{eq:gen_2nd_act_final}\) and \(\ref{eq:gen_2nd_inact_final}\), obtaining \[ \gamma_m v \frac{d^{2} G}{d v^{2}} + \left(\gamma_m + k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}} -r_m v\right) \frac{d G}{d v} - \frac{r_m k^{(p)}_{\text{on}}}{\gamma_m} G(v) - r_m G_A(v) = 0, \] where we substituted \(G_A(v) + G_I(v) = G(v)\). To remove the last \(G_A(v)\), we utilize again Eq. \(\ref{eq:gen_fn_rel}\), obtaining \[ \gamma_m v \frac{d^2 G}{dv^2} + \left( k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}} - r_m v \right) \frac{dG}{dv} - \frac{r_m k^{(p)}_{\text{on}}}{\gamma_m} G(v) = 0. \label{eq:gen_2nd} \]

Solving the partial differential equation

Eq. \(\ref{eq:gen_2nd}\) looks almost like the so-called Kummer’s equation also known as the confluent hypergeometric differential equation—a second order differential equation of the form \[ z \frac{d^2w}{dz^2} + (b - z) \frac{dw}{dz} - aw = 0. \label{eq:kummer} \] The solution for the Kummer equation can be expressed as the sum of two functions: 1. The confluent hypergeometric function of the first kind, 2. The Tricomi function. This is written as \[ w(z) = A {}_1F_1(a, b, z) + B z^{1-b} {}_1 F_1(a+1-b, 2-b, z), \label{eq:kummer_sol} \] where \(A\) and \(B\) are constants, and \({}_1F_1\) is the confluent hypergeometric function of the first kind defined as \[ {}_1F_1(a, b, z) = \sum_{m=0}^{\infty} \frac{a^{(m)}z^n}{b^{(m)} m!}, \] where \(a^{(n})\) and \(b^{(n)}\) are the rising factorials, i.e., \[ a^{(0)} = 1, \] and \[ a^{(n)} = a (a + 1) (a + 2) \cdots (a + n - 1). \]

To write Eq. \(\ref{eq:gen_2nd}\) in the form of Eq. \(\ref{eq:kummer}\), we can define \(s \equiv r_m v / \gamma_m\). The chain rule tells us that \[ ds = \frac{r_m}{\gamma_m} dv \Rightarrow \frac{dG}{ds} = \frac{dG}{dv}\frac{dv}{ds} = \frac{\gamma_m}{r_m} \frac{dG}{dv}. \] From the chain rule, we also conclude that \[ \frac{d^2G}{ds^2} = \frac{d}{dv} \left( \frac{dG}{dv} \frac{dv}{ds} \right) \frac{dv}{ds} = \frac{\gamma_m ^2}{r_m^2} \frac{d^2G}{d v^2}. \] So the three relationships of \(v\) with \(s\) that we have derived take the form \[ v = \frac{\gamma_m}{r_m} s, \; \frac{dG}{dv} = \frac{r_m}{\gamma_m} \frac{dG}{ds}, \; \text{and } \frac{d^2 G}{dv^2} = \frac{r_m^2}{\gamma_m^2} \frac{d^2G}{dv^2}. \] Substituting these definitions results in \[ \gamma_m \left( \frac{\gamma_m}{r_m} s \right) \frac{r_m^2}{\gamma_m^2} \frac{d^2 G}{ds^2} + \left[ k^{(p)}_{\text{off}} + k^{(p)}_{\text{on}} - r_m \left( \frac{\gamma_m}{r_m} s \right) \right] \frac{r_m}{\gamma_m} \frac{dG}{ds} - \frac{r_m k^{(p)}_{\text{on}}}{\gamma_m} G(s) = 0. \] Upon simplifying terms, we find an equation that is now in the form of Eq. \(\ref{eq:kummer}\) \[ s \frac{d^2 G}{ds^2} + \left(\frac{k^{(p)}_{\text{off}} + k^{(p)}_{\text{off}}}{\gamma_m} - s \right) \frac{dG}{ds} - \frac{k^{(p)}_{\text{on}}}{\gamma_m} G(s) = 0. \label{eq:gen_kummer} \]

Having put this in the form of the Kummer Eq., we can use Eq. \(\ref{eq:kummer_sol}\) to write \(G(s)\) as \[ \begin{aligned} G(s) &= A {}_1F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m}, \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, s \right) \\ &+ B s^{1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}} {}_1 F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + 1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, 2 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, s \right). \end{aligned} \] We can write down this solution in terms of the original variable of the generating function. We have that \(s = r_m/\gamma_m v\), and \(v = z - 1\). With this, we then write \[ \begin{aligned} G(z) = &A {}_1F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m}, \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, \frac{r_m}{\gamma_m}(z - 1) \right) + B \left[\frac{r_m}{\gamma_m}(z - 1)\right] ^{1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}} \times \\ &{}_1 F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + 1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, 2 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, \frac{r_m}{\gamma_m}(z - 1) \right). \end{aligned} \label{eq:gen_sol} \]

Finding the Coefficients for the Solution

We can now use the normalization condition for the generating function; this is, \[ G(1) = \sum_{m=0}^\infty 1^m P(m) = 1. \] Evaluating \(z=1\) in Eq. \(\ref{eq:gen_sol}\) results in \[ G(1) = A {}_1F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m}, \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, 0 \right). \label{eq:gen_sol_1} \] Let us look at the hypergeometric function evaluated of the form \({}_1F_1(a, b , 0)\). This takes the form \[ {}_1 F_{1}(a, b, 0)=\sum_{m=0}^{\infty} \frac{a^{(m)} 0^{n}}{b^{(m)} m!}. \] All of the terms but one (\(n = 0\)) are zero. The first term involving \(0^0\) is undefined. Taking the limit as \(z \rightarrow 0\) from the positive side, we find \[ {}_1F_{1}(a, b, 0) = \lim _{z \rightarrow 0^{+}} {}_1 F_{1}(a, b, z) = \lim _{z \rightarrow 0^{+}} z^{0} = 1. \] Using this property in Eq. \(\ref{eq:gen_sol_1}\) tells us that \(A = 1\).

We do not have another constraint for \(B\). Nevertheless, recall that Eq. \(\ref{eq:first_mom_gen}\) tells us how to compute the first moment of the distribution from the generating function. For this, we need to compute the derivative of the confluent hypergeometric function. Let us derive this identity. Rather than computing the derivative directly, we will compute \[ z \frac{d}{dz}{}_1F_1 = z \frac{d}{dz} \left[ \sum_{m=0}^{\infty} \frac{a^{(m)} z^{m}}{b^{(m)} m!}\right]. \] Taking the derivative inside the sum gives \[ z \frac{d}{dz}{}_1F_1 = z \left[ \sum_{m=0}^{\infty} \frac{a^{(m)} }{b^{(m)} m!} \frac{d}{dz} z^{m} \right] = \left[ \sum_{m=0}^{\infty} \frac{a^{(m)} }{b^{(m)}} \frac{m z^m}{m!} \right] . \] Simplifying the term \(m/m!\) gives \[ z \frac{d}{dz}{}_1F_1 = \left[ \sum_{m=0}^{\infty} \frac{a^{(m)} }{b^{(m)}} \frac{z^m}{(m-1)!} \right] . \label{eq:confluent_1} \] Note that the rising factorials can be rewritten as \[ \begin{aligned} a^{(m)} &=a(a+1)(a+2) \cdots(a+m-1) \\ &=a \cdot(a+1)[(a+1)+1][(a+1)+2] \cdots[(a+1)+m - 2] \\ &=a \cdot(a+1)^{(m-1)} . \end{aligned} \] Therefore, we can rewrite Eq. \(\ref{eq:confluent_1}\) as \[ \begin{aligned} \sum_{m=0}^{\infty} \frac{a^{(m)}}{b^{(m)}} \frac{z^{m}}{(m-1) !} &= \sum_{m=0}^{\infty} \frac{a \cdot(a+1)^{(m-1)}}{b \cdot(b+1)^{(m-1)}} \frac{z \cdot z^{(m-1)}}{(m-1) !}, \\ &=\frac{a z}{b} \sum_{m=0}^{\infty} \frac{(a+1)^{(m-1)}}{(b+1)^{(m-1)}} \frac{z^{m-1}}{(m-1) !}. \end{aligned} \] If we define \(m' = m - 1\), we have \[ \frac{a z}{b} \sum_{m=0}^{\infty} \frac{(a+1)^{(m-1)}}{(b+1)^{(m-1)}} \frac{z^{m-1}}{(m-1) !} = \frac{a z}{b} \sum_{m'=-1}^{\infty} \frac{(a+1)^{m'}}{(b+1)^{m'}} \frac{z^{m'}}{m'!}. \] The term on the left is almost of the form of the confluent hypergeometric function again. The only difference is that the sum starts at \(m' = -1\). This first term of the sum would then involve a term of the form \(1 / (-1)!\) But what does this even mean? To find this out, we can generalize the factorial function using the Gamma function such that \[ (x - 1)! = \Gamma(x). \] The Gamma function diverges as \(x \rightarrow 0\), therefore \(1/\Gamma(x) \rightarrow 0\) as \(x \rightarrow 0\). This means that the first term of the sum is zero, so we can begin the sum at \(m' = 0\), recovering a confluent hypergeometric function. With this, we find that \[ z \frac{d}{d z} {}_1F_1 = \frac{a z}{b} \sum_{m=0}^{\infty} \frac{(a+1)^{m}}{(b+1)^{m}} \frac{z^{m}}{m !} = \frac{a}{b} z_{1} F_{1}(a+1, b+1, z), \] therefore \[ \frac{d}{dz}{}_1F_1 = \frac{a}{b} {}_1F_1(a + 1, b + 1, z). \label{eq:gen_deriv} \]

After this small but necessary detour, we can come back to computing the first moment of our distribution from the generating function. To evaluate Eq. \(\ref{eq:first_mom_gen}\) on Eq. \(\ref{eq:gen_sol}\), we first compute the derivative of the generating function. This can be easily evaluated using the relationship we derived for derivatives of \({}_1F_1\). The only thing to be aware of is that of the chain rule. In particular for the third entry of the function, we have \(r_m / \gamma_m (z - 1)\) rather than simply \(z\) as we had in Eq. \(\ref{eq:gen_deriv}\). This means that by the chain rule, we have that if we define \(u = r_m / \gamma_m (z - 1)\), we have \[ du = \frac{r_m}{\gamma_m} dz \Rightarrow \frac{dG}{dz} = \frac{dG}{du} \frac{du}{dz} = \frac{dG}{du} \frac{r_m}{\gamma_m}. \] So there is an extra factor of \(r_m / \gamma_m\) that will come along when we compute the derivative of our generating functions. Computing the derivative of Eq. \(\ref{eq:gen_sol}\) results in \[ \small \begin{aligned} \frac{dG}{dz} &= \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} \frac{r_m}{\gamma_m} {}_1F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + 1, \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} + 1, \frac{r_m}{\gamma_m}(z - 1) \right) \\ & + B \left( 1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} \right) \left[\frac{r_m}{\gamma_m}(z - 1)\right] ^{\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}} \times \\ &{}_1 F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + 1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, 2 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, \frac{r_m}{\gamma_m}(z - 1) \right) \\ &+ B \left[\frac{r_m}{\gamma_m}(z - 1)\right] ^{1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}} \left( \frac{k^{(p)}_{\text{on}} + \gamma_m} {k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} + \gamma_m} \right) \frac{r_m}{\gamma_m} \times \\ &{}_1 F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + 2 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, 1 - \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, \frac{r_m}{\gamma_m}(z - 1) \right). \end{aligned} \] This rather convoluted result is enormously simplified upon evaluating the derivative at \(z = 1\) (see Eq. \(\ref{eq:first_mom_gen}\)). This results in \[ \left. \frac{dG}{dz} \right\vert_{z = 1} = \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} \frac{r_m}{\gamma_m} {}_1F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + 1, \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} + 1, 0 \right) = \frac{r_m}{\gamma_m} \frac{k^{(p)}_{\text{on}}}{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}, \] which is precisely the mean mRNA copy number we derived before. Since \(B\) does not contribute to the mean, we can safely assume that \(B = 0\). This means that the final result for the generating function takes the much more compact form \[ G(z) = {}_1F_1 \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m}, \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m}, \frac{r_m}{\gamma_m}(z - 1) \right). \label{eq:gen_final} \]

Extracting the steady-state mRNA distribution

Let us quickly recapitulate where we are. We started with a system of infinite many ordinary differential equations, one for each promoter state and mRNA copy number that defined the master equation for our two-state promoter. We then used the generating function to transform this system into a single partial differential equation. The resulting differential equation for the generating function took the form of the so-called Kummer differential equation, which has as a solution the confluent hypergeometric function and the Tricomi function. After imposing the normalization condition on the generating function, we found that the confluent hypergeometric function’s coefficient was \(A=1\). We then used the fact that the mean mRNA copy number \(\langle m \rangle\) exists to show that the Tricomi function’s coefficient is \(B=0\). All that effort lead us to Eq. \(\ref{eq:gen_final}\), the generating function for the two-state promoter mRNA steady-state distribution. All we have left is trying to beat Eq. \(\ref{eq:gen_final}\) into the form of a standard generating function to extract the probability distribution from it.

Let us begin this task by writing down Eq. \(\ref{eq:gen_final}\) with the full definition of the confluent hypergeometric function. This gives us \[ G(z) = \sum_{m=0}^\infty \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(m)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(m)} } \frac{ \left[\frac{r_m}{\gamma_m} (z-1) \right]^m }{ m! }. \] Let us now split apart the term \((z-1)\), obtaining \[ G(z) = \sum_{m=0}^\infty \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(m)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(m)} } \frac{ \left(\frac{r_m}{\gamma_m} \right)^m }{ m! } (z - 1)^m. \] We now rewrite this last term \((z-2)^m\) using the binomial expansion. This results in \[ G(z) = \sum_{m=0}^\infty \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(m)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(m)} } \frac{ \left(\frac{r_m}{\gamma_m} \right)^m }{ m! } \left[ \sum_{n=0}^m {m \choose n} z^n (-1)^{m - n} \right]. \] We can take out the sum over the index \(n\) to the front, obtaining \[ G(z) = \sum_{m=0}^\infty \sum_{n=0}^n \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(m)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(m)} } \frac{ \left(\frac{r_m}{\gamma_m} \right)^m }{ m! } \left[ {m \choose n} z^n (-1)^{m - n} \right]. \] To make further progress, we must reindex the sum. The trick is to reverse the default order of the sums as \[ \sum_{m=0}^{\infty} \sum_{n=0}^{m} = \sum_{n=0}^{\infty} \sum_{m=n}^{\infty}. \] To see the logic of the sum, we point the reader to Fig. 37. The key is to notice that the double sum \(\sum_{m=0}^\infty \sum_{n=0}^m\) is adding all possible pairs \((m, n)\) in the lower triangle, so we can add the terms vertically as the original sum indexing suggests, i.e. \[ \sum_{m=0}^{\infty} \sum_{n=0}^{m} x_{(m, n)}= x_{(0, 0)} + x_{(1, 0)} + x_{(1, 1)} + x_{(2, 0)} + x_{(2, 1)} + x_{(2, 2)} + \ldots, \] where the variable \(x\) is just a placeholder to indicate the order in which the sum is taking place. But we can also add the terms horizontally as \[ \sum_{n=0}^{\infty} \sum_{m=n}^{\infty} x_{(m, n)} = x_{(0, 0)} + x_{(1, 0)} + x_{(2, 0)} + \ldots + x_{(1,1)} + x_{(2, 1)} + \ldots, \] which still adds all of the lower triangle terms.

Figure 37: Reindexing double sum. Schematic for reindexing the sum \sum_{m=0}^\infty \sum_{n=0}^m. Blue circles depict the 2D grid of nonnegative integers restricted to the lower triangular part of the m, n plane. The trick is that this double sum runs over all (m, n) pairs with n\le m. Summing m first instead of n requires determining the boundary: the upper boundary of the n-first double sum becomes the lower boundary of the m-first double sum.

Rewriting the sum in this way results in \[ G(z) = \sum_{n=0}^\infty \sum_{m=n}^\infty \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(m)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(m)} } \frac{ \left(\frac{r_m}{\gamma_m} \right)^m }{ m! } \left[ {m \choose n} z^n (-1)^{m - n} \right]. \] This allows us to separate the variable \(z^n\) from the rest of the equation, leaving the standard format generating function to read the probability distribution \(P(m)\). This looks as \[ G(z) = \sum_{n=0}^\infty z^n \left[ \sum_{m=n}^\infty \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(m)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(m)} } \frac{ \left(\frac{r_m}{\gamma_m} \right)^m }{ m! } {m \choose n} (-1)^{m - n} \right]. \] Given the “dummy” nature of \(z\), it does not matter what the sum variable name is. We can simply rename \(m = n\) and \(n = m\) and conclude that our distribution takes the form \[ P(m) = \sum_{n=m}^\infty \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(n)} }{ \left(\frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(n)} } \frac{ \left(\frac{r_m}{\gamma_m} \right)^n }{ n! } \frac{n!}{m! (n - m)!} (-1)^{n - m}. \label{eq:prob_ss_1} \]

We can simplify Eq. \(\ref{eq:prob_ss_1}\) further. First, we split the term \((-1)^{n-m} = (-1)^{-m} (-1)^{n}\). Furthermore, we absorb the \((-1)^{n}\) term on the \((r_m / \gamma_m)^n\) term. We also cancel the obvious \(n!/n!\) term, obtaining \[ P(m) = \sum_{n = m}^\infty \frac{(-1)^{-m}}{m!} \frac{ \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m}\right)^{(n)} }{ \left( \frac{k^{(p)}_{\text{on}}+ k^{(p)}_{\text{off}}} {\gamma_m}\right)^{(n)} } \frac{\left( - \frac{r_m}{\gamma_m}\right)^n}{(n - m)!}. \label{eq:prob_ss_2} \] We recognize in Eq. \(\ref{eq:prob_ss_2}\) that we have almost all the terms for a confluent hypergeometric function \({}_1F_1\). The problem is that the sum starts at \(n = m\) rather than \(n = 0\). Since the upper limit of the sum is \(\infty\), we can simply define \(u = n - m \Rightarrow n = m + u\). We can then use the following property of raising factorials \[ \begin{split} a^{(n)} &=a(a+1)(a+2) \cdots(a+n-1), \\ &=a(a+1)(a+2) \cdots(a+(u+m)-1), \\ &=a(a+1) \cdots(a+m-1)(a+m)(a+m+1) \cdots(a+m+u-1), \\ &=a^{(m)}(a+m)^{(u)}. \end{split} \] Making these substitutions results in \[ P(m) = \sum_{u=0}^{\infty} \frac{(-1)^{-m}}{m !} \frac{ \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m} \right)^{(m)} \left(\frac{k^{(p)}_{\text{on}}}{\gamma_m} + m \right)^{(u)} \left(-\frac{r_m}{\gamma_m}\right)^{u} \left(-\frac{r_m}{\gamma_m}\right)^{m} }{ \left( \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } \right)^{(m)} \left( \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } + m \right)^{(n)}} \frac{1}{u!}. \] Taking out of the sum the terms that do not depend on \(u\) gives \[ P(m) = \frac{(-1)^{-m}}{m!} \frac{ \left( \frac{ k^{(p)}_{\text{on}} }{ \gamma_m } \right)^{(m)} }{ \left( \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } \right)^{(m)} } \left(- \frac{r_m}{\gamma_m}\right)^m \left[ \sum_{u=0}^{\infty} \frac{ \left( \frac{ k^{(p)}_{\text{on}} }{ \gamma_m } + m \right)^{(u)} }{ \left( \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } + m \right)^{(u)} } \frac{ \left(- \frac{r_m}{\gamma_m}\right)^u }{u!} \right]. \] We recognize the term in the square brackets to be the necessary component for a confluent hypergeometric function. We can therefore write the mRNA steady-state distribution as \[ P(m) = \frac{1}{m!} \frac{ \left( \frac{ k^{(p)}_{\text{on}} }{ \gamma_m } \right)^{(m)} }{ \left( \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } \right)^{(m)} } \left(\frac{r_m}{\gamma_m}\right)^m {}_1F_1 \left( \frac{ k^{(p)}_{\text{on}} }{ \gamma_m } + m, \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } + m, - \frac{r_m}{\gamma_m} \right). \] For the last ingredient, we remove the rising factorials using the identity \[ \begin{split} a^{(m)} &=(a)(a+1)(a+2) \cdots(a+m-1), \\ &=\frac{(a+m-1) \cdots(a)(a-1) \cdots (1)}{(a+1) \cdots(1)}, \\ &=\frac{(a+m-1) !}{(a-1) !}. \end{split} \] This allows us to write \[ \begin{split} P(m) &= \frac{1}{m!} \frac{ \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + m - 1 \right) ! }{ \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} - 1 \right) ! } \frac{ \left( \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} - 1 \right) ! }{ \left( \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} + m - 1 \right) ! } \left( \frac{r_m}{\gamma_m} \right)^m \\ &\times {}_1F_1 \left( \frac{ k^{(p)}_{\text{on}} }{ \gamma_m } + m, \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } + m, - \frac{r_m}{\gamma_m} \right). \end{split} \] Or in terms of Gamma functions, we obtain the final form of the steady-state mRNA distribution \[ {\scriptstyle P(m) = \frac{1}{\Gamma(m + 1)} \frac{ \Gamma \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} + m \right) }{ \Gamma \left( \frac{k^{(p)}_{\text{on}}}{\gamma_m} \right) } \frac{ \Gamma \left( \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} \right) }{ \Gamma \left( \frac{k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}}}{\gamma_m} + m \right) } \left( \frac{r_m}{\gamma_m} \right)^m \\ \times {}_1F_1 \left( \frac{ k^{(p)}_{\text{on}} }{ \gamma_m } + m, \frac{ k^{(p)}_{\text{on}} + k^{(p)}_{\text{off}} }{ \gamma_m } + m, - \frac{r_m}{\gamma_m} \right), } \] The equation used to fit the kinetic parameters for the unregulated promoter.

Derivation of the Cell Age Distribution

E. O. Powell first derived in 1956 the cell age distribution for a cell population growing steadily in the exponential phase  [15]. This distribution is of the form \[ P(a) = \ln(2) \cdot 2^{1 - a}, \] where \(a \in [0, 1]\) is the fraction of the cell cycle, 0 being the moment right after the mother cell divides, and 1 being the end of the cell cycle just before cell division. In this section, we will reproduce and expand the details on each of the steps of the derivation.

For an exponentially growing bacterial culture, the cells satisfy the growth law \[ {\frac{dn}{dt}} = \mu n, \] where \(n\) is the number of cells, and \(\mu\) is the growth rate in units of time\(^{-1}\). We begin by defining \(P(a)\) to be the probability density function of a cell having age \(a\). At time zero of a culture in exponential growth, i.e., when we start considering the growth, not the initial condition of the culture, there are \(NP(a)da\) cells with an age range between \([a, a + da]\). In other words, for \(N \gg 1\) and \(da \ll a\) \[ N P(a \leq x \leq a + da) \approx N P(a)da. \] We now define \[ F(\tau) = \int_\tau^\infty f(\xi) d\xi, \] as the fraction of cells whose division time is greater than \(\tau\). This is because in principle, not all cells divide exactly after \(\tau\) minutes, but there is a distribution function \(f(\tau)\) for the division time after birth. Empirically it has been observed that a generalized Gamma distribution fits well to experimental data on cell division time, but we will worry about this specific point later on.

From the definition of \(F(\tau)\), we can see that if a cell reaches an age \(a\), the probability of surviving to an age \(a + t\) without dividing is given by \[ P(\text{age} = (a + t) \mid \text{age} = a) = F(a + t \mid a) = \frac{F(a + t)}{F(a)}. \label{eq:p_age_cond} \] This result comes simply from the definition of conditional probability. Since \(F(a)\) is the probability of surviving \(a\) or more minutes without dividing, by the definition of conditional probability, we have that \[ F(a + t \mid a) = \frac{F(a, a + t)}{F(a)}, \] where \(F(a, a + t)\) is the joint probability of surviving \(a\) minutes and \(a + t\) minutes. But the probability of surviving \(a + t\) minutes or more implies that the cell already survived \(a\) minutes, therefore, the information is redundant, and we have \[ F(a, a + t) = F(a + t). \] This explains Eq. \(\ref{eq:p_age_cond}\). From this equation, we can find that out of the \(N P(a)da\) cells with age \(a\), only a fraction \[ \left[ NP(a)da \right] F(a + t \mid a) = NP(a) \frac{F(a + t)}{F(a)} da \] will survive without dividing until time \(a + t\). During that time interval \(t\), the culture has passed from \(N\) cells to \(N e^{\mu t}\) cells, given the assumption that they are growing exponentially. The survivors \(NP(a)F(a + t \mid a)da\) then represent a fraction of the total number of cells \[ \frac{\text{\# survivors}}{\text{\# total cells}} = \frac{\left[ NP(a)da \right] F(a + t \mid a)}{Ne^{\mu t}} = P(a)\frac{F(a + t)}{F(a)}da \frac{1}{e^{\mu t}}, \] and their ages lie in the range \([a+t, a+t+da]\). Since we assume that the culture is in a steady-state, then it follows that the fraction of cells that transitioned from age \(a\) to age \(a + t\) must be \(P(a + t)da\). Therefore we have a difference equation—the discrete analogous of a differential equation—of the form \[ P(a + t) da = P(a) \frac{F(a + t)}{F(a)}e^{-\mu t} da. \] What this equation shows is a relationship that connects the probability of having a lifetime of \(a + t\) with a probability of having a shorter lifetime \(a\) and the growth of the population. If we take \(t\) to be very small, specifically if we assume \(t \ll \mu^{-1}\), we can Taylor expand around \(a\) the following terms: \[ F(a + t) \approx F(a) + \frac{dF}{da} t, \] \[ P(a + t) \approx P(a) + \frac{dP}{da} t, \] and \[ e^{-\mu t} \approx 1 - \mu t. \] Substituting these equations gives \[ P(a) + \frac{dP}{da} t = P(a) \left( \frac{F(a) + \frac{dF}{da}t}{ F(a)} \right) (1 - \mu t). \] This can be rewritten as \[ \frac{1}{P(a)} \frac{dP}{da} = \frac{1}{F(a)} \frac{dF}{da} - \mu - \frac{\mu t}{F(a)} \frac{dF}{da}. \] Since we assumed \(t \ll \mu^{-1}\), we approximate the last term to be close to zero. We can then simplify this result into \[ \frac{1}{P(a)} \frac{dP}{da} = \frac{1}{F(a)} \frac{dF}{da} - \mu. \] Integrating both sides of the equation with respect to \(a\) gives \[ \ln P(a) = \ln F(a) - \mu a + C, \] where \(C\) is the integration constant. Exponentiating both sides gives \[ P(a) = C' F(a)e^{-\mu a}, \] where \(C' \equiv e^C\). To obtain the unknown constant value, we recall that \(F(0) = 1\) since the probability of having a life equal to or longer than zero must add up to one. Therefore we have that \(P(0) = C'\). This gives then \[ P(a) = P(0) e^{-\mu a} F(a). \] Substituting the definition of \(F(a)\) gives \[ P(a) = P(0) e^{-\mu a} \int_a^\infty f(\xi) d\xi. \] The last step of the derivation involves writing \(P(0)\) and the growth rate \(\mu\) in terms of the cell cycle length distribution \(f(\tau)\).

The growth rate of the population cell number (not the growth of cell mass) is defined as the number of cell doublings per unit of time divided by the number of cells. This is more clear to see if we write as a finite difference \[ \frac{N(t + \Delta t) - N(t)}{\Delta t} = \mu N(t). \] If the time \(\Delta t\) is the time interval it takes to go from \(N\) to \(2N\) cells, we have \[ \frac{2N - N}{\Delta t} = \mu N. \] Solving for \(\mu\) gives \[ \mu = \overbrace{\frac{2N - N}{\Delta t}} ^{\text{\# doubling events per unit time}} \overbrace{\frac{1}{N}}^{\frac{1}{\text{population size}}}. \] We defined \(F(a)\) to be the probability of a cell reaching an age \(a\) or greater. For a cell to reach an age \(a + da\), we can then write \[ F(a + da) = \int_{a + da}^{\infty} f(\xi) d\xi = \int_a^{\infty} f(\xi) d\xi - \int_a^{a + da} f(\xi) d\xi. \] We can approximate the second term on the right-hand side to be \[ \int_a^{a + da} f(\xi) d\xi \approx f(a) da, \] for \(da \ll a\), obtaining \[ F(a + da) \approx F(a) - f(a)da. \] What this means is that from the original fraction of cells \(F(a)\) with age \(a\) or greater, a fraction \(f(a)da / F(a)\) will not reach age \((a + da)\) because they will divide. So, out of the \(NP(a)\) cells that reached exactly age \(a\), the number of doubling events on a time interval \(da\) is given by \[ {\text{\# doublings of cells of age } a {\text{ on interval } da}} = \overbrace{NP(a)}^{\text{\# cells of age }a} \overbrace{\frac{f(a) da}{F(a)}}^{\text{fraction of doublings per unit time}}. \] The growth rate then is just the sum (integral) of each age contribution to the total number of doublings. This is \[ \mu = \frac{1}{N} \int_0^\infty NP(a) \frac{f(a)da}{F(a)}. \] Substituting gives \[ \mu = \int_0^\infty [P(0) e^{-\mu a} F(a)] \frac{f(a)da}{F(a)} = \int_0^\infty P(0) e^{-\mu a} f(a)da. \] We now have the growth rate \(\mu\) written in terms of the cell cycle length probability distribution \(f(a)\) and the probability \(P(0)\). Since \(P(a)\) is a probability distribution, it must be normalized, i.e., \[ \int_0^\infty P(a) da = 1. \] Substituting into this normalization constraint gives \[ \int_0^\infty P(0) e^{-\mu a} F(a) da = 1. \] From here, we can integrate the left-hand side by parts. We note that given the definition of \(F(a)\), the derivative with respect to \(a\) is \(-f(a)\) rather than \(f(a)\). This is because if we write the derivative of \(F(a)\), we have \[ \frac{dF(a)}{da} \equiv \lim_{da \rightarrow 0} \frac{F(a + da) - F(a)}{da}. \] Substituting the definition of \(F(a)\) gives \[ \frac{dF(a)}{da} = \lim_{da \rightarrow 0} \frac{1}{da} \left[\int_{a + da}^\infty f(\xi) d\xi - \int_a^\infty f(\xi) d\xi \right]. \] This difference in the integrals can be simplified to \[ \lim_{da \rightarrow 0} \frac{1}{da} \left[ \int_{a + da}^\infty f(\xi) d\xi - \int_a^\infty f(\xi) d\xi \right]\approx \frac{-f(a)da}{da} = -f(a). \] Taking this into account, we now perform the integration by parts obtaining \[ P(0) \left[ \frac{e^{-\mu t}}{-\mu} F(a) \right]^\infty_0 - P(0) \int_0^\infty \frac{e^{-\mu a}}{-\mu} (-f(a)) da = 1. \] On the first term on the left hand side, we have that, as \(a \rightarrow \infty\), both terms \(e^{-\mu a}\) and \(F(a)\) go to zero. We also have that \(e^{\mu 0} = 1\) and \(F(0) = 1\). This results in \[ \frac{P(0)}{\mu} - P(0) \int_0^\infty \frac{e^{-\mu a}}{\mu} f(a) da = 1. \] The second term on the left-hand side is equal to \(1\) since \[ \mu = \int_0^\infty P(0) e^{-\mu a} f(a)da \Rightarrow 1 = \int_0^\infty P(0) \frac{e^{-\mu a}}{\mu} f(a)da. \label{eq:ch5_eq202} \] This implies that we have \[ \frac{P(0)}{\mu} - 1 = 1 \Rightarrow P(0) = 2 \mu. \] With this result in hand, we can rewrite it as \[ P(a) = 2\mu e^{-\mu a} \int_a^\infty f(\xi) d\xi. \] Also, we can rewrite the result for the growth rate \(\mu\) as \[ \mu = 2 \mu \int_0^\infty e^{-\mu a} f(a) da \Rightarrow 2 \int_0^\infty e^{-\mu a} f(a) da = 1. \]

As mentioned before, the distribution \(f(a)\) has been empirically fit to a generalized Gamma distribution. But if we assume that our distribution has almost negligible dispersion around the mean average doubling time \(a = \tau_d\), we can approximate \(f(a)\) as \[ f(a) = \delta(a - \tau_d), \] a Dirac delta function. Applying this to Eq. \(\ref{eq:ch5_eq202}\) results in \[ 2 \int_0^\infty e^{-\mu a} \delta(a - \tau_a) da = 1 \Rightarrow 2 e^{-\mu \tau_d} = 1. \] Solving for \(\mu\) gives \[ \mu = \frac{\ln 2}{\tau_d}. \] This delta function approximation for \(f(a)\) has as a consequence that \[ F(a) = \begin{cases} 1 \text{ for } a \in [0, \tau_d],\\ 0 \text{ for } a > \tau_d. \end{cases} \] Finally, we can rewrite it as \[ P(a) = 2 \left( \frac{\ln 2}{\tau_d} \right) e^{- \frac{\ln 2}{\tau_d} a} \int_a^\infty \delta(\xi - \tau_d) d\xi \Rightarrow = 2 \ln 2 \cdot 2^\frac{-a}{\tau_d}. \] Simplifying this, we obtain \[ P(a) = \begin{cases} \ln 2 \cdot 2^{1 - \frac{a}{\tau_d}} \text{ for } a \in [0, \tau_d],\\ 0 \text{ otherwise}. \end{cases} \] This is the equation we aimed to derive. The distribution of cell ages over the cell cycle.

References

[1]
A. Sanchez, S. Choubey, and J. Kondev, Stochastic models of transcription: From single molecules to single cells, Methods 62, 13 (2013).
[2]
J. Peccoud and B. Ycart, Markovian modeling of gene-product synthesis, Theoretical Population Biology 48, 222 (1995).
[3]
V. Shahrezaei and P. S. Swain, Analytical distributions for stochastic gene expression, Proceedings of the National Academy of Sciences 105, 17256 (2008).
[4]
D. L. Jones, R. C. Brewster, and R. Phillips, Promoter architecture dictates cell-to-cell variability in gene expression, Science 346, 1533 (2014).
[5]
H. G. Garcia and R. Phillips, Quantitative dissection of the simple repression input-output function, Proceedings of the National Academy of Sciences 108, 12173 (2011).
[6]
R. C. Brewster, F. M. Weinert, H. G. Garcia, D. Song, M. Rydenfelt, and R. Phillips, The transcription factor titration effect dictates level of gene expression, Cell 156, 1312 (2014).
[7]
M. Razo-Mejia, S. L. Barnes, N. M. Belliveau, G. Chure, T. Einav, M. Lewis, and R. Phillips, Tuning transcriptional regulation through signaling: A predictive theory of allosteric induction, Cell Systems 6, 456 (2018).
[8]
D. J. MacKay, Information theory, inference and learning algorithms (Cambridge University Press, 2003).
[9]
R. C. Brewster, D. L. Jones, and R. Phillips, Tuning promoter strength through RNA polymerase binding site design in Escherichia coli, PLoS Computational Biology 8, e1002811 (2012).
[10]
L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, and R. Phillips, Transcriptional regulation by the numbers: Models, Current Opinion in Genetics & Development 15, 116 (2005).
[11]
S. Klumpp and T. Hwa, Growth-rate-dependent partitioning of RNA polymerases in bacteria, Proceedings of the National Academy of Sciences 105, 20245 (2008).
[12]
M. K. Transtrum, B. B. Machta, K. S. Brown, B. C. Daniels, C. R. Myers, and J. P. Sethna, Perspective: Sloppiness and emergent theories in physics, biology, and beyond, The Journal of Chemical Physics 143, 010901 (2015).
[13]
J. R. Peterson, J. A. Cole, J. Fei, T. Ha, and Z. A. Luthey-Schulten, Effects of DNA replication on mRNA noise, Proceedings of the National Academy of Sciences 112, 15886 (2015).
[14]
H. Bremer and P. P. Dennis, Modulation of chemical composition and other parameters of the cell by growth rate, Escherichia Coli and Salmonella: Cellular and Molecular Biology (n.d.).
[15]
E. O. Powell, Growth rate and generation time of bacteria, with special reference to continuous culture, Journal of General Microbiology 15, 492 (1956).
[16]
D. F. Browning and S. J. W. Busby, The regulation of bacterial transcription initiation, Nature Reviews Microbiology 2, 57 (2004).
[17]
J. Yu, Probing gene expression in live cells, one protein molecule at a time, Science 311, 1600 (2006).
[18]
R. Phillips, Napoleon is in equilibrium, Annual Review of Condensed Matter Physics 6, 85 (2015).
[19]
N. E. Buchler, U. Gerland, and T. Hwa, On schemes of combinatorial transcription logic, Proceedings of the National Academy of Sciences 100, 5136 (2003).
[20]
J. M. G. Vilar and L. Saiz, Control of gene expression by modulated self-assembly, Nucleic Acids Research 39, 6854 (2011).
[21]
J. L. Radzikowski, S. Vedelaar, D. Siegel, Á. D. Ortega, A. Schmidt, and M. Heinemann, Bacterial persistence is an active \(\sigma\) S stress response to metabolic flux limitation, Molecular Systems Biology 12, 882 (2016).
[22]
P. Hammar, M. Walldén, D. Fange, F. Persson, Ö. Baltekin, G. Ullman, P. Leroy, and J. Elf, Direct measurement of transcription factor dissociation excludes a simple operator occupancy model for gene regulation, Nature Genetics 46, 405 (2014).
[23]
U. Moran, R. Phillips, and R. Milo, SnapShot: key mumbers in biology, Cell 141, 1262 (2010).
[24]
A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pedregosa, M. J. Curry, A. R. Terrel, Š. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz, SymPy: Symbolic Computing in Python, PeerJ Computer Science 3, e103 (2017).
[25]
R. Phillips, N. M. Belliveau, G. Chure, H. G. Garcia, M. Razo-Mejia, and C. Scholes, Figure 1 theory meets figure 2 experiments in the study of gene expression, Annual Review of Biophysics 48, 121 (2019).
[26]
S. L. Barnes, N. M. Belliveau, W. T. Ireland, J. B. Kinney, and R. Phillips, Mapping DNA sequence to transcription factor binding energy in vivo, PLoS Computational Biology 15, e1006226 (2019).
[27]
A. Ale, P. Kirk, and M. P. H. Stumpf, A general moment expansion method for stochastic kinetic models, The Journal of Chemical Physics 138, 174101 (2013).
[28]
A. Andreychenko, L. Bortolussi, R. Grima, P. Thomas, and V. Wolf, Modeling Cellular Systems, Vol. 11 (Cham, 2017).
[29]
F. Fröhlich, P. Thomas, A. Kazeroonian, F. J. Theis, R. Grima, and J. Hasenauer, Inference for stochastic chemical kinetics using moment equations and system size expansion, PLoS Computational Biology 12, e1005030 (2016).
[30]
D. Schnoerr, G. Sanguinetti, and R. Grima, Approximation and inference methods for stochastic biochemical kinetics—a tutorial review, Journal of Physics A: Mathematical and Theoretical 50, 093001 (2017).
[31]
P. Smadbeck and Y. N. Kaznessis, A closure scheme for chemical master equations, Proceedings of the National Academy of Sciences 110, 14261 (2013).
[32]
E. T. Jaynes, Information theory and statistical mechanics, Physical Review 106, 620 (1957).
[33]
C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27, 379 (1948).
[34]
G. Tkačik, C. G. Callan, and W. Bialek, Information capacity of genetic regulatory elements, Physical Review E 78, 011910 (2008).
[35]
R. Blahut, Computation of channel capacity and rate-distortion functions, IEEE Transactions on Information Theory 18, 460 (1972).
[36]
R. Cheong, A. Rhee, C. J. Wang, I. Nemenman, and A. Levchenko, Information transduction capacity of noisy biochemical signaling networks, Science 334, 354 (2011).