Chapter IV

Supporting Information for Tuning Transcriptional Regulation through Signaling: A Predictive Theory of Allosteric Induction


A version of this chapter originally appeared as Razo-Mejia, M.†, Barnes, S.L.†, Belliveau, N.M.†, Chure, G.†, Einav, T.†, Lewis, M., and Phillips, R. (2018). Tuning transcriptional regulation through signaling: A predictive theory of allosteric induction. Cell Systems 6, 456-469.e10. DOI:https://doi.org/10.1016/j.cels.2018.02.004.

Abstract

Allosteric regulation is found across all domains of life, yet we still lack simple, predictive theories that directly link the experimentally tunable parameters of a system to its input-output response. To that end, we present a general theory of allosteric transcriptional regulation using the Monod-Wyman-Changeux model. We rigorously test this model using the ubiquitous simple repression motif in bacteria by first predicting the behavior of strains that span a large range of repressor copy numbers and DNA binding strengths, and then constructing and measuring their response. Our model not only accurately captures the induction profiles of these strains, but also enables us to derive analytic expressions for key properties such as the dynamic range and \([EC_{50}]\). Finally, we derive an expression for the free energy of allosteric repressors which enables us to collapse our experimental data onto a single master curve that captures the diverse phenomenology of the induction profiles.

Inferring Allosteric Parameters from Previous Data

The fold-change profile described by Eq. 2.5 features three unknown parameters \(K_A\), \(K_I\), and \(\Delta\varepsilon_{AI}\). In this section, we explore different conceptual approaches to determining these parameters. We first discuss how the induction titration profile of the simple repression constructs used in this paper are not sufficient to determine all three MWC parameters simultaneously, since multiple degenerate sets of parameters can produce the same fold-change response. We then utilize an additional data set from  [1] to determine the parameter \(\Delta\varepsilon_{AI} = 4.5~k_BT\), after which the remaining parameters \(K_A\) and \(K_I\) can be extracted from any induction profile with no further degeneracy.

Degenerate Parameter Values

In this section, we discuss how multiple sets of parameters may yield identical fold-change profiles. More precisely, we shall show that if we try to fit the data into the fold-change and extract the three unknown parameters (\(K_A\), \(K_I\), and \(\Delta\varepsilon_{AI}\)), then multiple degenerate parameter sets would yield equally good fits. In other words, this data set alone is insufficient to determine the actual physical parameter values of the system uniquely. This problem persists even when fitting multiple data sets simultaneously, as we will see later.

In Fig. 1(A), we fit the \(R=260\) data by fixing \(\Delta\varepsilon_{AI}\) to the value shown on the \(x\)-axis and determine the parameters \(K_A\) and \(K_I\) given this constraint. We use the fold-change function, but with \(\beta \Delta\varepsilon_{RA}\) modified to the form \(\beta \Delta\tilde{\varepsilon}_{RA}\) in Eq. 2.5 to account for the underlying assumptions used when fitting previous data (see Section 3.2 for a full explanation of why this modification is needed).

The best-fit curves for several different values of \(\Delta\varepsilon_{AI}\) are shown in Fig. 1(B). Note that these fold-change curves are nearly overlapping, demonstrating that different sets of parameters can yield nearly equivalent responses. Without more data, the relationships between the parameter values shown in Fig. 1(A) represent the maximum information about the parameter values that can be extracted from the data. Additional experiments which independently measure any of these unknown parameters could resolve this degeneracy. For example, NMR measurements could be used to directly measure the fraction \((1 + e^{-\beta \Delta\varepsilon_{AI}})^{-1}\) of active repressors in the absence of IPTG  [2,3].

Figure 1: Multiple sets of parameters yield identical fold-change responses. (A) The data for the O2 strain (\Delta\varepsilon_{RA} = -13.9~k_BT) with R=260 in Fig. 2.4(D) was fit using Eq. \ref{eq:ch4_eq05} with n=2. \Delta\varepsilon_{AI} is forced to take on the value shown on the x-axis, while the K_A and K_I parameters are fit freely. (B) The resulting best-fit functions for several value of \Delta\varepsilon_{AI} all yield nearly identical fold-change responses.

Computing \(\boldsymbol{\Delta\varepsilon_{AI}}\)

As shown in the previous section, the fold-change response of a single strain is not sufficient to determine the three MWC parameters (\(K_A\), \(K_I\), and \(\Delta\varepsilon_{AI}\)), since degenerate sets of parameters yield nearly identical fold-change responses. To circumvent this degeneracy, we now turn to some previous data from the lac system to determine the value of \(\Delta\varepsilon_{AI}\) in Eq. 2.5 for the induction of the Lac repressor. Specifically, we consider two previous sets of work from (1)  [4] and (2)  [1], both of which measured fold-change with the same simple repression system in the absence of inducer (\(c=0\)) but at various repressor copy numbers \(R\). The original analysis for both data sets assumed that in the absence of inducer, all of the Lac repressors were in the active state. As a result, the effective binding energies they extracted were a convolution of the DNA binding energy \(\Delta\varepsilon_{RA}\) and the allosteric energy difference \(\Delta\varepsilon_{AI}\) between the Lac repressor’s active and inactive states. We refer to this convoluted energy value as \(\Delta \tilde{\varepsilon}_{RA}\). We first disentangle the relationship between these parameters in Garcia and Phillips, and then use this relationship to extract the value of \(\Delta\varepsilon_{AI}\) from the Brewster et al. dataset.

Garcia and Phillips determined the total repressor copy numbers \(R\) of different strains using quantitative Western blots. Then they measured the fold-change at these repressor copy numbers for simple repression constructs carrying the O1, O2, O3, and Oid lac operators integrated into the chromosome. These data were then fit to the following thermodynamic model to determine the repressor-DNA binding energies \(\Delta\tilde{\varepsilon}_{RA}\) for each operator, \[ \text{fold-change}(c=0) = \left( 1+\frac{R}{N_{NS}}e^{-\beta \Delta\tilde{\varepsilon}_{RA}} \right)^{-1}. \label{eq:ch4_eq01} \] Note that this functional form does not exactly match our fold-change in the limit \(c=0\), \[ \text{fold-change}(c=0) = \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI}}} \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}, \label{eq:ch4_eq02} \] since it is missing the factor \(\frac{1}{1+e^{-\beta \Delta\varepsilon_{AI}}}\) which specifies what fraction of repressors are in the active state in the absence of inducer, \[ \frac{1}{1+e^{-\beta \Delta\varepsilon_{AI}}} = p_A(0). \label{eq:ch4_eq03} \] In other words, Garcia and Phillips assumed that in the absence of inducer, all repressors were active. In terms of our notation, the convoluted energy values \(\Delta\tilde{\varepsilon}_{RA}\) extracted by Garcia and Phillips (namely, \(\Delta\tilde{\varepsilon}_{RA}=-15.3~k_B T\) for O1 and \(\Delta\tilde{\varepsilon}_{RA}=-17.0~k_B T\) for Oid) represent \[ \beta \Delta\tilde{\varepsilon}_{RA} = \beta \Delta\varepsilon_{RA} - \log \left( \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}} \right). \label{eq:ch4_eq04} \] Note that if \(e^{-\beta \Delta \varepsilon_{AI}} \ll 1\), then nearly all of the repressors are active in the absence of inducer so that \(\Delta\tilde{\varepsilon}_{RA} \approx \Delta\varepsilon_{RA}\). In simple repression systems where we definitively know the value of \(\Delta \varepsilon_{RA}\) and \(R\), we can use Eq. \(\ref{eq:ch4_eq02}\) to determine the value of \(\Delta \varepsilon_{AI}\) by comparing with experimentally determined fold-change values. However, the binding energy values that we use from  [4] are effective parameters \(\Delta\tilde{\varepsilon}_{RA}\). In this case, we are faced with an undetermined system in which we have more variables than equations, and we are thus unable to determine the value of \(\Delta \varepsilon_{AI}\). To obtain this parameter, we must turn to a more complex regulatory scenario which provides additional constraints that allow us to fit for \(\Delta \varepsilon_{AI}\).

A variation on simple repression in which multiple copies of the promoter are available for repressor binding (for instance, when the simple repression construct is on a plasmid) can be used to circumvent the problems that arise when using \(\Delta \tilde{\varepsilon}_{RA}\). This is because the behavior of the system is distinctly different when the number of active repressors \(p_A(0) R\) is less than or greater than the number of available promoters \(N\). Repression data for plasmids with known copy number \(N\) allows us to perform a fit for the value of \(\Delta\varepsilon_{AI}\).

To obtain an expression for a system with multiple promoters \(N\), we follow  [5], writing the fold-change in terms of the grand canonical ensemble as \[ \text{fold-change} = \frac{1}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \label{eq:ch4_eq05} \] where \(\lambda_r = e^{\beta \mu}\) is the fugacity, and \(\mu\) is the chemical potential of the repressor. The fugacity will enable us to enumerate the possible states available to the repressor easily.

To determine the value of \(\lambda_r\), we first consider that the total number of repressors in the system, \(R_{\text{tot}}\), is fixed and given by \[ R_{\text{tot}} = R_S + R_{NS}, \label{eq:ch4_eq06} \] where \(R_S\) represents the number of repressors specifically bound to the promoter and \(R_{NS}\) represents the number of repressors non-specifically bound throughout the genome. The value of \(R_S\) is given by \[ R_S = N \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \label{eq:ch4_eq07} \] where \(N\) is the number of available promoters in the cell. Note that in counting \(N\), we do not distinguish between promoters that are on plasmid or chromosomally integrated provided that they both have the same repressor-operator binding energy  [5]. The value of \(R_{NS}\) is similarly give by \[ R_{NS} = N_{NS} \frac{\lambda_r}{1 + \lambda_r}, \label{eq:ch4_eq08} \] where \(N_{NS}\) is the number of non-specific sites in the cell (recall that we use \(N_{NS} = 4.6 \times 10^6\) for E. coli).

Substituting Eq. \(\ref{eq:ch4_eq07}\) and \(\ref{eq:ch4_eq08}\) in Eq. \(\ref{eq:ch4_eq06}\) into the modified \(R_A\) yields the form \[ p_A(0) R_{\text{tot}} = \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}} \left( N \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} \right), \label{eq:ch4_eq09} \] where we recall from Eq. \(\ref{eq:ch4_eq04}\) that \(\beta \Delta \varepsilon_{RA} = \beta \Delta \tilde\varepsilon_{RA} + \log{\left(\frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}}\right)}.\) Numerically solving for \(\lambda_r\) and plugging the value back into Eq. \(\ref{eq:ch4_eq05}\) yields a fold-change function in which the only unknown parameter is \(\Delta \varepsilon_{AI}\).

With these calculations in hand, we can now determine the value of the \(\Delta \varepsilon_{AI}\) parameter. Fig. 5(A) shows how different values of \(\Delta\varepsilon_{AI}\) lead to significantly different fold-change response curves. Thus, analyzing the specific fold-change response of any strain with a known plasmid copy number \(N\) will fix \(\Delta\varepsilon_{AI}\). Interestingly, the inflection point of Eq. \(\ref{eq:ch4_eq09}\) occurs near \(p_A(0) R_{\text{tot}} = N\) (as shown by the triangles in Fig. 5(A)), so that merely knowing where the fold-change response transitions from concave down to concave up is sufficient to obtain a rough value for \(\Delta\varepsilon_{AI}\). We note, however, that for \(\Delta\varepsilon_{AI} \geq 5\; k_BT\), increasing \(\Delta\varepsilon_{AI}\) further does not affect the fold-change because essentially every repressors will be in the active state in this regime. Thus, if the \(\Delta\varepsilon_{AI}\) is in this regime, we can only bound it from below.

Figure 2: Fold-change of multiple identical genes. (A) In the presence of N=10 identical promoters, the fold-change Eq. \ref{eq:ch4_eq06} depends strongly on the allosteric energy difference \Delta\varepsilon_{AI} between the Lac repressor’s active and inactive states. The vertical dotted lines represent the number of repressors at which R_A = N for each value of \Delta \varepsilon_{AI}. (B) Using fold-change measurements from  [1] for the operators and gene copy numbers shown, we can determine the most likely value \Delta\varepsilon_{AI} = 4.5~k_BT for LacI.

We now analyze experimental induction data for different strains with known plasmid copy numbers to determine \(\Delta\varepsilon_{AI}\). Fig. 5(B) shows experimental measurements of fold-change for two O1 promoters with \(N=64\) and \(N=52\) copy numbers and one Oid promoter with \(N=10\) from  [1]. By fitting these data to Eq. \(\ref{eq:ch4_eq05}\), we extracted the parameter value \(\Delta\varepsilon_{AI} = 4.5~k_B T\). Substituting this value into Eq. \(\ref{eq:ch4_eq03}\) shows that 99% of the repressors are in the active state in the absence of inducer and \(\Delta\tilde{\varepsilon}_{RA} \approx \Delta\varepsilon_{RA}\) so that all of the previous energies and calculations made by  [1,4] were accurate.

Induction of Simple Repression with Multiple Promoters or Competitor Sites

We made the choice to perform all of our experiments using strains in which a single copy of our simple repression construct had been integrated into the chromosome. This stands in contrast to the methods used by a number of other studies  [613], in which reporter constructs are placed on a plasmid, meaning that the number of constructs in the cell is not precisely known. It is also common to express repressor on plasmid to boost its copy number, which results in an uncertain value for repressor copy number. Here we show that our treatment of the MWC model has broad predictive power beyond the single-promoter scenario we explore experimentally. Indeed, we can account for systems in which multiple promoters compete for the repressor of interest. Additionally, we demonstrate the importance of precise control over these parameters, as they can significantly affect the induction profile.

Chemical Potential Formulation to Calculate Fold-Change

This section discusses a simple repression construct that we generalize in two ways from the scenario discussed in the text. First, we will allow the repressor to bind to \(N_S\) identical specific promoters whose fold-change we are interested in measuring. Each promoter contains a single repressor binding site (\(N_S = 1\) in Chapter 2). Second, we consider \(N_C\) identical competitor sites which do not regulate the promoter of interest, but whose binding energies are substantially stronger than non-specific binding (\(N_C = 0\) in Chapter 2). As in Chapter 2, we assume that the rest of the genome contains \(N_{NS}\) non-specific binding sites for the repressor. We can write the fold-change in the grand canonical ensemble as \[ \text{fold-change} = \frac{1}{1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \label{eq:ch4_eq10} \] where \(\lambda_r\) is the fugacity of the repressor and \(\Delta \varepsilon_{RA}\) represents the energy difference between the repressor’s binding affinity to the specific operator of interest relative to the repressor’s non-specific binding affinity to the rest of the genome.

We now expand our definition of the total number of repressors in the system, \(R_{\text{tot}}\), so that it is given by \[ R_{\text{tot}} = R_S + R_{NS} + R_C, \label{eq:ch4_eq11} \] where \(R_S\), \(R_{NS}\), and \(R_C\) represent the number of repressors bound to the specific promoter, a non-specific binding site, or a competitor binding site, respectively. The value of \(R_S\) is given by \[ R_S = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}}, \label{eq:ch4_eq12} \] where \(N_S\) is the number of specific binding sites in the cell. The value of \(R_{NS}\) is similarly given by \[ R_{NS} = N_{NS} \frac{\lambda_r}{1 + \lambda_r}, \qquad(1)\] where \(N_{NS}\) is the number of non-specific sites in the cell (recall that we use \(N_{NS} = 4.6 \times 10^6\) for E. coli), and \(R_C\) is given by \[ R_C = N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}, \label{eq:ch4_eq14} \] where \(N_C\) is the number of competitor sites in the cell and \(\Delta \varepsilon_C\) is the binding energy of the repressor to the competitor site relative to its non-specific binding energy to the rest of the genome.

To account for the induction of the repressor, we replace the total number of repressors \(R_{\text{tot}}\) in Eq. \(\ref{eq:ch4_eq11}\) by the number of active repressors in the cell, \(p_A(c) R_{\text{tot}}\). Here, \(p_A\) denotes the probability that the repressor is in the active state (Eq. \(\ref{eq:ch4_eq13}\)), \[ p_A(c) = \frac{\left(1+\frac{c}{K_A}\right)^n} {\left(1+\frac{c}{K_A}\right)^n+e^{-\beta \Delta \varepsilon_{AI} } \left(1+\frac{c}{K_I}\right)^n}. \label{eq:ch4_eq15} \] Substituting Eq. \(\ref{eq:ch4_eq15}\) in Eqs. \(\ref{eq:ch4_eq12}\)-\(\ref{eq:ch4_eq14}\) into the modified Eq. \(\ref{eq:ch4_eq11}\) yields the form \[ p_A(c) R_{\text{tot}} = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} + N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}. \label{eq:ch4_eq16} \] For systems where the number of binding sites \(N_S\), \(N_{NS}\), and \(N_C\) are known, together with the binding affinities \(\Delta \varepsilon_{RA}\) and \(\Delta \varepsilon_C\), we can solve numerically for \(\lambda_r\), and then substitute it into \(\ref{eq:ch4_eq10}\) to obtain a fold-change at any concentration of inducer \(c\). In the following sections, we will theoretically explore the induction curves given by Eq. \(\ref{eq:ch4_eq16}\) for a number of different combinations of simple repression binding sites, thereby predicting how the system would behave if additional specific or competitor binding sites were introduced.

Variable Repressor Copy Number (\(\boldsymbol{R}\)) with Multiple Specific Binding Sites (\(\boldsymbol{N_S > 1}\))

In Chapter 2, we consider the induction profiles of strains with varying \(R\) but a single, specific binding site \(N_S = 1\) (see Fig. ¿fig:ch2_fig05?). Here we predict the induction profiles for similar strains in which \(R\) is varied, but \(N_S > 1\), as shown in Fig. 3. The top row shows induction profiles in which \(N_S = 10\) and the bottom row shows profiles in which \(N_S = 100\), assuming three different choices for the specific operator binding sites given by the O1, O2, and O3 operators. These values of \(N_S\) were chosen to mimic the common scenario in which a promoter construct is placed on either a low or high copy number plasmid. A few features stand out in these profiles. First, as the magnitude of \(N_S\) surpasses the number of repressors \(R\), the leakiness begins to increase significantly since there are no longer enough repressors to regulate all copies of the promoter of interest. Second, in the cases where \(\Delta \varepsilon_{RA} = -15.3~k_B T\) for the O1 operator or \(\Delta \varepsilon_{RA} = -13.9~k_B T\) for the O2 operator, the profiles where \(N_S = 100\) are notably sharper than the profiles where \(N_S = 10\), and it is possible to achieve dynamic ranges approaching 1. Finally, it is interesting to note that the profiles for the O3 operator where \(\Delta \varepsilon_{RA} = -9.7~k_B T\) are nearly indifferent to the value of \(N_S\).

Figure 3: Induction with variable \boldsymbol{R} and multiple specific binding sites. Induction profiles are shown for strains with variable R and \Delta \varepsilon_{RA} = -15.3, -13.9, or -9.7~k_B T. The number of specific sites, N_S, is held constant at ten as R and \Delta \varepsilon_{RA} are varied. N_S is held constant at 100 as R and \Delta \varepsilon_{RA} are varied. These situations mimic the common scenario in which a promoter construct is placed on either a low or high copy number plasmid.

Variable Number of Specific Binding Sites \(\boldsymbol{N_S}\) with Fixed Repressor Copy Number (\(\boldsymbol{R}\))

The second set of scenarios we consider is when the repressor copy number \(R=260\) is held constant while the number of specific promoters \(N_S\) is varied (see Fig. 4). Again we see that leakiness is increased significantly when \(N_S > R\), though all profiles for \(\Delta \varepsilon_{RA} = -9.7~k_BT\) exhibit high leakiness, making the effect less dramatic for this operator. Additionally, we find again that adjusting the number of specific sites can produce induction profiles with maximal dynamic ranges. In particular, the O1 and O2 profiles with \(\Delta \varepsilon_{RA} = -15.3\) and \(-13.9~k_BT\), respectively, have dynamic ranges approaching 1 for \(N_S = 50\) and 100.

Figure 4: Induction with variable specific sites and fixed \boldsymbol{R}. Induction profiles are shown for strains with R=260 and \Delta \varepsilon_{RA} = -15.3~k_BT, \Delta \varepsilon_{RA} = -13.9~k_BT, or \Delta \varepsilon_{RA} = -9.7~k_BT. The number of specific sites N_S is varied from 1 to 500.

Competitor Binding Sites

An intriguing scenario is presented by the possibility of competitor sites elsewhere in the genome. This serves as a model for situations in which a promoter of interest is regulated by a transcription factor that has multiple targets. This is highly relevant, as the majority of transcription factors in E. coli have at least two known binding sites, with approximately 50 transcription factors having more than ten known binding sites  [14,15]. If the number of competitor sites and their average binding energy is known, they can be accounted for in the model. Here, we predict the induction profiles for strains in which \(R=260\) and \(N_S=1\), but a variable number of competitor sites \(N_C\) with strong binding energy \(\Delta \varepsilon_C = -17.0~k_BT\). In the presence of such a strong competitor, when \(N_C > R\), the leakiness is greatly increased, as many repressors are siphoned into the pool of competitor sites. This is most dramatic for the case where \(\Delta \varepsilon_{RA} = -9.7~k_B T\), in which it appears that no repression occurs at all when \(N_C = 500\). Interestingly, when \(N_C < R\), the effects of the competitor are not especially notable.

Figure 5: Induction with variable competitor sites, a single specific site, and fixed \boldsymbol{R}. Induction profiles are shown for strains with R=260, N_s=1, and \Delta \varepsilon_{RA} = -15.3~k_B T for the O1 operator, \Delta \varepsilon_{RA} = -13.9~k_B T for the O2 operator, or \Delta \varepsilon_{RA} = -9.7~k_B T for the O3 operator. The number of specific sites, N_C, is varied from 1 to 500. This mimics the common scenario in which a transcription factor has multiple binding sites in the genome.

Properties of the Induction Response

As discussed in the main body of the paper, our treatment of the MWC model allows us to predict key properties of induction responses. Here, we consider the leakiness, saturation, and dynamic range (see Fig. ¿fig:ch2_fig01?) by numerically solving Eq. \(\ref{eq:ch4_eq16}\) in the absence of inducer, \(c=0\), and in the presence of saturating inducer \(c \to \infty\). Using Eq. \(\ref{eq:ch4_eq15}\), the former case is given by \[ R_{\text{tot}} \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}}} = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} + N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}, \label{eq:ch4_eq17} \] whereupon substituting in the value of \(\lambda_r\) into Eq. \(\ref{eq:ch4_eq10}\) will yield the leakiness. Similarly, the limit of saturating inducer is found by determining \(\lambda_r\) from the form \[ R_{\text{tot}} \frac{1}{1 + e^{-\beta \Delta \varepsilon_{AI}} \left(\frac{K_A}{K_I} \right)^2} = N_S \frac{\lambda_r e^{-\beta \Delta \varepsilon_{RA}}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_{RA}}} + N_{NS} \frac{\lambda_r}{1 + \lambda_r} + N_C \frac{\lambda_r e^{-\beta \Delta \varepsilon_C}} {1 + \lambda_r e^{-\beta \Delta \varepsilon_C}}. \label{eq:ch4_eq18} \]

In Fig. 6, we show how the leakiness, saturation, and dynamic range vary with \(R\) and \(\Delta \varepsilon_{RA}\) in systems with \(N_S =10\) or \(N_S = 100\). An inflection point occurs where \(N_S = R\), with leakiness and dynamic range behaving differently when \(R < N_S\) than when \(R > N_S\). This transition is more dramatic for \(N_S = 100\) than for \(N_S = 10\). Interestingly, the saturation values consistently approach 1, indicating that full induction is easier to achieve when multiple specific sites are present. Moreover, dynamic range values for O1 and O2 strains with \(\Delta \varepsilon_{RA} = -15.3\) and \(-13.9~k_B T\) approach 1 when \(R > N_S\), although when \(N_S = 10\), there is a slight downward dip owing to saturation values of less than 1 at high repressor copy numbers.

Figure 6: Phenotypic properties of induction with multiple specific binding sites. The leakiness (A, D), saturation (B, E), and dynamic range (C, F) are shown for systems with a number of specific binding sites N_S = 10 or N_S = 100 . The dashed vertical line indicates the point at which N_S = R.

In Fig. 7, we similarly show how the leakiness, saturation, and dynamic range vary with \(R\) and \(\Delta \varepsilon_{RA}\) in systems with \(N_S =1\) and multiple competitor sites \(N_C = 10\) or \(N_C = 100\). Each of the competitor sites has a binding energy of \(\Delta \varepsilon_C = -17.0~k_BT\). The phenotypic profiles are very similar to those for multiple specific sites shown in Fig. 7, with sharper transitions at \(R = N_C\) due to the greater binding strength of the competitor site. This indicates that introducing competitors has much the same effect on the induction phenotypes as introducing additional specific sites. In either case, the influence of the repressors is dampened when there are insufficient repressors to interact with all of the specific binding sites.

Figure 7: Phenotypic properties of induction with a single specific site and multiple competitor sites. The leakiness, saturation, and dynamic range are shown for systems with a single specific binding site N_S = 1 and a number of competitor sites N_C = 10 or N_C = 100 . All competitor sites have a binding energy of \Delta \varepsilon_C = -17.0~k_BT. The dashed vertical line indicates the point at which N_C = R.

This section of the appendix gives a quantitative analysis of the nuances imposed on induction response in the case of systems involving multiple gene copies as are found in the vast majority of studies on induction. In these cases, the intrinsic parameters of the MWC model get entangled with the parameters describing gene copy number.

Flow Cytometry

In this section, we provide information regarding the equipment used to make experimental measurements of the fold-change in gene expression in the interests of transparency and reproducibility. We also provide a summary of our unsupervised method of gating the flow cytometry measurements for consistency between experimental runs.

Equipment

Due to past experience using the Miltenyi Biotec MACSQuant flow cytometer during the Physiology summer course at the Marine Biological Laboratory, we used the same flow cytometer for the formal measurements in this work graciously provided by the Pamela Björkman lab at Caltech. All measurements were made using an excitation wavelength of \(488\,\text{nm}\) with an emission filter set of 525/\(50\,\text{nm}\). This excitation wavelength provides approximately 40% of the maximum YFP absorbance  [16], which was sufficient for these experiments. A useful feature of modern flow cytometry is the high-sensitivity signal detection through the use of photomultiplier tubes (PMT), whose response can be tuned by adjusting the voltage. Thus, the voltage for the forward-scatter (FSC), side-scatter (SSC) and gene expression measurements were tuned manually to maximize the dynamic range between autofluorescence signal and maximal expression without losing the details of the population distribution. Once these voltages were determined, they were used for all subsequent measurements. The extremely low signal-producing particles were discarded before data storage by setting a basal voltage threshold, thus removing the majority of spurious events. The various instrument settings for data collection are given in tbl. 1.

Table 1: Instrument settings for data collection using the Miltenyi Biotec MACSQuant flow cytometer. All experimental measurements were collected using these values.
Laser Channel Sensor Voltage
\(488\,\text{nm}\) Forward-Scatter (FSC) \(423\,\text{V}\)
\(488\,\text{nm}\) Side-Scatter (SSC) \(537\,\text{V}\)
\(488\,\text{nm}\) Intensity (B1 Filter, 525/50nm) \(790\,\text{V}\)
\(488\,\text{nm}\) Trigger (debris threshold) \(24.5\,\text{V}\)

Experimental Measurement

Before each day’s experiments, the analyzer was calibrated using MACSQuant Calibration Beads (Cat. No. 130-093-607) such that day-to-day experiments would be comparable. A single data set consisted of seven bacterial strains, all sharing the same operator, with varying repressor copy numbers (\(R = 0\), 22, 60, 124, 260, 1220, and 1740), in addition to an autofluorescent strain, under twelve IPTG concentrations. Data collection took place over two to three hours. During this time, the cultures were held at approximately 4\(^\circ\)C by placing the 96-well plate on a MACSQuant ice block. Because the ice block thawed over the course of the experiment, the samples measured last were approximately at room temperature. This means that samples may have grown slightly by the end of the experiment. To confirm that this continued growth did not alter the measured results, a subset of experiments were run in reverse, meaning that the fully induced cultures were measured first and the uninduced samples last. The plate arrangements and corresponding fold-change measurements are shown in Fig. 8(A) and (B), respectively. The measured fold-change values in the reverse ordered plate appear to be drawn from the same distribution as those measured in the forward order, meaning that any growth that might have occurred during the experiment did not significantly affect the results. Both the forward and reverse data sets were used in our analysis.

Figure 8: Plate arrangements for flow cytometry. (A) Samples were measured primarily in the forward arrangement with a subset of samples measured in reverse. The black arrow indicates the order in which samples were processed by the flow cytometer. (B) The experimentally measured fold-change values for the two sets of plate arrangements show that samples measured in the forward arrangement appear to be indistinguishable from those measured in reverse order.

Unsupervised Gating

Flow cytometry data will frequently include a number of spurious events or other undesirable data points such as cell doublets and debris. The process of restricting the collected data set to those determined to be “real” is commonly referred to as gating. These gates are typically drawn manually  [17] and restrict the data set to those points which display a high degree of linear correlation between their forward-scatter (FSC) and side-scatter (SSC). The development of unbiased and unsupervised methods of drawing these gates is an active area of research  [18,19].

For this study, we used an automatic unsupervised gating procedure to filter the flow cytometry data based on the front and side-scattering values returned by the MACSQuant flow cytometer. We assume that the region with the highest density of points in these two channels corresponds to single-cell measurements. Therefore, everything extending outside of this region was discarded to exclude sources of error such as cell clustering, particulates, or other spurious events.

To define the gated region, we fit a two-dimensional Gaussian function to the \(\log_{10}\) forward-scattering (FSC) and the \(\log_{10}\) side-scattering (SSC) data. We then kept a fraction \(\alpha \in [0, 1]\) of the data by defining an elliptical region given by \[ \left(\boldsymbol{x} - \boldsymbol{\mu} \right)^T \boldsymbol{\Sigma}^{-1} \left(\boldsymbol{x} - \boldsymbol{\mu} \right) \leq \chi^2_\alpha(p), \label{eq:ch4_eq19} \] where \(\boldsymbol{x}\) is the \(2 \times 1\) vector containing the \(\log(\text{FSC})\) and \(\log(\text{SSC})\), \(\boldsymbol{\mu}\) is the \(2 \times 1\) vector representing the mean values of \(\log(\text{FSC})\) and \(\log(\text{SSC})\) as obtained from fitting a two-dimensional Gaussian to the data, and \(\boldsymbol{\Sigma}\) is the \(2\times 2\) covariance matrix also obtained from the Gaussian fit. \(\chi^2_\alpha(p)\) is the quantile function for probability \(p\) of the chi-squared distribution with two degrees of freedom. Fig. 9 shows an example of different gating contours that would arise from different values of \(\alpha\) in Eq. \(\ref{eq:ch4_eq19}\). In this work, we chose \(\alpha = 0.4\), which we deemed as a sufficient constraint to minimize the noise in the data. As explained in Section 4.6, we compared our high throughput flow cytometry data with single-cell microscopy, confirming that the automatic gating did not introduce systematic biases to the analysis pipeline. The specific code where this gating is implemented can be found in GitHub repository.

Figure 9: Representative unsupervised gating contours. Points indicate individual flow cytometry measurements of forward scatter and side scatter. Colored points indicate arbitrary gating contours ranging from 100% (\alpha = 1.0) to 5% (\alpha = 0.05). All measurements for this work were made computing the mean fluorescence from the 40^\text{th} percentile (\alpha = 0.4), shown as orange points.

Comparison of Flow Cytometry with Other Methods

Previous work from our lab experimentally determined fold-change for similar simple repression constructs using a variety of different measurement methods  [1,20]. Garcia and Phillips used the same background strains as the ones used in this work, but gene expression was measured with Miller assays based on colorimetric enzymatic reactions with the LacZ protein  [4]. Ref.  [1] used a LacI dimer with the tetramerization region replaced with an mCherry tag, where the fold-change was measured as the ratio of the gene expression rate rather than a single snapshot of the gene output.

Fig. 10 shows the comparison of these methods along with the flow cytometry method used in this work. The consistency of these three readouts validates the quantitative use of flow cytometry and unsupervised gating to determine the fold-change in gene expression. However, one crucial caveat revealed by this figure is that the sensitivity of flow cytometer measurements is not sufficient to accurately determine the fold-change for the high repressor copy number strains in O1 without induction. Instead, a method with an extensive dynamic range such as the Miller assay is needed to resolve the fold-change at such low expression levels accurately.

Figure 10: Comparison of experimental methods to determine the fold-change. The fold-change in gene expression for equivalent simple-repression constructs has been determined using three independent methods: flow cytometry (this work), colorimetric miller assays  [4], and video microscopy  [1]. All three methods give consistent results, although flow cytometry measurements lose accuracy for fold-change less than 10^{-2}. Note that the repressor-DNA binding energies \Delta\varepsilon_{ra} used for the theoretical predictions were determined in  [4].

Single-Cell Microscopy

In this section, we detail the procedures and results from single-cell microscopy verification of our flow cytometry measurements. Our previous measurements of fold-change in gene expression have been measured using bulk-scale Miller assays  [4] or through single-cell microscopy  [1]. In this work, flow cytometry was an attractive method due to the ability to screen through many different strains at different concentrations of inducer in a short amount of time. To verify our results from flow cytometry, we examined two bacterial strains with different repressor-DNA binding energies (\(\Delta\varepsilon_{RA}\)) of \(-13.9~k_BT\) and \(-15.3~k_BT\) with \(R = 260\) repressors per cell using fluorescence microscopy and estimated the values of the parameters \(K_A\) and \(K_I\) for direct comparison between the two methods. For a detailed explanation of the Python code implementation of the processing steps described below, please see this paper’s GitHub repository. An outline of our microscopy workflow can be seen in Fig. 11.

Strains and Growth Conditions

Cells were grown identically to those used for measurement via flow cytometry (see Methods). Briefly, cells were grown overnight (between 10 and 13 hours) to saturation in rich media broth (LB) with \(100\,\mu\text{g} \cdot \text{mL}^{-1}\) spectinomycin in a deep-well 96-well plate at \(37^\circ \text{C}\). These cultures were then diluted 1000-fold into \(500\,\mu\text{L}\) of M9 minimal medium supplemented with 0.5% glucose and the appropriate concentration of the inducer IPTG. Strains were allowed to grow at \(37^\circ \text{C}\) with vigorous aeration for approximately 8 hours. The cultures were diluted 10-fold into M9 glucose minimal medium without IPTG before mounting for microscopy. Each construct was measured using the same range of inducer concentration values as was performed in the flow cytometry measurements (between \(100\,\text{nM}\) and \(5\,\text{mM}\) IPTG). Each condition was measured in triplicate in microscopy, whereas approximately ten measurements were made using flow cytometry.

Imaging Procedure

Figure 11: Experimental workflow for single-cell microscopy. For comparison with the flow cytometry results, the cells were grown in an identical manner to those described in Chapter 2. Once cells had reached mid to late exponential growth, the cultures were diluted and placed on agarose substrates and imaged under 100\times magnification. Regions of interest representing cellular mass were segmented, and average single-cell intensities were computed. The means of the distributions were used to compute the fold-change in gene expression.

During the last hour of cell growth, an agarose mounting substrate was prepared to contain the appropriate concentration of the IPTG inducer. This mounting substrate was composed of M9 minimal medium supplemented with 0.5% glucose and 2% agarose (Life Technologies UltraPure Agarose, Cat. No. 16500100). This solution was heated in a microwave until molten, followed by the addition of the IPTG to the appropriate final concentration. This solution was then thoroughly mixed, and a \(500\,\mu\text{L}\) aliquot was sandwiched between two glass coverslips and was allowed to solidify.

Once solid, the agarose substrates were cut into approximately \(10\,\text{mm}\times 10\,\text{mm}\) squares. An aliquot of one to two microliters of the diluted cell suspension was then added to each pad. For each concentration of inducer, a sample of the autofluorescence control, the \(\Delta lacI\) constitutive expression control, and the experimental strain were prepared, yielding a total of thirty-six agarose mounts per experiment. These samples were then mounted onto two glass-bottom dishes (Ted Pella Wilco Dish, Cat. No. 14027-20) and sealed with parafilm.

All imaging was performed on a Nikon Ti-Eclipse inverted fluorescent microscope outfitted with a custom-built laser illumination system operated by the open-source MicroManager control software  [21]. The YFP fluorescence was imaged using a CrystaLaser \(514\,\text{nm}\) excitation laser coupled with a laser-optimized (Semrock Cat. No. LF514-C-000) emission filter.

For each sample, between fifteen and twenty positions were imaged, allowing for the measurement of several hundred cells. At each position, a phase-contrast image, an mCherry image, and a YFP image were collected in that order with exposures on a time scale of ten to twenty milliseconds. Thus, each channel used the same exposure time across all samples in a given experiment. All images were collected and stored in ome.tiff format. All microscopy images are available on the CaltechDATA online repository under DOI: 10.22002/D1.229.

Image Processing

Correcting Uneven Illumination

Figure 12: Correction for uneven illumination. A representative image of the illumination profile of the 512\,\text{nm} excitation beam on a homogeneously fluorescent slide is shown in the left panel. This is corrected for using Eq. \ref{eq:ch4_eq20} and is shown in the right panel.

The excitation laser has a two-dimensional gaussian profile. To minimize non-uniform illumination of a single field of view, the excitation beam was expanded to illuminate an area larger than that of the camera sensor. While this allowed for an entire field of view to be illuminated, there was still approximately a 10% difference in illumination across both dimensions. This non-uniformity was corrected for in post-processing by capturing twenty images of a homogeneously fluorescent plastic slide (Autofluorescent Plastic Slides, Chroma Cat. No. 920001) and averaging to generate a map of illumination intensity at any pixel \(I_\text{YFP}\). To correct for shot noise in the camera (Andor iXon+ 897 EMCCD), twenty images were captured in the absence of illumination using the exposure time used for the experimental data. Averaging over these images produced a map of background noise at any pixel \(I_\text{dark}\). To perform the correction, each fluorescent image in the experimental acquisition was renormalized with respect to these average maps as \[ I_\text{flat} = \frac{I - I_\text{dark}}{I_\text{YFP} - I_\text{dark}}\langle I_\text{YFP} - I_\text{dark} \rangle, \label{eq:ch4_eq20} \] where \(I_\text{flat}\) is the renormalized image and \(I\) is the original fluorescence image. An example of this correction can be seen in Fig. 12.

Cell Segmentation

Each bacterial strain constitutively expressed an mCherry fluorophore from a low copy-number plasmid. This served as a volume marker of cell mass, allowing us to segment individual cells through edge detection in fluorescence. We used the Marr-Hildreth edge detector  [22], which identifies edges by taking the second derivative of a lightly Gaussian blurred image. Edges are identified as those regions which cross from highly negative to highly positive values or vice-versa within a specified neighborhood. Bacterial cells were defined as regions within an intact and closed identified edge. All segmented objects were then labeled and passed through a series of filtering steps.

To ensure that primarily single cells were segmented, we imposed area and eccentricity bounds. We assumed that single cells projected into two dimensions are roughly \(2\,\mu\text{m}\) long and \(1\,\mu\text{m}\) wide, so that cells are likely to have an area between \(0.5\,\mu\text{m}^2\) and \(6\,\mu\text{m}\). To determine the eccentricity bounds, we assumed that a single cell could be approximated by an ellipse with semi-major (\(a\)) and semi-minor (\(b\)) axis lengths of \(0.5\,\mu\text{m}\) and \(0.25\,\mu\text{m}\), respectively. The eccentricity of this hypothetical cell can be computed as \[ \text{eccentricity} = \sqrt{1 - \left(\frac{b}{a}\right)^2}, \label{eq:ch4_eq21} \] yielding a value of approximately 0.8. Any objects with an eccentricity below these values were not considered to be single cells. After imposing both an area (Fig. 13(A)) and eccentricity filter (Fig. 13(B)), the remaining objects were considered cells of interest (Fig. 13(C)), and the mean fluorescence intensity of each cell was extracted.

Figure 13: Segmentation of single bacterial cells. Objects were selected if they had an eccentricity greater than 0.8 and an area between 0.5\,\mu\text{m}^2 and 6\,\mu\text{m}^2. Highlighted in blue are the regions considered to be representative of single cells. The black lines correspond to the empirical cumulative distribution functions for the parameter of interest. A representative final segmentation mask is shown in which segmented cells are depicted in cyan over the phase contrast image.

Calculation of Fold-Change

Cells exhibited background fluorescence even in the absence of an expressed fluorophore. We corrected this autofluorescence contribution to the fold-change calculation by subtracting the mean YFP fluorescence of cells expressing only the mCherry volume marker from each experimental measurement. The fold-change in gene expression was, therefore, calculated as \[ \text{fold-change} = \frac{\langle I_{R > 0} \rangle - \langle I_\text{auto} \rangle} {\langle I_{R = 0} \rangle - \langle I_\text{auto} \rangle}, \label{eq:ch4_eq22} \] where \(\langle I_{R > 0}\rangle\) is the mean fluorescence intensity of cells expressing LacI repressors, \(\langle I_\text{auto}\rangle\) is the mean intensity of cells expressing only the mCherry volume marker, and \(\langle I_{R = 0}\rangle\) is the mean fluorescence intensity of cells in the absence of LacI. These fold-change values were very similar to those obtained through flow cytometry and were well described using the thermodynamic parameters used in Chapter 2. With these experimentally measured fold-change values, the best-fit parameter values of the model were inferred and compared to those obtained from flow cytometry.

Parameter Estimation and Comparison

Figure 14: Comparison of measured fold-change between flow cytometry and single-cell microscopy. Experimentally measured fold-change values obtained through single-cell microscopy and flow cytometry are shown as white-filled and solid-colored circles, respectively. Solid and dashed lines indicate the predicted behavior using the most likely parameter values of K_A and K_I inferred from flow cytometry data and microscopy data, respectively. The red and blue plotting elements correspond to the different operators O1 and O2 with binding energies \Delta\varepsilon_{RA} of -13.9~k_BT and -15.3~k_BT, respectively  [4]. The marginalized posterior distributions for K_A and K_I are shown in the top and bottom panels, respectively. The posterior distribution determined using the microscopy data is wider than that computed using the flow cytometry data due to a smaller fig collection of data sets (three for microscopy and ten for flow cytometry).

To confirm quantitative consistency between flow cytometry and microscopy, the parameter values of \(K_A\) and \(K_I\) were also estimated from three biological replicates of IPTG titration curves obtained by microscopy for strains with \(R=260\) and operators O1 and O2. Fig. 14(A) shows the data from these measurements (orange circles) and the ten biological replicates from our flow cytometry measurements (blue circles), along with the fold-change predictions from each inference. In comparison with the values obtained by flow cytometry, each parameter estimate overlapped with the 95% credible region of our flow cytometry estimates, as shown in Fig. 14(B). Specifically, these values were \(K_A=142^{+40}_{-34}\,\mu\text{M}\) and \(K_I=0.6^{+0.1}_{-0.1}\,\mu\text{M}\) from microscopy and \(K_A = 149^{+14}_{-12}\,\mu\text{M}\) and \(K_I = 0.57^{+0.03}_{-0.02}\,\mu\text{M}\) from the flow cytometry data. We note that the credible regions from the microscopy data shown in Fig. 14(B) are much broader than those from flow cytometry due to the fewer number of replicates performed.

Fold-Change Sensitivity Analysis

In Fig. ¿fig:ch2_fig05?, we found that the width of the credible regions varied widely depending on the repressor copy number \(R\) and repressor operator binding energy \(\Delta \varepsilon_{RA}\). More precisely, the credible regions were much narrower for low repressor copy numbers \(R\) and weak binding energy \(\Delta\varepsilon_{RA}\). In this section, we explain how this behavior comes about. We focus our attention on the maximum fold-change in the presence of saturating inducer given by Eq. 2.7. While it is straightforward to consider the width of the credible regions at any other inducer concentration, it shows that the credible region is widest at saturation.

The width of the credible regions corresponds to how sensitive the fold-change is to the fit values of the dissociation constants \(K_A\) and \(K_I\). To be quantitative, we define \[ \Delta \text{fold-change}_{K_A} \equiv \text{fold-change}(K_A,K_I^\text{fit}) - \text{fold-change}(K_A^\text{fit},K_I^\text{fit}), \label{eq:ch4_eq23} \] the difference between the fold-change at a particular \(K_A\) value relative to the best-fit dissociation constant \(K_A^\text{fit}=139 \times 10^{-6} \, \text{M}\). For simplicity, we keep the inactive state dissociation constant fixed at its best-fit value \(K_I^\text{fit}=0.53 \times 10^{-6}\, \text{M}\). A larger difference \(\Delta \text{fold-change}_{K_A}\) implies a wider credible region. Similarly, we define the analogous quantity \[ \Delta \text{fold-change}_{K_I} = \text{fold-change}(K_A^{\text{fit}},K_I) - \text{fold-change}(K_A^{\text{fit}},K_I^{\text{fit}}) \label{eq:ch4_eq24} \] to measure the sensitivity of the fold-change to \(K_I\) at a fixed \(K_A^{\text{fit}}\). Fig. 15 shows both of these quantities in the limit \(c \to \infty\) for different repressor-DNA binding energies \(\Delta\varepsilon_{RA}\) and repressor copy numbers \(R\). See our GitHub repository for the code that reproduces these plots.

To understand how the width of the credible region scales with \(\Delta\varepsilon_{RA}\) and \(R\), we can Taylor expand the difference in fold-change to first order, \(\Delta \text{fold-change}_{K_A} \approx \frac{\partial \text{fold-change}}{\partial K_A} \left( K_A - K_A^{\text{fit}} \right)\), where the partial derivative has the form \[ {\scriptstyle \frac{\partial \text{fold-change}}{\partial K_A} = \frac{e^{-\beta \Delta \varepsilon_{AI}} \frac{n}{K_I}\left(\frac{K_A}{K_I}\right)^{n-1}} {\left( 1 + e^{-\beta \Delta \varepsilon_{AI}} \left(\frac{K_A}{K_I}\right)^n \right)^2}\frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n } \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-2}. } \label{eq:ch4_eq25} \] Similarly, the Taylor expansion \(\Delta \text{fold-change}_{K_I} \approx \frac{\partial \text{fold-change}}{\partial K_I} \left( K_I - K_I^{\text{fit}} \right)\) features the partial derivative \[ {\scriptstyle \frac{\partial \text{fold-change}}{\partial K_I} = -\frac{e^{-\beta \Delta \varepsilon_{AI}} \frac{n}{K_I}\left(\frac{K_A}{K_I}\right)^{n}} {\left( 1 + e^{-\beta \Delta \varepsilon_{AI}} \left(\frac{K_A}{K_I}\right)^n \right)^2}\frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n }\frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right)^{-2}. } \label{eq:ch4_eq26} \] From Eqs. \(\ref{eq:ch4_eq25}\) and \(\ref{eq:ch4_eq26}\), we find that both \(\Delta \text{fold-change}_{K_A}\) and \(\Delta \text{fold-change}_{K_I}\) increase in magnitude with \(R\) and decrease in magnitude with \(\Delta\varepsilon_{RA}\). Accordingly, we expect that the O3 strains (with the least negative \(\Delta\varepsilon_{RA}\)) and the strains with the smallest repressor copy number will lead to partial derivatives with smaller magnitude and hence to tighter credible regions. Indeed, this prediction is carried out in Fig. 15.

Lastly, we note that Eqs. \(\ref{eq:ch4_eq25}\) and \(\ref{eq:ch4_eq26}\) enable us to quantify the scaling relationship between the width of the credible region and the two quantities \(R\) and \(\Delta\varepsilon_{RA}\). For example, for the O3 strains, where the fold-change at saturating inducer concentration is \(\approx 1\), the right-most term in both equations which equal the fold-change squared is roughly one. Therefore, we find that both \(\frac{\partial \text{fold-change}}{\partial K_A}\) and \(\frac{\partial \text{fold-change}}{\partial K_I}\) scale linearly with \(R\) and \(e^{-\beta \Delta\varepsilon_{RA}}\). Thus the width of the \(R=22\) strain will be roughly 1/1000 as large as that of the \(R=1740\) strain; similarly, the width of the O3 curves will be roughly 1/1000 the width of the O1 curves.

Figure 15: Determining how sensitive the fold-change values are to the fit values of the dissociation constants. (A) The difference \Delta \text{fold-change}_{K_A} in fold change when the dissociation constant K_A is slightly offset from its best-fit value K_A=139^{+29}_{-22} \times 10^{-6} \, \text{M}, as given by Eq. \ref{eq:ch4_eq23}. Fold-change is computed in the limit of saturating inducer concentration (c \to \infty, see Eq. 2.7) where the credible regions in Fig. ¿fig:ch2_fig04? are the widest. The O3 strain (\Delta\varepsilon_{RA} = -9.7~k_B T) is about 1/1000 as sensitive as the O1 operator to perturbations in the parameter values, and hence its credible region is roughly 1/1000 as wide. All curves were made using R = 260. (B) As in Panel (A), but plotting the sensitivity of fold-change to the K_I parameter relative to the best-fit value K_I=0.53^{+0.04}_{-0.04} \times 10^{-6}\, \text{M}. Note that only the magnitude, and not the sign of this difference, describes the sensitivity of each parameter. Hence, the O3 strain is again less sensitive than the O1 and O2 strains. (C) As in Panel (A), but showing how the fold-change sensitivity for different repressor copy numbers. The strains with lower repressor copy numbers are less sensitive to changes in the dissociation constants, and hence their corresponding curves in Fig. ¿fig:ch2_fig04? have tighter credible regions. All curves were made using \Delta\varepsilon_{RA} = -13.9~k_B T. (D) As in Panel (C), the sensitivity of fold-change with respect to K_I is again smallest (in magnitude) for the low repressor copy number strains.

Alternate Characterizations of Induction

In this section, we discuss a different way to describe the induction data, namely, through using the conventional Hill approach. We first demonstrate how using a Hill function to characterize a single induction curve enables us to extract features (such as the midpoint and sharpness) of that single response, but precludes any predictions of the other seventeen strains. We then discuss how a thermodynamic model of simple repression coupled with a Hill approach to the induction response can both characterize an induction profile and predict the response of all eighteen strains, although we argue that such a description provides no insight into the allosteric nature of the protein and how mutations to the repressor would affect induction. We conclude the section by discussing the differences between such a model and the statistical mechanical model used in Chapter 2.

Fitting Induction Curves using a Hill Function Approach

The Hill equation is a phenomenological function commonly used to describe data with a sigmoidal profile  [2325]. Its simplicity and ability to estimate the cooperativity of a system (through the Hill coefficient) has led to its widespread use in many domains of biology  [26]. Nevertheless, the Hill function is often criticized as a physically unrealistic model and the extracted Hill coefficient is often difficult to contextualize in the physics of a system  [27]. In the present work, we note that a Hill function, even if it is only used because of its simplicity, presents no mechanism to understand how a regulatory system’s behavior will change if physical parameters such as repressor copy number or operator binding energy are varied. In addition, the Hill equation provides no foundation to explore how mutating the repressor (e.g., at its inducer-binding interface) would modify its induction profile, although statistical mechanical models have proved capable of characterizing such scenarios  [2830].

Consider the general Hill equation for a single induction profile given by \[ \text{fold-change} = (\text{leakiness}) + (\text{dynamic range}) \frac{\left( \frac{c}{K} \right)^n}{1 + \left( \frac{c}{K} \right)^n}, \label{eq:ch4_eq27} \] where, as in Chapter 2, the leakiness represents the minimum fold-change, the dynamic range represents the difference between the maximum and minimum fold-change, \(K\) is the repressor-inducer dissociation constant, and \(n\) denotes the Hill coefficient that characterizes the sharpness of the curve (\(n > 1\) signifies positive cooperativity, \(n = 1\) denotes no cooperativity, and \(n < 1\) represents negative cooperativity). Fig. 16 shows how the individual induction profiles can be fit (using the same Bayesian methods as described in Sec. 4.8 to this Hill response, yielding a similar response to that shown in Fig. ¿fig:ch2_fig05?. However, characterizing the induction response in this manner is unsatisfactory because each curve must be fit independently, thus removing our predictive power for other repressor copy numbers and binding sites.

The fitted parameters obtained from this approach are shown in Fig. 17. These are rather unsatisfactory because they do not reflect the properties of the physical system under consideration. For example, the dissociation constant \(K\) between LacI and inducer should not be affected by either the copy number of the repressor or the DNA binding energy. Yet, we see upward trends as \(R\) is increased or the binding energy is decreased. Here, the \(K\) parameter ultimately describes the midpoint of the induction curve and, therefore, cannot strictly be considered a dissociation constant. Similarly, the Hill coefficient \(n\) does not directly represent the cooperativity between the repressor and the inducer. The molecular details of the copy number and DNA binding strength are subsumed in this parameter. While the leakiness and dynamic range describe important phenotypic properties of the induction response, this Hill approach leaves us with no means to predict them for other strains. In summary, the Hill equation (Eq. \(\ref{eq:ch4_eq27}\)) cannot predict how an induction profile varies with repressor copy number, operator binding energy, or how mutations alter the induction profile. To that end, we turn to a more sophisticated approach where we use the Hill function to describe the available fraction of repressor as a function of inducer concentration.

Figure 16: Hill function and MWC analysis of each induction profile. Data for each individual strain was fit to the general Hill function in Fig. ¿fig:ch2_fig05?. (A) strains with O1 binding site, (B) strains with O2 binding site, and (C) strains with O3 binding site. Shaded regions indicate the bounds of the 95% credible region.
Figure 17: Parameter values for the Hill equation fit to each individual titration. The resulting fit parameters from the Hill function fits of Fig. 16 are summarized. The large parameter intervals for many of the O3 strains are due to the flatter induction profile (as seen by its smaller dynamic range) and the ability for a large range of K and n values to describe the data.

Fitting Induction Curves using a Combination Thermodynamic Model and Hill Function Approach

Motivated by the inability in the previous section to characterize all eighteen strains using the Hill function with a single set of parameters, here we combine the Hill approach with a thermodynamic model of simple repression to garner predictive power. More specifically, we will use the thermodynamic model in Fig. ¿fig:ch2_fig02?(A), but substitute the statistical model in Fig. ¿fig:ch2_fig02?(B) with the phenomenological Hill function (Eq. \(\ref{eq:ch4_eq27}\)).

Following Eqs. \(\ref{eq:p_bound_definition}\), \(\ref{eq:fold_change_definition}\), \(\ref{eq:fold_change_approx}\) and fold-change is given by \[ \text{fold-change} = \left( 1 + p_A(c) \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1} \label{eq:ch4_eq28} \] where the Hill function \[ p_A(c) = p_A^{\text{max}} - p_A^{\text{range}} \frac{\left( \frac{c}{K_D} \right)^n}{1 + \left( \frac{c}{K_D} \right)^n} \label{eq:ch4_eq29} \] represents the fraction of repressors in the allosterically active state, with \(p_A^{\text{max}}\) denoting the fraction of active repressors in the absence of inducer and \(p_A^{\text{max}} - p_A^{\text{range}}\) the minimum fraction of active repressors in the presence of saturating inducer. The Hill function characterizes the inducer-repressor binding while the thermodynamic model with the known constants \(R\), \(N_{NS}\), and \(\Delta\varepsilon_{RA}\) describes how the induction profile changes with repressor copy number and repressor-operator binding energy.

As in Chapter 2, we can fit the four Hill parameters—the vertical shift and stretch parameters \(p_A^{\text{max}}\) and \(p_A^{\text{range}}\), the Hill coefficient \(n\), and the inducer-repressor dissociation constant \(K_D\)—for a single induction curve and then use the fully characterized Eq. \(\ref{eq:ch4_eq27}\) to describe the response of each of the eighteen strains. Fig. 18 shows this process carried out by fitting the O2 \(R=260\) strain (white circles in Panel (B)) and predicting the behavior of the remaining seventeen strains.

Figure 18: A thermodynamic model coupled with a Hill analysis can characterize induction. Combining a thermodynamic model of simple repression with the Hill function to characterize the repressor-inducer binding successfully characterizes the induction profiles of all eighteen strains. As in the main text, data was only fit for the O2 R=260 strain using Eq. \ref{eq:ch4_eq27} and the parameters p_A^{\text{max}} =0.90^{+0.03}_{-0.01}, p_A^{\text{range}} = -0.90^{+0.02}_{-0.03}, n = 1.6_{-0.1}^{+0.2}, and K_D = 4^{+2}_{-1} \times 10^{-6}\,\text{M}. Shaded regions indicate bounds of the 95% credible region.

Although the curves in Fig. 18 are nearly identical to those in Fig. ¿fig:ch2_fig05? (which were made using the MWC model), we stress that the Hill function approach is more complex than the MWC model (containing four parameters instead of three) and obscures the relationships to the physical parameters of the system. For example, it is not clear whether the fit parameter \(K_D = 4^{+2}_{-1} \times 10^{-6}\,\text{M}\) relays the dissociation constant between the inducer and active-state repressor, between the inducer and the inactive-state repressor, or some mix of the two quantities.

In addition, the MWC model naturally suggests further quantitative tests for the fold-change relationship. For example, mutating the repressor’s inducer binding site would likely alter the repressor-inducer dissociation constants \(K_A\) and \(K_I\), and it would be interesting to find out if such mutations also modify the allosteric energy difference \(\Delta\varepsilon_{AI}\) between the repressor’s active and inactive conformations. For our purposes, the Hill function falls short of the connection to the physics of the system and provides no intuition about how transcription depends upon such mutations. For these reasons, we present the thermodynamic model coupled with the statistical mechanical MWC model approach in the paper.

Global Fit of All Parameters

In Chapter 2, we used the repressor copy numbers \(R\) and repressor-DNA binding energies \(\Delta\varepsilon_{RA}\) as reported by  [4]. However, any error in these previous measurements of \(R\) and \(\Delta\varepsilon_{RA}\) will necessarily propagate into our own fold-change predictions. This section takes an alternative approach to fitting the system’s physical parameters to that used in Chapter 2. First, rather than fitting only a single strain, we fit the entire data set in Fig. ¿fig:ch2_fig05? along with microscopy data for the synthetic operator Oid (see Sec. 4.9). In addition, we also simultaneously fit the parameters \(R\) and \(\Delta\varepsilon_{RA}\) using the prior information given by the previous measurements. By using the entire data set and fitting all of the parameters, we obtain the best possible characterization of the statistical mechanical parameters of the system, given our current state of knowledge. As a point of reference, we state all of the parameters of the MWC model derived in the text in tbl. 2.

To fit all of the parameters simultaneously, we follow a similar approach to the one detailed in the Methods section. Briefly, we perform a Bayesian parameter estimation of the dissociation constants \(K_A\) and \(K_I\), the six different repressor copy numbers \(R\) corresponding to the six lacI ribosomal binding sites used in our work, and the four different binding energies \(\Delta \varepsilon_{RA}\) characterizing the four distinct operators used to make the experimental strains. As in Chapter 2, we fit the logarithms \(\tilde{k}_A = -\log \frac{K_A}{1\,\text{M}}\) and \(\tilde{k}_I = -\log \frac{K_I}{1\,\text{M}}\) of the dissociation constants, which grants better numerical stability.

As in Eqs. \(\ref{eq:ch4_eq28}\) and \(\ref{eq:ch4_eq29}\), we assume that deviations of the experimental fold-change from the theoretical predictions are normally distributed with mean zero and standard deviation \(\sigma\). We begin by writing Bayes’ theorem, \[ P(\tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma \mid D) = \frac{P(D \mid \tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma) P(\tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma)}{P(D)}, \label{eq:ch4_eq30} \] where \(\textbf{R}\) is an array containing the six different repressor copy numbers to be fit, \(\mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}\) is an array containing the four binding energies to be fit, and \(D\) is the experimental fold-change data. The term \(P(\tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma \mid D)\) gives the probability distributions of all of the parameters given the data. The term \({\small P(D \mid \tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma)}\) represents the likelihood of having observed our experimental data given some value for each parameter. \(P(\tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma)\) contains all the prior information on the values of these parameters. Lastly, \(P(D)\) serves as a normalization constant and hence can be ignored.

Given \(n\) independent measurements of the fold-change, the first term in Eq. \(\ref{eq:ch4_eq30}\) can be written as \[ {\scriptstyle P(D \mid \tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma) = \frac{1}{(2\pi\sigma^2)^{\frac{n}{2}}}\prod\limits_{i=1}^n \exp \left[-\frac{(\text{fc}^{(i)}_{\exp} - \text{fc}(\tilde{k}_A, \tilde{k}_I, R^{(i)}, \Delta\varepsilon_{RA}^{(i)}, c^{(i)}))^2}{2\sigma^2}\right], } \label{eq:ch4_eq31} \] where \(\text{fc}^{(i)}_{\text{exp}}\) is the \(i^{\text{th}}\) experimental fold-change and \(\text{fc}(\cdot\cdot\cdot)\) is the theoretical prediction. Note that the standard deviation \(\sigma\) of this distribution is not known and hence needs to be included as a parameter to be fit.

The second term in Eq. \(\ref{eq:ch4_eq30}\) represents the prior information of the parameter values. We assume that all parameters are independent of each other so that \[ P(\tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta\boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma) = P(\tilde{k}_A ) \cdot P(\tilde{k}_I ) \cdot \prod_i P(R^{(i)}) \cdot \prod_j P(\Delta\varepsilon_{RA}^{(j)}) \cdot P(\sigma), \label{eq:ch4_eq32} \] where the superscript \((i)\) indicates the repressor copy number of index \(i\) and the superscript \((j)\) denotes the binding energy of index \(j\). As above, we note that a prior must also be included for the unknown parameter \(\sigma\).

Because we knew nothing about the values of \(\tilde{k}_A\), \(\tilde{k}_I\), and \(\sigma\) before performing the experiment, we assign maximally uninformative priors to each of these parameters. More specifically, we assign uniform priors to \(\tilde{k}_A\) and \(\tilde{k}_I\) and a Jeffreys prior to \(\sigma\), indicating that \(K_A\), \(K_I\), and \(\sigma\) are scale parameters  [31]. We do, however, have prior information for the repressor copy numbers and the repressor-DNA binding energies from  [4]. This prior knowledge is included within our model using an informative prior for these two parameters, which we assume to be Gaussian. Hence each of the \(R^{(i)}\) repressor copy numbers to be fit satisfies \[ P(R^{(i)}) = \frac{1}{\sqrt{2\pi\sigma_{R_i}^2}} \exp \left(-\frac{(R^{(i)} - \bar{R}^{(i)})^2}{2 \sigma_{R_i}^2} \right), \label{eq:ch4_eq33} \] where \(\bar{R}^{(i)}\) is the mean repressor copy number and \(\sigma_{R_i}\) is the variability associated with this parameter as reported in  [4]. Note that we use the given value of \(\sigma_{R_i}\) from previous measurements rather than leaving this as a free parameter.

Similarly, the binding energies \(\Delta\varepsilon_{RA}^{(j)}\) are also assumed to have a Gaussian informative prior of the same form. We write it as \[ P(\Delta\varepsilon_{RA}^{(j)}) = \frac{1}{\sqrt{2\pi\sigma_{\varepsilon_j}^2}} \exp \left(- \frac{(\Delta\varepsilon_{RA}^{(j)} - \Delta\bar{\varepsilon}_{RA}^{(j)})^2}{2 \sigma_{\varepsilon_j}^2} \right), \label{eq:ch4_eq34} \] where \(\Delta\bar{\varepsilon}_{RA}^{(j)}\) is the binding energy and \(\sigma_{\varepsilon_j}\) is the variability associated with that parameter around the mean value as reported in  [4].

The \(\sigma_{R_i}\) and \(\sigma_{\varepsilon_j}\) parameters will constrain the range of values for \(R^{(i)}\) and \(\Delta\varepsilon_{RA}^{(j)}\) found from the fitting. For example, if for some \(i\) the standard deviation \(\sigma_{R_i}\) is very small, it implies strong confidence in the previously reported value. Mathematically, the exponential in Eq. \(\ref{eq:ch4_eq33}\) will ensure that the best-fit \(R^{(i)}\) lies within a few standard deviations of \(\bar{R}^{(i)}\). Since we are interested in exploring which values could give the best fit, the errors are taken to be wide enough to allow the parameter estimation to explore parameter space in freely the vicinity of the best estimates. Putting all these terms together, we use Markov chain Monte Carlo to sample the posterior distribution \(P(\tilde{k}_A, \tilde{k}_I, \mathbf{R}, \mathbf{\Delta \boldsymbol{\varepsilon}_{\textbf{RA}}}, \sigma \mid D)\), enabling us to determine both the most likely value for each physical parameter as well as its associated credible region (see the GitHub repository for the implementation).

Fig. 19 shows the result of this global fit. When compared with Fig. ¿fig:ch2_fig05?, we can see that fitting for the binding energies and the repressor copy numbers improve the agreement between the theory and the data. tbl. 3 summarizes the values of the parameters as obtained with this MCMC parameter inference. We note that even though we allowed the repressor copy numbers and repressor-DNA binding energies to vary, the resulting fit values were very close to the previously reported values. The fit values of the repressor copy numbers were all within one standard deviation of the previously reported values provided in  [4]. And although some of the repressor-DNA binding energies differed by a few standard deviations from the reported values, the differences were always less than \(1~k_BT\), representing a small change in the biological scales we are considering. The biggest discrepancy between our fit values and the previous measurements arose for the synthetic Oid operator, which we discuss in more detail in Sec. 4.9.

Fig. 20 shows the same key properties as in Fig. ¿fig:ch2_fig06?, but uses the parameters obtained from this global fitting approach. We note that even by increasing the number of degrees of freedom in our fit, the result does not change substantially due to only minor improvements between the theoretical curves and data. For the O3 operator data, again, the agreement between the predicted \([EC_{50}]\) and the effective Hill coefficient remains poor due to the theory being unable to capture the steepness of the response curves.

Table 2: Key model parameters for induction of an allosteric repressor.
Parameter Description
\(c\) Concentration of the inducer
\(K_A, K_I\) Dissociation constant between an inducer and the repressor in the active/inactive state
\(\Delta \varepsilon_{AI}\) The difference between the free energy of repressor in the inactive and active states
\(\Delta\varepsilon_{P}\) Binding energy between the RNAP and its specific binding site
\(\Delta\varepsilon_{RA}, \Delta\varepsilon_{RI}\) Binding energy between the operator and the active/inactive repressor
\(n\) Number of inducer binding sites per repressor
\(P\) Number of RNAP
\(R_A, R_I, R\) Number of active/inactive/total repressors
\(p_A = \frac{R_A}{R}\) Probability that a repressor will be in the active state
\(p_{\text{bound}}\) Probability that an RNAP is bound to the promoter of interest, assumed to be proportional to gene expression
\(\text{fold-change}\) Ratio of gene expression in the presence of repressor to that in the absence of repressor
\(F\) Free energy of the system
\(N_{NS}\) The number of non-specific binding sites for the repressor in the genome
\(\beta = \frac{1}{k_B T}\) The inverse product of the Boltzmann constant \(k_B\) and the temperature \(T\) of the system
Figure 19: Global fit of dissociation constants, repressor copy numbers, and binding energies. Theoretical predictions resulting from simultaneously fitting the dissociation constants K_A and K_I, the six repressor copy numbers R, and the four repressor-DNA binding energies \Delta\varepsilon_{RA} using the entire data set from Fig. ¿fig:ch2_fig05? as well as the microscopy data for the Oid operator. Error bars of experimental data show the standard error of the mean (eight or more replicates), and shaded regions denote the 95% credible region. Where error bars are not visible, they are smaller than the point itself. All of the data points are shown for the Oid operator since a smaller number of replicates were taken. The shaded regions are significantly smaller than in Fig. ¿fig:ch2_fig05? because this fit was based on all data points, and hence the fit parameters are much more tightly constrained. The dashed lines at 0 IPTG indicate a linear scale, whereas solid lines represent a log scale.
Figure 20: Key properties of induction profiles as predicted with a global fit using all available data. Data for the (A) leakiness, (B) saturation, and (C) dynamic range are obtained from fold-change measurements in Fig. ¿fig:ch2_fig05? in the absence and presence of IPTG. All prediction curves were generated using the parameters listed in tbl. 3. Both the (D) [EC_{50}] and (E) effective Hill coefficient are inferred by individually fitting all parameters–K_A,\, K_I,\, R,\, \Delta\varepsilon_{RA}–to each operator-repressor pairing in Fig. ¿fig:ch2_fig04?(A)-(C) separately to Eq. \ref{eq:fold_change_full} to smoothly interpolate between the data points. Note that the error bars are smaller than some of the points.
Table 3: Global fit of all parameter values using the entire data set in Fig. ¿fig:ch2_fig05?. In addition to fitting the repressor inducer dissociation constants \(K_A\) and \(K_I\) as was done in the text, we also fit the repressor DNA binding energy \(\Delta\varepsilon_{RA}\) as well as the repressor copy numbers \(R\) for each strain. The middle columns show the previously reported values for all \(\Delta\varepsilon_{RA}\) and \(R\) values, with \(\pm\) representing the standard deviation of three replicates. The right column shows the global fits from this work, with the subscript and superscript notation denoting the 95% credible region. Note that there is overlap between all of the repressor copy numbers and that the net difference in the repressor-DNA binding energies is less than \(1~k_B T\). The logarithms \(\tilde{k}_A = -\log \frac{K_A}{1\,\text{M}}\) and \(\tilde{k}_I = -\log \frac{K_I}{1\,\text{M}}\) of the dissociation constants were fit for numerical stability.
Reported Values  [4] Global Fit
\(\tilde{k}_A\) \(-\) \(-5.33^{+0.06}_{-0.05}\)
\(\tilde{k}_I\) \(-\) \(0.31^{+0.05}_{-0.06}\)
\(K_A\) \(-\) \(205^{+11}_{-12}\,\mu\text{M}\)
\(K_I\) \(-\) \(0.73^{+0.04}_{-0.04}\,\mu\text{M}\)
\(R_{22}\) \(22 \pm 4\) \(20^{+1}_{-1}\)
\(R_{60}\) \(60 \pm 20\) \(74^{+4}_{-3}\)
\(R_{124}\) \(124 \pm 30\) \(130^{+6}_{-6}\)
\(R_{260}\) \(260 \pm 40\) \(257^{+9}_{-11}\)
\(R_{1220}\) \(1220 \pm 160\) \(1191^{+32}_{-55}\)
\(R_{1740}\) \(1740 \pm 340\) \(1599^{+75}_{-87}\)
O1 \(\Delta\varepsilon_{RA}\) \(-15.3 \pm 0.2~k_BT\) \(-15.2^{+0.1}_{-0.1}~k_BT\)
O2 \(\Delta\varepsilon_{RA}\) \(-13.9 \pm 0.2~k_BT\) \(-13.6^{+0.1}_{-0.1}~k_BT\)
O3 \(\Delta\varepsilon_{RA}\) \(-9.7 \pm 0.1~k_BT\) \(-9.4^{+0.1}_{-0.1}~k_BT\)
Oid \(\Delta\varepsilon_{RA}\) \(-17.0 \pm 0.2~k_BT\) \(-17.7^{+0.2}_{-0.1}~k_BT\)

Applicability of Theory to the Oid Operator Sequence

In addition to the native operator sequences (O1, O2, and O3) considered in Chapter 2, we were also interested in testing our model predictions against the synthetic Oid operator. In contrast to the other operators, Oid is one base pair shorter in length (\(20\,\text{bp}\)), is fully symmetric, and is known to provide stronger repression than the native operator sequences considered so far. While the theory should be similarly applicable, measuring the lower fold-changes associated with this YFP construct was expected to be near the sensitivity limit for our flow cytometer due to the especially strong binding energy of Oid (\(\Delta \varepsilon_{RA}=-17.0 ~k_BT\))  [20]. Accordingly, fluorescence data for Oid were obtained using microscopy, which is more sensitive than flow cytometry. Sec. 4.6 gives a detailed explanation of how microscopy measurements were used to obtain induction curves.

We follow the approach of Chapter 2 and make fold-change predictions based on the parameter estimates from our strain with \(R=260\) and an O2 operator. These predictions are shown in Fig. 21(A), where we also plot data taken in triplicate for strains containing \(R=\) 22, 60, and 124, obtained by single-cell microscopy. We find that the data are systematically below the theoretical predictions. We also considered our global fitting approach (see Sec. 4.8) to see whether we might find better agreement with the observed data. Interestingly, we find that the majority of the parameters remain essentially unchanged, but our estimate for the Oid binding energy \(\Delta \varepsilon_{RA}\) is shifted to \(-17.7~k_BT\) instead of the value \(-17.0~k_BT\) found by  [4]. In Fig. 21(B), we again plot the Oid fold-change data with theoretical predictions using the new estimate for the Oid binding energy from our global fit. This parameter modification gives substantially better agreement between theory and data.

Figure 21: Predictions of fold-change for strains with an Oid binding sequence versus experimental measurements with different repressor copy numbers. Experimental data is plotted against the parameter-free predictions that are based on our fit to the O2 strain with R=260. Here we use the previously measured binding energy \Delta\varepsilon_{RA}=-17.0~k_BT  [4]. The same experimental data is plotted against the best-fit parameters using the complete O1, O2, O3, and Oid data sets to infer K_A, K_I, repressor copy numbers, and the binding energies of all operators (see Sec. 4.8). Here the major difference in the inferred parameters is a shift in the binding energy for Oid from \Delta\varepsilon_{RA}=-17.0~k_BT to \Delta\varepsilon_{RA}=-17.7~k_BT, which now shows agreement between the theoretical predictions and experimental data. Shaded regions from the theoretical curves denote the 95% credible region. These are narrower in Panel because the inference of parameters was performed with much more data, and hence the best-fit values are more tightly constrained. Individual data points are shown due to the small number of replicates. The dashed lines at 0 IPTG indicate a linear scale, whereas solid lines represent a log scale.

Fig. 22 shows the cumulative data from  [4] and  [1], as well as our data with \(c=0 \, \mu \text{M}\), which all measured fold-change for the same simple repression architecture utilizing different reporters and measurement techniques. We find that the binding energies from the global fit, including \(\Delta \varepsilon_{RA}=-17.7~k_BT\), compare reasonably well with all previous measurements.

Figure 22: Comparison of fold-change predictions based on binding energies from Garcia and Phillips and those inferred from this work. Fold-change curves for the different repressor-DNA binding energies \Delta\varepsilon_{RA} are plotted as a function of repressor copy number when IPTG concentration c=0. Solid curves use the binding energies determined from  [4], while the dashed curves use the inferred binding energies we obtained when performing a global fit of K_A, K_I, repressor copy numbers, and the binding energies using all available data from our work. Fold-change measurements from our experiments (outlined circles)  [4] (solid circles), and  [1] (diamonds) show that the small shifts in binding energy that we infer are still in agreement with prior data. Note that only a single flow cytometry data point is shown for Oid from this study, since the R=60 and R=124 curves from Fig. 21 had extremely low fold-change in the absence of inducer (c=0) to be indistinguishable from autofluorescence, and their fold-change values in this limit were negative and hence do not appear on this plot.

Comparison of Parameter Estimation and Fold-Change Predictions across Strains

The inferred parameter values for \(K_A\) and \(K_I\) in Chapter 2 were determined by fitting to induction fold-change measurements from a single strain (\(R=260\), \(\Delta\varepsilon_{RA} = -13.9~k_BT\), \(n=2\), and \(\Delta\varepsilon_{AI}=4.5~k_BT\)). After determining these parameters, we were able to predict the fold-change of the remaining strains without any additional fitting. However, the theory should be independent of the specific strain used to estimate \(K_A\) and \(K_I\); using any alternative strain to fit \(K_A\) and \(K_I\) should yield similar predictions. For the sake of completeness, here we discuss the values for \(K_A\) and \(K_I\) that are obtained by fitting to each of the induction data sets individually. These fit parameters are shown in Fig. ¿fig:ch2_fig04?(D), where we find close agreement between strains, but with some deviation and poorer inferences observed with the O3 operator strains. Overall, we find that regardless of which strain is chosen to determine the unknown parameters, the predictions laid out by the theory closely match the experimental measurements. Here we present a comparison of the strain-specific predictions and measured fold-change data for each of the three operators considered.

We follow the approach taken in Chapter 2 and use Eq. \(\ref{eq:fold_change_full}\) to infer values for \(K_A\) and \(K_I\) by fitting to each combination of binding energy \(\Delta \varepsilon_{RA}\) and repressor copy number \(R\). We then use these fitted parameters to predict the induction curves of all other strains. In Fig. 23, we plot these fold-change predictions along with experimental data for each of our strains that contain an O1 operator. To make sense of this plot, consider the first row as an example. In the first row, \(K_A\) and \(K_I\) were estimated using data from the strain containing \(R=22\) and an O1 operator (top leftmost plot, shaded in gray). The remaining plots in this row show the predicted fold-change using these values for \(K_A\) and \(K_I\). In each row, we then infer \(K_A\) and \(K_I\) using data from a strain containing a different repressor copy number (\(R=60\) in the second row, \(R=124\) in the third row, and so on). In Fig. 24 and Fig. 25, we similarly apply this inference to our strains with O2 and O3 operators, respectively. We note that the overwhelming majority of predictions closely match the experimental data. The notable exception is that using the \(R=22\) strain provides poor predictions for the strains with large copy numbers (especially \(R=1220\) and \(R=1740\)), though it should be noted that predictions made from the \(R=22\) strain have considerably broader credible regions. This loss in predictive power is due to the poorer estimates of \(K_A\) and \(K_I\) for the \(R=22\) strain as shown in Fig. ¿fig:ch2_fig04?(D).

Figure 23: O1 strain fold-change predictions based on strain-specific parameter estimation of \boldsymbol{K_A} and \boldsymbol{K_I}. Fold-change in expression is plotted as a function of IPTG concentration for all strains containing an O1 operator. The solid points correspond to the mean experimental value. The solid lines correspond to Eq. \ref{eq:fold_change_full} using the parameter estimates of K_A and K_I. Each row uses a single set of parameter values based on the strain noted on the left axis. The shaded plots along the diagonal are those where the parameter estimates are plotted along with the data used to infer them. Values for repressor copy number and operator binding energy are from  [4]. The shaded region on the curve represents the uncertainty from our parameter estimates and reflects the 95% highest probability density region of the parameter predictions.
Figure 24: O2 strain fold-change predictions based on strain-specific parameter estimation of \boldsymbol{K_A} and \boldsymbol{K_I}. Fold-change in expression is plotted as a function of IPTG concentration for all strains containing an O2 operator. The plots and data shown are analogous to Fig. 23, but for the O2 operator.
Figure 25: O3 strain fold-change predictions based on strain-specific parameter estimation of \boldsymbol{K_A} and \boldsymbol{K_I}. Fold-change in expression is plotted as a function of IPTG concentration for all strains containing an O3 operator. The plots and data shown are analogous to Fig. 23, but for the O3 operator. We note that when using the R=22 O3 strain to predict K_A and K_I, the large uncertainty in the estimates of these parameters (see Fig. ¿fig:ch2_fig04?(D)) leads to correspondingly wider credible regions.

Properties of Induction Titration Curves

In this section, we expand on the phenotypic properties of the induction response that were explored in Chapter 2 (see Fig. ¿fig:ch2_fig01?). We begin by expanding on our discussion of dynamic range and then show the analytic form of the \([EC_{50}]\) for simple repression.

As stated in Chapter 2, the dynamic range is defined as the difference between the maximum and minimum system response, or equivalently, as the difference between the saturation and leakiness of the system. Using Eqs. \(\ref{eq:leakiness}\), \(\ref{eq:saturation}\), and \(\ref{eq:dynamic_range_def}\), the dynamic range is given by \[ {\scriptstyle \text{dynamic range} = \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n }\frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1} - \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} }} \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}. } \label{eq:ch4_eq35} \] The dynamic range, saturation, and leakiness were plotted with our experimental data in Fig. ¿fig:ch2_fig06?(A)-(C) as a function of repressor copy number. Fig. 26 shows how these properties are expected to vary as a function of the repressor-operator binding energy. Note that the resulting curves for all three properties have the same shape as in Fig. ¿fig:ch2_fig06?(A)-(C), since the dependence of the fold-change upon the repressor copy number and repressor-operator binding energy are both contained in a single multiplicative term, \(R e^{-\beta \Delta\varepsilon_{RA}}\). Hence, increasing \(R\) on a logarithmic scale (as in Fig. ¿fig:ch2_fig06?(A)-(C)) is equivalent to decreasing \(\Delta\varepsilon_{RA}\) on a linear scale (as in Fig. 26).

An interesting aspect of the dynamic range is that it exhibits a peak as a function of either the repressor copy number (or equivalently of the repressor-operator binding energy). Differentiating the dynamic range Eq. \(\ref{eq:ch4_eq35}\) and setting it equal to zero, we find that this peak occurs at \[ \frac{R^*}{N_{NS}} = e^{-\beta (\Delta\varepsilon_{AI} - \Delta\varepsilon_{RA})} \sqrt{e^{\Delta\varepsilon_{AI}} + 1} \sqrt{e^{\Delta\varepsilon_{AI}} + \left( \frac{K_A}{K_I} \right)^n}. \label{eq:ch4_eq36} \] The magnitude of the peak is given by \[ \text{max dynamic range} = \frac{\left( \sqrt{e^{\Delta\varepsilon_{AI}} + 1} - \sqrt{e^{\Delta\varepsilon_{AI}} + \left( \frac{K_A}{K_I} \right)^n} \right)^2}{\left( \frac{K_A}{K_I} \right)^n - 1}, \label{eq:ch4_eq37} \] which is independent of the repressor-operator binding energy \(\Delta\varepsilon_{RA}\) or \(R\), and will only cause a shift in the location of the peak but not its magnitude.

Figure 26: Dependence of leakiness, saturation, and dynamic range on the operator binding energy and repressor copy number. Increasing the repressor copy number or decreasing the repressor-operator binding energy suppresses gene expression and decreases the leakiness and saturation. The dynamic range retains its shape but shifts right as the repressor copy number increases. The peak in the dynamic range can be understood by considering the two extremes for \Delta \varepsilon_{RA}: for small repressor-operator binding energies, the leakiness is small, but the saturation increases with \Delta \varepsilon_{RA}; for large repressor-operator binding energies, the saturation is near unity, and the leakiness increases with \Delta \varepsilon_{RA}, thereby decreasing the dynamic range. Repressor copy number does not affect the maximum dynamic range. Circles, diamonds, and squares represent \Delta \varepsilon_{RA} values for the O1, O2, and O3 operators, respectively, demonstrating the expected values of the properties using those strains.

We now consider the two remaining properties, the \([EC_{50}]\) and effective Hill coefficient, which determine the horizontal properties of a system—that is, they determine the range of inducer concentration in which the system’s response goes from its minimum to maximum values. The \([EC_{50}]\) denotes the inducer concentration required to generate fold-change halfway between its minimum and maximum value and was defined implicitly in Eq. \(\ref{eq:ec50}\). For the simple repression system, the \([EC_{50}]\) is given by \[ \frac{[EC_{50}]}{K_A} = \frac{\frac{K_A}{K_I} - 1} {\frac{K_A}{K_I} - \left( \frac{\left( 1 + \frac{R}{N_{NS} } e^{-\beta \Delta\varepsilon_{RA}} \right) + \left( \frac{K_A}{K_I} \right)^n \left( 2 e^{-\beta \Delta \varepsilon_{AI}} + \left( 1 + \frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right) \right) } { 2 \left( 1 + \frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right) + e^{-\beta \Delta \varepsilon_{AI}} + \left( \frac{K_A}{K_I} \right)^n e^{-\beta \Delta \varepsilon_{AI}} } \right)^{\frac{1}{n}}} - 1. \label{eq:ch4_eq38} \] Using this expression, we can then find the effective Hill coefficient \(h\), which equals twice the log-log slope of the normalized fold-change evaluated at \(c = [EC_{50}]\) (see Eq. \(\ref{eq:effective_Hill}\)). In Fig. ¿fig:ch2_fig06?(D)-(E), we show how these two properties vary with repressor copy number, and in Fig. 27 we demonstrate how they depend on the repressor-operator binding energy. Both the \([EC_{50}]\) and \(h\) vary significantly with repressor copy numbers for sufficiently strong operator binding energies. Interestingly, for weak operator binding energies on the order of the O3 operator, it is predicted that the effective Hill coefficient should not vary with repressor copy number. In addition, the maximum possible Hill coefficient is roughly 1.75, which stresses the point that the effective Hill coefficient should not be interpreted as the number of inducer binding sites, which is precisely 2.

Figure 27: \boldsymbol{[EC_{50}]} and effective Hill coefficient depend strongly on repressor copy number and operator binding energy. [EC_{50}] values range from very small and tightly clustered at weak operator binding energies (e.g. O3) to relatively large and spread out for stronger operator binding energies (O1 and O2). The effective Hill coefficient generally decreases with increasing repressor copy number, indicating a flatter normalized response. The maximum possible Hill coefficient is roughly 1.75 for all repressor-operator binding energies. Circles, diamonds, and squares represent \Delta \varepsilon_{RA} values for the O1, O2, and O3 operators, respectively.

Applications to Other Regulatory Architectures

This section discusses how the theoretical framework presented in this work is sufficiently general to include various regulatory architectures outside of simple repression by LacI. We begin by noting that the same formula for fold-change given in Eq. \(\ref{eq:fold_change_full}\) can also describe corepression. We then demonstrate how our model can be generalized to include other architectures, such as a coactivator binding to an activator to promote gene expression. In each case, we briefly describe the system and describe its corresponding theoretical description. For further details, we invite the interested reader to read  [32,33].

Corepression

Consider a regulatory architecture where binding of a transcriptional repressor impedes the binding of RNAP to the DNA. A corepressor molecule binds to the repressor and shifts its allosteric equilibrium towards the active state in which it binds more tightly to the DNA, thereby decreasing gene expression (in contrast, an inducer shifts the allosteric equilibrium towards the inactive state where the repressor binds more weakly to the DNA). As in Chapter 2, we can enumerate the states and statistical weights of the promoter and the allosteric states of the repressor. We note that these states and weights exactly match Fig. ¿fig:ch2_fig02? and yield the same fold-change equation as Eq. \(\ref{eq:fold_change_full}\), \[ \text{fold-change} \approx \left(1 + \frac{\left(1 + \frac{c}{K_A}\right)^n}{ \left(1 + \frac{c}{K_A}\right)^n + e^{\beta\Delta\varepsilon_{AI}} \left(1 + \frac{c}{K_I}\right)^n} \frac{R}{N_{NS}} e^{-\beta\Delta\varepsilon_{RA}}\right)^{-1}, \label{eq:ch4_eq39} \] where \(c\) now represents the concentration of the corepressor molecule. Mathematically, the difference between these two architectures can be seen in the relative sizes of the dissociation constants \(K_A\) and \(K_I\) between the inducer and repressor in the active and inactive states, respectively. The corepressor is defined by \(K_A < K_I\) since the corepressor favors binding to the repressor’s active state; an inducer must satisfy \(K_I < K_A\), as was found in Chapter 2 from the induction data (see Fig. ¿fig:ch2_fig04?). Much as was performed in Chapter 2, we can make some predictions about the response of a corepressor. In Fig. 28(A), we show how varying the repressor copy number \(R\) and the repressor-DNA binding energy \(\Delta\varepsilon_{RA}\) influence the response. We draw the reader’s attention to the decrease in fold-change as the concentration of the effector is increased.

Activation

We now turn to the case of activation. While this architecture was not studied in this work, we wish to demonstrate how the framework presented here can be extended to include transcription factors other than repressors. To that end, we consider a transcriptional activator that binds to DNA and aids in the binding of RNAP through energetic interaction term \(\varepsilon_{AP}\). Note that in this architecture, binding of the activator does not occlude binding of the polymerase. The binding of a coactivator molecule shifts its allosteric equilibrium towards the active state (\(K_A < K_I\)), where the activator is more likely to be bound to the DNA and promote expression. Enumerating all of the states and statistical weights of this architecture and making the approximation that the promoter is weak generates a fold-change equation of the form \[ \text{fold-change} = \frac{1 + \frac{\left(1 + \frac{c}{K_A}\right)^n}{ \left( 1 + \frac{c}{K_A}\right)^n + e^{\beta\Delta\varepsilon_{AI}} \left(1 + \frac{c}{K_I}\right)^n}\frac{A}{N_{NS}} e^{-\beta\Delta\varepsilon_{AA}}e^{-\beta\varepsilon_{AP}}}{ 1 + \frac{\left(1 + \frac{c}{K_A}\right)^n}{ \left(1 + \frac{c}{K_A}\right)^n + e^{\beta\Delta\varepsilon_{AI}} \left(1 + \frac{c}{K_I}\right)^n}\frac{A}{N_{NS}} e^{-\beta\Delta\varepsilon_{AA}}}, \label{eq:ch4_eq40} \] where \(A\) is the total number of activators per cell, \(c\) is the concentration of a coactivator molecule, \(\Delta\varepsilon_{AA}\) is the binding energy of the activator to the DNA in the active allosteric state, and \(\varepsilon_{AP}\) is the interaction energy between the activator and the RNAP. Unlike in the cases of induction and corepression, the fold-change formula for activation includes terms from when the RNAP is bound by itself on the DNA and when both RNAP and the activator are simultaneously bound to the DNA. Fig. 28(B) explores predictions of the fold-change in gene expression by manipulating the activator copy number, DNA binding energy, and the polymerase-activator interaction energy. Note that with this activation scheme, the fold-change must necessarily be greater than one. An interesting feature of these predictions is the observation that even small changes in the interaction energy (\(< 0.5~k_BT\)) can dramatically increase fold-change.

As in the case of induction, the Eq. \(\ref{eq:ch4_eq40}\) is straightforward to generalize. For example, the relative values of \(K_I\) and \(K_A\) can be switched such that \(K_I < K_A\) in which the secondary molecule drives the activator to assume the inactive state represents induction of an activator. Thus, while these cases might be viewed as separate biological phenomena, they can all be described by the same underlying formalism mathematically.

Figure 28: Representative fold-change predictions for allosteric corepression and activation. (A) Contrary to the case of induction described in Chapter 2, the addition of a corepressor decreases fold-change in gene expression. The left and right panels demonstrate how varying the values of the repressor copy number R and repressor-DNA binding energy \Delta\varepsilon_{RA}, respectively, change the predicted response profiles. (B) In the case of inducible activation, binding of an effector molecule to an activator transcription factor increases the fold-change in gene expression. Note that for activation, the fold-change is greater than 1. The left and center panels show how changing the activator copy number A and activator-DNA binding energy \Delta\varepsilon_{AA} alter the response, respectively. The right panel shows how varying the polymerase-activator interaction energy \varepsilon_{AP} alters the fold-change. Relatively small perturbations to this energetic parameter drastically change the level of activation and play a major role in dictating the dynamic range of the system.

Definition of the non-specific background \(N_{NS}\)

In this section, we will explore the definition of the non-specific background \(N_{NS}\). As raised by an anonymous reviewer, the nature of this parameter seems to raise some controversy on what the right value should be, or whether or not the arbitrary definition of its value should also be applied to the \(\Delta\varepsilon_{AI}\) parameter.

Specifically, during the first round, a reviewer did not like the idea that the value of \(N_{NS} = 4.6 \times 10^6\) assumed that the entirety of the genome was available for non-specific binding of the repressor. We will consider how reasonable this is at the end of the section. However, as we will show first, the specific value of \(N_{NS}\) is analogous to the zero potential energy or the reference concentration state. Thus, it is only the free energy differences that matter at the end of the day. For the second round of reviews, the same reviewer was willing to agree on our point if and only if we were to acknowledge that other parameters such as \(\Delta\varepsilon_{AI}\), the free energy difference between the active and inactive state of the repressor, also had an arbitrary definition that could be set to any value. In this section, we will show that such a statement is an erroneous interpretation of the parameters. This free energy difference value cannot be re-defined to take any value if one is consistent with the experimental data.

Let us start by showing why the specific value of \(N_{NS}\) is not the critical variable. Under the weak promoter approximation, the fold-change equation is equivalent to a two-state Fermi function of having the promoter occupied by a repressor or having an empty promoter. This is \[ \text{fold-change} \rightarrow p^r_{bound} = \frac{1}{ 1 + \frac{R}{N_{NS}} e^{-\beta\Delta\varepsilon_{RA}}}. \label{eq:ch4_eq41} \] This expression can be rewritten as \[ p^r_{bound} = \frac{1}{ 1 + e^{-\beta\Delta E}}, \label{eq:ch4_eq42} \] where \(\Delta E\) is the free energy difference between the empty and occupied promoter. This definition implies that \[ \Delta E\equiv \overbrace{\Delta\varepsilon_{RA}}^{\text{enthalpic term}} - \overbrace{k_BT \ln \left( \frac{R}{N_{NS}} \right)}^{\text{entropic term}}. \label{eq:ch4_eq43} \]

Given that the parameter \(\Delta\varepsilon_{RA}\) is inferred rather than directly measured, this puts us in the position of being able to re-define \(N_{NS}\) at will as long as \(\Delta E\) is in accordance with the experimental data. In other words, the parameter that matters is the free energy difference rather than its components. For example, if for a given operator and a given repressor copy number we choose a different value of \(N_{NS}\), it still should hold that \[ \Delta E= \Delta\varepsilon_{RA}' - k_BT \ln \left( \frac{R}{N_{NS}'} \right), \label{eq:ch4_eq44} \] where \(N_{NS}'\) is the changed value of the non-specific background and \(\Delta\varepsilon_{RA}'\) is a different value for the repressor binding energy that compensates for the difference in the non-specific background.

Let \(N_{NS}' \equiv \alpha N_{NS}\). Since the value of \(\Delta E\) has to be preserved, it should be true that \[ \Delta E= \Delta\varepsilon_{RA}' - k_BT \ln \left( \frac{R}{\alpha N_{NS}} \right) = \Delta\varepsilon_{RA} - k_BT \ln \left( \frac{R}{N_{NS}} \right). \label{eq:ch4_eq45} \] Solving Eq. \(\ref{eq:ch4_eq45}\) for \(\Delta\varepsilon_{RA}'\) gives \[ \begin{split} \Delta\varepsilon_{RA}' &= \Delta\varepsilon_{RA} + k_BT \ln \left( \frac{N_{NS}}{\alpha N_{NS}} \right)\\ &= \Delta\varepsilon_{RA} - k_BT \ln \alpha. \end{split} \label{eq:ch4_eq46} \]

Eq. \(\ref{eq:ch4_eq46}\) implies that we can redefine \(N_{NS}\) to be any value as long as \(\Delta\varepsilon_{RA}\) compensates to maintain the value of \(\Delta E\). This statement holds true whether we are considering a single promoter or multiple promoters. The same cannot be said about the \(\Delta\varepsilon_{AI}\) parameter. The parameter \(\Delta\varepsilon_{AI}\) by itself sets the fraction of inactive repressors in the absence of inducer via \[ p_{act} = \frac{1}{1 + e^{-\beta\Delta\varepsilon_{AI}}}, \label{eq:ch4_eq47} \] where we have a Fermi function for a two-state system in which the repressor can be in an active or inactive state again.

As shown before, the reason why we could define \(N_{NS}\) to be any value is that the parameter that matters is itself \(\Delta E\) the free energy difference. Therefore the repressor binding energy \(\Delta\varepsilon_{RA}\) could compensate for changes in the value of \(N_{NS}\). For the case of \(\Delta\varepsilon_{AI}\) Eq. \(\ref{eq:ch4_eq47}\) tells us that \(\Delta\varepsilon_{AI}\) has no entropic term that can be compensated with an enthalpic term, or vice versa.

One could argue that for the case of a single promoter, the fold-change equation does allow this parameter to be re-defined arbitrarily since the full equation in the absence of inducer can be written as \[ \text{fold-change} = \frac{1}{ 1 + \left( \frac{1}{1 + e^{-\beta\Delta\varepsilon_{AI}}} \right) \frac{R}{N_{NS}} e^{-\beta\Delta\varepsilon_{RA}}}. \label{eq:ch4_eq48} \] So when we define the free energy \(\Delta E\), we would include an extra term of the form \[ \Delta E= \Delta\varepsilon_{RA} - k_BT \left[ \ln \left( \frac{R}{N_{NS}} \right) + \ln \left( \frac{1}{1 + e^{-\beta\Delta\varepsilon_{AI}}} \right) \right]. \label{eq:ch4_eq49} \] If we were only to use Eq. \(\ref{eq:ch4_eq49}\), the statement brought up by the anonymous reviewer would be true since changes in \(\Delta\varepsilon_{AI}\) could be compensated by changes in \(\Delta\varepsilon_{RA}\) or \(N_{NS}\). But as specified in Sec. 4.2, this is not the case for cells with multiple promoters.

The case of multiple promoters can be handled using the Canonical ensemble as is or using the Grand Canonical ensemble as detailed in  [5]. Our point is more clearly seen in the case of the Canonical ensemble. Under this formalism, the fold-change equation is given by  [1] \[ \text{fold-change} = \frac{ \sum_{m=0}^{\min (N,R)} \frac{R!}{(N_{NS})^m (R - m)!} {N \choose m} e^{-\beta m \Delta\varepsilon_{RA}}(N - m) }{ N \sum_{m=0}^{\min (N,R)} \frac{R!}{(N_{NS})^m (R - m)!} {N \choose m} e^{-\beta m \Delta\varepsilon_{RA}} }, \label{eq:ch4_eq50} \] where \(N\) is the number of promoters. Notice that we can group the terms including \(N_{NS}\) and \(\Delta\varepsilon_{RA}\) as \[ \text{fold-change} = \frac{ \sum_{m=0}^{\min (N,R)} \frac{R!}{(R - m)!} {N \choose m} \left( \frac{e^{-\beta \Delta\varepsilon_{RA}}}{N_{NS}}\right)^m (N - m) }{ N \sum_{m=0}^{\min (N,R)} \frac{R!}{(R - m)!} {N \choose m} \left(\frac{e^{-\beta \Delta\varepsilon_{RA}}}{N_{NS}}\right)^m}, \label{eq:ch4_eq51} \] to highlight that it is a combination of these two parameters that matter, rather than their individual values. For the case of the \(\Delta\varepsilon_{AI}\) parameter, this is not the case. Every term containing \(R\) on Eq. \(\ref{eq:ch4_eq51}\) is effectively multiplied by Eq. \(\ref{eq:p_active}\). Since these terms are included inside the factorials, it is not true that a simple compensation by the other parameters allows us to define \(\Delta\varepsilon_{AI}\) to be any value. Therefore as defined in Sec. 4.3, the parameter \(\Delta\varepsilon_{AI}\) can be independently inferred using multiple promoter measurements of fold change.

As a final note, we can also check whether \(N_{NS} = 4.6 \times 10^6\) is at all a reasonable value to use. One potential point of concern is whether the chromosomal DNA is occupied by other transcription factors that may reduce the availability of the DNA for repressor or RNAP to bind. Here we consider data from a recent census of protein abundance across the E. coli genome. In that work, Schmidt et al.  [15] measured the protein copy number across more than half the coding genes (greater than 95% by total protein mass). During exponential growth in M9 minimal media with 0.5 % glucose, they found that about 6 % of the protein mass, or 311,000 monomer copies per cell, are proteins such as transcription factors that will be bound to the DNA (about two-thirds of these are nucleoid-associated proteins such as HNS and HU).

To make a simple estimate of DNA occupancy, let us assume that all transcription factors bind DNA as dimers and occupy a DNA length of 15 bp (this appears to vary from 7 bp to 38 bp in E. coli on RegulonDB  [34]), we find that about 2.3 kbp or about half of the genome will be occupied. In the most extreme case, we could assume that this fraction is inaccessible, which would reduce \(N_NS\) by a factor of about \(2\). Applying this to Eq. \(\ref{eq:ch4_eq46}\), we see that this has a negligible effect on the actual binding energy that we would infer and only corresponds to a change in energy \(\varepsilon_{RA}\) by about 0.7 \(k_B T\).

Measurement of Steady State

All measurements have been performed with cells in an exponential growth phase, where we expect an average expression to be maintained across the cell population. Here we wanted to use one of our strains (O2 \(\Delta\varepsilon_{RA} = -13.9\ k_BT\), \(R=260\)) to show that gene expression is under steady-state for our experimental conditions. As a reminder, we begin by growing an overnight culture in Lysogeny Broth for each of the required strains under our standard protocol. After approximately \(12\) hours, the saturated cultures are diluted 1000-fold into a \(2\,\text{mL}\) 96-deep-well plate. Each well contains \(500\,\mu\text{L}\) of M9 minimal media supplemented with 0.5% w/v glucose and the appropriate IPTG inducer concentration.

Here we follow the protocol as noted above but take measurements in one-hour increments after the 1000-fold dilution. We performe this in triplicate with our O2 \(\Delta\varepsilon_{RA} = -13.9\ k_BT\), \(R=260\) strain (IPTG inducer concentration \(c=50\,\mu\text{M}\)), and also include an autofluorescence strain and O2 \(\Delta lacI\) strain. In Fig. 29(A) we plot the optical density (OD\(_{600nm}\)) as a function of time and see that growth is reasonably consistent between strains and their replicates. The shaded gray bar indicates an OD 0.3, which is the density at which we typically make our measurements. In Fig. 29(B), we show the associated fold-change measurements (using flow cytometry). While it does look like there is a steady increase in fold-change from 0 to 4 hours, it seems to level off past this time point. However, there was also a large degree of variation in our measurements, making it difficult to say that the fold-change is not changing over time.

In Fig. 31, we also plot the raw fluorescence values against the measured OD\(_{600nm}\) values. The variation is rather large, but it does appear that the overall expression is relatively constant across two decades of OD\(_{600nm}\).

Figure 29: Time course measurement of single-cell fluorescence by flow cytometry—data set 1. Flow cytometry measurements were performed at different time points following a 1000-fold dilution of an overnight culture. Cell strains were grown in M9 minimal media supplemented with 0.5% w/v glucose and IPTG c=50\,\mu\text{M}. OD_{600nm} measurements are shown for the three strains. (B) The fold-change is calculated for each measurement shown in Panel (A). Note that each measurement represents a different culture grown in a 96-deep-well plate.
Figure 30: Time course measurement of single-cell fluorescence versus OD_{600nm}—data set 1. Fluorescence measurements used to calculate fold-change from Fig. 29(B) are plotted against their OD_{600nm}. Error bars represent standard deviation from the triplicate culture measurements from growth in a 96-deep-well plate.

In a separate set of replicates, we observed more consistent fold-change measurements over these later time points. Fig. 31(A) shows the average single-cell fluorescence from these measurements. While there does appear to be a downward trend in both the \(R=260\) strain and the \(\Delta\)lacI strain, this is perhaps due to cultures leaving exponential growth (mistakenly, OD\(_{600nm}\) was not measured in this attempt). In contrast to the data in Fig. 30(B), we found that fold-change did not appreciably change across these measurements (Fig. 31(B)). Given the differences across these two sets of experiments, it will be essential to perform more experiments before drawing any definite opinions about the above results.

Figure 31: Time course measurement of single-cell fluorescence by flow cytometry - data set 2. Flow cytometry measurements were performed at different time points following a 1000-fold dilution of an overnight culture. Cell strains were grown in M9 minimal media supplemented with 0.5% w/v glucose and IPTG c=50\,\mu\text{M}. Mean fluorescence values are shown for strain O2 \Delta\varepsilon_{RA} = -13.9~k_BT with R=260, O2 \Delta lacI, and an autofluorescence strain. Data points represent measurements from separate 500\,\mu\text{L} cell cultures. The fold change is calculated for each measurement shown in Panel .

E. coli Primer and Strain List

Here we provide additional details about the strains’ genotypes and the primer sequences used to generate them. E. coli strains were derived from K12 MG1655. For those containing \(R=22\), we used strain HG104, which additionally has the lacYZA operon deleted (positions 360,483 to 365,579) but still contains the native lacI locus. All other strains used strain HG105, where both the lacYZA and lacI operons have both been deleted (positions 360,483 to 366,637).

All 25x+11-yfp expression constructs were integrated at the galK locus (between positions 1,504,078 and 1,505,112) while the 3*1x-lacI constructs were integrated at the ybcN locus (between positions 1,287,628 and 1,288,047). Integration was performed with \(\lambda\) Red recombineering  [35] as described in  [4]. We follow the notation of Lutz and Bujard  [36] for the nomenclature of the different constructs used. Specifically, the first number refers to the antibiotic resistance cassette that is present for selection (2 = kanamycin, 3 = chloramphenicol, and 4 = spectinomycin), and the second number refers to the promoter used to drive the expression of either YFP or LacI (1 = \(P_{LtetO-1}\), and 5 = lacUV5). Note that in 25x+11-yfp, x refers to the LacI operator used, which is centered at +11 (or begins at the transcription start site). For the different LacI constructs, 3*1x-lacI, x refers to the different ribosomal binding site modifications that provide different repressor copy numbers and follows from  [4]. The asterisk refers to the presence of FLP recombinase sites flanking the chloramphenicol resistance gene that can be used to lose this resistance. However, we maintained the resistance gene in our constructs. A summary of the final genotypes of each strain is listed in tbl. 4. In addition, each strain also contained the plasmid pZS4*1-mCherry and provided constitutive expression of the mCherry fluorescent protein. This pZS plasmid is a low copy (SC101 origin of replication) where like with 3*1x-lacI, mCherry is driven by a \(P_{LtetO-1}\) promoter.

Table 4: Each strain contains a unique operator-yfp construct for measurement of fluorescence, and \(R\) refers to the dimer copy number as measured by  [4].
Strain Genotype
O1, \(R = 0\) HG105::galK <>25O1+11-yfp
O1, \(R = 22\) HG104::galK <>25O1+11-yfp
O1, \(R = 60\) HG105::galK <>25O1+11-yfp, ybcN <>3*1RBS1147-lacI
O1, \(R = 124\) HG105::galK <>25O1+11-yfp, ybcN <>3*1RBS1027-lacI
O1, \(R = 260\) HG105::galK <>25O1+11-yfp, ybcN <>3*1RBS446-lacI
O1, \(R = 1220\) HG105::galK <>25O1+11-yfp, ybcN <>3*1RBS1-lacI
O1, \(R = 1740\) HG105::galK <>25O1+11-yfp, ybcN <>3*1-lacI (RBS1L)
O2, \(R = 0\) HG105::galK <>25O2+11-yfp
O2, \(R = 22\) HG104::galK <>25O2+11-yfp
O2, \(R = 60\) HG105::galK <>25O2+11-yfp, ybcN <>3*1RBS1147-lacI
O2, \(R = 124\) HG105::galK <>25O2+11-yfp, ybcN <>3*1RBS1027-lacI
O2, \(R = 260\) HG105::galK <>25O2+11-yfp, ybcN <>3*1RBS446-lacI
O2, \(R = 1220\) HG105::galK <>25O2+11-yfp, ybcN <>3*1RBS1-lacI
O2, \(R = 1740\) HG105::galK <>25O2+11-yfp, ybcN <>3*1-lacI (RBS1L)
O3, \(R = 0\) HG105::galK <>25O3+11-yfp
O3, \(R = 22\) HG104::galK <>25O3+11-yfp
O3, \(R = 60\) HG105::galK <>25O3+11-yfp, ybcN <>3*1RBS1147-lacI
O3, \(R = 124\) HG105::galK <>25O3+11-yfp, ybcN <>3*1RBS1027-lacI
O3, \(R = 260\) HG105::galK <>25O3+11-yfp, ybcN <>3*1RBS446-lacI
O3, \(R = 1220\) HG105::galK <>25O3+11-yfp, ybcN <>3*1RBS1-lacI
O3, \(R = 1740\) HG105::galK <>25O3+11-yfp, ybcN <>3*1-lacI (RBS1L)
Oid, \(R = 0\) HG105::galK <>25Oid+11-yfp
Oid, \(R = 22\) HG104::galK <>25Oid+11-yfp
Oid, \(R = 60\) HG105::galK <>25Oid+11-yfp, ybcN <>3*1RBS1147-lacI
Oid, \(R = 124\) HG105::galK <>25Oid+11-yfp, ybcN <>3*1RBS1027-lacI
Oid, \(R = 260\) HG105::galK <>25Oid+11-yfp, ybcN <>3*1RBS446-lacI
Oid, \(R = 1220\) HG105::galK <>25Oid+11-yfp, ybcN <>3*1RBS1-lacI
Oid, \(R = 1740\) HG105::galK <>25Oid+11-yfp, ybcN <>3*1-lacI (RBS1L)

References

[1]
R. C. Brewster, F. M. Weinert, H. G. Garcia, D. Song, M. Rydenfelt, and R. Phillips, The transcription factor titration effect dictates level of gene expression, Cell 156, 1312 (2014).
[2]
A. K. Gardino, B. F. Volkman, H. S. Cho, S.-Y. Lee, D. E. Wemmer, and D. Kern, The NMR Solution structure of BeF3-activated Spo0F reveals the conformational switch in a phosphorelay system, Journal of Molecular Biology 331, 245 (2003).
[3]
S. Boulton and G. Melacini, Advances in NMR methods to map allosteric sites: From models to translation, Chemical Reviews 116, 6267 (2016).
[4]
H. G. Garcia and R. Phillips, Quantitative dissection of the simple repression input-output function, Proceedings of the National Academy of Sciences 108, 12173 (2011).
[5]
F. M. Weinert, R. C. Brewster, M. Rydenfelt, R. Phillips, and W. K. Kegel, Scaling of gene expression with transcription-factor fugacity, Physical Review Letters 113, 1 (2014).
[6]
S. Oehler, M. Amouyal, P. Kolkhof, B. von Wilcken-Bergmann, and B. Müller-Hill, Quality and position of the three lac operators of E. coli define efficiency of repression, The EMBO Journal 13, 3348 (1994).
[7]
Y. Setty, A. E. Mayo, M. G. Surette, and U. Alon, Detailed map of a cis-regulatory input function, Proceedings of the National Academy of Sciences 100, 7702 (2003).
[8]
S. Oehler, S. Alberti, and B. Müller-Hill, Induction of the lac promoter in the absence of DNA loops and the stoichiometry of induction, Nucleic Acids Research 34, 606 (2006).
[9]
R. Daber, K. Sharp, and M. Lewis, One is not enough, Journal of Molecular Biology 392, 1133 (2009).
[10]
R. Daber, M. A. Sochor, and M. Lewis, Thermodynamic analysis of mutant Lac repressors, Journal of Molecular Biology 409, 76 (2011).
[11]
J. M. G. Vilar and L. Saiz, Reliable prediction of complex phenotypes from a modular design in free energy space: An extensive exploration of the lac operon, ACS Synthetic Biology 2, 576 (2013).
[12]
D. L. Shis, F. Hussain, S. Meinhardt, L. Swint-Kruse, and M. R. Bennett, Modular, multi-input transcriptional logic Gating with orthogonal LacI/GalR family chimeras, ACS Synthetic Biology 3, 645 (2014).
[13]
M. A. Sochor, In vitro transcription accurately predicts lac repressor phenotype in vivo in Escherichia coli, PeerJ 2, e498 (2014).
[14]
M. Rydenfelt, H. G. Garcia, R. S. Cox, and R. Phillips, The influence of promoter architectures and regulatory motifs on gene expression in Escherichia coli, PLoS ONE 9, 1 (2014).
[15]
A. Schmidt, K. Kochanowski, S. Vedelaar, E. Ahrné, B. Volkmer, L. Callipo, K. Knoops, M. Bauer, R. Aebersold, and M. Heinemann, The quantitative and condition-dependent Escherichia coli proteome, Nature Biotechnology 34, 104 (2015).
[16]
Chroma2ptTechnology2ptCorporation, Chroma spectra viewer, (2016).
[17]
H. T. Maecker, A. Rinfret, P. D’Souza, J. Darden, E. Roig, C. Landry, P. Hayes, J. Birungi, O. Anzala, M. Garcia, A. Harari, I. Frank, R. Baydo, M. Baker, J. Holbrook, J. Ottinger, L. Lamoreaux, C. L. Epling, E. Sinclair, M. A. Suni, K. Punt, S. Calarota, S. El-Bahi, G. Alter, H. Maila, E. Kuta, J. Cox, C. Gray, M. Altfeld, N. Nougarede, J. Boyer, L. Tussey, T. Tobery, B. Bredt, M. Roederer, R. Koup, V. C. Maino, K. Weinhold, G. Pantaleo, J. Gilmour, H. Horton, and R. P. Sekaly, Standardization of cytokine flow cytometry assays, BMC Immunology 6, 13 (2005).
[18]
K. Lo, R. R. Brinkman, and R. Gottardo, Automated gating of flow cytometry data via robust model-based clustering, Cytometry Part A 73A, 321 (2008).
[19]
N. Aghaeepour, G. Finak, The2ptFlowCAP2ptConsortium, The2ptDREAM2ptConsortium, H. Hoos, T. R. Mosmann, R. Brinkman, R. Gottardo, and R. H. Scheuermann, Critical assessment of automated flow cytometry data analysis techniques, Nature Methods 10, 228 (2013).
[20]
H. G. Garcia, H. J. Lee, J. Q. Boedicker, and R. Phillips, Comparison and calibration of different reporters for quantitative analysis of gene expression, Biophysical Journal 101, 535 (2011).
[21]
A. D. Edelstein, M. A. Tsuchida, N. Amodaj, H. Pinkard, R. D. Vale, and N. Stuurman, Advanced methods of microscope control using \(\mu\)Manager software, Journal of Biological Methods 1, 10 (2014).
[22]
D. Marr and E. Hildreth, Theory of edge detection, Proceedings of the Royal Society B: Biological Sciences 207, 187 (1980).
[23]
K. F. Murphy, G. Balázsi, and J. J. Collins, Combinatorial promoter design for engineering noisy gene expression, Proceedings of the National Academy of Sciences 104, 12726 (2007).
[24]
K. F. Murphy, R. M. Adams, X. Wang, G. Balázsi, and J. J. Collins, Tuning and controlling gene expression noise in synthetic gene networks, Nucleic Acids Research 38, 2712 (2010).
[25]
J. K. Rogers, C. D. Guzman, N. D. Taylor, S. Raman, K. Anderson, and G. M. Church, Synthetic biosensors for precise gene control and real-time monitoring of metabolites, Nucleic Acids Research 43, 7648 (2015).
[26]
S. Frank, Input-output relations in biological systems: measurement, information and the Hill equation, Biology Direct 8, 31 (2013).
[27]
J. N. Weiss, The Hill equation revisited: uses and misuses, The FASEB Journal 11, 835 (1997).
[28]
J. E. Keymer, R. G. Endres, M. Skoge, Y. Meir, and N. S. Wingreen, Chemosensing in Escherichia coli: Two regimes of two-state receptors, Proceedings of the National Academy of Sciences 103, 1786 (2006).
[29]
L. R. Swem, D. L. Swem, N. S. Wingreen, and B. L. Bassler, Deducing receptor signaling parameters from in vivo analysis: LuxN/AI-1 quorum sensing in Vibrio harveyi, Cell 134, 461 (2008).
[30]
T. Einav, L. Mazutis, and R. Phillips, Statistical mechanics of allosteric enzymes, The Journal of Physical Chemistry B 121, (2016).
[31]
D. Sivia and J. Skilling, Data analysis: a Bayesian tutorial (OUP Oxford, 2006).
[32]
L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, T. Kuhlman, and R. Phillips, Transcriptional regulation by the numbers: Applications, Current Opinion in Genetics & Development 15, 125 (2005).
[33]
S. Marzen, H. G. Garcia, and R. Phillips, Statistical mechanics of Monod-Wyman-Changeux (MWC) models, Journal of Molecular Biology 425, 1433 (2013).
[34]
S. Gama-Castro, H. Salgado, A. Santos-Zavaleta, D. Ledezma-Tejeida, L. Muñiz-Rascado, J. S. García-Sotelo, K. Alquicira-Hernández, I. Martínez-Flores, L. Pannier, J. A. Castro-Mondragón, A. Medina-Rivera, H. Solano-Lira, C. Bonavides-Martínez, E. Pérez-Rueda, S. Alquicira-Hernández, L. Porrón-Sotelo, A. López-Fuentes, A. Hernández-Koutoucheva, V. D. Moral-Chávez, F. Rinaldi, and J. Collado-Vides, RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond, Nucleic Acids Research 44, D133 (2016).
[35]
S. K. Sharan, L. C. Thomason, S. G. Kuznetsov, and D. L. Court, Recombineering: a homologous recombination-based method of genetic engineering, Nature Protocols 4, 206 (2009).
[36]
R. Lutz and H. Bujard, Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements, Nucleic Acids Research 25, 1203 (1997).