Chapter II

A Predictive Theory of Allosteric Induction

Published as …

A version of this chapter originally appeared as Razo-Mejia, M.†, Barnes, S.L.†, Belliveau, N.M.†, Chure, G.†, Einav, T.†, Lewis, M., and Phillips, R. (2018). Tuning transcriptional regulation through signaling: A predictive theory of allosteric induction. Cell Systems 6, 456-469.e10. DOI:https://doi.org/10.1016/j.cels.2018.02.004. † M.R.M, S.L.B, N.M.B, G.C., and T.E. contributed equally to this work from the theoretical underpinnings to the experimental design and execution. M.R.M, S.L.B, N.M.B, G.C, T.E., and R.P. wrote the paper. M.L. provided guidance and advice.

Abstract

Allosteric regulation is found across all domains of life. Yet, we still lack simple, predictive theories that directly link the experimentally tunable parameters of a system to its input-output response. To that end, we present a general theory of allosteric transcriptional regulation using the Monod-Wyman-Changeux model. We rigorously test this model using the ubiquitous simple repression motif in bacteria by first predicting the behavior of strains that span a large range of repressor copy numbers and DNA binding strengths and then constructing and measuring their response. Our model not only accurately captures the induction profiles of these strains, but also enables us to derive analytic expressions for key properties such as the dynamic range and \([EC_{50}]\). Finally, we derive an expression for the free energy of allosteric repressors, which enables us to collapse our experimental data onto a single master curve that captures the diverse phenomenology of the induction profiles.

Introduction

Understanding how organisms sense and respond to changes in their environment has long been a central theme of biological inquiry. At the cellular level, this interaction is mediated by a diverse collection of molecular signaling pathways. A pervasive mechanism of signaling in these pathways is allosteric regulation, in which the binding of a ligand induces a conformational change in some target molecule, triggering a signaling cascade  [1]. One of the most important examples of such signaling is offered by transcriptional regulation, where a transcription factor’s propensity to bind to DNA will be altered upon binding to an allosteric effector.

Despite allostery’s ubiquity, we lack a formal, rigorous, and generalizable framework for studying its effects across the broad variety of contexts in which it appears. A key example of this is transcriptional regulation, in which allosteric transcription factors can be induced or corepressed by binding to a ligand. An allosteric transcription factor can adopt multiple conformational states, each of which has its own affinity for the ligand and its DNA target site. In vitro studies have rigorously quantified the equilibria of different conformational states for allosteric transcription factors and measured the affinities of these states to the ligand  [2,3]. In spite of these experimental observations, the lack of a coherent quantitative model for allosteric transcriptional regulation has made it impossible to predict the behavior of even a simple genetic circuit across a range of regulatory parameters.

The ability to predict circuit behavior robustly—across both broad ranges of parameters and regulatory architectures—is important for multiple reasons. First, in the context of a specific gene, accurate prediction demonstrates that all components relevant to the gene’s behavior have been identified and characterized to sufficient quantitative precision. Second, in the context of genetic circuits in general, robust prediction validates the model that generated the prediction. Possessing a validated model also has implications for future work. For example, when we have sufficient confidence in the model, a single data set can be used to extrapolate a system’s behavior in other conditions accurately. Moreover, there is an essential distinction between a predictive model, which is used to predict a system’s behavior given a set of input variables, and a retroactive model, which describes the behavior of data that has already been obtained. We note that even some of the most careful and rigorous analysis of transcriptional regulation often entails only a retroactive reflection on a single experiment. This raises the fear that each regulatory architecture may require a unique analysis that cannot carry over to other systems, a worry that is exacerbated by the prevalent use of phenomenological functions (e.g., Hill functions or ratios of polynomials) that can analyze a single data set, but cannot be used to extrapolate a system’s behavior in other conditions  [48].

This work explores what happens when theory takes center stage; namely, we first write down the equations governing a system and describe its expected behavior across a wide array of experimental conditions, and only then do we set out to experimentally confirm these results. Building upon previous work  [911] and the work of Monod, Wyman, and Changeux  [12], we present a statistical mechanical rendering of allostery in the context of induction and corepression (shown schematically in Fig. 1 and henceforth referred to as the MWC model) and use it as the basis of parameter-free predictions, which we then test experimentally. More specifically, we study the simple repression motif —a widespread bacterial genetic regulatory architecture in which binding of a transcription factor occludes binding of an RNA polymerase, thereby inhibiting transcription initiation. The MWC model stipulates that an allosteric protein fluctuates between two distinct conformations —an active and inactive state— in thermodynamic equilibrium  [12]. During induction, for example, effector binding increases the probability that a repressor will be in the inactive state, weakening its ability to bind to the promoter and resulting in increased expression. To test the predictions of our model across a wide range of operator binding strengths and repressor copy numbers, we design an E. coli genetic construct in which the binding probability of a repressor regulates gene expression of a fluorescent reporter.

In total, the work presented here demonstrates that one extremely compact set of parameters can be applied self-consistently and predictively to different regulatory situations including simple repression on the chromosome, cases in which decoy binding sites for repressor are put on plasmids, cases in which multiple genes compete for the same regulatory machinery, cases involving multiple binding sites for repressor leading to DNA looping, and induction by signaling  [9,10,1316]. Thus, rather than viewing the behavior of each circuit as giving rise to its unique input-output response, the MWC model provides a means to characterize these seemingly diverse behaviors using a single unified framework governed by a small set of parameters.

Figure 1: Transcription regulation architectures involving an allosteric repressor. We consider a promoter regulated solely by an allosteric repressor. When bound, the repressor prevents RNAP from binding and initiating transcription. Induction is characterized by the addition of an effector which binds to the repressor and stabilizes the inactive state (defined as the state which has a low affinity for DNA), thereby increasing gene expression. In corepression, the effector stabilizes the repressor’s active state and thus further reduces gene expression. We list several characterized examples of induction and corepression that support different physiological roles in E. coli  [17,18]. A schematic regulatory response of the two architectures shown in Panel plotting the fold-change in gene expression as a function of effector concentration, where fold-change is defined as the ratio of gene expression in the presence versus the absence of repressor. We consider the following key phenotypic properties that describe each response curve: the minimum response (leakiness), the maximum response (saturation), the difference between the maximum and minimum response (dynamic range), the concentration of ligand which generates a fold-change halfway between the minimal and maximal response ([EC_{50}]), and the log-log slope at the midpoint of the response (effective Hill coefficient). (C) Over time, we have refined our understanding of simple repression architectures. A first round of experiments used colorimetric assays and quantitative Western blots to investigate how single-site repression is modified by the repressor copy number and repressor-DNA binding energy  [9]. A second round of experiments used video microscopy to probe how the copy number of the promoter and presence of competing repressor binding sites affect gene expression and we use this data set to determine the free energy difference between the repressor’s inactive and active conformations  [11]. Here we used flow cytometry to determine the inducer-repressor dissociation constants and demonstrate that with these parameters, we can predict a priori the behavior of the system for any repressor copy number, DNA binding energy, gene copy number, and inducer concentration.

Results

Characterizing Transcription Factor Induction using the Monod-Wyman-Changeux (MWC) Model

We begin by considering a simple repression genetic architecture in which the binding of an allosteric repressor occludes the binding of RNA polymerase (RNAP) to the DNA  [19,20]. When an effector (hereafter referred to as an "inducer" for the case of induction) binds to the repressor, it shifts the repressor’s allosteric equilibrium towards the inactive state as specified by the MWC model  [12]. This causes the repressor to bind more weakly to the operator, which increases gene expression. Simple repression motifs in the absence of inducer have been previously characterized by an equilibrium model where the probability of each state of repressor and RNAP promoter occupancy is dictated by the Boltzmann distribution  [9,10,1922] (we note that non-equilibrium models of simple repression have been shown to have the same functional form that we derive below  [23]). We extend these models to consider allostery by accounting for the equilibrium state of the repressor through the MWC model.

Thermodynamic models of gene expression begin by enumerating all possible states of the promoter and their corresponding statistical weights. As shown in Fig. 2(A), the promoter can either be empty, occupied by RNAP, or occupied by either an active or inactive repressor. The probability of binding to the promoter will be affected by the protein copy number, which we denote as \(P\) for RNAP, \(R_{A}\) for active repressor, and \(R_{I}\) for inactive repressor. We note that repressors fluctuate between the active and inactive conformation in thermodynamic equilibrium, such that \(R_{A}\) and \(R_{I}\) will remain constant for a given inducer concentration  [12]. We assign the repressor a different DNA binding affinity in the active and inactive state. In addition to the specific binding sites at the promoter, we assume that there are \(N_{NS}\) non-specific binding sites elsewhere (i.e., on parts of the genome outside the simple repression architecture) where the RNAP or the repressor can bind. All specific binding energies are measured relative to the average non-specific binding energy. Thus, \(\Delta\varepsilon_{P}\) represents the energy difference between the specific and non-specific binding for RNAP to the DNA. Likewise, \(\Delta\varepsilon_{RA}\) and \(\Delta\varepsilon_{RI}\) represent the difference in specific and non-specific binding energies for repressor in the active or inactive state, respectively.

Figure 2: States and weights for the simple repression motif. RNAP (light blue) and a repressor compete for binding to a promoter of interest. There are R_A repressors in the active state (red) and R_I repressors in the inactive state (purple). The difference in energy between a repressor bound to the promoter of interest versus another non-specific site elsewhere on the DNA equals \Delta\varepsilon_{RA} in the active state and \Delta\varepsilon_{RI} in the inactive state; the P RNAP have a corresponding energy difference \Delta\varepsilon_{P} relative to non-specific binding on the DNA. N_{NS} represents the number of non-specific binding sites for both RNAP and repressor. A repressor has an active conformation (red, left column) and an inactive conformation (purple, right column), with the energy difference between these two states given by \Delta \varepsilon_{AI}. The inducer (blue circle) at concentration c can bind to the repressor with dissociation constants K_A in the active state and K_I in the inactive state. The eight states for a dimer with n=2 inducer binding sites are shown along with the sums of the active and inactive states.

Thermodynamic models of transcription  [911,1922,2426] posit that gene expression is proportional to the probability that the RNAP is bound to the promoter \(p_{\text{bound}}\), which is given by \[ p_\text{bound} = \frac{\frac{P}{N_{NS}}e^{-\beta \Delta\varepsilon_{P}}}{1+\frac{R_A}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} + \frac{R_I}{N_{NS}}e^{-\beta \Delta\varepsilon_{RI}} + \frac{P}{N_{NS}}e^{-\beta\Delta\varepsilon_{P}}}, \label{eq:p_bound_definition} \] with \(\beta = \frac{1}{k_BT}\) where \(k_B\) is the Boltzmann constant and \(T\) is the temperature of the system. As \(k_BT\) is the natural unit of energy at the molecular length scale, we treat the products \(\beta \Delta\varepsilon_{j}\) as single parameters within our model. Measuring \(p_{\text{bound}}\) directly is fraught with experimental difficulties, as determining the exact proportionality between expression and \(p_{\text{bound}}\) is not straightforward. Instead, we measure the fold-change in gene expression due to the presence of the repressor. We define fold-change as the ratio of gene expression in the presence of repressor relative to expression in the absence of repressor (i.e., constitutive expression), namely, \[ \text{fold-change} \equiv \frac{p_\text{bound}(R > 0)}{p_\text{bound}(R = 0)}. \label{eq:fold_change_definition} \] We can simplify this expression using two well-justified approximations: (1) the RNAP binds weakly to the promoter, implying that \(\frac{P}{N_{NS}}e^{-\beta\Delta\varepsilon_{P}}\ll 1\) (\(N_{NS} = 4.6 \times 10^6\), \(P \approx 10^3\)  [27], \(\Delta\varepsilon_{P} \approx -2 \,\, \text{to} \, -5~k_B T\)  [14], so that \(\frac{P}{N_{NS}}e^{-\beta\Delta\varepsilon_{P}} \approx 0.01\)) and (2) \(\frac{R_I}{N_{NS}}e^{-\beta \Delta\varepsilon_{RI}} \ll 1 + \frac{R_A}{N_{NS}} e^{-\beta\Delta\varepsilon_{RA}}\) which reflects our assumption that the inactive repressor binds weakly to the promoter of interest. Using these approximations, the fold-change reduces to the form \[ \text{fold-change} \approx \left(1+\frac{R_A}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}}\right)^{-1} \equiv \left( 1+p_A(c) \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}, \label{eq:fold_change_approx} \] where in the last step, we have introduced the fraction \(p_A(c)\) of repressors in the active state given a concentration \(c\) of inducer, such that \(R_A(c)=p_A(c) R\). Since inducer binding shifts the repressors from the active to the inactive state, \(p_A(c)\) grows smaller as \(c\) increases  [28].

We use the MWC model to compute the probability \(p_A(c)\) that a repressor with \(n\) inducer binding sites will be active. The value of \(p_A(c)\) is given by the sum of the weights of the active repressor states divided by the sum of the weights of all possible repressor states (see Fig. 2(B)), namely, \[ p_A(c) = \frac{\left(1+\frac{c}{K_A}\right)^n}{\left(1+\frac{c}{K_A}\right)^n + e^{-\beta \Delta \varepsilon_{AI} }\left(1+\frac{c}{K_I}\right)^n}, \label{eq:p_active} \] where \(K_A\) and \(K_I\) represent the dissociation constant between the inducer and repressor in the active and inactive states, respectively, and \(\Delta \varepsilon_{AI} = \varepsilon_{I} - \varepsilon_{A}\) is the free energy difference between a repressor in the inactive and active state (the quantity \(e^{-\Delta \beta \varepsilon_{AI}}\) is sometimes denoted by \(L\)  [12,28] or \(K_{\text{RR}*}\)  [26]). In this equation, \(\frac{c}{K_A}\) and \(\frac{c}{K_I}\) represent the change in free energy when an inducer binds to a repressor in the active or inactive state, respectively, while \(e^{-\beta \Delta \varepsilon_{AI} }\) represents the change in free energy when the repressor changes from the active to the inactive state in the absence of inducer. Thus, a repressor that favors the active state in the absence of inducer (\(\Delta \varepsilon_{AI} > 0\)) will be driven towards the inactive state upon inducer binding when \(K_I < K_A\). The specific case of a repressor dimer with \(n=2\) inducer binding sites is shown in Fig. 2(B).

Substituting \(p_A(c)\) from Eq. \(\ref{eq:p_active}\) into Eq. \(\ref{eq:fold_change_approx}\) yields the general formula for induction of a simple repression regulatory architecture  [23], namely, \[ \text{fold-change} = \left( 1+\frac{\left(1+\frac{c}{K_A}\right)^n}{\left(1+\frac{c}{K_A}\right)^n + e^{-\beta \Delta \varepsilon_{AI} }\left(1+\frac{c}{K_I}\right)^n} \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}. \label{eq:fold_change_full} \] While we have used the specific case of simple repression with induction to craft this model, the same mathematics describe the case of corepression in which binding of an allosteric effector stabilizes the active state of the repressor and decreases gene expression (see Fig. 1(B)). Interestingly, we shift from induction (governed by \(K_I < K_A\)) to corepression (\(K_I > K_A\)) as the ligand transitions from preferentially binding to the inactive repressor state to stabilizing the active state. Furthermore, this general approach can be used to describe a variety of other motifs such as activation, multiple repressor binding sites, and combinations of activator and repressor binding sites  [10,11,24].

The formula presented in Eq. \(\ref{eq:fold_change_full}\) enables us to make precise quantitative statements about induction profiles. Motivated by the broad range of predictions implied by Eq. \(\ref{eq:fold_change_full}\), we designed a series of experiments using the lac system in E. coli to tune the control parameters for a simple repression genetic circuit. As discussed in Fig. 1(C), previous studies from our lab have provided well-characterized values for many of the parameters in our experimental system, leaving only the values of the MWC parameters (\(K_A\), \(K_I\), and \(\Delta \varepsilon_{AI}\)) to be determined. We note that while previous studies have obtained values for \(K_A\), \(K_I\), and \(L=e^{-\beta \Delta \varepsilon_{AI}}\)  [26,29], they were either based upon biochemical experiments or in vivo conditions involving poorly characterized transcription factor copy numbers and gene copy numbers. These differences relative to our experimental conditions and fitting techniques led us to believe that it was important to perform our own analysis of these parameters. After inferring these three MWC parameters (see Sec. 4.2 for details regarding the inference of \(\Delta \varepsilon_{AI}\), which was fitted separately from \(K_A\) and \(K_I\)), we were able to predict the input/output response of the system under a broad range of experimental conditions. For example, this framework can predict the response of the system at different repressor copy numbers \(R\), repressor-operator affinities \(\Delta\varepsilon_{RA}\), inducer concentrations \(c\), and gene copy numbers.

Experimental Design

We test our model by predicting the induction profiles for an array of strains that could be made using previously characterized repressor copy numbers and DNA binding energies. Our approach contrasts with previous studies that have parameterized induction curves of simple repression motifs, as these have relied on expression systems where proteins are expressed from plasmids, resulting in highly variable and unconstrained copy numbers  [26,3033]. Instead, our approach relies on a foundation of previous work as depicted in Fig. 1(C). This includes work from our laboratory that used E. coli constructs based on components of the lac system to demonstrate how the Lac repressor (LacI) copy number \(R\) and operator binding energy \(\Delta\varepsilon_{RA}\) affect gene expression in the absence of inducer  [9].  [34] extended the theory used in that work to the case of multiple promoters competing for a given transcription factor, which was validated experimentally by  [10], who modified this system to consider expression from multiple-copy plasmids as well as the presence of competing repressor binding sites.

The present study extends this body of work by introducing three additional biophysical parameters—\(\Delta\varepsilon_{AI}\), \(K_A\), and \(K_I\)—which capture the allosteric nature of the transcription factor and complement the results shown by  [9] and  [10]. Although the current work focuses on systems with a single site of repression, in Sec. 2.5, we utilize data from  [10], in which multiple sites of repression are explored to characterize the allosteric free energy difference \(\Delta\varepsilon_{AI}\) between the repressor’s active and inactive states. As explained in that section, this additional data set is critical because multiple degenerate sets of parameters can characterize an induction curve equally well, with the \(\Delta\varepsilon_{AI}\) parameter compensated by the inducer dissociation constants \(K_A\) and \(K_I\) (see Sec. 4.2). After fixing \(\Delta\varepsilon_{AI}\) as described in the Sec. 2.5, we can use data from single-site simple repression systems to determine the values of \(K_A\) and \(K_I\).

We determine the values of \(K_A\) and \(K_I\) by fitting to a single induction profile using Bayesian inferential methods  [35]. We then use Eq. \(\ref{eq:fold_change_full}\) to predict gene expression for any concentration of inducer, repressor copy number, and DNA binding energy and compare these predictions against experimental measurements. To obtain induction profiles for a set of strains with varying repressor copy numbers, we used modified lacI ribosomal binding sites from  [9] to generate strains with mean repressor copy number per cell of \(R = 22 \pm 4\), \(60 \pm 20\), \(124 \pm 30\), \(260 \pm 40\), \(1220 \pm 160\), and \(1740 \pm 340\), where the error denotes the standard deviation of at least three replicates as measured by  [9]. We note that \(R\) refers to the number of repressor dimers in the cell, which is twice the number of repressor tetramers reported by  [9]; since both heads of the repressor are always assumed to be either specifically or non-specifically bound to the genome, the two repressor dimers in each LacI tetramer can be considered independently. Gene expression was measured using a Yellow Fluorescent Protein (YFP) gene, driven by a lacUV5 promoter. Each of the six repressor copy number variants were paired with the native O1, O2, or O3 lac operator  [36] placed at the YFP transcription start site, thereby generating eighteen unique strains. The repressor-operator binding energies (O1 \(\Delta\varepsilon_{RA} = -15.3 \pm 0.2~k_BT\), O2 \(\Delta\varepsilon_{RA} = -13.9~k_BT \pm 0.2\), and O3 \(\Delta\varepsilon_{RA} = -9.7 \pm 0.1~k_BT\)) were previously inferred by measuring the fold-change of the lac system at different repressor copy numbers, where the error arises from model fitting  [9]. Additionally, we were able to obtain the value \(\Delta \varepsilon_{AI} = 4.5\; k_BT\) by fitting to previous data as discussed in Sec. 2.5. We measure fold-change over a range of known IPTG concentrations \(c\), using \(n=2\) inducer binding sites per LacI dimer and approximating the number of non-specific binding sites as the length in base-pairs of the E. coli genome, \(N_{NS} = 4.6 \times 10^6\).

Our experimental pipeline for determining fold-change using flow cytometry is shown in Fig. 3. Briefly, cells were grown to exponential phase, in which gene expression reaches steady-state  [37], under concentrations of the inducer IPTG ranging between 0 and \(5\,\text{mM}\). We measure YFP fluorescence using flow cytometry and automatically gate the data to include only single-cell measurements (see Sec. 2.5). To validate the use of flow cytometry, we also measured the fold-change of a subset of strains using the established method of single-cell microscopy (see Sec. 4.5). We found that the fold-change measurements obtained from microscopy were indistinguishable from that of flow-cytometry and yielded values for the inducer binding constants \(K_A\) and \(K_I\) that were within error.

Figure 3: An experimental pipeline for high-throughput fold-change measurements. Cells are grown to an exponential steady-state, and their fluorescence is measured using flow cytometry. Automatic gating methods using forward- and side-scattering are used to ensure that all measurements come from single cells (see Sec. 2.5). Mean expression is then quantified at different IPTG concentrations (top, blue histograms) and for a strain without repressor (bottom, green histograms), which shows no response to IPTG as expected. Fold-change is computed by dividing the mean fluorescence in the presence of repressor by the mean fluorescence in the absence of repressor.

Determination of the in vivo MWC Parameters

The three parameters that we tune experimentally are shown in Fig. 4(A), leaving the three allosteric parameters (\(\Delta \varepsilon_{AI}\), \(K_A\), and \(K_I\)) to be determined by fitting. We used previous LacI fold-change data  [10] to infer that \(\Delta\varepsilon_{AI} = 4.5~k_BT\) (see Sec. 4.2). Rather than fitting \(K_A\) and \(K_I\) to our entire data set of eighteen unique constructs, we performed Bayesian parameter estimation on data from a single strain with \(R=260\) and an O2 operator (\(\Delta\varepsilon_{RA}=-13.9~k_BT\)  [9]) shown in Fig. 4(D) (white circles). Using Markov Chain Monte Carlo, we determine the most likely parameter values to be \(K_A=139^{+29}_{-22} \times 10^{-6} \, \text{M}\) and \(K_I=0.53^{+0.04}_{-0.04} \times 10^{-6}\, \text{M}\), which are the modes of their respective distributions, where the superscripts and subscripts represent the upper and lower bounds of the \(95^\text{th}\) percentile of the parameter value distributions (see Fig. 4(B)). Unfortunately, we cannot make a meaningful value-for-value comparison of our parameters to those of earlier studies  [26,31] because of uncertainties in gene copy number and transcription factor copy numbers in these studies. We then predicted the fold-change for the remaining seventeen strains with no further fitting (see Fig. 4(C)-(E)) together with the specific phenotypic properties described in and discussed in detail below (see Fig. 4(F)-(J)). The shaded regions denote the 95% credible regions. Factors determining the width of the credible regions are explored in Sec. 4.6.

We stress that the entire suite of predictions is based upon a single strain’s induction profile. Our ability to make such a broad range of predictions stems from the fact that our parameters of interest—such as the repressor copy number and DNA binding energy—appear as distinct physical parameters within our model. While the single data set in Fig. 4(D) could also be fit using a Hill function, such an analysis would be unable to predict any of the other curves in the figure (see Sec. 4.7). Phenomenological expressions such as the Hill function can describe data but lack predictive power and are thus unable to build our intuition, help us design de novo input-output functions, or guide future experiments  [25,30].

Figure 4: Predicting induction profiles for different biological control parameters. (A) Schematic representation of experimentally accessible variables. Repressor copy number R is tuned by changing the sequence of the ribosomal binding site (RBS), DNA binding energy \Delta\varepsilon_{RA} is controlled via the sequence of the operator, and the inducer concentration c is controlled via a dilution series. (B) Markov Chain Monte Carlo (MCMC) sampling of the posterior distribution of K_A and K_I. Each point corresponds to a single MCMC sample. Distribution on top and right represent the marginal posterior probability distribution over K_A and K_I, respectively. (C)-(E) Predicted induction profiles for strains with various repressor copy numbers and DNA binding energies. White-faced points represent those to which the inducer binding constants K_A and K_I were determined. (F)-(J) Predicted properties of the induction profiles in (C) using parameter values known a priori. The shaded regions denote the 95% credible region. Region between 0 and 10^{-2}\, \muM is scaled linearly with log scaling elsewhere. The Python code (ch2_fig04.py) used to generate this figure can be found on the original paper’s GitHub repository.

Comparison of Experimental Measurements with Theoretical Predictions

We tested the predictions shown in Fig. 4 by measuring fold-change induction profiles in strains with a broad range of repressor copy numbers and repressor binding energies as characterized in  [9]. With a few notable exceptions, the results shown in Fig. 5 demonstrate agreement between theory and experiment. We note that there was an apparently systematic shift in the O3 \(\Delta\varepsilon_{RA} = -9.7\ k_BT\) strains (Fig. 5(C)) and all of the \(R=1220\) and \(R =1740\) strains. This may be partially due to imprecise previous determinations of their \(\Delta\varepsilon_{RA}\) and \(R\) values. By performing a global fit where we infer all parameters, including the repressor copy number \(R\) and the binding energy \(\Delta\varepsilon_{RA}\), we found a better agreement for these strains. However, a discrepancy in the steepness of the response for all O3 strains remains (see Sec. 4.8). We considered a number of hypotheses to explain these discrepancies, such as including other states (e.g. non-negligible binding of the inactive repressor), relaxing the weak promoter approximation, and accounting for variations in gene and repressor copy number throughout the cell cycle, but none explained the observed discrepancies. As an additional test of our model, we considered strains using the synthetic Oid operator, which exhibits an especially strong binding energy of \(\Delta\varepsilon_{RA}=-17~k_B T\)  [9]. The global fit agrees well with the Oid microscopy data, though it asserts a stronger Oid binding energy of \(\Delta\varepsilon_{RA}=-17.7~k_B T\) (see Sec. 4.8).

To ensure that the agreement between our predictions and data is not an accident of the strain we used to perform our fitting, we also inferred \(K_A\) and \(K_I\) from each of the other strains. As shown in Sec. 4.10 and Fig. 5(D), the inferred values of \(K_A\) and \(K_I\) depend minimally upon which strain is chosen, indicating these parameter values are highly robust. We also performed a global fit using the data from all eighteen strains in which we fitted for the inducer dissociation constants \(K_A\) and \(K_I\), the repressor copy number \(R\), and the repressor DNA binding energy \(\Delta\varepsilon_{RA}\) (see Sec. 4.8). The resulting parameter values were nearly identical to those fitted from any single strain. We continue using parameters fitted from the strain with \(R=260\) repressors and an O2 operator for the remainder of the text.

Figure 5: Comparison of predictions against measured and inferred data. (A-C) Flow cytometry measurements of fold-change over a range of IPTG concentrations for O1, O2, and O3 strains at varying repressor copy numbers overlaid on the predicted responses. Error bars for the experimental data show the standard error of the mean (eight or more replicates). As discussed in Fig. 4, all predicted induction curves were generated prior to measurement by inferring the MWC parameters using a single data set (O2 R=260, shown by white circles in Panel (B)). The predictions may therefore depend upon which strain is used to infer the parameters. (D) The inferred parameter values of the dissociation constants K_A and K_I using any of the eighteen strains instead of the O2 R=260 strain. Nearly identical parameter values are inferred from each strain, demonstrating that the same set of induction profiles would have been predicted regardless of which strain was chosen. The points show the mode, and the error bars denote the 95\% credible region of the parameter value distribution. Error bars not visible are smaller than the size of the marker. The Python code (ch2_fig05.py) used to generate this figure can be found on the original paper’s GitHub repository.

Predicting the Phenotypic Traits of the Induction Response

A subset of the properties shown in Fig. 1 (i.e., the leakiness, saturation, dynamic range, \([EC_{50}]\), and effective Hill coefficient) are of significant interest to synthetic biology. For example, synthetic biology is often focused on generating large responses (i.e., a large dynamic range) or finding a strong binding partner (i.e., a small \([EC_{50}]\))  [38,39]. While these properties are all individually informative, they capture the essential features of the induction response when taken together. We reiterate that a Hill function approach cannot predict these features a priori and requires fitting each curve individually. The MWC model, on the other hand, enables us to quantify how each trait depends upon a single set of physical parameters as shown by Fig. 4(F-J).

We define these five phenotypic traits using expressions derived from the model, Eq. \(\ref{eq:fold_change_full}\). These results build upon extensive work in  [40], where many such properties were computed for ligand-receptor binding within the MWC model. We begin by analyzing the leakiness, which is the minimum fold-change observed in the absence of ligand, given by \[ \begin{split} \text{leakiness} &= \text{fold-change}(c=0) \\ &= \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} }} \frac{R}{N_{NS}}e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}, \end{split} \label{eq:leakiness} \] and the saturation, which is the maximum fold change observed in the presence of saturating ligand, \[ \begin{split} \text{saturation} &= \text{fold-change}(c \to \infty) \\ &= \left( 1+\frac{1}{1+e^{-\beta \Delta \varepsilon_{AI} } \left(\frac{K_A}{K_I}\right)^n }\frac{R}{N_{NS}} e^{-\beta \Delta\varepsilon_{RA}} \right)^{-1}. \end{split} \label{eq:saturation} \]

Systems that minimize leakiness repress strongly in the absence of the effector, while systems that maximize saturation have high expression in the presence of the effector. Together, these two properties determine the dynamic range of a system’s response, which is given by the difference \[ \text{dynamic range} = \text{saturation} - \text{leakiness}. \label{eq:dynamic_range_def} \] These three properties are shown in Fig. 4(F-H). We discuss these properties in greater detail in Sec. 4.11. Fig. 6(A-C) shows that the measurements of these three properties, derived from the fold-change data in the absence of IPTG and the presence of saturating IPTG closely match the predictions for all three operators.

Figure 6: Predictions and experimental measurements of key properties of induction profiles. Data for the leakiness, saturation, and dynamic range are obtained from fold-change measurements in Fig. 5 in the absence of IPTG and at saturating concentrations of IPTG. The three repressor-operator binding energies in the legend correspond to the O1 operator (-15.3~k_B T), O2 operator (-13.9~k_B T), and O3 operator (-9.7~k_B T). Both the [EC_{50}] and effective Hill coefficient are inferred by individually fitting each operator-repressor pairing in Fig. 5(A-C) separately to Eq. \ref{eq:fold_change_full} to smoothly interpolate between the data points. Error bars for (A-C) represent the standard error of the mean for eight or more replicates; error bars for (D-E) represent the 95% credible region for the parameter found by propagating the credible region of our estimates of K_A and K_I into Eq. \ref{eq:ec50} and Eq. \ref{eq:effective_Hill}. The Python code (ch2_fig06.py) used to generate this figure can be found on the original paper’s GitHub repository.

Two additional properties of induction profiles are the \([EC_{50}]\) and effective Hill coefficient, which determine the range of inducer concentration in which the system’s output goes from its minimum to maximum value. The \([EC_{50}]\) denotes the inducer concentration required to generate a system response halfway between its minimum and maximum value, \[ \text{fold-change}(c = [EC_{50}]) = \frac{\text{leakiness} + \text{saturation}}{2}. \label{eq:ec50} \] The effective Hill coefficient \(h\), which quantifies the steepness of the curve at the \([EC_{50}]\)  [28], is given by \[ h = \left( 2 \frac{d}{d \log c} \left[ \log \left( \frac{ \text{fold-change}(c) - \text{leakiness}}{\text{dynamic range}} \right) \right] \right)_{c = [EC_{50}]}. \label{eq:effective_Hill} \] Fig. 4(I),(J) shows how the \([EC_{50}]\) and effective Hill coefficient depend on the repressor copy number. Sec. 4.11 discusses the analytic forms of these two properties and their dependence on the repressor-DNA binding energy.

Fig. 6(D) and Fig. 6(E) shows the estimated values of the \([EC_{50}]\) and the effective Hill coefficient overlaid on the theoretical predictions. Both properties were obtained by fitting Eq. \(\ref{eq:fold_change_full}\) to each individual titration curve and computing the \([EC_{50}]\) and effective Hill coefficient using Eq. \(\ref{eq:ec50}\) and Eq. \(\ref{eq:effective_Hill}\), respectively. We find that the predictions made with the single strain fit closely match those made for each of the strains with O1 and O2 operators, but the predictions for the O3 operator are markedly off. Sec. 4.10 shows that the large, asymmetric error bars for the O3 \(R=22\) strain arise from its nearly flat response, where the lack of dynamic range makes it impossible to determine the value of the inducer dissociation constants \(K_A\) and \(K_I\), as can be seen in the uncertainty of both the \([EC_{50}]\) and effective Hill coefficient. Discrepancies between theory and data for O3 are improved but not fully resolved by performing a global fit or fitting the MWC model individually to each curve (see Sec. 4.8). It remains an open question on how to account for discrepancies in O3, particularly regarding the significant mismatch between the predicted and fitted effective Hill coefficients.

Data Collapse of Induction Profiles

Our primary interest heretofore was to determine the system response at a specific inducer concentration, repressor copy number, and repressor-DNA binding energy. However, the cell does not necessarily “care about” the precise number of repressors in the system or the binding energy of an individual operator. The relevant quantity for cellular function is the fold-change enacted by the regulatory system. This raises the question: given a specific value of the fold-change, what combination of parameters will give rise to this desired response? In other words, what trade-offs between the parameters of the system will give rise to the same mean cellular output? These are key questions for understanding how the system is governed and for engineering specific responses in a synthetic biology context. To address these questions, we follow the data collapse strategy used in a number of previous studies  [4143], and rewrite Eq. \(\ref{eq:fold_change_full}\) as a Fermi function, \[ \text{fold-change}= \frac{1}{1 + e^{-F(c)}}, \label{eq:free_energy_definition} \] where \(F(c)\) is the free energy of the repressor binding to the operator of interest relative to the unbound operator state in \(k_B T\) units  [23,42,43], which is given by \[ F(c) = \frac{\Delta\varepsilon_{RA}}{k_BT} - \log \frac{\left(1+\frac{c}{K_A}\right)^n}{\left(1+\frac{c}{K_A}\right)^n + e^{-\beta \Delta\varepsilon_{AI} }\left(1+\frac{c}{K_I}\right)^n} - \log \frac{R}{N_{NS}}. \label{eq:free_energy_MWC_parameters} \] The first term in \(F(c)\) denotes the repressor-operator binding energy, the second the contribution from the inducer concentration, and the last the effect of the repressor copy number. We note that elsewhere, this free energy has been dubbed the Bohr parameter since such families of curves are analogous to the shifts in hemoglobin binding curves at different pHs known as the Bohr effect  [23,44,45].

Instead of analyzing each induction curve individually, the free energy provides a natural means to simultaneously characterize the diversity in our eighteen induction profiles. Fig. 7(A) demonstrates how the various induction curves from Fig. 4(C-E) all collapse onto a single master curve, where points from every induction profile that yield the same fold-change are mapped onto the same free energy. Fig. 7(B) shows this data collapse for the 216 data points in Fig. 5(A-C), demonstrating the close match between the theoretical predictions and experimental measurements across all eighteen strains.

Many different combinations of parameter values can result in the same free energy as defined in Eq. \(\ref{eq:free_energy_MWC_parameters}\). For example, suppose a system initially has a fold-change of 0.2 at a specific inducer concentration, and then operator mutations increase the \(\Delta\varepsilon_{RA}\) binding energy  [46]. While this initially increases both the free energy and the fold-change, a subsequent increase in the repressor copy number could bring the cell back to the original fold-change level. Such trade-offs hint that there need not be a single set of parameters that evoke a specific cellular response, but rather that the cell explores a large but degenerate space of parameters with multiple, equally valid paths.

Figure 7: Fold-change data from a broad collection of different strains collapse onto a single master curve. (A) Any combination of parameters can be mapped to a single physiological response (i.e., fold-change) via the free energy, which encompasses the parametric details of the model. (B) Experimental data from collapse onto a single master curve as a function of the free energy Eq. \ref{eq:free_energy_MWC_parameters}. The free energy for each strain was calculated from Eq. \ref{eq:free_energy_MWC_parameters} using n=2, \Delta\varepsilon_{AI}=4.5~k_BT, K_A=139 \times 10^{-6} \, \text{M}, K_I=0.53 \times 10^{-6}\, \text{M}, and the strain-specific R and \Delta\varepsilon_{RA}. All data points represent the mean, and error bars are the standard error of the mean for eight or more replicates. The Python code (ch2_fig07.py) used to generate this figure can be found on the original paper’s GitHub repository.

Discussion

Since the early work by Monod, Wyman, and Changeux  [12,47], an array of biological phenomena has been tied to the existence of macromolecules that switch between inactive and active states. Examples can be found in a wide variety of cellular processes, including ligand-gated ion channels  [48], enzymatic reactions  [45,49], chemotaxis  [42], quorum sensing  [43], G-protein coupled receptors  [50], physiologically important proteins  [51,52], and beyond. One of the most ubiquitous examples of allostery is in the context of gene expression, where an array of molecular players bind to transcription factors to influence their ability to regulate gene activity  [17,18]. A number of studies have focused on developing a quantitative understanding of allosteric regulatory systems.  [28,40] analytically derived fundamental properties of the MWC model, including the leakiness and dynamic range described in this work, noting the inherent trade-offs in these properties when tuning the model’s parameters. Work in the Church and Voigt labs, among others, has expanded on the availability of allosteric circuits for synthetic biology  [7,8,53,54]. Recently, Daber et al. theoretically explored the induction of simple repression within the MWC model  [31] and experimentally measured how mutations alter the induction profiles of transcription factors  [26]. Vilar and Saiz analyzed a variety of interactions in inducible lac-based systems, including the effects of oligomerization and DNA folding on transcription factor induction  [6,55]. Other work has attempted to use the lac system to reconcile in vitro and in vivo measurements  [33,56].

Although this body of work has done much to improve our understanding of allosteric transcription factors, there have been few attempts to connect quantitative models to experiments explicitly. Here, we generate a predictive model of allosteric transcriptional regulation and then test the model against a thorough set of experiments using well-characterized regulatory components. Specifically, we used the MWC model to build upon a well-established thermodynamic model of transcriptional regulation  [9,24], allowing us to compose the model from a minimal set of biologically meaningful parameters. This model combines both theoretical and experimental insights; for example, rather than considering gene expression directly, we analyze the fold-change in expression, where the weak promoter approximation circumvents uncertainty in the RNAP copy number. The resulting model depended upon experimentally accessible parameters, namely, the repressor copy number, the repressor-DNA binding energy, and inducer concentration. We tested these predictions on a range of strains whose repressor copy number spanned two orders of magnitude and whose DNA binding affinity spanned 6 \(k_BT\). We argue that one would not generate such a wide array of predictions by using a Hill function, which abstracts away the biophysical meaning of the parameters into phenomenological parameters  [57].

More precisely, we tested our model in the context of a lac-based simple repression system by first determining the allosteric dissociation constants \(K_A\) and \(K_I\) from a single induction data set (O2 operator with binding energy \(\Delta \varepsilon_{RA} = -13.9~k_BT\) and repressor copy number \(R = 260\)), and then using these values to make parameter-free predictions of the induction profiles for seventeen other strains where \(\Delta \varepsilon_{RA}\) and \(R\) were varied significantly. We next measured the induction profiles of these seventeen strains using flow cytometry and found that our predictions consistently and accurately captured the primary features for each induction data set, as shown in Fig. 5. Importantly, we find that fitting \(K_A\) and \(K_I\) to data from any other strain would have resulted in nearly identical predictions (see Sec. 4.10 for further details). This suggests that a few carefully chosen measurements can lead to a deep quantitative understanding of how simple regulatory systems work without requiring an extensive sampling of strains that span the parameter space. Moreover, the fact that we could consistently achieve reliable predictions after fitting only two free parameters stand in contrast to the common practice of fitting several free parameters simultaneously, which can nearly guarantee an acceptable fit provided that the model roughly resembles the system response, regardless of whether the details of the model are tied to any underlying molecular mechanism.

Beyond observing changes in fold-change as a function of effector concentration, our application of the MWC model allows us to predict the values of explicitly the induction curves’ key parameters, namely, the leakiness, saturation, dynamic range, \([EC_{50}]\), and the effective Hill coefficient. We are consistently able to accurately predict the leakiness, saturation, and dynamic range for each of the strains. For both the O1 and O2 data sets, our model also accurately predicts the effective Hill coefficient and \([EC_{50}]\), though these predictions for O3 are noticeably less accurate. While performing a global fit for all model parameters marginally improves the prediction for O3 (see Sec. 4.8), we are still unable to predict the effective Hill coefficient or accurately the \([EC_{50}]\). We further tried including additional states (such as allowing the inactive repressor to bind to the operator), relaxing the weak promoter approximation, accounting for changes in gene and repressor copy number throughout the cell cycle  [58], and refitting the original binding energies from  [13], but we were still unable to account for the O3 data. It remains an open question as to how the discrepancy between the theory and measurements for O3 can be reconciled.

The dynamic range, which is of considerable interest when designing or characterizing a genetic circuit is revealed to have an interesting property: although changing the value of \(\Delta \varepsilon_{RA}\) causes the dynamic range curves to shift to the right or left, each curve has the same shape and in particular the same maximum value. This means that strains with strong or weak binding energies can attain the same dynamic range when the value of \(R\) is tuned to compensate for the binding energy. This feature is not immediately apparent from the IPTG induction curves, which show very low dynamic ranges for several of the O1 and O3 strains. Without the benefit of models that can predict such phenotypic traits, efforts to engineer genetic circuits with allosteric transcription factors must rely on trial and error to achieve specific responses  [7,8].

Despite the diversity observed in the induction profiles of each of our strains, our data are unified by their reliance on fundamental biophysical parameters. In particular, we have shown that our model for fold-change can be rewritten in terms of the free energy Eq. \(\ref{eq:free_energy_MWC_parameters}\), which encompasses all of the physical parameters of the system. This has proven to be an illuminating technique in a number of studies of allosteric proteins  [4143]. Although it is experimentally straightforward to observe system responses to changes in effector concentration \(c\), framing the input-output function in terms of \(c\) can give the misleading impression that changes in system parameters lead to fundamentally altered system responses. Alternatively, suppose one can find the “natural variable” that enables the output to collapse onto a single curve. In that case, it becomes clear that the system’s output is not governed by individual system parameters but rather the contributions of multiple parameters that define the natural variable. When our fold-change data are plotted against each construct’s respective free energies, they collapse cleanly onto a single curve. This enables us to analyze how parameters can compensate for each other. For example, rather than viewing strong repression as a consequence of low IPTG concentration \(c\) or high repressor copy number \(R\), we can now observe that strong repression is achieved when the free energy \(F(c) \leq -5 k_BT\), a condition which can be reached in a number of ways.

While our experiments validated the theoretical predictions in the case of simple repression, we expect the framework presented here to apply much more generally to different biological instances of allosteric regulation. For example, we can use this model to study more complex systems, such as when transcription factors interact with multiple operators  [24]. We can further explore different regulatory configurations such as corepression, activation, and coactivation, each of which are found in E. coli (see Sec. 4.12). This work can also serve as a springboard to characterize not just the mean but the full gene expression distribution and thus quantify the impact of noise on the system  [59]. Another extension of this approach would be to theoretically predict and experimentally verify whether the repressor-inducer dissociation constants \(K_A\) and \(K_I\) or the energy difference \(\Delta \varepsilon_{AI}\) between the allosteric states can be tuned by making single amino acid substitutions in the transcription factor  [23,26]. Finally, we expect that the rigorous quantitative description of the allosteric phenomenon provided here will make it possible to construct biophysical models of fitness for allosteric proteins similar to those already invoked to explore the fitness effects of transcription factor binding site strengths and protein stability  [6062].

To conclude, we find that our application of the MWC model provides an accurate, predictive framework for understanding simple repression by allosteric transcription factors. To reach this conclusion, we analyzed the model in the context of a well-characterized system, in which each parameter had a clear biophysical meaning. As many of these parameters had been measured or inferred in previous studies, this gave us a minimal model with only two free parameters, which we inferred from a single data set. We then accurately predicted the behavior of seventeen other data sets in which repressor copy number and repressor-DNA binding energy were systematically varied. In addition, our model allowed us to understand how key properties such as the leakiness, saturation, dynamic range, \([EC_{50}]\), and effective Hill coefficient depended upon the small set of parameters governing this system. Finally, we show that by framing inducible simple repression in terms of free energy, the data from all of our experimental strains collapse cleanly onto a single curve, illustrating the many ways in which a particular output can be targeted. In total, these results show that a thermodynamic formulation of the MWC model supersedes phenomenological fitting functions for understanding transcriptional regulation by allosteric proteins.

Materials & Methods

Bacterial Strains and DNA Constructs

All strains used in these experiments were derived from E. coli K12 MG1655 with the lac operon removed, adapted from those created and described in  [9,13]. Briefly, the operator variants and YFP reporter gene were cloned into a pZS25 background which contains a lacUV5 promoter that drives expression, as is shown schematically in Fig. 2. These constructs carried a kanamycin resistance gene and were integrated into the galK locus of the chromosome using \(\lambda\) Red recombineering  [63]. The lacI gene was constitutively expressed via a P\(_\mathrm{LtetO\hbox{-}1}\) promoter  [53], with ribosomal binding site mutations made to vary the LacI copy number as described in  [64] using site-directed mutagenesis (Quickchange II; Stratagene), with further details in  [9]. These lacI constructs carried a chloramphenicol resistance gene and were integrated into the ybcN locus of the chromosome. Final strain construction was achieved by performing repeated P1 transduction  [65] of the different operator and lacI constructs to generate each combination used in this work. Integration was confirmed by PCR amplification of the replaced chromosomal region and by sequencing. Primers and final strain genotypes are listed in Sec. 4.15.

It is important to note that the rest of the lac operon (lacZYA) was never expressed. The LacY protein is a transmembrane protein that actively transports lactose as well as IPTG into the cell. As LacY was never produced in our strains, we assume that the extracellular and intracellular IPTG concentration was approximately equal due to diffusion across the membrane into the cell, as suggested by previous work  [66].

To make this theory applicable to transcription factors with any number of DNA binding domains, we used a different definition for repressor copy number than has been used previously. We define the LacI copy number as the average number of repressor dimers per cell, whereas in  [9], the copy number is defined as the average number of repressor tetramers in each cell. To motivate this decision, we consider that the LacI repressor molecule exists as a tetramer in E. coli  [67] in which a single DNA binding domain is formed from dimerization of LacI proteins so that wild-type LacI might be described as dimer of dimers. Since each dimer is allosterically independent (i.e., either dimer can be allosterically active or inactive, independent of the configuration of the other dimer)  [31], a single LacI tetramer can be treated as two functional repressors. Therefore, we have multiplied the number of repressors reported in  [9] by a factor of two. This factor is included as a keyword argument in the numerous Python functions used to perform this analysis, as discussed in the code documentation.

A subset of strains in these experiments was measured using fluorescence microscopy to validate the flow cytometry data and results. To aid in the high-fidelity segmentation of individual cells, the strains were modified to express an mCherry fluorophore constitutively. This reporter was cloned into a pZS4*1 backbone  [53] in which mCherry is driven by the lacUV5 promoter. All microscopy and flow cytometry experiments were performed using these strains.

Growth Conditions for Flow Cytometry Measurements

All measurements were performed with E. coli cells grown to mid-exponential phase in standard M9 minimal media (M9 5X Salts, Sigma-Aldrich M6030; \(2\,\text{mM}\) magnesium sulfate, Mallinckrodt Chemicals 6066-04; \(100\,\mu\text{M}\) calcium chloride, Fisher Chemicals C79-500) supplemented with 0.5% (w/v) glucose. Briefly, \(500\,\mu\text{L}\) cultures of E. coli were inoculated into Lysogeny Broth (LB Miller Powder, BD Medical) from a 50% glycerol frozen stock (-80\(^\circ\)C) and were grown overnight in a \(2\,\text{mL}\) 96-deep-well plate sealed with a breathable nylon cover (Lab Pak - Nitex Nylon, Sefar America Inc. Cat. No. 241205) with rapid agitation for proper aeration. After approximately \(12\) to \(15\) hours, the cultures had reached saturation and were diluted 1000-fold into a second \(2\,\text{mL}\) 96-deep-well plate where each well contained \(500\,\mu\text{L}\) of M9 minimal media supplemented with 0.5% w/v glucose (anhydrous D-Glucose, Macron Chemicals) and the appropriate concentration of IPTG (Isopropyl \(\beta\)-D-1 thiogalactopyranoside Dioxane Free, Research Products International). These were sealed with a breathable cover and were allowed to grow for approximately eight hours. Cells were then diluted ten-fold into a round-bottom 96-well plate (Corning Cat. No. 3365) containing \(90\,\mu\text{L}\) of M9 minimal media supplemented with 0.5% w/v glucose along with the corresponding IPTG concentrations. For each IPTG concentration, a stock of 100-fold concentrated IPTG in double distilled water was prepared and partitioned into \(100\,\mu\text{L}\) aliquots. The same parent stock was used for all experiments described in this work.

Flow Cytometry

Unless explicitly mentioned, all fold-change measurements were collected on a Miltenyi Biotec MACSquant Analyzer 10 Flow Cytometer graciously provided by the Pamela Björkman lab at Caltech. Detailed information regarding the voltage settings of the photo-multiplier detectors can be found in Sec. 4.4. Prior to each day’s experiments, the analyzer was calibrated using MACSQuant Calibration Beads (Cat. No. 130-093-607) such that day-to-day experiments would be comparable. All YFP fluorescence measurements were collected via \(488\,\text{nm}\) laser excitation coupled with a 525/\(50\,\text{nm}\) emission filter. Unless otherwise specified, all measurements were taken over two to three hours using automated sampling from a 96-well plate kept at approximately \(4^\circ \, \hbox{-} \, 10^\circ\)C on a MACS Chill 96 Rack (Cat. No. 130-094-459). Cells were diluted to a final concentration of approximately \(4\times 10^{4}\) cells per \(\mu\text{L}\) which corresponded to a flow rate of 2,000-6,000 measurements per second, and acquisition for each well was halted after 100,000 events were detected. Once completed, the data were extracted and immediately processed using the following methods.

Unsupervised Gating of Flow Cytometry Data

Flow cytometry data will frequently include a number of spurious events or other undesirable data points such as cell doublets and debris. The process of restricting the collected data set to those determined to be “real” is commonly referred to as gating. These gates are typically drawn manually  [68] and restrict the data set to those points which display a high degree of linear correlation between their forward-scatter (FSC) and side-scatter (SSC). The development of unbiased and unsupervised methods of drawing these gates is an active area of research  [69,70]. For our purposes, we assume that the fluorescence level of the population should be log-normally distributed about some mean value. With this assumption in place, we developed a method that allows us to restrict the data used to compute the mean fluorescence intensity of the population to the smallest two-dimensional region of the \(\log(\mathrm{FSC})\) vs. \(\log(\mathrm{SSC})\) space in which 40% of the data is found. This was performed by fitting a bivariate Gaussian distribution and restricting the data used for the calculation to those that reside within the 40th percentile. This procedure is described in more detail in the supplementary information as well as in a Jupyter notebook located in this paper’s Github repository.

Experimental Determination of Fold-Change

For each strain and IPTG concentration, the fold-change in gene expression was calculated by taking the ratio of the population mean YFP expression in the presence of LacI repressor to that of the population mean in the absence of LacI repressor. However, the measured fluorescence intensity of each cell also includes the autofluorescence contributed by the weak excitation of the myriad protein and small molecules within the cell. To correct for this background, we computed the fold change as \[ \text{fold-change} = \frac{\langle I_{R > 0} \rangle - \langle I_\text{auto}\rangle}{\langle I_{R = 0} \rangle - \langle I_\text{auto}\rangle}, \label{eq:fold_change_fluor} \] where \(\langle I_{R > 0}\rangle\) is the average cell YFP intensity in the presence of repressor, \(\langle I_{R = 0}\rangle\) is the average cell YFP intensity in the absence of repressor, and \(\langle I_\text{auto} \rangle\) is the average cell autofluorescence intensity, as measured from cells that lack the lac-YFP construct.

Bayesian Parameter Estimation

In this work, we determine the most likely parameter values for the inducer dissociation constants \(K_A\) and \(K_I\) of the active and inactive state, respectively, using Bayesian methods. We compute the probability distribution of the value of each parameter given the data \(D\), which by Bayes’ theorem is given by \[ P(K_A, K_I \mid D) = \frac{P(D \mid K_A, K_I)P(K_A, K_I)}{P(D)}, \label{eq:bayes_theorem} \] where \(D\) is all the data composed of independent variables (repressor copy number \(R\), repressor-DNA binding energy \(\Delta\varepsilon_{RA}\), and inducer concentration \(c\)) and one dependent variable (experimental fold-change). \(P(D \mid K_A, K_I)\) is the likelihood of having observed the data given the parameter values for the dissociation constants, \(P(K_A, K_I)\) contains all the prior information on these parameters, and \(P(D)\) serves as a normalization constant, which we can ignore in our parameter estimation. Our model assumes a deterministic relationship between the parameters and the data, so to construct a probabilistic relationship as required by Eq. \(\ref{eq:bayes_theorem}\), we assume that the experimental fold-change for the \(i^\text{th}\) datum given the parameters is of the form \[ \text{fold-change} _{\exp}^{(i)} = \left( 1 + \frac{\left(1 + \frac{c^{(i)}}{K_A}\right)^2}{\left( 1 + \frac{c^{(i)}}{K_A}\right)^2 + e^{-\beta \Delta \varepsilon_{AI}} \left(1 + \frac{c^{(i)}}{K_I} \right)^2} \frac{R^{(i)}}{N_{NS}} e^{-\beta \Delta \varepsilon_{RA}^{(i)}}\right)^{-1} + \epsilon^{(i)}, \label{eq:fold_change_exp} \] where \(\epsilon^{(i)}\) represents the departure from the deterministic theoretical prediction for the \(i^\text{th}\) data point. If we assume that these \(\epsilon^{(i)}\) errors are normally distributed with mean zero and standard deviation \(\sigma\), the likelihood of the data given the parameters \(P(D \mid K_A, K_I, \sigma)\) is of the form \[ {\scriptstyle P(D \vert K_A, K_I, \sigma) = \frac{1}{(2\pi\sigma^2)^{\frac{n}{2}}} \prod\limits_{i=1}^n \exp \left[-\frac{(\text{fold-change}^{(i)}_{\exp} - \text{fold-change}(K_A, K_I, R^{(i)}, \Delta\varepsilon_{RA}^{(i)}, c^{(i)}))^2}{2\sigma^2}\right], } \label{eq:likelihood} \] where \(\text{fold-change}^{(i)}_{\text{exp}}\) is the experimental fold-change and \(\text{fold-change}(\,\cdots)\) is the theoretical prediction. The product \(\prod_{i=1}^n\) captures the assumption that the \(n\) data points are independent. Note that the likelihood and prior terms now include the extra unknown parameter \(\sigma\). In applying Eq. \(\ref{eq:likelihood}\), a choice of \(K_A\) and \(K_I\) that provides a better agreement between theoretical fold-change predictions and experimental measurements will result in a more probable likelihood.

Both mathematically and numerically, it is convenient to define \(\tilde{k}_A = -\log \frac{K_A}{1\,\text{M}}\) and \(\tilde{k}_I = -\log \frac{K_I}{1\,\text{M}}\) and fit for these parameters on a log scale. Dissociation constants are scale invariant, so that a change from \(10\,\mu\text{M}\) to \(1\,\mu\text{M}\) leads to an equivalent increase in affinity as a change from \(1\,\mu\text{M}\) to \(0.1\,\mu\text{M}\). With these definitions, we assume for the prior \(P(\tilde{k}_A, \tilde{k}_I, \sigma)\) that all three parameters are independent. In addition, we assume a uniform distribution for \(\tilde{k}_A\) and \(\tilde{k}_I\) and a Jeffreys prior  [35] for the scale parameter \(\sigma\). This yields the complete prior \[ P(\tilde{k}_A, \tilde{k}_I, \sigma) \equiv \frac{1}{(\tilde{k}_A^{\max} - \tilde{k}_A^{\min})} \frac{1}{(\tilde{k}_I^{\max} - \tilde{k}_I^{\min})}\frac{1}{\sigma}. \label{eq:ka_ki_prior} \] These priors are maximally uninformative, meaning that they imply no prior knowledge of the parameter values. We defined the \(\tilde{k}_A\) and \(\tilde{k}_A\) ranges uniform on the range of \(-7\) to \(7\), although we note that this particular choice does not affect the outcome provided the chosen range is sufficiently wide.

Putting all these terms together, we can now sample from \(P(\tilde{k}_A, \tilde{k}_I, \sigma \mid D)\) using Markov chain Monte Carlo (see GitHub repository) to compute the most likely parameter as well as the error bars (given by the 95% credible region) for \(K_A\) and \(K_I\).

Data Curation

All of the data used in this work and all relevant code can be found at this dedicated website. Data were collected, stored, and preserved using the Git version control software combined with off-site storage and hosting website GitHub. Code is used to generate all figures and complete all processing steps, and analyses are available on the GitHub repository. Many analysis files are stored as instructive Jupyter Notebooks. The scientific community is invited to fork our repositories and open constructive issues on the GitHub repository.

References

[1]
J. E. Lindsley and J. Rutter, Whence cometh the allosterome?, Proceedings of the National Academy of Sciences 103, 10533 (2006).
[2]
J. G. Harman, Allosteric regulation of the cAMP receptor protein, Biochimica Et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 1547, 1 (2001).
[3]
M. F. Lanfranco, F. Gárate, A. J. Engdahl, and R. A. Maillard, Asymmetric configurations in a reengineered homodimer reveal multiple subunit communication pathways in protein allostery, The Journal of Biological Chemistry 292, 6086 (2017).
[4]
Y. Setty, A. E. Mayo, M. G. Surette, and U. Alon, Detailed map of a cis-regulatory input function, Proceedings of the National Academy of Sciences 100, 7702 (2003).
[5]
F. J. Poelwijk, M. G. J. deVos, and S. J. Tans, Tradeoffs and optimality in the evolution of gene regulation, Cell 146, 462 (2011).
[6]
J. M. G. Vilar and L. Saiz, Reliable prediction of complex phenotypes from a modular design in free energy space: An extensive exploration of the lac operon, ACS Synthetic Biology 2, 576 (2013).
[7]
J. K. Rogers, C. D. Guzman, N. D. Taylor, S. Raman, K. Anderson, and G. M. Church, Synthetic biosensors for precise gene control and real-time monitoring of metabolites, Nucleic Acids Research 43, 7648 (2015).
[8]
J. Rohlhill, N. R. Sandoval, and E. T. Papoutsakis, Sort-Seq approach to engineering a formaldehyde-inducible promoter for dynamically regulated Escherichia coli growth on methanol, ACS Synthetic Biology (2017).
[9]
H. G. Garcia and R. Phillips, Quantitative dissection of the simple repression input-output function, Proceedings of the National Academy of Sciences 108, 12173 (2011).
[10]
R. C. Brewster, F. M. Weinert, H. G. Garcia, D. Song, M. Rydenfelt, and R. Phillips, The transcription factor titration effect dictates level of gene expression, Cell 156, 1312 (2014).
[11]
F. M. Weinert, R. C. Brewster, M. Rydenfelt, R. Phillips, and W. K. Kegel, Scaling of gene expression with transcription-factor fugacity, Physical Review Letters 113, 1 (2014).
[12]
J. Monod, J. Wyman, and J.-P. Changeux, On the nature of allosteric transitions: A plausible model, Journal of Molecular Biology 12, 88 (1965).
[13]
H. G. Garcia, H. J. Lee, J. Q. Boedicker, and R. Phillips, Comparison and calibration of different reporters for quantitative analysis of gene expression, Biophysical Journal 101, 535 (2011).
[14]
R. C. Brewster, D. L. Jones, and R. Phillips, Tuning promoter strength through RNA polymerase binding site design in Escherichia coli, PLoS Computational Biology 8, e1002811 (2012).
[15]
J. Q. Boedicker, H. G. Garcia, and R. Phillips, Theoretical and experimental dissection of DNA loop-mediated repression, Physical Review Letters 110, 1 (2013).
[16]
S. J. James Boedicker Hernan Garcia, DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation, Physical Biology 10, 066005 (2013).
[17]
Z. Huang, L. Zhu, Y. Cao, G. Wu, X. Liu, Y. Chen, Q. Wang, T. Shi, Y. Zhao, Y. Wang, W. Li, Y. Li, H. Chen, G. Chen, and J. Zhang, ASD: A comprehensive database of allosteric proteins and modulators, Nucleic Acids Research 39, D663 (2011).
[18]
G.-W. Li, D. Burkhardt, C. Gross, and J. S. Weissman, Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources, Cell 157, 624 (2014).
[19]
G. K. Ackers, A. D. Johnson, and M. A. Shea, Quantitative model for gene regulation by lambda phage repressor, Proceedings of the National Academy of Sciences 79, 1129 (1982).
[20]
N. E. Buchler, U. Gerland, and T. Hwa, On schemes of combinatorial transcription logic, Proceedings of the National Academy of Sciences 100, 5136 (2003).
[21]
J. M. G. Vilar and S. Leibler, DNA looping and physical constraints on transcription regulation, Journal of Molecular Biology 331, 981 (2003).
[22]
L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, and R. Phillips, Transcriptional regulation by the numbers: Models, Current Opinion in Genetics & Development 15, 116 (2005).
[23]
R. Phillips, Napoleon is in equilibrium, Annual Review of Condensed Matter Physics 6, 85 (2015).
[24]
L. Bintu, N. E. Buchler, H. G. Garcia, U. Gerland, T. Hwa, J. Kondev, T. Kuhlman, and R. Phillips, Transcriptional regulation by the numbers: Applications, Current Opinion in Genetics & Development 15, 125 (2005).
[25]
T. Kuhlman, Z. Zhang, M. H. Saier, and T. Hwa, Combinatorial transcriptional control of the lactose operon of Escherichia coli, Proceedings of the National Academy of Sciences 104, 6043 (2007).
[26]
R. Daber, M. A. Sochor, and M. Lewis, Thermodynamic analysis of mutant Lac repressors, Journal of Molecular Biology 409, 76 (2011).
[27]
S. Klumpp and T. Hwa, Growth-rate-dependent partitioning of RNA polymerases in bacteria, Proceedings of the National Academy of Sciences 105, 20245 (2008).
[28]
S. Marzen, H. G. Garcia, and R. Phillips, Statistical mechanics of Monod-Wyman-Changeux (MWC) models, Journal of Molecular Biology 425, 1433 (2013).
[29]
R. B. O’Gorman, J. M. Rosenberg, O. B. Kallai, R. E. Dickerson, K. Itakura, A. D. Riggs, and K. S. Matthews, Equilibrium binding of inducer to lac repressor-operator DNA complex, Journal of Biological Chemistry 255, 10107 (1980).
[30]
K. F. Murphy, G. Balázsi, and J. J. Collins, Combinatorial promoter design for engineering noisy gene expression, Proceedings of the National Academy of Sciences 104, 12726 (2007).
[31]
R. Daber, K. Sharp, and M. Lewis, One is not enough, Journal of Molecular Biology 392, 1133 (2009).
[32]
K. F. Murphy, R. M. Adams, X. Wang, G. Balázsi, and J. J. Collins, Tuning and controlling gene expression noise in synthetic gene networks, Nucleic Acids Research 38, 2712 (2010).
[33]
M. A. Sochor, In vitro transcription accurately predicts lac repressor phenotype in vivo in Escherichia coli, PeerJ 2, e498 (2014).
[34]
M. Rydenfelt, R. S. Cox, H. Garcia, and R. Phillips, Statistical mechanical model of coupled transcription from multiple promoters due to transcription factor titration, Physical Review E 89, 012702 (2014).
[35]
D. Sivia and J. Skilling, Data analysis: a Bayesian tutorial (OUP Oxford, 2006).
[36]
S. Oehler, M. Amouyal, P. Kolkhof, B. von Wilcken-Bergmann, and B. Müller-Hill, Quality and position of the three lac operators of E. coli define efficiency of repression, The EMBO Journal 13, 3348 (1994).
[37]
M. Scott, C. W. Gunderson, E. M. Mateescu, Z. Zhang, and T. Hwa, Interdependence of cell growth and gene expression: origins and consequences, Science 330, 1099 (2010).
[38]
J. A. N. Brophy and C. A. Voigt, Principles of genetic circuit design, Nature Methods 11, 508 (2014).
[39]
D. L. Shis, F. Hussain, S. Meinhardt, L. Swint-Kruse, and M. R. Bennett, Modular, multi-input transcriptional logic Gating with orthogonal LacI/GalR family chimeras, ACS Synthetic Biology 3, 645 (2014).
[40]
B. M. C. Martins and P. S. Swain, Trade-offs and constraints in allosteric sensing, PLoS Computational Biology 7, 1 (2011).
[41]
V. Sourjik and H. C. Berg, Receptor sensitivity in bacterial chemotaxis, Proceedings of the National Academy of Sciences 99, 123 (2002).
[42]
J. E. Keymer, R. G. Endres, M. Skoge, Y. Meir, and N. S. Wingreen, Chemosensing in Escherichia coli: Two regimes of two-state receptors, Proceedings of the National Academy of Sciences 103, 1786 (2006).
[43]
L. R. Swem, D. L. Swem, N. S. Wingreen, and B. L. Bassler, Deducing receptor signaling parameters from in vivo analysis: LuxN/AI-1 quorum sensing in Vibrio harveyi, Cell 134, 461 (2008).
[44]
L. A. Mirny, Nucleosome-mediated cooperativity between transcription factors, Proceedings of the National Academy of Sciences 107, 22534 (2010).
[45]
T. Einav, L. Mazutis, and R. Phillips, Statistical mechanics of allosteric enzymes, The Journal of Physical Chemistry B 121, (2016).
[46]
H. G. Garcia, A. Sanchez, J. Q. Boedicker, M. Osborne, J. Gelles, J. Kondev, and R. Phillips, Operator sequence alters gene expression independently of transcription factor occupancy in bacteria, Cell Reports 2, 150 (2012).
[47]
J. Monod, J.-P. Changeux, and F. Jacob, Allosteric proteins and cellular control systems, Journal of Molecular Biology 6, 306 (1963).
[48]
A. Auerbach, Thinking in cycles: MWC is a good model for acetylcholine receptor-channels, The Journal of Physiology 590, 93 (2012).
[49]
A. Velyvis, Y. R. Yang, H. K. Schachman, and L. E. Kay, A solution NMR study showing that active site ligands and nucleotides directly perturb the allosteric equilibrium in aspartate transcarbamoylase, Proceedings of the National Academy of Sciences 104, 8815 (2007).
[50]
M. Canals, J. R. Lane, A. Wen, P. J. Scammells, P. M. Sexton, and A. Christopoulos, A Monod-Wyman-Changeux mechanism can explain G protein-coupled receptor (GPCR) allosteric modulation, Journal of Biological Chemistry 287, 650 (2012).
[51]
R. Milo, J. H. Hou, M. Springer, M. P. Brenner, and M. W. Kirschner, The relationship between evolutionary and physiological variation in hemoglobin, Proceedings of the National Academy of Sciences 104, 16998 (2007).
[52]
M. Levantino, A. Spilotros, M. Cammarata, G. Schirò, C. Ardiccioni, B. Vallone, M. Brunori, and A. Cupane, The Monod-Wyman-Changeux allosteric model accounts for the quaternary transition dynamics in wild type and a recombinant mutant human hemoglobin, Proceedings of the National Academy of Sciences 109, 14894 (2012).
[53]
R. Lutz and H. Bujard, Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements, Nucleic Acids Research 25, 1203 (1997).
[54]
T. S. Moon, C. Lou, A. Tamsir, B. C. Stanton, and C. A. Voigt, Genetic programs constructed from layered logic gates in single cells, Nature 491, 249 (2012).
[55]
L. Saiz and J. M. G. Vilar, Ab initio thermodynamic modeling of distal multisite transcription regulation, Nucleic Acids Research 36, 726 (2008).
[56]
S. Tungtur, H. Skinner, H. Zhan, L. Swint-Kruse, and D. Beckett, In vivo tests of thermodynamic models of transcription repressor function, Biophysical Chemistry 159, 142 (2011).
[57]
S. Forsén and S. Linse, Cooperativity: over the Hill, Trends in Biochemical Sciences 20, 495 (1995).
[58]
D. L. Jones, R. C. Brewster, and R. Phillips, Promoter architecture dictates cell-to-cell variability in gene expression, Science 346, 1533 (2014).
[59]
A. Eldar and M. Elowitz, Functional roles for noise in genetic circuits, Nature 467, 167 (2010).
[60]
U. Gerland and T. Hwa, On the selection and evolution of regulatory DNA motifs, Journal of Molecular Evolution 55, 386 (2002).
[61]
J. Berg, S. Willmann, and M. Lässig, Adaptive evolution of transcription factor binding sites, BMC Evolutionary Biology 4, 42 (2004).
[62]
K. B. Zeldovich and E. I. Shakhnovich, Understanding protein evolution: From protein physics to Darwinian selection, Annual Review of Physical Chemistry 59, 105 (2008).
[63]
S. K. Sharan, L. C. Thomason, S. G. Kuznetsov, and D. L. Court, Recombineering: a homologous recombination-based method of genetic engineering, Nature Protocols 4, 206 (2009).
[64]
H. M. Salis, E. A. Mirsky, and C. A. Voigt, Automated design of synthetic ribosome binding sites to control protein expression, Nature Biotechnology 27, 946 (2009).
[65]
L. C. Thomason, N. Costantino, and D. L. Court, E. coli genome manipulation by P1 transduction, Current Protocols in Molecular Biology Chapter 1, Unit 1.17 (2007).
[66]
A. Fernández-Castané, C. E. Vine, G. Caminal, and J. López-Santín, Evidencing the role of lactose permease in IPTG uptake by Escherichia coli in fed-batch high cell density cultures, Journal of Biotechnology 157, 391 (2012).
[67]
M. Lewis, G. Chang, N. C. Horton, M. A. Kercher, H. C. Pace, M. A. Schumacher, R. G. Brennan, and P. Lu, Crystal structure of the lactose operon repressor and its complexes with DNA and inducer, Science 271, 1247 (1996).
[68]
H. T. Maecker, A. Rinfret, P. D’Souza, J. Darden, E. Roig, C. Landry, P. Hayes, J. Birungi, O. Anzala, M. Garcia, A. Harari, I. Frank, R. Baydo, M. Baker, J. Holbrook, J. Ottinger, L. Lamoreaux, C. L. Epling, E. Sinclair, M. A. Suni, K. Punt, S. Calarota, S. El-Bahi, G. Alter, H. Maila, E. Kuta, J. Cox, C. Gray, M. Altfeld, N. Nougarede, J. Boyer, L. Tussey, T. Tobery, B. Bredt, M. Roederer, R. Koup, V. C. Maino, K. Weinhold, G. Pantaleo, J. Gilmour, H. Horton, and R. P. Sekaly, Standardization of cytokine flow cytometry assays, BMC Immunology 6, 13 (2005).
[69]
K. Lo, R. R. Brinkman, and R. Gottardo, Automated gating of flow cytometry data via robust model-based clustering, Cytometry Part A 73A, 321 (2008).
[70]
N. Aghaeepour, G. Finak, The2ptFlowCAP2ptConsortium, The2ptDREAM2ptConsortium, H. Hoos, T. R. Mosmann, R. Brinkman, R. Gottardo, and R. H. Scheuermann, Critical assessment of automated flow cytometry data analysis techniques, Nature Methods 10, 228 (2013).