Gene Editing, Synthesis, and Assembly

Gene editing, synthesis, and assembly focuses on the development and advancement of tools to enable the production of chromosomal DNA and the engineering of entire genomes. Concurrently, advancements are needed in the design and construction of functional gene sequences, perhaps entirely synthetic. This section of the Roadmap focuses on these goals and the expansion and progression of tools and methods to make gene editing, synthesis, and assembly a precise and robust process.

Introduction & Impact

Lorem ipsum dolor sit amet, te vis vitae lucilius. Meliore omnesque inciderint per in. Harum eruditi definitiones eum ea. An sea nibh tation antiopam. Nostrud mediocrem quo no. Eu has dolore constituam. At stet illum phaedrum eum, nam cu eligendi inimicus, eam id falli prompta discere. Quo illum utinam convenire ne, eu sea blandit scaevola indoctum, ne nihil lobortis vim. Decore delenit est an.

Transformative Tools & Technologies

Oligonucleotide synthesis technologies

Currently, phosphoramidite-based chemistry is the predominant approach for synthesizing oligonucleotides. Even after significant optimization, per-cycle synthesis yields are about 99.5%; synthesis of a 200-nucleotide oligonucleotide has a yield of only 35%. New technologies seek to improve this process by (a) synthesizing thousands of oligonucleotides in parallel, using either on-chip supports or within tiny microtiter wells; or (b) improving synthesis processivity by replacing the phosphoramidite-based chemistry, for example, using enzyme catalysis (e.g. terminal deoxynucleotidyl transferases) to extend primers with defined nucleotides. There are a number start-up companies developing template-free oligo synthesis methods including, DNA Script, Nuclera Nucleics, Molecular Assemblies and Ansa. Clearly, achieving picomole production of 1000mer oligonucleotides with error-free sequences would significantly improve the overall DNA assembly protocol.

Technologies for oligonucleotide assembly into non-clonal DNA fragments

Currently, multiple 60 to 200mer oligonucleotides are assembled into non-clonal DNA fragments using a combination of annealing, ligation, and/or polymerase chain reaction. The cost of synthesizing non-clonal DNA fragments is $0.10 to $0.30 per base pair, depending on size and complexity. DNA fragments between 300 to 1800 bp can be synthesized by multiple providers. DNA fragments up to 5800 bp can be synthesized by select providers at increased cost. Errors are introduced whenever two oligonucleotides form undesired base pairings, when two oligonucleotides are incorrectly ligated together, or when DNA polymerases extend a synthesized DNA fragment with an incorrect nucleotide. Certain sequence determinants will increase the error rate, resulting in a mixture of undesired fragments. Computational sequence design can reduce the frequency of these errors. Mismatch repair enzymes may be added (with added cost) to eliminate DNA fragments with mis-paired nucleotides, for example, as a result of mis-annealing or DNA polymerase errors. This process has been scaled up to assemble thousands of non-clonal DNA fragments per day. The purification of full-length, error-free DNA fragments remains a challenge. Utilizing longer oligonucleotides (see Oligonucleotide synthesis technologies) would enable the synthesis of longer non-clonal DNA fragments with the same error rate. New technologies utilizing nanopore sequencing have the potential to couple sequencing and purification at single-molecule resolution.

Multi-fragment DNA assembly techniques for clonal genetic systems and genomes

Currently, multiple DNA fragments (300 to 3000 bp long) are assembled into large genetic systems (10,000 to 1,000,000 base-pairs long) using single-pot DNA assembly techniques that combine cocktails of bioprospected and/or engineered enzymes, including exonucleases, endonucleases, DNA polymerases, ligases, and/or recombinases. Enzyme costs are currently about $25 per assembly. Assembled DNA is then introduced into cells for clonal separation and replication. Most assembly techniques have essential sequence determinants, for example, regions of overlapping homology or flanking Type IIS restriction sites. Errors are introduced when two fragments anneal together at incorrect overlap regions, when two fragments are mis-ligated at incorrect ligation junctions, or when DNA polymerases incorporate incorrect nucleotides during DNA synthesis. Computational sequence design can limit the frequency of errors. A major challenge for DNA assembly is the trial-and-error identification of a full-length, error-free genetic system. For example, an optimized assembly technique with a per-junction efficiency of 90% will assemble a 10-part (3000 bp/part) system with 35% yield. At the same per-junction efficiency, assembling a 1,000,000 bp genome from 3000 bp DNA fragments will have a miniscule yield of 5.2×10-14 %. This limitation to DNA assembly has motivated the synthesis of longer non-clonal DNA fragments (see Technologies for oligonucleotide assembly into non-clonal DNA fragments). For example, 1,000,000 bp genomes could be assembled from 10,000 bp, 30,000 bp, or 50,000 bp DNA fragments with a 0.002%, 2.7%, or 11% efficiency, respectively. If longer non-clonal DNA fragments are unavailable, then hierarchical approaches to DNA assembly are required, which increases the number of DNA assembly reactions and verification costs.

Sequencing costs become significant once assembled genetic systems are large and/or assembly yields are exceedingly small. For example, after assembling a 30,000 bp genetic system with a 35% yield, it is necessary to sequence at least 7 clonal isolates to achieve at least 95% chance of identifying a fully correct one. At low throughput, this cost is about $1000 (using Sanger sequencing). Using next generation sequencing, this cost can be greatly reduced to about $0.70, but only when a large amount of DNA (2 billion base pairs) is sequenced at the same time. Similarly, if a 1,000,000 bp genome is assembled from 30,000 bp fragments with a 2.7% yield, then it would be necessary to sequence 100 clonal isolates to achieve a 93% chance of identifying a fully correct one (about $275 in sequencing costs). Finally, hierarchical DNA assembly can be performed by first assembling and purifying smaller genetic systems (e.g. 30,000 bp) and then using them to perform a multi-fragment assembly to build larger genetic systems (33 x 30,000 bp). Hierarchical DNA assembly increases sequencing costs by a multiplier roughly equal to the number of hierarchical cycles. Overall, DNA assembly costs are greatly reduced by utilizing longer non-clonal DNA fragments and by parallelizing operations such that at least 2 billion base pairs of DNA are verified across multiple DNA assembly reactions.

Goal 1:

Ability to manufacture thousands of very long oligonucleotides with high fidelity

Existing synthesis chemistries manufacture oligonucleotides up to 200 nucleotides long with cycle efficiencies of 99.5% and yields of 35%. Parallel synthesis of oligonucleotides is carried out on solid supports, producing up to 300,000 oligonucleotides with defined sequences.

Significant improvements in oligonucleotide synthesis to increase the number, length, and fidelity of oligonucleotides

Robustly synthesize 1,000,000 200-mer oligonucleotides with an error rate of 1/500 nucleotides (2 years)

Scaling up the production of chip-based or semiconductor-based oligonucleotide synthesis chemistries. A better understanding of nanofluidics.
Microfabrication of nanotiter plates and patterned nanometer-scale chips.
Improved process dynamics taking into account inherent stochasticity. Improved electronic control of reaction chemistries.

Robustly synthesize 1,000-mer oligonucleotides at 1/1000 error rate (5 years)

Current phosphoramidite-based chemistries have peaked at 99.5% per-nucleotide efficiencies, resulting in only 0.66% yields when producing 1000-mers. Efficiencies must be 99.9% to achieve more than 35% yields. Cycle times must also be reduced for commercial scalability
Enzyme-based non-templated synthesis has the potential to achieve greater than 99.9% per-nucleotide efficiencies and synthesis rates exceeding 1 nucleotide per second (e.g. terminal deoxynucleotidyl transferases).

Reduce error rate of 1000mer oligonucleotide synthesis to 1/5,000 (10 years)

Non-templated DNA synthesis is currently slow with lower fidelity than templated synthesis. Improvements in enzyme substrate selectivity or substrate availability are needed to control sequence-specific synthesis.
Significant bioprospecting, rational design, and directed evolution of enzymes responsible for non-templated DNA synthesis can improve selectivity and increase catalytic efficiencies.

Synthesize 10,000-mer oligonucleotides at 99.99% cycle efficiency within 1 minute with an error rate of 1/30000 (20 years)

Multiple synergistic improvements are needed, including improved non-templated DNA polymerases, fast substrate switching at the nanoliter scale, multi-nucleotide same-cycle addition, and electronic control of substrate selection.
Inspiration from natural DNA polymerases, ligases, recombinases, and helicases, working together in a dynamic molecular machine.

Goal 2:

Many-fragment DNA assembly with simultaneous, high-fidelity sequence validation

Oligonucleotides are assembled into double-stranded DNA fragments up to 6000 base pairs long using in vitro techniques (e.g. polymerase cycling assembly, ligation cycling) as well as in vivo techniques (yeast-mediated homologous recombination), producing non-clonal DNA fragments. Clonal (isogenic) fragments are then identified using a combination of enzyme-based removal of mismatched base pairs (e.g. MutS) and DNA sequencing (Sanger or NGS). Multiple verified DNA fragments are then assembled together into longer fragments (10,000 to 100,000 base pairs long) using hierarchical approaches employing DNA assembly techniques (e.g. Gibson assembly, ligation cycling reaction, Golden Gate). Megabase length DNA is then assembled from 100,000 base pair fragments using yeast-mediated homologous recombination.

More detailed descriptions of commonly used techniques are below:

Polymerase Cycling Assembly (PCA) is a method to assemble larger DNA constructs from shorter oligonucleotides. PCA is an efficient method for assembling constructs between 200 to 1,000 base-pairs in length. The process is similar to PCR but utilizes a set of overlapping “seed” oligonucleotides that are designed to hybridize to one another leaving gaps that are then filled in using a thermal-stable DNA polymerase. The oligonucleotides are generally 50 to 100 nucleotides in length to ensure uniqueness in the hybridization with their complement. The reactions are cycled from ~60 and ~95 Co for 15 to 30 cycles. The full length assembled product is then usually amplified by PCR using two terminal-specific primers. PCA is an efficient method for assembling constructs between 200 to 1,000 base-pairs in length and can be performed in individual tubes or multiplexed using microtiter well plates.

Emulsion PCA is a method developed by Sriram Kosuri for highly multiplexing the assembly of larger constructs from small amounts of shorter DNA fragments (Plesa et al. 2018). In this method, the oligos required for a given construct are designed with a unique barcode on the terminus which specifically hybridizes with a complementary barcoded attached to a bead from a complex pool of oligonucleotides. The bead mixture is then emulsified into picoliter sized droplets containing a Type IIs restriction endonuclease (RE), dNTPs and a thermal stable DNA polymerase. The oligonucleotides are released from the bead by the Type II RE and then assembled by PCA through thermal cycling of the emulsion. Using this method, 1,000s of specific constructs can be assembled in a single emulsion tube depending upon the number uniquely barcoded beads.

Ligase Cycling Assembly (LCA) is a method to assemble larger DNA constructs from shorter oligonucleotides or double-stranded DNA fragments. LCA is an efficient method for assembling constructs between 500 and 10K base-pairs in length. LCR assembly uses shorter single-stranded bridging oligos that are complementary to the termini of adjacent DNA fragments that are to be joined using a thermostable ligase. Like PCA, LCA utilizes multiple temperature cycling to denature, re-anneal and then ligate the fragments to assemble the larger DNA construct and can be performed in individual tubes or multiplexed using microtiter well plates.

Gibson Assembly is a method to assemble larger DNA constructs from shorter oligonucleotides or double-stranded DNA fragments. Gibson assembly is an efficient method for assembling constructs up to many 10s of kilobase-pairs in length. This method, which is isothermal, utilizes up to 15 double-stranded DNA fragments having ~20-40 base-pair overlaps with the adjacent DNA fragments. The DNA fragments are first incubated with 5’ to 3’ exonuclease resulting in single-stranded regions on the adjacent DNA fragments that can anneal in a base-pair specific manner. The gaps are then filled in with a DNA polymerase and the final nicked closed with a DNA ligase. This method can be performed in individual tubes or multiplexed using microtiter well plates.

Importantly, the final fidelity (error rate) of the assembled constructs above methods are at the mercy of the quality of the input oligonucleotides. The above methods usually incorporate some type of error reduction or correction methods. These include removing errored duplexes (mismatches and insertions) with the MutS protein after denaturation and reannealing of the construct, or degradation of the error containing DNA using T7 or CEL endonuclease. See the following for a review of these methods (Ma et al. 2012).

Predictive design of DNA sequences for improved assembly of longer DNA fragments.

 

Coupled design of DNA sequences to remove problematic elements, while maintaining genetic system function (2 years)  

Many genetic systems contain polymeric sequences, long repeats, and non-canonical DNA structures that inhibit the assembly process
Genetic systems can be rationally designed to eliminate problematic sequence elements, while maintaining their function, thus reducing their “synthesis complexity”.
Toolboxes of highly non-repetitive genetic parts can be designed and characterized to enable design of non-repetitive genetic systems.

Incorporate machine learning to identify poorly understood problematic sequences and process conditions (5 years)

: The complete list of sequence elements that inhibit DNA assembly is not fully known. The process conditions leading to undesired byproducts are not well-understood.
Machine learning algorithms have the ability to identify problematic DNA sequences and undesired process conditions that lead to inefficient DNA assembly.

Design algorithms that identify optimal synthesis strategies for assembling megabase-length genetic systems (10 years)

The functions of some genetic system components are more strictly reliant on problematic sequences. Trade-offs between design for function versus design for synthesis are likely.
Design algorithms can identify regions with problematic sequences, and identify optimal strategies for mixing and matching megabase-length assembly strategies accounting for these regions.

Design algorithms for optimal one-pot assembly of billions of unique genomic/chromosomal variants with defined sequences (20 years)

Mixtures of oligonucleotides can be used to construct combinatorial libraries of DNA fragments, though assembling those fragment libraries into diversified mega-base genetic systems has not been achieved.
Parallel evaluation of sequence design criteria across billions (trillions) of potential sequence variants can be carried out. As the diversification of libraries increase, the number of sequence variants increases combinatorially.

Methods for one-step, simultaneous assembly and sequence-verification of long DNA fragments

Reliable assembly of 10,000 base pair non-clonal DNA fragments (2 years)

The availability of high-fidelity long oligonucleotides, the optimization of process conditions, and the presence of problematic sequences.
Higher fidelity 100-mers and 200-mers [See Goal 1].
Identification of optimal process conditions and removal (by design) of problematic sequence elements [See Goal 2].

Reliable assembly and verification of 10,000 base pair clonal DNA fragments (5 years)

Low assembly yields and decoupled sequencing leads to more costly hierarchical processes with higher failure rates.
Enzyme-based selection (e.g. MutS) can eliminate DNA fragments containing errors.
Approaches using simultaneous DNA synthesis and sequencing can rapidly sort DNA fragments, excluding ones with errors (e.g. using nanopore-based sequencing and dynamic pore flicking).

Reliable assembly and verification of 100,000 base pair clonal DNA fragments (10 years)

Reliable, low-cost assembly of clonal 10,000 base pair fragments.
Higher efficiency 10-part assemblies using lower-cost, clonal 10,000 base pair DNA fragments.
Extra long read sequencing for verification of 100,000 base pair fragments (e.g. nanopore sequencing).

Reliable assembly and verification of 1,000,000 to 10,000,000 base pair clonal DNA fragments (20 years)

Reliable, low-cost assembly of clonal 100,000 base pair fragments.
in vivo yeast-mediated assembly of clonal 100,000 base pair fragments into megabase-length genetic systems.
Extra long read sequencing for verification of 1,000,000 base pair fragments (e.g. nanopore sequencing).

Reliable assembly and verification of 1,000,000 to 10,000,000 base pair clonal DNA fragments (20 years)

Pipelined synthesis, assembly, and functional testing of engineered genetic systems.

Achieving desired functionalities in lower fidelity, error-prone genetic systems (2 years)

Unpredictable relationship between synthesis & assembly errors versus undesired functional outcomes.
Elimination of problematic sequences via rational design. Incorporating robust, mutation-invariant design into genetic systems.
Routine application of low-cost omics technologies to verify the functions of genetic systems (e.g. DNA-Seq, RNA-Seq, Ribo-Seq, metabolomics).

Achieving reliable Design-for-Testing in engineered genetic systems (5 years)

Costly to assay diverse genetic functions to verify desired behaviors.
Synthesized & assembled genetic systems can directly incorporate a suite of sensors and genetic circuits for self-testing of genetic system function. Sensor-circuit outputs could be tailored for desired high-throughput assays, including surface display, Flow-Seq, and RNA-Seq.
Achieving readily swappable modules within large genetic systems (10 years)
Synthesis of megabase-length genetic systems may contain commonly used and re-used genetic modules.
Previously synthesized and assembled genetic modules (>100,000 base pair fragments) can be re-used in downstream processes. Models can be developed to predict inter-module interactions and overall system function.

Achieving 1 month Design-to-Test Cycles for megabase-length genetic systems (20 years)

Design algorithms, synthesis chemistries, assembly techniques, simultaneous sequencing, and functional testing must be seamlessly integrated within a commercially viable suite of services with fast turnaround times.
A combination of well-behaved horizontal service providers and well-integrated vertical service providers operating within a healthy commercial ecosystem.

Goal 3:

Precision gene editing at multiple sites simultaneously with no off-target effects.

TALEN or CRISPR-based genome engineering techniques that introduce site-specific nicks or double-stranded breaks, which are then repaired using natural repair pathways. Up to 6 distinct sites and up to 15000 identical sites have been targeted simultaneously with efficiencies ranging from 2% to 90%. Site-specific DNA-binding proteins (ZFs, TALEs, CRISPR) are fused to gene regulatory domains to carry out activation or repression of desired genes. Up to 6 distinct genes have been targeted for regulation with repression magnitudes up to 300-fold (knock-down) and activation magnitudes up to 20-fold (knock-up).

Goal 4:

Fast and cheap design, synthesis, and assembly of functional gene sequences.

There are commonly used predictive models that can design short genetic parts to control gene expression levels. These parts are then combined into larger genetic systems (operons, regulons) to create desired cellular functions, including sensors, genetic circuits, transporters, multi-enzyme metabolic pathways, organelle compartments, and orthogonal expression systems. There are many coupled interactions, between adjacent parts or between distant genetic modules, that alter system function in unpredictable and undesired ways. Therefore, new approaches are needed to correctly design large genetic systems, taking into account these poorly understood mechanisms, within an even larger genomic background.

Suggest a Job Post

EBRC invites the Engineering Biology community to post relevant job openings and career-development opportunities.

resume book

Submit to SPA CV/Resume Book

The EBRC Student and Postdoc Resume Book is a centralized database of resumes and CVs from students and postdocs in EBRC looking for industry positions. Information in the Resume Book is intended to help EBRC Institutional Members to better identify candidates for positions at their companies.

Diversity Database Program

Suggest a Woman in Biology

If you or someone you know should be included in our Diversity Database, please let us know. We are open to any nominations or self-nominations of women-identifying professionals in or related to engineering biology.

Submit Your Teaching Materials

Overseen by the Education Working Group, EBRC has created a collection point for synthetic biology teaching tools and materials, including syllabi, lecture slides, and exam material. EBRC members are encouraged to utilize the shared materials and to contribute their own. Contributed materials are available to all EBRC Individual and Institutional Members.

Suggest a News Item

Suggest an Event

EBRC wants to connect the synthetic biology community to your workshops, meetings, and other activities. Please add your event here to share with your colleagues.

Suggest an Education & Outreach Activity

EBRC has developed a lightly-curated list of education and outreach activities in synthetic biology. This list includes workshops and short-courses, classroom activities and hands-on training opportunities. The list is intended to serve as a resource for existing opportunities you may want to engage in, as well as ideas for activities you may want implement at your own institution.

Synthetic and Systems Biology Summer School

BioBuilder

iBiology

STEM Pathways

The Virtual Bioengineer

Research Experience for Teachers (RET)

iGEM

Freshman Research Initiative

Bioresponsive art

Cold Spring Harbor Synthetic Biology Summer Course

LEAP (Leadership Excellence Accelerator Program)

Three course synthetic biology sequence