Currently, phosphoramidite-based chemistry is the predominant approach for synthesizing oligonucleotides. Even after significant optimization, per-cycle synthesis yields are about 99.5%; synthesis of a 200-nucleotide oligonucleotide has a yield of only 35%. New technologies seek to improve this process by (a) synthesizing thousands of oligonucleotides in parallel, using either on-chip supports or within tiny microtiter wells; or (b) improving synthesis processivity by replacing the phosphoramidite-based chemistry, for example, using enzyme catalysis (e.g. terminal deoxynucleotidyl transferases) to extend primers with defined nucleotides. There are a number start-up companies developing template-free oligo synthesis methods including, DNA Script, Nuclera Nucleics, Molecular Assemblies and Ansa. Clearly, achieving picomole production of 1000mer oligonucleotides with error-free sequences would significantly improve the overall DNA assembly protocol.
Currently, multiple 60 to 200mer oligonucleotides are assembled into non-clonal DNA fragments using a combination of annealing, ligation, and/or polymerase chain reaction. The cost of synthesizing non-clonal DNA fragments is $0.10 to $0.30 per base pair, depending on size and complexity. DNA fragments between 300 to 1800 bp can be synthesized by multiple providers. DNA fragments up to 5800 bp can be synthesized by select providers at increased cost. Errors are introduced whenever two oligonucleotides form undesired base pairings, when two oligonucleotides are incorrectly ligated together, or when DNA polymerases extend a synthesized DNA fragment with an incorrect nucleotide. Certain sequence determinants will increase the error rate, resulting in a mixture of undesired fragments. Computational sequence design can reduce the frequency of these errors. Mismatch repair enzymes may be added (with added cost) to eliminate DNA fragments with mis-paired nucleotides, for example, as a result of mis-annealing or DNA polymerase errors. This process has been scaled up to assemble thousands of non-clonal DNA fragments per day. The purification of full-length, error-free DNA fragments remains a challenge. Utilizing longer oligonucleotides (see Oligonucleotide synthesis technologies) would enable the synthesis of longer non-clonal DNA fragments with the same error rate. New technologies utilizing nanopore sequencing have the potential to couple sequencing and purification at single-molecule resolution.
Currently, multiple DNA fragments (300 to 3000 bp long) are assembled into large genetic systems (10,000 to 1,000,000 base-pairs long) using single-pot DNA assembly techniques that combine cocktails of bioprospected and/or engineered enzymes, including exonucleases, endonucleases, DNA polymerases, ligases, and/or recombinases. Enzyme costs are currently about $25 per assembly. Assembled DNA is then introduced into cells for clonal separation and replication. Most assembly techniques have essential sequence determinants, for example, regions of overlapping homology or flanking Type IIS restriction sites. Errors are introduced when two fragments anneal together at incorrect overlap regions, when two fragments are mis-ligated at incorrect ligation junctions, or when DNA polymerases incorporate incorrect nucleotides during DNA synthesis. Computational sequence design can limit the frequency of errors. A major challenge for DNA assembly is the trial-and-error identification of a full-length, error-free genetic system. For example, an optimized assembly technique with a per-junction efficiency of 90% will assemble a 10-part (3000 bp/part) system with 35% yield. At the same per-junction efficiency, assembling a 1,000,000 bp genome from 3000 bp DNA fragments will have a miniscule yield of 5.2×10-14 %. This limitation to DNA assembly has motivated the synthesis of longer non-clonal DNA fragments (see Technologies for oligonucleotide assembly into non-clonal DNA fragments). For example, 1,000,000 bp genomes could be assembled from 10,000 bp, 30,000 bp, or 50,000 bp DNA fragments with a 0.002%, 2.7%, or 11% efficiency, respectively. If longer non-clonal DNA fragments are unavailable, then hierarchical approaches to DNA assembly are required, which increases the number of DNA assembly reactions and verification costs.
Sequencing costs become significant once assembled genetic systems are large and/or assembly yields are exceedingly small. For example, after assembling a 30,000 bp genetic system with a 35% yield, it is necessary to sequence at least 7 clonal isolates to achieve at least 95% chance of identifying a fully correct one. At low throughput, this cost is about $1000 (using Sanger sequencing). Using next generation sequencing, this cost can be greatly reduced to about $0.70, but only when a large amount of DNA (2 billion base pairs) is sequenced at the same time. Similarly, if a 1,000,000 bp genome is assembled from 30,000 bp fragments with a 2.7% yield, then it would be necessary to sequence 100 clonal isolates to achieve a 93% chance of identifying a fully correct one (about $275 in sequencing costs). Finally, hierarchical DNA assembly can be performed by first assembling and purifying smaller genetic systems (e.g. 30,000 bp) and then using them to perform a multi-fragment assembly to build larger genetic systems (33 x 30,000 bp). Hierarchical DNA assembly increases sequencing costs by a multiplier roughly equal to the number of hierarchical cycles. Overall, DNA assembly costs are greatly reduced by utilizing longer non-clonal DNA fragments and by parallelizing operations such that at least 2 billion base pairs of DNA are verified across multiple DNA assembly reactions.