Release date: 2016-05-18
Abstract: From the first generation of DNA sequencing technology (Sanger method) in 1977, the development of sequencing technology has been quite advanced for more than 30 years. From the first generation to the third generation and even the fourth generation, sequencing reads Length from long to short, then from short to long. Although the second generation of short-reading and long-sequencing technology still has an absolute advantage in the global sequencing market, the third and fourth-generation sequencing technologies have also developed rapidly during the past two years. Every change in sequencing technology has also played a huge role in genomic research, disease medical research, drug development, breeding and other fields. Here I mainly make a simple summary of the current sequencing technologies and their sequencing principles.
Figure 1: The development of sequencing technology
The rapid acquisition of genetic information in living organisms is of great significance for the study of life sciences. The above (Figure 1) describes the development of the entire sequencing technology since Watson and Crick established the DNA double helix in 1953.
First generation sequencing technology
The first generation of DNA sequencing technology was terminated by the chain termination method pioneered by Sanger and Coulson in 1975 or by Maxam and Gilbert from 1976 to 1977. Chemical method (chain degradation). And in 1977, Sanger measured the first genomic sequence, which is phage X174, with a total length of 5375 bases1. Since then, humans have acquired the ability to spy on the nature of genetic differences in life, and as a starting point into the era of genomics. Researchers have continued to improve on the Sanger's many years of practice. In 2001, the completed first human genome map was based on the improved Sanger method. The core principle of the Sanger method is that since the 2' and 3' of ddNTP do not contain hydroxyl groups, they cannot be synthesized during DNA synthesis. The formation of a phosphodiester bond can be used to interrupt the DNA synthesis reaction, and a certain proportion of ddNTPs with radioisotope labeling (divided into: ddATP, ddCTP, ddGTP and ddTTP) are added to the four DNA synthesis reaction systems through the gel. After electrophoresis and autoradiography, the DNA sequence of the molecule to be tested can be determined according to the position of the electrophoresis band (Fig. 2). This website produced a short video for the sanger sequencing method, which is vivid and vivid.
It is worth noting that in the period when the sequencing technology started to develop, in addition to the Sanger method, there are some other sequencing technologies, such as pyrosequencing and linkase. Among them, pyrosequencing is the sequencing method used in Roche's 454 technology 2–4, and ligase sequencing is the sequencing method used by ABI's SOLID technology. 2,4, but their common core tools are utilized. A dNTP in Sanger1 that interrupts the DNA synthesis reaction.
Figure 2: Sanger method sequencing principle
Second generation sequencing technology
In general, the main features of the first-generation sequencing technology are sequencing read lengths of up to 1000 bp and accuracy of up to 99.999%. However, the shortcomings of high sequencing cost and low throughput have seriously affected its true large-scale application. . Therefore, the first generation sequencing technology is not the most ideal sequencing method. After continuous technical development and improvement, the second generation sequencing technology marked with Roche's 454 technology, Illumina's Solexa, Hiseq technology and ABI's Solid technology was born. The second-generation sequencing technology greatly reduces the cost of sequencing, while greatly improving the sequencing speed and maintaining high accuracy. It takes 3 years to complete the sequencing of a human genome, and only 1 week using the second-generation sequencing technology. However, the sequence read length is much shorter than the first generation sequencing technology. Table 1 and Figure 3 provide a simple comparison of the characteristics of the first and second generation sequencing technologies and the cost of sequencing. 5 I will make a summary of the main principles and features of these three major second-generation sequencing technologies. brief introduction.
Figure 3. Changes in sequencing costs
1.Illumine
Illumina's Solexa and Hiseq should be said to be the world's largest second-generation sequencing machines, and the core principles of the two series are the same 2,4. These two series of machines use the method of sequencing while synthesizing, and its sequencing process is mainly divided into the following four steps, as shown in Figure 4.
(1) DNA test library construction
Ultrasound is used to break the DNA sample to be tested into small fragments. At present, apart from assembly and some other special requirements, it is mainly interrupted into sequence fragments of 200-500 bp long and added at both ends of these small fragments. A single-stranded DNA library was constructed using different linkers.
(2) Flowcell
Flowcell is a channel for adsorbing flow DNA fragments. When the library is built, the DNA in these libraries will randomly attach to the channel on the flowcell surface when passing through the flowcell. Each Flowcell has 8 channels, and each channel has a number of connectors attached to it. These connectors can be paired with the connectors added to the ends of the DNA fragments during the database construction process (this is why the flowcell can absorb the DNA after the library is built. ), and can support the amplification of DNA by bridge PCR on its surface.
(3) Bridge PCR amplification and denaturation
The bridge PCR uses a linker fixed on the surface of the Flowcell as a template for bridge amplification, as shown in Figure 4.a. After continuous amplification and denaturation cycles, each DNA fragment will eventually be bundled at its respective position, each bundle containing many copies of a single DNA template. The purpose of this process is to achieve bases. The signal intensity is amplified to achieve the signal requirements required for sequencing.
(4) Sequencing
The sequencing method uses a method of sequencing while synthesizing. DNA polymerase, linker primers, and 4 dNTPs with base-specific fluorescent labels were added to the reaction system (as in Sanger sequencing). The 3'-OH of these dNTPs is chemically protected so that only one dNTP can be added at a time. After the dNTPs are added to the synthetic strand, all unused free dNTPs and DNA polymerase will be eluted. Next, the buffer required for exciting the fluorescence is added, the fluorescent signal is excited by the laser, and the optical signal is recorded by the optical device. Finally, the optical signal is converted into the sequencing base by computer analysis. After the fluorescent signal recording is completed, a chemical reagent is added to quench the fluorescent signal and the dNTP 3'-OH protecting group is removed to enable the next round of sequencing reaction. Illumina's sequencing technology adds only one dNTP feature at a time to solve the problem of accurate measurement of homopolymer length. Its main source of sequencing error is base substitution. Currently, its sequencing error rate is 1%. Between -1.5%, the sequencing cycle is exemplified by human genome resequencing, and the 30x sequencing depth is about 1 week.
Figure 4. Illumina sequencing process
1.Roche 454
The Roche 454 Sequencing System is the first platform to commercialize second-generation sequencing technology. Its main sequencing principle is (Fig. 5 abc) 2:
(1) DNA library preparation
The file construction method of the 454 sequencing system is different from that of illumina. It uses a spray method to break the DNA to be tested into small fragments of 300-800 bp long, and adds different linkers at both ends of the fragment, or denatures the DNA to be tested. The hybridization primers were used for PCR amplification, and the vector was ligated to construct a single-stranded DNA library (Fig. 5a).
(2) Emulsion PCR (emulsion PCR is actually a unique process of water injection into oil)
454 Of course, the DNA amplification process is also very different from that of illumina, which binds and anneals and anneals these single-stranded DNA to water-oil coated magnetic beads of about 28 um in diameter.
The biggest feature of emulsion PCR is the ability to form a large number of independent reaction spaces for DNA amplification. The key technology is “water injection to oil†(oil-in-water). The basic process is to inject an aqueous solution containing all the reaction components of the PCR into the surface of the high-speed rotating mineral oil before the PCR reaction. The aqueous solution instantly forms numerous mineral oil-encapsulated coatings. Small drops of water. These small droplets form an independent PCR reaction space. Ideally, each droplet contains only one DNA template and one magnetic bead.
These magnetic beads coated with small water droplets contain a DNA sequence complementary to the linker, and thus these single-stranded DNA sequences can be specifically bound to the magnetic beads. At the same time, the incubation system contains the PCR reagent, so that each small fragment bound to the magnetic beads can be independently amplified by PCR, and the amplified product can still be bound to the magnetic beads. When the reaction is complete, the incubation system can be destroyed and the magnetic beads with DNA enriched. After amplification, each small fragment will be amplified approximately 1 million times to achieve the amount of DNA required for subsequent sequencing.
(3) Pyrosequencing
Prior to sequencing, the magnetic beads with DNA are first treated with a polymerase and single-stranded binding protein, and the magnetic beads are then placed on a PTP plate. The plate is specially made with a number of small holes of about 44 um in diameter, each of which can accommodate only one magnetic bead. In this way, the position of each magnetic bead is fixed to detect the subsequent sequencing reaction process.
The sequencing method uses pyrosequencing to insert a magnetic bead smaller than the diameter of the pores on the PTP plate into the small well to initiate the sequencing reaction. The sequencing reaction uses a single-stranded DNA amplified on a large number of magnetic beads as a template, and a dNTP is added to each reaction for synthesis reaction. If the dNTP can pair with the sequence to be tested, the pyrophosphate group will be released after synthesis. The released pyrophosphate group reacts with the ATP sulfuric acid chemical enzyme in the reaction system to form ATP. The generated ATP and luciferase co-oxidize to fluoresce the fluorescein molecule in the sequencing reaction, while being recorded by a CCD camera on the other side of the PTP plate, and finally subjected to optical signal processing by a computer to obtain a final sequencing result. Since each dNTP produces a different fluorescent color in the reaction, the sequence of the detected molecule can be judged based on the color of the fluorescent light. Upon completion of the reaction, the free dNTPs degrade ATP under the action of the diphosphatase, resulting in fluorescence quenching to allow the sequencing reaction to proceed to the next cycle. Because of the 454 sequencing technology, each sequencing reaction is carried out in separate wells on the PTP plate, thus greatly reducing mutual interference and sequencing bias. The biggest advantage of the 454 technology is that it can obtain a long sequencing read length. The current 454 technology has an average read length of up to 400 bp, and the 454 technology is different from the Illumina Solexa and Hiseq technologies. One of its main drawbacks is that it cannot accurately measure the same. The length of the polymer, such as when there is a similarity to PolyA in the sequence, the sequencing reaction will add more than one T at a time, and the number of T added can only be estimated by the fluorescence intensity, which may lead to inaccurate results. . It is for this reason that the 454 technology introduces sequencing errors for insertions and deletions during the sequencing process.
Figure 5. Roche 454 sequencing process
1.Solid technology
Solid sequencing technology was introduced by ABI in 2007 for commercial sequencing applications. It is based on the ligase method, which uses DNA ligase to sequence during ligation (Figure 6) 2,4. Its principle is:
Figure 6-a. Solid sequencing technology
(1) DNA library construction
The fragment was interrupted and a sequencing linker was added to both ends of the fragment, and the vector was ligated to construct a single-stranded DNA library.
(2) Emulsion PCR
Solid's PCR process is similar to the 454 method, which also uses droplet PCR, but these beads are much smaller than the 454 system, only 1 um. The 3' end of the amplified product is modified at the same time as amplification, which is prepared for the next sequencing process. The 3' modified beads will be deposited on a slide. During the loading of the beads, the deposition chamber divides each slide into 1, 4 or 8 sequencing areas (Fig. 6-a). The biggest advantage of the Solid system is that each slide can hold a higher density of beads than the 454, making it easier to achieve higher throughput in the same system.
(3) ligase sequencing
This step is unique to Solid sequencing. Instead of using the DNA polymerase commonly used in sequencing, it uses a ligase. The substrate for the Solid ligation reaction is a mixture of 8-base single-stranded fluorescent probes, which is simply referred to herein as: 3'-XXnnnzzz-5'. In the ligation reaction, these probes are paired with a single-stranded DNA template strand according to the rules of base complementation. The 5' end of the probe was labeled with fluorescent dyes of four colors, CY5, Texas Red, CY3, and 6-FAM, respectively (Fig. 6-a). In this 8-base single-stranded fluorescent probe, the bases at bases 1 and 2 (XX) are determined, and different fluorescence is added to the 6-8 position (zzz) depending on the species. mark. This is Solid's unique sequencing method, where two bases define a fluorescent signal, which is equivalent to determining two bases at a time. This sequencing method is also referred to as two-base sequencing. When the fluorescent probe is capable of pairing with the DNA template strand, a fluorescent signal representing the first and second bases is emitted, and the colorimetric version in Fig. 6-a and Fig. 6-b indicates the first The relationship between the different combinations of the two bases and the fluorescent color. After recording the fluorescent signal, the cleavage between the 5th and 6th bases is chemically performed, so that the fluorescent signal can be removed for sequencing at the next position. However, it is worth noting that with this sequencing method, the position of each sequencing is 5 bits apart. That is, the first time is the first and second places, and the second time is the sixth and seventh places... After the end is detected, the newly synthesized chain is denatured and eluted. A second round of sequencing was then performed with primer n-1. The difference between primer n-1 and primer n is that they differ by one base at the position paired with the linker (Fig. 6-a. 8). That is, by shifting the sequencing position to the 3' end by the primer n-1 on the basis of the primer n, the 0th, 1st, and 5th, 6th positions can be determined... The second round of sequencing is completed. By analogy, until the fifth round of sequencing, base sequencing at all positions can be completed, and the bases at each position are detected twice. The read length of this technique is 2 × 50 bp, and the subsequent sequence splicing is also more complicated. Due to double detection, the original sequencing accuracy of this technology is as high as 99.94%, and the accuracy of 15x coverage is 99.999%. It should be said that the accuracy of the second generation sequencing technology is the highest. However, in the fluorescence decoding stage, since it is a double base to determine a fluorescent signal, a chained decoding error is likely to occur in the event of an error.
Figure 6-b. Solid sequencing technology
Third generation sequencing technology
Sequencing technology has reached new milestones in the last two or three years. PacBio's SMRT and Oxford Nanopore Technologies nanopore single-molecule sequencing technology is called third-generation sequencing technology. Compared with the previous two generations, their biggest feature is single-molecule sequencing, and the sequencing process does not require PCR amplification.
Among them, PacBio SMRT technology also applies the idea of ​​sequencing while synthesizing, and uses SMRT chip as the sequencing carrier. The basic principle is: DNA polymerase and template are combined, and 4 colors are fluorescently labeled with 4 bases (ie, dNTP). In the base pairing stage, different bases are added, and different light is emitted, which can be judged according to the wavelength and peak value of light. The type of base entered. At the same time, this DNA polymerase is one of the keys to achieving ultra-long read length. The read length is mainly related to the activity of the enzyme, which is mainly affected by the damage caused by the laser. One of the keys to PacBio SMRT technology is how to distinguish the reaction signal from the strong fluorescent background of the surrounding free bases. They use the ZMW (Zero Mode Waveguide Hole) principle: as many dense holes are visible on the walls of a microwave oven. The diameter of the small hole is elegant. If the diameter is larger than the wavelength of the microwave, the energy will leak through the panel under the action of the diffraction effect, thereby interfering with the surrounding small holes. If the aperture is smaller than the wavelength, the energy will not radiate to the surroundings, but will remain in a straight state (the principle of light diffraction), thereby providing protection. Similarly, in a reaction tube (SMRTCell: single-molecule real-time reaction well), there are many such circular nano-pores, namely ZMW (zero-mode waveguide hole), with an outer diameter of more than 100 nanometers, which is smaller than the detection laser wavelength (hundreds Nano), the laser can't penetrate the small hole and enter the upper solution area after being punched from the bottom. The energy is limited to a small range (volume 20X 10-21 L), just enough to cover the part to be detected, so that the signal comes only from this small In the reaction zone, excess free nucleotide monomer outside the well remains in the dark, thereby minimizing background. In addition, some base modifications can be detected by detecting the sequencing time between two adjacent bases. If the base is modified, the speed at the time of passing through the polymerase will be slowed down, and the distance between adjacent two peaks. Increase, you can use this to detect information such as methylation (Figure 7). The sequencing of SMRT technology is fast, with about 10 dNTPs per second. However, at the same time, its sequencing error rate is relatively high (this is almost a common problem of single-molecule sequencing technology), reaching 15%, but fortunately its error is random, and there is no sequencing error like the second-generation sequencing technology. The bias is so that effective error correction can be performed by multiple sequencing.
Figure 7. Principle of PacBio SMRT sequencing
The nano single-molecule sequencing technology developed by Oxford Nanopore Technologies is different from previous sequencing technologies. It is a sequencing technology based on electrical signals rather than optical signals. One of the keys to this technology is that they have designed a special nanopore that is covalently bound to a molecular linker. As DNA bases pass through the nanopore, they change the charge, which transiently affects the intensity of the current flowing through the nanopore (the magnitude of the current change affected by each base is different), which is detected by sensitive electronics. Thereby the bases passed were identified (Fig. 8).
The company launched the first commercial nanopore sequencer at the annual meeting of the Genomics and Biology Technology Progress (AGBT), which attracted great attention from the scientific community. Nanopore sequencing (and other third-generation sequencing technologies) is expected to solve the shortcomings of current sequencing platforms. The main features of nanopore sequencing are: long read lengths, about tens of kb or even 100 kb; error rate currently between 1% Up to 4%, and random errors, rather than gathering at both ends of the reading; data can be read in real time; high throughput (30x human genome is expected to be completed in one day); starting DNA is not destroyed during sequencing And sample preparation is simple and inexpensive. In theory, it can also directly sequence RNA.
Another feature of nanopore single-molecule sequencing calculations is the ability to directly read methylated cytosine without the need for bisulfite treatment of the genome as in traditional methods. This is of great help in directly studying epigenetic related phenomena at the genome level. And the sequencing accuracy of the modified method can reach 99.8%, and it can be easily corrected once the sequencing error is found. However, it seems that there is no relevant report on the application.
Figure 8. Nanopore sequencing
Other sequencing technologies
There is also a new generation of revolutionary sequencing technology based on semiconductor chips - Ion Torrent6. The technology uses a high-density semiconductor chip filled with small holes, and a small hole is a sequencing reaction cell. When the DNA polymerase polymerizes the nucleotides onto the extended DNA strand, a hydrogen ion is released, and the pH in the reaction cell changes. The ion receptors located under the pool sense the H+ ion signal, and the H+ ion signal is directly converted. The digital signal is thus read out of the DNA sequence (Fig. 9). The inventor of this technology is also one of the inventors of 454 sequencing technology - Jonathan Rothberg, whose library and sample preparation are very similar to the 454 technology, and even 454 is a replica, but not by detecting pyrophosphate during sequencing. Fluorescence color development, but by detecting changes in the H+ signal to obtain sequence base information. Compared to other sequencing technologies, Ion Torrent does not require expensive physical imaging equipment. Therefore, the cost is relatively low, the volume is relatively small, and the operation is simpler and faster. The time of the library production, the entire machine can be completed in 2-3.5 hours, but the throughput of the whole chip is not high, currently about 10G, but very suitable for sequencing of small genome and exon verification.
Figure 9. Ion Torrent
summary
The above briefly describes the principles of each generation of sequencing technology. The characteristics of these three generations of sequencing technologies are summarized in Tables 1 and 2 below. Among them, sequencing cost, read length and flux are three important indicators for evaluating the advanced technology of this sequencing technology. In addition to the differences in throughput and cost of the first and second generation sequencing technologies, the core principles of sequencing (except for Solid is edge-by-sequence sequencing) are based on the idea of ​​edge synthesis. The advantage of the second-generation sequencing technology is that the cost is greatly reduced compared with the previous generation, and the throughput is greatly improved, but the disadvantage is that the introduced PCR process will increase the error rate of sequencing to some extent, and has system bias and short read length. . The third-generation sequencing technology was developed to solve the shortcomings of the second generation. Its fundamental feature is single-molecule sequencing, which does not require any PCR process. This is to avoid system errors caused by PCR bias. At the same time, improve the read length, and maintain the high-throughput, low-cost advantages of the second-generation technology.
Table 1: Comparison of sequencing technologies
Table 2: Cost Sequencing Comparison of Mainstream Sequencing Machines
Figure 10 below shows the current distribution of global sequencers. The hot spots in the map are mainly distributed in Shenzhen (mainly Huada) in China, Southern Europe, Western Europe and the United States.
Source: Peptide time bound
Acrylic Foam Tape,Vhb Tape Waterproof,Adhesive Acrylic Foam Tape,Acrylic Foam Mounting Tape
Kunshan Jieyudeng Intelligent Technology Co., Ltd. , https://www.jerrytapes.com