The core idea of the second generation sequencing technology is synthetic sequencing, that is, the sequence of DNA is determined by capturing the newly synthesized terminal tag. The existing technology platforms mainly include Roche /454 FLX, Illumina/Solexa genome analyzer and solid-phase system of applied biological system.
The main feature of the first generation sequencing technology is that the reading length of sequencing can reach 1000bp, and the accuracy can reach 99.999%. However, the disadvantages of high sequencing cost and low throughput seriously affect its real large-scale application. So the first generation sequencing technology is not the best sequencing method.
After continuous technical development and improvement, the second generation sequencing technology marked by Roche's 454 technology, illumina's Solexa, Hiseq technology and ABI's Solid technology was born.
The second generation sequencing technology greatly reduces the sequencing cost, and at the same time greatly improves the sequencing speed and maintains high accuracy. It used to take three years to complete the sequencing of a human genome, but the second-generation sequencing technology only takes 1 week, but the sequence reading length is much shorter than that of the first-generation sequencing technology.
:
1) sequencing library construction.
First, the genome is prepared (although the sequencing company requires the sample size to reach 200ng, the sample size required by the Gnome analyzer system can be as low as 100ng, which can be used in many experiments with limited samples), and then the DNA is randomly fragmented into small fragments of several hundred bases or shorter, and special adapters are added at both ends. If it is transcriptome sequencing, the construction of the library is relatively troublesome. After RNA fragmentation, it needs to be reversed into cDNA, and then the linker is added, or the RNA is reversed into cDNA first, and then the fragment is added into the linker. The fragment size (insertion size) has an influence on the later data analysis, and can be selected as needed. For genome sequencing, several inserts with different sizes are usually selected to obtain more information in the assembly process.
2) Surface attachment and bridge amplification.
Solexa sequencing reaction is carried out in a glass tube called a flow cell, which is subdivided into eight lanes, and there are countless fixed single-stranded heads on the inner surface of each lane. The DNA fragment with linker obtained in the above step is denatured into single strand, and then combined with linker primers on sequencing channels to form a bridge structure for subsequent pre-amplification.
3) Pre-amplification (denaturation and complete amplification)
Unmarked dNTP and common Taq enzyme were added for solid-phase bridge PCR amplification, and the single-stranded bridge fragment to be detected was amplified into double-stranded bridge fragment. Through denaturation, the complementary single chain is released and anchored on the nearby solid surface. Through continuous circulation, millions of clusters of double-stranded fragments to be detected will be obtained on the solid surface of the flow cell.
4) Single base extension and sequencing.
Four fluorescent labeled dNTP, DNA polymerase and linker primers were added to the sequencing flow cell for amplification. When each sequencing cluster extends its complementary chain, each fluorescently labeled dNTP can release corresponding fluorescence. The sequencer captures the fluorescence signal and converts the light signal into a sequencing peak by computer software, thus obtaining the sequence information of the fragment to be detected. The process of obtaining the sequence information of the fragment to be detected from the fluorescence signal is called base call, which is used by Illumina's genome analyzer sequence control software and pipeline analysis software. Reading length will be affected by many factors that lead to signal attenuation, such as incomplete cutting of fluorescent markers. With the increase of reading length, the error rate will also increase.
5) data analysis
Strictly speaking, this step can't be counted as a part of the sequencing operation flow, but it is meaningful only through the preliminary work of this step. The original data obtained by sequencing is a sequence with a length of only a few tens of bases. These short sequences should be assembled into an overlapping group or even a whole genome by bioinformatics tools, or these sequences should be compared with existing genomes or genome sequences of similar species and further analyzed to obtain biologically meaningful results.
References:
Baidu Encyclopedia-Second Generation DNA Sequencing Technology