High-throughput sequencing technology is a revolutionary change to traditional sequencing, which can sequence hundreds of thousands to millions of DNA molecules at a time, so it is called the next generation sequencing in some literatures, which shows its epoch-making change. At the same time, high-throughput sequencing makes it possible to analyze the transcriptome and genome of a species in detail, so it is also called deep sequencing.
Development history
First generation sequencing:1mid-1980s
Traditional chemical degradation, dideoxy chain termination method and sequencing technology based on them are collectively called the first generation sequencing. It has played an important role in molecular biology research, such as the Human Genome Project.
Second generation sequencing: started in 2005.
It mainly includes Roche 454 sequencing technology, Illumina Solexa sequencing technology and Life Technologies ion torrent sequencing technology. The most remarkable feature of the second generation sequencing technology is Qualcomm quantity, which can sequence hundreds of thousands to millions of DNA molecules at a time.
Third Generation Sequencing: Started in 2008
The third generation DNA sequencing technology is characterized by single molecule sequencing, such as Helico BioScience's single molecule sequencer, Pacific Bioscience's single molecule real-time DNA sequencing technology, Oxford Nano-pore single molecule sequencing technology and so on.
At present, high-throughput sequencing often refers to sequencing with the second generation sequencing technology.
The third generation sequencing technology (single molecule detection) has the advantages of long reading length, high error rate and high cost.
Comparison of NGS test platforms
Explanation of common nouns
Reads: The sequence tags generated by high-throughput sequencing platform are called Reads.
Reference: Reference genome sequence
Sequencing depth: the ratio of the total number of bases obtained by sequencing to the size of the genome to be measured.
Coverage: the proportion of sequences obtained by sequencing in the whole genome.
SNP: single nucleotide site variation
Polymorphism caused by single nucleotide variation (substitution, insertion or deletion) in the same position of genomic DNA sequence between individuals. The single nucleotide in the same position in the genomic DNA sequence of different species and individuals is different. There may be 1000 nucleotides in the human genome, some of which may be related to diseases, but most of them may not be related to diseases. SNP is an important basis for studying the genetic variation of human families and animal and plant strains.
SNV:
When studying tumor genome, compared with normal tissues, the specific single nucleotide mutation in tumor is a somatic mutation, which is called SNV(single nucleotide variants).
Inder:
Insertion or deletion of small fragments of genome (< 50bp).
CNV (copy number variation, CNV): copy data variation.
As an important part of structural variation (SV), it is caused by genome rearrangement, which generally refers to the gain or loss of the copy number of large genome fragments with the length greater than 1 kb. As shown in the figure, A is loss and B is gain.
SV (structural variation): structural variation
Refers to the variation of large segments on chromosomes. It mainly includes the insertion (as shown in Figure A) and deletion (as shown in Figure B) of large chromosome fragments, the inversion and transversion of a certain region inside the chromosome (as shown in Figures D and E), and the recombination between two chromosomes (as shown in Figure F). Although the number of SV is much lower than that of SNVs and indel, SV affects more bases. Literature shows that as many as 13% bases are affected by SV changes. SV is highly correlated with disease risk and phenotypic variation.