Detailed explanation of NGS joint

Sequencing is the sequencing of mixed samples. How to separate these different samples? People thought of a method of adding "linker code" to the linker when constructing the library, and separated the samples according to the "linker code" after sequencing. The joint code here is the sample label "Index/Barcode".

The nature of the linker is a short base sequence, which basically includes three parts: the segment P5/P7 which is the same as or complementary to the oligonucleotide on the flow-cell; Sequencing primer binding part R1/R2; Index/Barcode used to distinguish different samples. The linker is a bridge between the DNA fragment to be detected and the Flow-cell, and the target fragment can be amplified and sequenced on the flow cell after being connected with the linker.

there are two main methods to classify connectors, one is based on the position of Index, and the other is based on whether it matches PCR free or not.

(1) according to the Index position, the connectors can be divided into single-ended Index connectors and double-ended Index connectors. Single-ended Index connector means that Index exists only at P5 or P7 (generally at P7), and double-ended Index connector means that Index exists at both P5 and P7. (as shown in figure 2). The number of indexes directly affects the number of samples that can be mixed in the end. Double-ended indexes can hold more samples than single-ended indexes. In recent years, in order to meet the demand of measuring more samples at a time, double-ended connectors with indexes have been widely used.

figure 2: connectors are divided into single-ended Index connectors and double-ended Index connectors according to the Index position, and the schematic diagram after the two connectors are connected

(2) connectors can be divided into long connectors and short connectors according to whether the connectors match the PCR free database (see figure 3). The long linker, also known as the complete linker, includes P5/P7+Index sequence +Read 1/2. After the complete linker is connected to the DNA fragment by TA cloning, it can be directly sequenced on the computer without PCR amplification (when the DNA amount is enough, it can be directly sequenced on the computer, and when the DNA amount is not enough, PCR amplification is needed to make the product reach a certain amount before sequencing on the computer). After the short linker is connected to the DNA fragment by TA cloning, the primer complementary to the short linker must be amplified by PCR, and the amplified product is the DNA fragment containing the complete linker (see Figure 4). That is to say, the short linker must be amplified into a complete linker by PCR before it can be sequenced on the computer.

what is the mystery of index as an important part of the connector? Simply put, Index is the "ID card" of different samples in the mixed sample, which is itself a base sequence, generally 6nt or 8nt long. Through the identification of this "ID card", the data of a single sample can be identified in a mixed sample. Then the question is, there are so many permutations and combinations of four bases randomly. Can these be used as Index? What is the basis for choosing the Index sequence?

the selection of Index should meet two principles: base balance and laser balance

a) base balance refers to the complexity and balance of index sequence: complexity refers to the diversity of base types; Balance refers to the balance of the distribution ratio between bases. It should be noted that the base balance refers to the balance between multiple Indexes, not the base balance within a single index. The best Index sequence should all contain four bases, A, T, C and G, and the ratio between the bases is close to 25%, as shown in Figure 5.

b) laser balance (must be considered): it means that each base position A+C =G+T should be satisfied in a set of Index sequences. In Illumina sequencer, two bases * * * A and C are excited by a red laser with a wavelength of 66nm; G and T*** are excited by a green laser with a wavelength of 532 nm. It should be noted that laser balance is a helpless move in the case of base imbalance, which can improve the quality of base recognition in index sequencing to a certain extent and reduce the possibility of problems in data separation, as shown in Figure 6. If the number of samples is odd, it is inevitable that the base balance and laser balance cannot be met. At this time, you can choose two Indexes whose columns are completely complementary, plus one other index, which can ensure the sequencing quality to the greatest extent.

Transfer from: A comprehensive analysis of the mystery of NGS connectors-business trends-information-biological online (bioon.com.cn)