How to find the pathway according to the results of gene sequencing analysis

Animal and plant genome De? Novo sequencing analysis, also known as de novo sequencing analysis, refers to sequencing an animal and plant without any reference sequence information, obtaining the genome sequence map of a species by using the latest bioinformatics method, and carrying out a series of subsequent analysis such as genome structure annotation, functional annotation and comparative genomics analysis. The third generation sequencing technology (represented by PacBio and nanopore) has the characteristics of long reading, and has been used in animal and plant genomes from 20 15. The results of such sequencing analysis can be widely used in agriculture, forestry, fishery, animal husbandry, medicine and marine research.

Figure 1 Evaluation of reading length, accuracy and genome continuity of different sequencing technologies

Principle of third generation sequencing technology

PacBio sequencing principle

Using the method of sequencing while synthesizing, one of the DNA strands was used as a template to synthesize the other strand with DNA polymerase, and the fluorescence signal was further converted into a base signal. At the same time, PacBio upgraded the CCS sequencing mode and obtained high fidelity (HiFi) 15 kb reading with long reading length, thus improving the accuracy of genome assembly.

Fig. 2 sequencing principle of three generations PacBio

Principle of nanopore sequencing

When a single-stranded DNA molecule passes through the nanopore, each nucleotide will get a different current signal. The change of ion current in each hole is recorded and converted into a basic sequence based on Markov model or recurrent neural network. In addition, ULRs is another important feature of ONT platform, and it has the potential to promote large genome assembly.

Information analysis content

De? Novo research content

Multi-software splicing of genome splicing and evaluation of splicing results

Gene prediction and annotation coding gene prediction; Repeat sequence annotation and transposition component classification; Non-coding RNA annotation; Pseudogene annotation, etc.

Hi-C assisted genome assembly effective data evaluation; Overlapping clustering, sorting and direction analysis; Mounting result evaluation

Analysis of biological problems

Comparative genomics research

Gene family clustering;

Construction of phylogenetic tree;

Analysis of gene family expansion and contraction;

Calculation of species differentiation time;

Estimation of LTR formation time:

Whole genome replication event;

Selective pressure analysis

The analysis of specific biological problems combined with omics research methods deeply analyzes the biological problems of a species.

Cluster analysis of strawberry gene family

Analysis of Whole Genome Replication Events of Coix lachryma-jobi

Phylogenetic tree and gene family contraction and expansion analysis of pistachio.

Linear analysis of subgenome of upland cotton

Technical service process

Sample delivery

Database construction sorting

data analysis

Publish a report

After-sales problem solving

Product advantage

Founded in 2009, the company has been deeply involved in the field of genome sequencing 1 1 year, and has long been committed to becoming an accurate genome assembly expert;

It has the most mainstream three-generation sequencing platforms in the world (PacBio sequencing platform and nanopore sequencing platform), and has rich experience in dual-platform assembly and genome assembly of tens of thousands of species.

Hi-C chromatin conformation capture technology library has a high proportion of effective data, and the mounting efficiency is as high as 99%. With rich experience in polyploid species research, colleagues who obtained chromosome-level genome by combining the third generation genome assembly further improved the quality of genome assembly.

With the leading genome sequencing and analysis technology independently developed, we have obtained 23 invention patents and more than 150 core software copyrights.

Example of project experience

Case of cooperation clause

Case 1

Study on Important Agronomic Traits of 243 Diploid Cotton Varieties Based on Updated Asian Cotton A Genome

Re-sequencing of 243 diploid cotton materials? Identification of Genetic Basis of Key Agronomic Traits Based on Updated Genome

Journal: Natural Genetics

Impact factor: 27. 125

Published by: Cotton Research Institute of Chinese Academy of Agricultural Sciences, Beijing Bai mike biological Technology Co., Ltd., etc.

Publication year: 2065438+May 2008

Research background:

Cotton is a valuable resource for studying plant polyploidy. The ancestors of Asian cotton and herbaceous cotton are the donors of modern allotetraploid cotton subgenomes. In this study, three generations of PacBio and Hi-C techniques were used to reassemble the genome of high-quality Asian cotton, and the population structure and genome differentiation trend of 243 diploid cotton germplasm were analyzed, and some candidate gene loci which were helpful for genetic improvement of cotton lint yield were determined.

Research results:

1, the third generation genome assembly of Asian cotton;

Genome assembly of Asian cotton was combined with three generations of sequencing, Hi-C * * * obtained 142.54 Gb, and assembled 1.7 1 Gb Asian cotton genome, with overlapping group N50= 1. 1 Mb, the longest overlapping group. The assembly data of 1573 Mb were located on 13 chromosomes by Hi-C technology. Compared with the published genome, when the Hi-C data is compared with the updated genome, the off-diagonal inconsistency is obviously reduced (figure 1 a-b).

Figure 1 Comparison of HI-C data of two versions of Asian cotton genome

2. Genetic evolution analysis of diploid cotton population;

230 copies of Asian cotton and 13 copies of cotton grass were re-sequenced. Genome comparison, phylogenetic tree, population structure analysis, PCA, LD and selective clearance analysis showed that Asian cotton and cotton grass (a) differentiated simultaneously with Ramon cotton. Asian cotton originated in the south of China, and then spread to the Yangtze River and Yellow River basins. Most germplasm with domestication-related characteristics experienced geographical isolation (Figure 2).

Fig. 2 Population evolution and population structure analysis of diploid cotton.

3. Asian Cotton genome-wide association studies (GWAS):

Genome-wide association studies analysis was conducted on the important traits of 1 1 from different environments, and 98 significant related loci of important agronomic traits of Asian cotton 1 1 were identified. Synonymous substitution of GaKASIII (cysteine/arginine substitution) leads to fatty acid composition in cottonseed (C 16:0 and C 16). It was found that cotton Fusarium wilt resistance was related to the activation of GaGSTF9 gene expression. 158 lint and 57 lint-free cotton materials in Asian cotton germplasm were selected for GWAS correlation analysis, and information related to epidermal hair and fiber development was found (Figure 3).

Population evolution and structure analysis of diploid cotton.

Research conclusions:

The genome recombination of Asian cotton was completed by three generations of sequencing and Hi-C technology, and the genome assembly index was from 72? Kb increased to 1. 1 Mb, which laid the foundation for the following related research of Asian cotton population genetics. Through the correlation analysis of population genetic evolution, it was found that Asian cotton and grass cotton (type A) and Ramon cotton (type D) differentiated at the same time, which proved that Asian cotton originated in southern China and was introduced into the Yangtze River and Yellow River basins. By integrating GWAS and QTL analysis methods, the genes related to fatty acid content, disease resistance and lint growth and development of Asian cotton were located and their related functions were verified, which promoted the improvement of complex agronomic traits of Asian cotton.

Case 2,

Comparative genomic analysis of diploid, wild and cultivated tetraploid peanuts revealed the asymmetric evolution and improvement of subgenomes.

Comparison of peanuts? Diploid and cultivated tetraploid genomes? Revealing the evolution and improvement of peanut asymmetric subgenome

Journals: Advanced Science

Impact factor: 15.804

Issued by: Henan Agricultural University and Beijing Bai mike biological Technology Co., Ltd.

Publication year: 20 19 1 1.

Research background:

Peanut, as an important cash crop in China, is the basis of providing important protein and oil. The genus Arachis includes 30 diploid varieties, 1 allotetraploid wild peanut (peanut) and 1 cultivated peanut (peanut). As an important wild resource donor to improve the agronomic traits of cultivated peanuts, wild tetraploid peanuts have always been the research focus of scholars at home and abroad. The genome of the only wild allotetraploid peanut in Arachis was studied.