Problems and Prospects of Chip Sequencing

Seeing that the tweet of WeChat official account was deeply interpreted.

/s/7 bqcpr 1 bmjazhv 408-sv6Q

Chromatin immunoprecipitation sequencing (ChIP-seq) is a genome-wide analysis technique for DNA binding proteins, histone modifications or nucleosomes. With the reduction of sequencing cost, ChIP-seq has become an indispensable tool to study gene regulation and epigenetic mechanism. In this paper, we summarize the previous contents, and analyze the problems that ChIP-seq needs to pay attention to at this stage and how to make better use of this technology to obtain research results.

Although formaldehyde is a cross-linking agent with high permeability, its cross-linking efficiency is low because its reactivity is limited to amine. For mammalian cells, the maximum cross-linking efficiency is only 1%. Protein with a residence time of less than 5 seconds on DNA cannot be cross-linked by protein. In addition, formaldehyde will cause many other unrelated protein to cross-link to DNA, which will affect the subsequent analysis data. It has been reported that formaldehyde crosslinking will trigger DNA damage reaction mechanism, thus changing chromatin composition, and then making the chip results biased. Because the cross-linking reaction will be reversed under heating and low PH, the stability of DNA-protein cross-linking complex is also a problem worthy of attention.

According to the presence or absence of formaldehyde crosslinking step, the slices can be divided into two types. One is formaldehyde cross-linked X-ChIP (cross-linking and mechanical shear slicing). The other is a chip without cross-linking, that is, a native-chip; Compared with X chip, N chip has many advantages: (1) high resolution; (2) avoiding the enrichment of nonspecific protein on DNA caused by formaldehyde crosslinking; (3) avoiding formaldehyde cross-linking to resist the coverage of antigenic epitopes; (4) Reduced sample loss. Because mnase is used, N-chip is only suitable for studying histone modification, not transcription factors.

The commonly used lyase is MNase, that is, micrococcus nuclease, which can degrade the DNA sequence of nucleosome junction region; MNase digestion of chromatin can release independent nucleosomes. MNase enzymatic hydrolysis has some limitations: (1) tends to cut the A/T base site, which makes the expression of nucleosome A/T enrichment region lower than the real situation; (2)MNase can't cut accurately at the nucleosome boundary, which leads to the difference between the open position of chromosome and the real situation; (3)MNase tends to digest fragile nucleosomes; (4) (4) The DNA fragment obtained by MNASE is relatively short, which makes it difficult for PCR amplification and detection of subsequent samples.

Some studies believe that ultrasonic interruption is not as mild as enzyme digestion, and uneven interruption will lead to high background noise of sequencing results, which will affect subsequent data analysis. When the interruption mode is selected, (1) if the protein studied is highly expressed and closely bound to DNA, such as histone, then the sample does not need to be crosslinked, and enzymolysis can be used; (2) If the expression abundance of protein studied is low or it is not closely bound to DNA, such as transcription factor, it is best to fix the sample with cross-linking agent to stabilize the protein and DNA morphology. In this case, ultrasonic crushing is the best.

ChIP-seq data can be used to analyze different cell types, and the information of these cell types can be used to infer the dynamic information of genome or annotate the epigenetic map of cell types with some experimental data. More and more studies show that epigenetic information is highly correlated with gene expression and chromosome conformation, and can be used to predict gene expression and chromosome conformation. In this section, we briefly introduce the advanced application tools of histone modified chip sequence analysis.

Various methods based on machine learning have been developed to quantitatively infer the gene expression level through epigenetic information obtained from ChIP-seq experiments. For example, (1) linear regression model is applied to histone modification and enrichment of promoter sites to predict gene expression in CD4+T cells; They used 19 histone modification, which indicated that only three promoter sites were needed to simulate gene expression [1]. (2) Using nonlinear models (such as multiple adaptive regression line (MARS) and random forest), 1 1 histone modification and DNase I hypersensitivity in seven human cell lines were drawn [2]. These models only consider the epigenetic pattern of promoter sites, but not the information of enhancer sites. In contrast, DeepExpression[3] uses HiChIP data [4], a Qualcomm quantitative technique, to capture the central chromosome ring of protein to consider the enhancer and its interaction with the promoter. Other tools use convolutional neural networks (CNN) to predict gene expression [5] or differential gene regulation patterns [6].

A large number of studies have shown that single base polymorphism on enhancers can lead to genetic diseases and cancer [7], so a method is needed to define the state of enhancers in different cell lines. The experiment of chromatin conceptual capture (3C) extends some new technologies: Hi-C[8], HiChIP[4] and ChIA-PET[9], which can capture the spatial structure between enhancer and target gene. Hariprakash and Ferrari divided the methods of exploring the interaction between genes and enhancers into four categories: (1) estimating the interaction intensity of all enhancer-promoter pairs based on correlation; (2) Regression-based method assumes that multiple enhancers contribute to a single gene; (3) The method based on supervised learning and grading can integrate multiple ChIP-seq data sets and other information types. These tools focus on enhancer-promoter interaction, but there are many other types of chromatin interactions, such as enhancer-enhancer loop and weak chromatin aggregation caused by phase separation [10]. CITD[ 1 1] and Long [12] respectively used wavelet transform and potential energy function to comprehensively analyze three-dimensional genome organization from epigenetic data.

The deviation and batch effect in ChIP-seq data have great influence on the analysis. Because the machine learning method is sensitive to the noise in the training data, some ChIP-seq samples will be identified as medium quality or rejected as low quality data (resulting in data loss). If biological samples are precious (such as primary cells and clinical samples) and it is difficult to collect a large number of samples, the "data interpolation" method may be applied. These methods use epigenetic data from other closely related cell types for data denoising or reconstruction. "Data denoising" aims to improve the quality of existing ChIP-seq samples by identifying and eliminating noise in data. The software Coda[ 13] can encode the process of noise generation, and use convolutional neural network to recover the signal in ChIP-seq data. The purpose of "data reconstruction" is to generate missing chip sequence data from a large data set in a computer. Chromimpulse [14] is a new tool, which can use regression tree to infer the signal of each deletion experiment using the ten most relevant cell types. Software PREDICTD[ 15] and Avocado[ 16] use tensor decomposition to insert multiple ChIP-seq data simultaneously. These data interpolation methods are potential computational alternatives to actual ChIP-seq experiments, and may open the way for collecting epigenome data of all cell types and environmental conditions that are biologically impossible. Although this method is computationally challenging, people from various cell types can use high-quality data to encourage them to achieve this goal.

Recent studies show that many cell types (including normal immune cells) play an important auxiliary function in complex tissues and tumors. In order to clarify this cell heterogeneity and the fate track of cells in the development process, people have developed various single cell determination methods. ScChIP-seq can analyze the whole genome of histone modification and other chromatin binding proteins from low input samples with single cell resolution. Recently, many methods used for single cell labeling and chip sequence library preparation have been used for single cell labeling and chip sequence library preparation; These methods use microfluidic systems, Tn5 transposase labeling and chipless strategies.

The first scChIP-seq method, scDrop-ChIP [17], used microfluidic system to label cells, and combined with standard chip method, about 800 non-repetitive reading fragments were generated in each cell. The recently developed micro-droplet microfluidic method [18] provides higher resolution, and each cell produces about 10000 non-repetitive reading fragments. The limitation of these methods is that most laboratories usually cannot use special microfluidic devices.

Tag-based library preparation using Tn5 transposase has been widely used in various NGS analysis, including ChIP-seq. Sc-itChIP-seq [19] Before the classic chip experiment, single cells were labeled by labeling technology to prepare a library. This method produces 9000 non-repetitive reading segments per unit. Because the experimental process is similar to the standard ChIP-seq method, this method is easier to use than scDrop-ChIP.

ScChIP-seq developed several chipless methods: single cell chromatin immunolysis sequencing (scChIC-seq)[20] and single cell ulicut & run [21]; +0]; They are based on CUT&RUN method [22], and the fusion protein of MNase and protein A is used to detect the cleavage target site of specific antibodies. These methods produce about 4 100 non-repetitive reading fragments per cell, and then strict experimental steps are needed to prepare the library. The disadvantage is that the reading rate is relatively low (~ 6%). In addition, three similar methods have been developed: CUT&Tag [23]]], ACT-seq [24] and CoBATCH [25], which use Tn5 transposase and protein A as fusion proteins. In the process of library preparation, after the target protein binds to the chromosome, the fusion protein captures the first antibody, and then activates the binding site of Tn5 transposase marker protein. The advantage of these methods is that protein binding site detection and library preparation can be carried out at the same time, thus greatly reducing the experimental steps and time. In addition, these methods are less affected by errors caused by immunoprecipitation steps. In addition, these methods show a comparison rate of about 97%, and each cell produces about 12000 non-repetitive reading fragments. Therefore, this chipless method has the potential of Qualcomm and high-quality scChIP-seq analysis. Finally, chromatin integration labeling and sequencing (ChIL-seq)[26] is another chipless method, which is based on immunostaining instead of chip. The method uses a second antibody probe coupled with dsDNA, which contains T7 RNA polymerase promoter, NGS linker sequence and Tn5 binding sequence. After capturing the first antibody, the probe DNA sequence was integrated to the target binding site by Tn5 transposase. Then the integrated region was amplified by transcription, and RNA purification and library preparation were carried out. This method can be used for single cell analysis, but it may need several optimizations to achieve high-throughput sequencing. Other scChIP-seq methods will be developed in the future, such as simultaneous detection of multiple histone modifications and other chromatin binding proteins. These studies will be able to capture the gene regulatory factors on each cell chromosome and know their interactions.

[1]R. Karlic, H.R. Chung, J. Lasserre, K. Vlahovicek, M. Vingron, histone modification level can predict gene expression, Proceedings of the National Academy of Sciences USA U SA107 (7) (2010).

[2]X. Dong, M.C. Greven, A. Kundaje, S. Djebali, J.B. Brown, C. Cheng, T.R.Gingeras, M. Gerstein, R. Guigo, E. Birney, Z. Weng, in various cellular environments.

[3]W. Zeng, Y. Wang, R. Jiang, Predicting gene expression by integrating far-end and near-end information through densely connected convolutional neural networks, Bioinformatics 36(2) (2020) 496-503.

[4]M.R. Mumbach, A.J. Rubin, R.A. Flynn, C. Dai, P.A. Khavari, W.J. Greenleaf, H.Y. Chang, HiChIP: Effective and sensitive analysis of protein-oriented gene expression profile, Nat method/kloc-0.

[5]R. Singh, J. Lanchantin, G. Robins, Y. Qi, DeepChrome: Deep Learning of Predicting Gene Expression from Histone Modification, Bioinformatics 32 (17) (2016) I639-I648.

[6]A. Sekhon, R. Singh, Y. Qi, DeepDiff: Deep Learning for Predicting Differential Gene Expression from Histone Modification, Bioinformatics 34 (17) (2018) I 891-I 900.

[7]H. Chen, C. Li, X. Peng, Z. Zhou, J.N. Weinstein, N. Study on cancer genome map, H. Liang, Pan-cancer analysis of enhancer expression in nearly 9000 patient samples, Cell173 (2) (20/kloc.

[8]E. Lieberman-Aiden, N.L. van Berkum, L. Comprehensive Atlas of Long-range Interaction Revealing the Folding Principle of Human Genome, Science 326(5950)(2009) 289-93.

[9]M.J. Fullwood, M.H. Liu, Y.F. E.T. Liu, C.L. Wei, E. Cheung, Y.Ruan, an estrogen receptor-α-binding human chromatin interaction group, Nature 462 (7269) (2009).

[10] B.R. Sabari, A.Dahl Agnese, A.Boija, I.A. Klein, E.L. Coffey, K. Shrinivas, B.J. Abraham, N.M. Hannett, A.V. Samudio.

[1 1] Chen, Wang, Xuan, Chen, Zhang, reinterpreting three-dimensional chromatin interaction and topological domain through wavelet transform of epigenetic map, nucleic acid research 44 (11) (2016) e/kloc-0.

[12] qi, Zhang, prediction of three-dimensional genome structure by chromatin state, PLOS computobiol15 (6) (2019) e1007024.

[13]P.W. Koh, E. Pierson, A. Kundaje, denoising whole genome histone chip-sequence and transformation neural network, bioinformatics 33 (14) (2017) i225-i233.

[14]J. Ernst, M. Kellis, Large-scale Interpolation of Epigenomic Datasets for Annotation of Different Human Tissues, Nature Biotechnology 33(4) (20 15)364-76.

[15]T.J. Durham, M.W. Libbrecht, J.J. Howbert, J. Bilmes, W.S. Noble, Interpolation of Predictive Parallel Epigenomics Data Using Tensor Decomposition Based on Cloud, NAT Commun9 (1) (20/)

[16]J. Schreiber, T. Durham, J. Bilmes, W.S. Noble, Learning the Potential Representation of Human Epigenome by Multiscale Depth Tensor Decomposition, bioRxiv(20 19).

[17]A. Rotem, O. Ram, N. Shoresh, R.A. Sperling, A. Goren, D.A. Weitz, B.E.Bernstein, Single Cell Chip -seq Reveals Cell Subsets Defined by Chromatin State, NAT Biotechnology 33 (/kloc-)

[ 18]K. Grosselin,A. Durand,J. Marsolier,A. Poitou,E. Marangoni,F. Nemati,A.Dahmani,S. Lameiras,F. Reyal,O. Frenoy,Y. Pousse,M. Reichen, A. Woolfe, C.Brenan, A.D. Griffiths, C. Vallot, A. Gerard, Qualcomm Single Cell Chip -seq Identification of Heterogeneity of Chromatin Status in Breast Cancer, NatGenet 560.

[19]S. Ai, H. Xiong, C.C. Li, Y. Luo, Q. Shi, Y. Liu, X. Yu, C. Li, A. He, Analysis of Chromatin State by Single Cell itChIP-seq, NAT Cell Biol21.

[20] W. L., K. Nakamura, W. Gao, K. Cui, G. Hu, Q. Tang, B. Ni, K. Zhao, Single cell chromatin immunocleavage sequencing (scChIC-seq) to describe histone modification, Nat method16 (4).

[2 1]S.J. Hainer, A. Boskovic, K.N. McCannell, O.J. Rando, T.G. Fazzio, Analysis of Reproductive Factors in Single Cell and Early Embryo, Cell177 (5) (20/kloc.

[22]P.J. Skene, S. Henikoff, Efficient Targeted Nuclease Strategy for High Resolution Mapping of DNA Binding Sites, Elife 6 (20 17).

H.S. Kaya-Oku, S.J. Wu, C.A. Kodomo, E.S. Pleker, T.D. Bridson, J.G. Henikoff, K.Ahmed, S. Henikoff, CUT& tag for efficient epigenome analysis of small samples and single cells, NAT Commun.

[24]B. Carter, W. L., J.Y. Kang, G. Hu, J. Perrie, Q. Tang, K. Zhao, mapping used antibody-guided chromatin labeling to modify histone in low cell count and single cells (ACT-seq), NAT commune10 (/.

[25] Wang, Xiong, Sai, Yu, Liu, Zhang, He, Epigenomic analysis of high-yield single cells, Molecular Cell 76 (1) (2019) 206-216e7.

[26] A. Harada, K. Maehara, T. Handa, Y. Arimura, J. Nogami, Y. Hayashi-Takanaka, K. Shirahige, H.Kurumizaka, H. Kimura, Y. Ohkawa, a chromatin integration labeling method can be realized with low input.