Molecular biology 3

Transcription level regulation of eukaryotic gene expression

There are the following differences between eukaryotic cells and prokaryotic cells in gene transcription, translation and spatial structure of DNA:

1. In eukaryotic cells, a mature mRNA chain can only be translated into a polypeptide chain, and the common form of multigene operon in prokaryotes is rare in eukaryotic cells.

2, the histone DNA of eukaryotic cells is combined with a large number of non-histone DNA, and only a small part of DNA is naked.

3, most of the DNA in higher eukaryotic cells is not transcribed, and some eukaryotic cells have DNA sequences composed of several or dozens of bases, which are repeated hundreds or even millions of times in the whole genome. In addition, there are untranslated introns in most eukaryotic cells.

4. Eukaryotes can orderly rearrange DNA fragments according to the needs of growth and development stages, and can also increase the copy number of some genes in cells when necessary, which is extremely rare in prokaryotes.

5. In prokaryotes, the regulatory regions of transcription are very small, and most of them are located not far from the upstream of the transcription initiation site. The binding of regulatory proteins to the regulatory sites can directly promote or inhibit the binding of RNA polymerase to it. In eukaryotes, the regulatory regions of gene transcription are much larger, and they may be hundreds or even thousands of base pairs away from the core promoter. Although these regulatory regions can also bind to protein, they do not directly affect the acceptability of the promoter region to RNA polymerase, but affect its binding ability to RNA polymerase by changing the DNA configuration of the 5' upstream region of the whole controlled gene.

6. Eukaryotic RNA is synthesized in the nucleus, and can only be translated into protein after transposing through the nuclear membrane and reaching the cytoplasm matrix. There is no such strict spatial interval in prokaryotes.

7. Many eukaryotic genes can be successfully translated into protein only after complicated maturation and splicing.

the basic elements of gene transcription regulation include cis-acting region, trans-acting factor and RNA polymerase.

cis-acting elements refer to promoters and regulatory sequences of genes. It mainly includes promoters, enhancer, silencer and so on. Trans-acting factors refer to protein or RNA that can bind to cis-acting elements to regulate gene expression. There is only one RNA polymerase in prokaryotes and three RNA polymerases in eukaryotes, which catalyze the transcription of different RNA products. Among the three kinds of RNA polymerases in eukaryotes, only RNA polymerases ⅱ can transcribe messenger RNA precursors and translate them into protein products according to the principle of triplet code after processing. Here, we mainly discuss the gene transcription and its regulation process of RNA polymerases ⅱ.

Eukaryotic gene promoter is composed of a core promoter and an upstream promoter. It is a group of DNA sequences with independent functions within 1-2 bp upstream of the transcription start site (+1) and its 5', and each element is 7-2 bp long, which is the key element to determine the transcription start site and transcription frequency of RNA polymerase II.

1, the core promoter refers to the minimum DNA sequence necessary to ensure the normal transcription start of RNA polymerase II, including the transcription start site and the TATA region at -25~-3bp upstream of the transcription start site. When the core promoter acts alone, it can only determine the transcription start site and produce basic transcription.

2, upstream promoter element (UPE)? Including CAAT region (CCAAT) and GC region (GGGCGG), which are usually located near -7 bp, it can regulate the frequency of transcription initiation and improve transcription efficiency through TF ⅱ d complex.

includes all DNA sequences from the transcription initiation site to the transcription termination site of RNA polymerase ii.

RNA polymerase ⅱ is a kind of regulatory protein that can directly or indirectly bind to TATA region of promoter core sequence and start transcription. RNA polymerase Ⅱ forms transcription initiation complex with the help of transcription factors. RNA polymerase II consists of at least 1~12 subunits, and the relative molecular weight of each subunit is 1x1 4 ~ 2.4x1 5, and some subunits are also used in polymerase I and III.

4. protein factor (denoted as "TF II") required for basic transcription of RNA polymerase II. Under physiological conditions, protein factor required for basic transcription of RNA polymerase II forms a transcription initiation complex. Tf ⅱ d, TF ⅱ b, and TF ⅱ f can form the primary complex with RNA polymerase ⅱ on the promoter and start to transcribe mRNA. After adding TF ⅱ e and TF ⅱ h, a complete transcription complex can be formed and long-chain RNA can be transcribed. Adding TF ⅱ a can further improve the transcription efficiency. When RNA polymerase ⅱ slides along the template, TF ⅱ d and TF ⅱ a stay at the transcription start site, and other factors move to the 3' end of template DNA with the polymerase. At present, there are two hypotheses about how RNA polymerase ⅱ can be integrated into transcription initiation complex to play its role, one is that it is combined in one step, and the other is that it is combined in steps.

enhancer refers to a DNA sequence that can significantly increase the transcription frequency of the gene linked to it.

Eukaryotic promoters and enhancers are composed of several DNA sequence elements, which are often linked with specific functional genes, so they are called cis-acting elements. These sequences constitute the regulatory region of gene transcription and affect gene expression. In the process of transcription regulation, besides the regulatory region, trans-acting factors are also needed. According to different functions, trans-acting factors are often divided into the following three categories: basic transcription factors with the function of recognizing promoter elements; Transcriptional regulatory factors that can recognize enhancers or silencers and * * * regulatory factors that participate in transcriptional regulation without DNA- protein interaction.

in experiments, the first two types of trans-acting factors are often referred to as transcription factors (TF), including transcriptional activator and transcriptional repressor. Such regulatory proteins can recognize and bind the upstream sequence or the remote enhancer element of transcription initiation site, regulate transcription activity through DNA- protein interaction, and determine the temporal and spatial specific expression of different genes.

*** regulatory factor itself has no DNA binding activity, and mainly influences the molecular conformation of transcription factor through protein-protein interaction, thus regulating transcription activity. In experiments, the * * * regulatory factors that have synergistic effect with transcription activating factors are often called * * * activating factors. < P > All * * activating factors can recognize the target site (promoter, enhancer), and the specificity of the target site is determined by the specific sequence of DNA binding domain. The DNA binding domain binds to a specific sequence, thus bringing the transcriptional activation domain on the activator to the vicinity of the basic transcription region.

In botany-related fields, researchers use related principles to construct inducible transformation vectors containing tags. First, special promoters are designed to ensure the constitutive expression of the fused transcription factors (including DNA binding domain, transcription activation domain and receptor regulatory domain). Once chemical small molecules are added to the experiment, the small molecules combine with the receptor regulatory domain, which leads to the conformation change of the fusion expression transcription factor and transfer from the cytoplasmic matrix to the nucleus. The fusion protein can specifically recognize and bind to the related DNA binding domain, and the transcription activation domain of the protein can activate the high-level expression of related genes. (Simply put, it is a transcription factor, which has three domains: 1, binding to DNA, 2, binding to RNA polymerase, and 3, binding to regulate some chemical regulatory factors. After adding some small chemical molecules, transcription factors are activated, which in turn activates gene expression.

transcription factors: TF ⅱ d in TATA region, CTF in CAAT region, SP1 in GGGCGG region and HSF in heat shock protein initiation region. It is known that each cell can contain about 6, SP1, and the content of CTF is as high as 3, per cell.

trans-acting factors are protein which can directly or indirectly recognize or bind to the core sequences of various cis-acting elements and participate in regulating the transcription efficiency of target genes. Common DNA binding domains include basic amino acid binding domain, acidic activation domain, glutamine (Q) rich domain, proline (P) rich domain and so on. Usually, ligand regulatory receptors have DNA binding domain and transcription activation domain. However, sterol receptors are usually transcription factors, with conserved DNA binding domains at the N-terminal and hormone binding domains at the C-terminal.

1. protein molecules with helix-turn-helix (HTH) structure (this structure is to bind DNA) have at least two α helices, and the short side chain amino acid residues form a "turn" in the middle. The substitution of amino acid residues in the α helix near the carboxyl terminal will affect the binding of the protein in the large groove of DNA double helix. The proteins encoded by homeobox genes, such as MAT locus of yeast mating type and regulatory genes (antp,ftz,ubx) that control the development of Drosophila somites, all have HTH structure. When interacting with DNA, the first and second helices of homologous domain proteins tend to lean on the outside, and the third helix is combined with DNA groove, and combined with DNA groove through its redundant arm at N-terminal.

Homologous domain refers to a DNA fragment encoding 6 conserved amino acid sequences, which is widely found in eukaryotic genomes. It is named after it was first cloned from Drosophila homeotic loci (the gene product of this genetic locus determines the body development). Homologous transformation genes are closely related to the growth, development and differentiation of biological organisms. Many genes containing homologous transformation regions have transcriptional regulation functions, and the amino acid sequence of homologous transformation regions is likely to participate in the formation of DNA binding regions. As follows, the DNA sequences recognized by some transcription factors containing homologous transformation regions, Oct-1 and Oct-2, are only one base different from the core sequences recognized by Pit-1/GHF-1, while Drosophila en,ftz and ubx gene products can recognize exactly the same DNA sequences. Eve gene products recognize not only the same sequence as the former, but also another target sequence.

Oct-1 and Oct-2 specifically bind to the 8-base region in the promoter, and both of them contain a pou region of 75 amino acids and a homologous transformation region of 6 amino acids. Although the homology box in Oct-1 and Oct-2 is quite different from the classical Drosophila homology transformation region (only 2 of 6 amino acids are the same, plus 8 conservative substitutions), this region is highly conserved in these two protein (53 of 6 amino acids are the same).

2, zinc finger structure? Zinc finger structure family proteins can be roughly divided into zinc finger, zinc twist and zinc cluster structures. The amino acid residue number between cysteine and histidine residues is basically constant, and it has transcription regulation activity only when zinc is involved. The repeated zinc finger structure is that the base of an α helix and an anti-parallel β lamella is centered on the zinc atom, which is connected with a pair of cysteine and a pair of histidine by forming a coordination bond, and lysine and arginine protruding from the zinc finger ring participate in DNA binding.

Each α helix in zinc finger structure can specifically recognize 3~4 bases. Using the characteristics of different zinc finger structures to recognize specific DNA sequences and the principle that nucleases can cut off target DNA, researchers have obtained a new type of restriction endonucleases called zinc-finger nucleases (ZFN). According to the characteristic that different DNA sequences can be recognized by changing seven X sequences in the general sequence of zinc finger structure, the α helix for recognizing specific DNA sequences is designed artificially, and TGEK is used as the connecting sequence between the helices to construct a pair of artificial zinc finger structure domain and Fok I fusion protein (ZFN), so that DNA double-stranded can be cut in the designated region.

3, basic-leucine zipper, that is, bZIP structure. A large class of C/EBP family proteins exist in liver, small intestinal epithelium, fat cells and some brain cells, which are characterized by their ability to bind to CCAAT region and virus enhancer. The 35 amino acid residues at the carboxyl end of C/EBP family proteins can form an α helix, in which there is a leucine residue every six amino acids, which leads to the seventh leucine residue appearing in the same direction of the helix.

4. basic-helix/loop/helix, that is, bHLH structure. In the enhancer binding proteins E12 and E47 of immunoglobulin κ light chain gene, 1~288 amino acid residues at the carboxyl end can form two amphipathic α helices, which are separated by a non-helix ring structure.

Transcription activation domain

is located in. The function of trans-acting factors is complicated due to the regulation of protein-protein interaction, and the complete transcription regulation function is usually completed in the form of complex, which means that not every transcription factor directly binds to DNA. Therefore, whether it has transcriptional activation domain becomes the only structural basis of trans-acting factors. Trans-acting factors have diverse functions and their transcriptional activation domains, which usually depend on 3~1 amino acid residues outside the DNA binding domain. Different transcriptional activation domains generally have the following characteristic structures:

1, negatively charged spiral structure

2, glutamine-rich structure

3, proline-rich structure

Based on the principle of interaction between cis-acting elements and trans-acting factors, researchers have developed many technologies to regulate gene expression. The latest application is TALEN (Talen = transcription activator-like effector+Foki nuclear fusion protein), which uses the characteristic that the 12th and 13th amino acids in 34 amino acid repetitive peptide segments of transcription activator can specifically recognize the bases of DNA method. Tandem synthesis of TALE protein which can recognize the target base sequence, fusion expression with endonuclease FokI, can cut 9~13bp downstream of the specific recognition sequence, thus achieving the function of knocking out the specified endogenous gene.

In eukaryotes, the structural adjustment at chromatin level before transcription is called epigenetic regulation of gene expression. It mainly includes DNA modification (DNA methylation) and histone modification (