Volcano maps can reflect the overall gene expression. The abscissa represents log2 (multiple change) and the ordinate represents -log 10(P value). Each dot represents a gene, and colors are used to distinguish whether genes are differentially expressed. The orange dots represent differentially expressed genes, and the blue dots represent non-differentially expressed genes. Cluster diagram Cluster diagram can measure the similarity of expression between samples or genes.
In the cluster diagram, the abscissa represents sample clustering, and one column represents a sample. Clustering is based on the similarity of gene expression between samples. The closer the gene expression between samples is, the closer it is, and so on.
The ordinate represents gene clustering, and a line represents a gene. Clustering is based on the similarity of gene expression in samples. The closer the gene expression in samples is, the closer it is, and so on.
The color scale represents the abundance of gene expression, with the redder representing the more obvious up-regulation and the greenerer representing the more obvious down-regulation.
The first is the volcano map:
Volcano map is an image used to represent the difference data between groups, because when the organism changes, the expression of most genes does not change slightly, and only a few genes change significantly. Therefore, volcano maps are often used to analyze RNA expression profiles and data of chips, and most commonly used to analyze the differential expression of genes. In recent years, there have been other omics applications, so I won't go into details here.
The essence of volcano map is a positive scatter diagram, which contains two important concepts:
1) significance, that is, p value, and the difference value is used to test the p value of the two groups of samples, and the negative logarithm -log 10(P value) is transformed into the ordinate;
2) With log2 (fold change) as the abscissa, the volcano map can be obtained. Under certain screening conditions (for example, the multiple change is more than 2 times, and the significant P value is less than 0.05), genes with significant differences in expression can be screened for subsequent research.
If DEseq2 is used to analyze the data of RNA expression profile, the analysis results should be as follows, in which
Log2FoldChange is the log2(Fold Change) value of the expression, and padj lists the modified pvalue, which are the two columns we need to draw the volcano map.
First, we convert the output format of DEseq to dataframe format, and use the function as.data.frame () to view the first six lines with head, as shown below:
Df<- as.data.frame (resolution)
Supervisor (df)
Next, according to P.
Set the grouping and assign it to the variable color, we P< P
df $ color & lt-if else(df $ padj & lt; 0.05 & ampabs(df$log2FoldChange)>= 2,if else(df$log 2 foldchange & gt; 2, "red", "blue", "gray")
To set the grouping, you also need to specify a color for the grouping:
R color & lt- c (red = "red", gray = "gray" and blue = "blue")
The complete drawing code is here:
p & lt- ggplot(df,aes(log2FoldChange,-log 10(padj),col = color)) +
geom_point() +
theme_bw() +
scale _ color _ manual(values = color)+
Labs(x="log2 (multiple change) ",y="-log 10 (q value)")+
geom _ hline(yintercept =-log 10(0.05),lty=4,col="grey ",lwd=0.6) +
geom_vline(xintercept = c(-2,2),lty=4,col="grey ",lwd=0.6) +
Theme (legend.position = "none ",
panel.grid=element_blank(),
axis . title = element _ text(size = 16),
axis . text = element _ text(size = 14))
p
Key points to pay attention to in the code part:
1) log 10 transform qvalue.
2) -log 10(0.05) is done when the threshold line of the vertical axis is drawn.
3) Other drawing parameters and concepts are the same as those of drawing a scatter plot.