First, the composition of the data chart
1. Composition of data chart
After disassembling the structure of the data chart, we will find that a data chart is composed of many tiny prefabricated components, all of which have their own names and main uses, namely, article title, axis, chart, icon, logo and information prompt. In the whole process of normal application, some prefabricated elements will be modified and cut according to the scene to reduce the residual information, help customers achieve the overall goal quickly with the most suitable data -inkRatio, and get a lot of information in at least _ _ _ _.
Article title-describes the theme style of the data chart (including the format of the main article title and subtitle)
Identification-the identification of the information content of this group of data today.
Axis-Used to define the projection correlation of data azimuth in plane coordinates.
Chart-induction of the pattern itself
Information prompt-In the case of tapping or hovering, the detailed data information of this point will be displayed in the form of interactive information prompt.
Visual effect of pattern-statistical diagram-visual effect of projection of safe passage on appearance
Below, I can interpret it bit by bit for everyone, so that everyone can make effective use of it. But before that, let's master a knowledge point-the ratio of black ink to data information, so as to better understand the following content.
2. Data information ink ratio
The ink ratio of data information-"data-ink ratio" is a definition clearly put forward by EdwardTufte, a master of visual effects in 1983, in "Visualization of Quantitative Information Formation": most ink pens of data charts should be used to display data information, and when data information changes, ink pens will also change. He defined the printing ink ratio of data information as the amount of ink used in data information divided by the total amount of printing ink in the data chart. Among them, data information ink refers to the specific content in the data chart that cannot be deleted. For example, I can delete icons, vertical coordinates and dividing lines, which probably won't easily endanger you to load relevant information from the data chart. However, if you delete the main elements of the data chart, such as histogram and pie chart, the data chart will lose its contents.
Personally, I prefer to use the definition of "signal-to-noise ratio" = signal/(signal+noise) to understand, because the information content transmitted according to data visualization is not only data information, but also business process insight, especially the information content such as opinions and summaries must usually be presented in text form. But no matter which word is used, the ultimate goal is to highlight part of the "information content" and remove the "noise" of these influences.
Therefore, the greater the proportion of data information ink in the data chart, the lower the residual information of the data chart and the higher the actual effect of information transmission. Therefore, when establishing data charts and graphs, everyone's overall goal should be to maximize the profit ratio of data information within the effective range.
Second, the detailed description of data chart elements
1. article title
An established and consistent article title can quickly let readers know what the data chart should express. Generally speaking, the title of an article in a data map is determined according to what the data map must express, and most friends may find it too difficult to name it. However, when the result of this data graph is single and unique, it is suggested to add a summary network switch to the title of the article summarizing the data graph. That will reduce the possibility of readers misunderstanding your intentions and ensure that they will focus on the data information you want to emphasize.
2. Axis (ordinate)
2. 1 definition
Axis is a mobile phone positioning system, which allows each two-dimensional array to find the projection correlation in the dimension space, with more emphasis on math class/physics concepts. In other words, the role of the axis is like putting the data visualization target on the same standard, and then calibrating and measuring with a ruler. In the visualization of big data, it generally exists in Cartesian plane coordinates (rectangular coordinates) and polar coordinates. By decomposing the "molecular" factor of the axis, we can get the following types of elements, namely: center line, axis scale line, axis logo, axis article title (enterprise) and its separation line.
2.2 classification
According to whether the matched independent variable is continuous data information or discrete variable data information, the axis can be divided into classification axis, time axis and continuous axis.
2.3 Application for Proposal
2.3. 1 centerline
Generally, the center line only considers whether it represents the combination of the above data and information ink ratio. In the case of separation line, the column chart/broken line statistical chart will hide the Y center line, while the bar chart will hide the X center line, so as to reduce the noise of information content and highlight the key purpose of visual effect.
2.3.2 Axis scale line
The axis scale line is a small straight line on the center line, which can give the standard value a given position on the ordinate. There are three kinds of axis scale lines, namely: inner, middle (that is, cross method) and outer. However, the ruler should be placed on both sides of the ordinate of the standard value, and it is not recommended to use the method of centering or embedding the ruler.
The application of axis ruler line is to improve the correlation of projection and quickly match the data information points. Classification axes mostly appear in column charts and bar charts, and there is a natural correspondence between projections, so the ruler lines on the axes are habitually hidden in the classification axes.
split line
Separation lines are used to help data charts improve projection correlation. The application of dividing line can improve the readability of data information, which has two functions: first, it broadens the scale of data visualization objects and is convenient for observing the value of data information; The second is to improve the intermediate data visualization goal, which is beneficial to comparison.
The separation line generally follows the part of the range axis, and the horizontal grid chart is selected for the single-sided display column chart, and the vertical grid chart is selected for the bar chart. When applying the separation line, we should pay attention to following the sequence standard, with the center line as the leading factor and the separation line as the auxiliary factor, and dotted lines or diagonal lines can be used in the design. In order to prevent the color tone from being too heavy, it is not necessary to paint it with pure black or pure white, and the information content in the data chart should not be robbed in the visual effect level.
Axis article title
The key of axis article title (enterprise) is to show the data information meaning of function definition domain axis and value domain axis. When other parts of the visual chart (article title, chart, axis logo, etc. ) has been able to fully express the meaning of data information. According to Occam's basic law of razor, the title of shaft article can be omitted, the printing ink proportion of data information can be further expanded, and the interface elements can be reduced.
Axis identification
The design scheme of the shaft target is complicated and involves many key points. Here, we will focus on the X axis and Y axis of rectangular coordinates.
X-axis identification
The key to the design scheme of X-axis logo is the indication standard. In the visual chart design scheme, people often encounter the situation that the axis mark is too long. When the indoor space is limited, the axis marks will overlap. How to deal with this kind of problem, matching solutions are given here according to different types of shafts.
A. duration/timeline identification
In continuous axis and time axis, we can use sampling method to improve the problem of overlapping axis markers. The application of rotation to reduce the total width is not strongly recommended here. On the one hand, from an aesthetic point of view, rotation is likely to destroy the overall harmony of the page. On the other hand, the duration/timeline does not necessarily represent all Axis logos. Referring to the Basic Principles of Sustainability in Gestalt, although the axis logo cannot be fully displayed, customers will make up the missing parts in their minds, and the axis logo will still be continuous.
B. Classification axis identification
On the classification axis, because there is a close logical correlation transaction between logo and logo. If the sampling standard is selected and some marks are hidden, it will increase the difficulty coefficient for customers to obtain the information content of data charts, which is not desirable. For the classification axis, it is suggested to rotate or convert it into other data charts (bar charts) according to the logo to reduce the total width.
Y-axis identification
The key of Y-axis logo design is the total number of logos, value range and data type. Symbols indicate that the area is usually scaled according to the total width of most symbols. If the two-dimensional array is fixed, the fixed total width is written, which saves the calculation of data chart and improves the 3D rendering rate.
A. the total number of axis markers
The total number of axes is not recommended too much. Too many marks will inevitably lead to more horizontal separation lines, which will lead to the sinking of elements and affect the expression of graphic information content. According to the law of 7 2, the total number of Y-axis markers should be controlled within this category as much as possible.
B. Value range of shaft identification
Generally speaking, the assignment of Y-axis marks should start gradually from zero baseline to correctly reflect the standard values. Displaying disconnected data information may deceive customers into making incorrect distinctions. For example, the data itself _ has that _ fluctuation change, solving the granularity of the upper and lower limits, and lengthening the scale can also look like "a bumper harvest is in sight".
From the above, it can be clear that you should never be deceived by the surface when you look at the data chart, but you can't get the due result when you look at the development trend of the column chart. You must pay attention to whether the starting and ending parts of the ordinate have been falsely reported by others.
But there is a fundamental reason, so I don't comment on this assignment method. Here, I will focus on my usual assignment method: dynamically calculate the limit of the Y axis, that is, the maximum value, according to specific data information. For example, the larger one in a row of data is 1 190, and the largest bit of that _ axis identifier is1200; In a row of data, the larger one is 12 10, and the largest bit of the axis identifier is 1400. Where 1400 and 2 100 are based on the number of segments on the axis.
However, some people also put forward the following suggestions on the assignment of Y-axis marks: in dotted statistical charts, assignment generally ensures that the graph accounts for about 2/3 of the drawing area, or controls the height-width ratio of columns at about 85% of the height of the chart.
But I think this method is too deliberate, and the standard is relatively fine. However, it must be recognized that this will definitely be beautiful, and it is impossible to do real data information. Due to the full consideration of specific data information, there will be some limitations, such as some are particularly large and some are particularly small. In order to better ensure that customers can accurately obtain information from data charts, it should not be destroyed for better artistic beauty. Therefore, this method of assignment is not strongly recommended.
C. data type of axis identification
With regard to the data type of Y-axis identification, the following are some design points that are easily overlooked. First, the number of decimal places saved by logo remains uniform, because some axis logos are integer values, so there is no need to omit decimal places.
Second, the Y axis is marked with positive and negative signs, because negative numbers are marked with "-",all Y axes will have visual effect errors, especially the right Y axis of the two-axis diagram. In this paper, it is proposed that the positive and negative poles carry ""along the Y axis to achieve the practical effect of balancing visual effects.
Step 3 explain
3. 1 definition
A graph is an induction of the graph itself and belongs to _ help _ capacity in the data graph element. It provides readers with a comparative method to understand the new project classification of data visualization objectives. It consists of projected patterns and characters.
3.2 types
According to the different types of basic data, it is divided into continuous icons and classified icons; Depending on the situation, the icon can be set as static data or interactive.
mark
4. 1 definition
In a data chart, a logo is an indication of a set of data information today. Include data information points, hanging lines, text values and other elements, and the application is selected according to different chart types.
5. Information prompt
5. 1 definition
When the information prompt is usually a touch or hover, the data chart spits out this part of the data information in an interactive way to help customers grasp the data information in a deeper level. Generally, it consists of three elements: visual effect recognition mode, character recognition and standard value recognition.
5.2 types
There are four ways to present information tips. According to the different types of charts, they can be divided into floating, fixed, fixed on the shaft and fixed on the mode.
6. Mode
6. 1 definition
People get information from patterns more efficiently than words. It can be said that people have entered the media age. Graphics is the visual effect presentation projected on the appearance by the visual effect security channel of statistical charts, and is the essential element of data graphics, which is installed together with the information content behind data information. According to the concept of composition atomization, today's bizarre data charts can be divided into six basic styles: curve, total area, scatter, bubble, cake/ring, column and bar.
Third, choose the appropriate data chart.
There is a saying in statistical analysis of data that "a picture is worth a thousand words". Data chart is an important way to present data information. Choosing the right data chart can help us convey data information more quickly and intuitively.
So how to choose the right data chart? I think it is roughly divided into three steps:
1. Clarify the specific content: establish the key information content to be conveyed by the chart;
2. Distinguish relative: distinguish the relative types between data and information (such as occupancy rate, total number, development trend, etc. );
3. Select chart type: select matching data charts (such as pie chart, column chart and broken line statistical chart). ).
Many basin friends will be at a loss when judging and choosing chart types, but in fact, you just need to remember one sentence: it is not the data information that determines the way of data charts, but the information content you need to convey.
1. Define the specific content.
The same set of data information has different theme styles when viewed from different angles, such as the following set of data information:
Looking at the data information in May from another angle, you will probably focus on the percentage of each commodity in total sales. Then the theme style of your data map should be "In May, commodity A ranked first in the total sales of the enterprise".
Generally speaking, the first and most important thing to choose a suitable data chart is to establish the key information content to be conveyed by the chart.
2. This resolution is more meaningful.
In specific work, there are all kinds of scene data information that must be represented by charts, but the classification according to the correlation degree of data information is only the following categories, so we can simply give you a few examples:
"It is estimated that the total sales volume will increase in the next 10 year" is the development trend;
"The maximum salary of employees is between 30,000 and 35,000 dollars", and the matching association is _ rateover;
"Automobile gasoline is not characterized by higher brand and higher price." The relevance of matching is relevance;
"Sales in six regions were basically flat in September" is a matching correlation;
"The head of the marketing department only spent 15% of his time in his industry" is a matching association.
3. Select the chart type
Overseas authoritative expert AndrewAbela once combed the legend of a chart type selection manual (as shown below), and he divided the association between data and information into four categories to help us choose the appropriate data and charts to display.
But in fact, combined with my own work experience and fully considering the company's daily data statistical analysis scenarios, the application rate of some data charts on the map is extremely low. So I refer to some graphs, sum up, replace some missing data graphs and clean them up. Generally speaking, I think it will be more suitable for business reception data charts, more pragmatic and true, and suitable for your reference application.
Fourth, the general visual acuity chart
1. broken line statistical chart
The definition of 1. 1
Broken-line statistical chart is a data chart that shows the change of continuous data information at any time, any place or ordered type according to the fluctuation (rise or fall) of wireframe, and is often used to reflect the trend analysis of data information over time.
2.2 Available scenarios
The abscissa is continuous (such as _ _), and trend analysis and forecast analysis are emphasized, so the broken line statistical chart is applicable.
For example, I like to watch the sales of products in the first half of 2020 and make an analysis of the market. Because the monthly product sales are related, which means a kind of data information value under different conditions, so you can use the broken line statistical chart to connect them.
However, if we like to look at the sales of Beijing, Shanghai, Guangzhou, Shenzhen and Nanjing in the first half of 2020, because the sales of the provinces are not related, we can't use the broken-line statistical chart instead of the column chart at will.
2. Regional map
2. 1 definition
Area chart, also called area chart, is a statistical chart that reflects the change of standard values with the change of ordered independent variables, and its basic principle is similar to that of broken-line statistical chart. It basically adds a definition of total area to the statistical graph of broken lines, and the added area can represent the meaning of "accumulation" (when the X axis is a continuous nominal value).
2.2 Available scenarios
When paying attention to the changes with the development trend and total value, the area map is applicable.
For example, if you want to inquire about the daily product sales in June 2020 and June 2020, and compare the monthly sales, you can choose the area map at this time. However, when the variables are not sequential independent variables, it is not appropriate to use the area chart.
3. Accumulation area map
3. 1 definition
Similar to the area chart, the stacked area chart is a statistical chart with the middle area between the curve and the variable ordinate added on the basis of the broken line statistical chart.
The only difference is that there are several data information series products in the stacking area diagram, which are stacked layer by layer, and the starting point and end point of each data information series product are the end points of the previous data information series product.
3.2 Available scenarios
It is appropriate to use _ to observe the transformation of multiple independent variables, which can not only see the overall development trend, but also see the composition of their respective variables.
4. Histogram
4. 1 definition
Column chart is a statistical chart that uses rectangular boxes to compare different types of standard values. The length of vertical stigma or horizontal stigma should be compared with the nominal size, and the classification level that must be compared should be indicated on an axis; The other axis represents the corresponding scale value.
On the column chart, each solid line of classified variables is displayed as a rectangular box (abbreviated as "stigma"), and the nominal value determines the height-width ratio of the stigma. Vertical column chart column chart vertical arrangement:
The columns in a bar chart are horizontal, also known as a bar chart:
4.2 Available scenarios
Column chart is most suitable for classified data information, especially when the standard value is close, because the human eye's cognition of aspect ratio is better than other visual effect elements (such as total area, viewing angle, etc.). ), so the column chart is more appropriate.
As shown in the figure below, the standard values of the five groups of data information are very close. If you choose a pie chart, you can't visualize it, but the column chart on the right can better convey the information content of the data chart.
5. Stacked column chart
5. 1 definition
StackedColumnChart, also known as stacked column chart, is a data chart used to dissolve the whole and compare parts.
It is an extension of the column chart. The difference is that the data information of the column chart is processed and sorted in parallel, while the stacked column chart is accumulated one by one. It can show the total output of each category, as well as the scale and share of each small category contained in this category, and this sub-category is generally represented by different colors.
5.2 Available scenarios
Compare the nominal size of different types of data information, and at the same time compare the composition and size of subtypes in each type of data information.
For example, the following figure shows the market sales of each skin care product in each commodity. According to the stacked bar chart, we can clearly know which city the same skin care product sells better.
6. Sort column chart
6. 1 definition
Sort column chart, also known as aggregate column chart. Similar to a column chart, the length of the column chart header is used to project and compare data information values. Stigmas in each category should be distinguished with different tones or the same tones in a completely transparent way, and the spacing between each category must be maintained.
6.2 Available scenarios
Compare the dimensions of the same category in different sorts, and compare the dimensions of different categories in the same sort. Among them, the sorting number does not need to exceed 12, and the classification under each sorting does not need to exceed 6.
7. Double column chart
7. 1 definition
The standard value in the middle of the type indicated by the column headers in both positive and negative directions in the double column chart is relatively high, in which the classification axis indicates the classification level that must be compared, and the continuous axis indicates the corresponding standard value, which is divided into two situations, one is that the positive and negative scale values are completely symmetrical; The other is the anti-symmetry of the forward scale value and the reverse scale value, that is, they are opposite numbers.
It can also be divided into vertical and horizontal directions, in which the horizontal double-column chart is also called positive and negative bar chart.
7.2 Available scenarios
Double column chart is generally used to compare positive and negative data information, such as statistical analysis of wage income and expenditure in a week, in which income is positive and expenditure is negative.
Double column chart can be used to compare income and expenditure, and analyze the nominal value and fluctuation of income and expenditure from a single series of products.
8. Circular grid statistical chart
8. 1 Definition
Pie chart or pie chart is a circular statistical chart, which is divided into several fan types. In the pie chart, the size of the arc length of each sector (and its central angle and total area) indicates the proportion of this sector in the whole, and this sector together happens to be a complete ring.
8.2 Available scenarios
Pie chart is mainly used to show the relative occupancy of different types of standard values relative to quantity, especially when it is necessary to highlight that a certain part often accounts for 25% or 50% of the total.
9. Ring diagram
9. 1 Definition
A Ring chart, also known as a Puff chart, is a pattern in which two or more pie charts with different sizes are stacked together and a part in the middle is cut out.
9.2 Available scenarios
The occupancy rate suitable for display classification is similar to that of pie chart, but the utilization rate of ring chart is higher than that of indoor space of pie chart. For example, we can use its hollow area to display text information content, such as article titles.