How to analyze the data after entering the questionnaire?

The method of analyzing questionnaire data with SPSS

When our questionnaire gets back the survey data, what we need to do is to use relevant statistical software to process it. Here we use spss as the processing software to briefly explain the questionnaire processing process, which can be roughly divided into four processes: defining variables, data entry, statistical analysis and saving results. The following will introduce the questionnaire processing in detail from these four aspects.

Spss processing:

Step 1: Define variables

In most cases, we need to define variables from scratch. After opening SPSS, we can see an interface similar to excel. In the lower left corner of the interface, we can see two tabs, data view and variable view. Just click the variable view tab in the lower left corner to switch to the variable definition interface and start defining new variables. At the top of the table, you can see the following items to be set for variables: name (variable name), type (variable type), width (variable value width), decimal (decimal place), label (variable label), value (label defining specific variable value), missing (defining variable missing value), column (defining display column width) and alignment (defining display column width).

We know that in spss, we can set each question in a questionnaire as a variable, so that there will be as many variables as there are questions in a questionnaire, and the answer to each question is the value of the variable. Now let's take the first question of the questionnaire as an example to illustrate the setting of variables. For the sake of explanation, we can assume that the title is:

1. Which age group do you belong to?

A: 20-29 years old B: 30-39 years old C: 40-49 years old D: 50-59 years old.

Then our variable settings can be as follows: name means that the variable name is 1, type means that the type can be set according to the type of the answer, and we can use 1, 2, 3, 4 instead of A, B, C, D for the answer, so we choose numeric, that is, we choose numeric, with a width of 4, and decimal means that the number of decimal places is 0 (because the answer has no decimal point) Values is a label used to define the value of a specific variable. Click the ellipsis in the right half of the value box to open the variable value label dialog box. Enter 1 in the first text box, 20-29 in the second text box, and then click Add. Similarly, we can make the following settings, namely 1 = 20-29, 2 = 30-39, and 3 = 40. Missing is used to define the missing value of a variable. Click the ellipsis on the right side of the Missing box to open the Missing Value dialog box with a list of three radio buttons. The default value is "No Missing Value" at the top; The second item is "Discontinuous Missing Values", and at most 3 values can be defined; The last item is "missing value range plus an optional missing value", and no default value is set here, as shown in the figure, select the first item; Columns, which defines the width of display columns and can be set according to the actual situation; Align, which defines display alignment methods, including left alignment, right alignment and center alignment; Measure, which defines whether the variable type is continuous, ordered or unordered.

The above is the variable setting of common multiple-choice questions in the questionnaire, and the variable setting of some special situations will be explained below.

1. Setting of open-ended questions: Fill in the blanks. If your province is _ _ _ _ _ _ _, it is an open-ended question. When setting these variables, you only need to leave values and missing values.

2. Variable setting of multiple-choice questions: There are two ways to set this kind of questions, multiple dichotomy and multiple classification. Here we only introduce the multiple dichotomy. The basic idea of this method is to set each option of the question as a variable, and then split each option into two options, that is, checked and unchecked. Now let's give an example to illustrate the specific operation in spss. For example:

How do you usually get news?

1 newspaper 2 magazine 3 TV 4 radio 5 network

When setting variables in spss, you can set five variables for this question. If it is the third question of the questionnaire, then the variable names are 3_ 1, 3_2, 3_3, 3_4 and 3_5 respectively, and then each option has two options to check and uncheck. Set it to 1= checked in the Value item.

Using this window, we can define all the questions in the questionnaire as variables at one time in this window.

At this point, our work of defining variables can basically be completed. What we need to do next is data input. First, we need to return to the data entry window, which is very simple, as long as we click the data view tab at the bottom left of the software.

Step 2: Data Entry

There are many ways to input Spss data, roughly as follows:

1. Read data in SPSS format.

2. Read data in Excel and other formats.

3. Read text data (fixed and separator)

4. Reading data in database format (divided into the following two steps)

(1) Configure ODBC (2) Use ODBC and database in SPSS.

However, the data entry of the questionnaire is actually very simple, just enter it directly in the data entry window of spss, but there are several points to pay attention to here.

1. In the data entry window, we can see a table, and each line in this table represents a questionnaire, also known as a case.

2. In the data input window, we can see that the tag names of 1, 2,3,4,5 ... appear at the top of the table, which is actually the variable name we took for each question in the questionnaire in the first step, that is, 1 stands for the first question. 2 stands for the second question and so on. We only need to enter the answer to the corresponding question under the variable name, and then we can complete the data entry of the questionnaire. For example, check the answer of A on the questionnaire, and just enter 1 under 1 (don't forget that we usually use 1, 2, 3, 4 instead of A, B, C, D).

We know that one line represents a questionnaire, so if there is a questionnaire, there must be several lines of data.

After the data entry is completed, what we need to do is the statistical analysis of the questionnaire, which is our key part, because at this time we have already entered the data in the questionnaire into our software.

Step 3: Statistical analysis

With the data, we can use various analysis methods of SPSS to analyze, but which statistical analysis method to choose, that is, which statistical analysis process to call, is the key to get the correct analysis results. It depends on the purpose of our questionnaire and what kind of results we want. SPSS has two methods: numerical analysis and graphic analysis.

1. Drawing analysis:

In SPSS, except the survival chart used in survival analysis is integrated into the analysis menu, other statistical drawing functions are placed in the graphic menu. The menu is divided into the following sections:

(1) Gallery: It is equivalent to a self-study guide, which briefly introduces the statistical drawing function, and beginners can have a general understanding of the drawing ability of SPSS through it.

(2) Interactive: interactive statistical chart.

(3) Maps: statistical maps.

(4) The following other menu items are our most commonly used general statistical charts, specifically:

bar chart

scatter diagram

line graph

bar chart

pie

Regional map

box plot

Normal Q-Q graph

Normal P-P diagram

Quality control chart

pareto chart

Autoregressive curve

Altitude map

Cross correlation diagram

sequence map

spectrogram

Error diagram

Drawing analysis is easy to understand and clear at a glance. We can choose the graphics we need to make according to our own needs. Generally speaking, we often use bar charts, histograms, normal charts, scatter charts, pie charts and so on. The specific operation is very simple. You can refer to related books. Drawing analysis is more often combined with numerical analysis to analyze test papers, and the effect is better.

2. Numerical analysis:

The numerical statistical analysis process of SPSS is carried out in the Analyze menu, including:

(1), statements and descriptive statistics: also known as basic statistical analysis. Basic statistical analysis is the premise of other deeper statistical analysis. Through basic statistical analysis, users can more accurately grasp the overall characteristics of the analysis data, so as to choose a more in-depth analysis method to study the analysis object. The function contained in the report and descriptive statistics command item is univariate descriptive statistical analysis.

Descriptive statistics include the following statistical functions:

Frequency (frequency analysis): Function: Understand the distribution of variables.

Descriptor: Function: Understand the basic statistical characteristics of data and standardize the specified variable values.

Exploration: Function: Investigate the singularity and distribution characteristics of data.

Crosstab: Function: Analyze the interaction and relationship between things (variables).

The report includes the following statistical functions:

OLAP Cube: Function: Calculate the total, average and other statistics of each group according to the grouping variables. The output report summary refers to the statistical information of various variables contained in each group.

Case Summary: View or print the required variable values.

Line report summary: output the report in the form of lines.

Column report summary: output the report in column form.

(2) Comparative mean: Can the sample mean be used to estimate the overall mean? Are the samples with similar mean values of two variables from the same population? In other words, the two groups of samples have different mean values of variables. Is the difference statistically significant? Can you explain the overall difference? This is a common problem in all kinds of research work. This requires a mean comparison.

The following is the process of average comparison test:

Mean process: descriptive statistics of different levels (different groups), such as the average salary of men and women, the average salary of various jobs, etc. The purpose is to compare. Terminology: number of levels (refers to the number of values of classified variables, for example, a gender variable has two values, which are called two levels), cells (refers to dependent variables grouped according to the values of classified variables), and level combinations.

T-test process: the process of t-test for samples.

Single sample t-test: to test whether the mean of a single variable is different from a given constant.

T-test of independent samples: to test whether two groups of unrelated samples come from the same population (whether the mean is the same, such as whether the average income of men and women is the same, whether there is a significant difference)

Paired t test: to test whether the two groups of related samples come from the same population (before and after comparison, such as training effect and treatment effect)

One-way ANOVA: One-way ANOVA is used to test whether several (three or more) independent groups are from the same population.

(3) Variance analysis model: Variance analysis is a method to test whether the differences between the mean values of multiple groups of samples are statistically significant. For example, the medical community studies the efficacy of several drugs on a certain disease; Agriculture studies the effects of soil, fertilizer, sunshine time and other factors on the yield of a certain crop; The influence of different feeds on the weight gain of livestock can be solved by variance analysis.

(4) Correlation analysis: it is a commonly used statistical method to study the closeness between variables. Commonly used correlation analysis includes the following types:

1, linear correlation analysis: study the degree of linear relationship between two variables. It is described by correlation coefficient R.

2. Partial correlation analysis: describes the correlation between two variables when controlling the influence of one or several other variables, such as controlling the influence of age and work experience, and estimates the correlation between wage income and education level.

3. Similarity measurement: The relationship between two or more variables, two groups or two groups of observations can sometimes be described by similarity or dissimilarity. Similarity measures are very similar with large values, while dissimilarity is described with distance or dissimilarity, and large values indicate that they are far apart.

(5) Regression analysis: Function: Seeking the relationship between related (related) variables. In the process of regression, it includes: Liner: linear regression; Curve estimation: curve estimation; Binary logistic regression; Multivariate logistic regression; Orderly regression; Probit: probability unit regression; Nonlinear: nonlinear regression; Weight estimation: weighted estimation; Two-stage least square method: two-stage least square method; Optimal scale optimal coding regression; The first three are the most commonly used.

(6) nonparametric test: it refers to a test method used to test whether the data comes from the same general assumption when the population does not obey the normal distribution and the distribution is unknown. These methods are named because they usually do not involve global parameters.

The process of nonparametric inspection is as follows:

1. chi-square test

2. Binomial test Binomial distribution test

3. Run the test Run the test

4. 1 sample Kolmogorov-smirnoff test single sample André Andrey Kolmogorov-Minov test.

5.2 Independent Sample Testing Two independent sample tests

6. Independent sample test

7.2 Related Sample Testing Two related sample tests

8.k correlation sample test Two correlation sample tests

(7), data reduction (factor analysis)

(8), classification (clustering and discrimination) and so on.

The above is a brief introduction to several numerical statistical analysis methods used for analysis under the analysis menu of numerical statistical analysis. After our variable definition and data entry are completed, we can choose the above analysis methods to statistically analyze our questionnaire data according to our own needs and get the desired results.

Step 4: Save the results.

Our spss software will save many results of our statistical analysis in a window, which is the result output window. Because spss software supports the copy-and-paste function, we can copy and paste the desired results into our report, and of course we can also execute the file- > command in the menu. Save to save our results. Generally speaking, we suggest saving our data, but not the results. Because as long as we have the data, as long as we are willing, we can use the data to get results at any time.

Summary:

The above are the four steps for spss to handle the questionnaire. After these four steps, the work we need spss software is basically over, and the next task is to write our statistical reports. It is worth mentioning that spss is a statistical software widely used in social statistics. Learning it well is of great significance and function to our future work and study.

In the questionnaire analysis of SPSS, a questionnaire is a case. Firstly, variables should be defined according to different questionnaire questions. There are two noteworthy points in defining variables: one is to distinguish the measure of variables from the value of the measure, in which the scale is quantitative, the ordinal number is ordinal number, and the nominalization is designated class; Second, pay attention to defining different data types.

The types of various questionnaire topics can be roughly divided into four types: single choice, multiple choice, ranking and open-ended questions. Their variables are defined and handled in different ways. We will introduce the following examples in detail:

1 multiple choice question: the answer can only have one option.

Example 1 Does your organization currently have an organization-oriented career planning system?

A has B, and it is starting. C has no D, but it has been interrupted.

Code: Only one variable is defined, and the values of 1, 2, 3 and 4 respectively represent four options: A, B, C and D.

Enter: enter the corresponding value of the option; If c is selected, please enter 3.

2 Multiple choice questions: The answer can have multiple options, including multiple choices for indefinite items and multiple choices for fixed items.

(1) method 1 (dichotomy):

What groups does your career planning system cover? When you draw a hook, please put all the hints in.

Take it into consideration.

A monthly worker b daily worker c hourly worker

Code: each corresponding option is defined as a variable, and the value of each variable is defined as follows: select 1 instead of "0".

Input: Enter 1 for the option selected by the respondent, and do not select 0. If the respondent chooses AC, the three variables are input as 1, 0 and 1 respectively.

(2) Method 2:

What do you think are the three most important goals in carrying out the educational activities to maintain the advanced nature of Communist party member?

1( ) 2 ( ) 3( )

A, improve the quality of party member; b, strengthen grass-roots organizations; c, persist in promoting democracy.

D. Stimulate the enthusiasm of entrepreneurs E. Serve the people F. Promote all work.

Coding: three variables are defined to represent the brackets of 1, 2,3 in the title, and their values are all defined by the corresponding options, namely "1"a, "2" b, "3" c, "4" d, "5" e and "6" f.

Input: the input values of 1, 2, 3, 4, 5, and 6 respectively represent the option ABCDEF, and they are input under the variable corresponding to each bracket. If the respondent chooses ACF in three brackets, enter 1, 3, and 6 respectively under three variables.

Note: Multiple choice questions that can be coded by Method 2 can also be coded by Method, but multiple choice questions with uncertain items can only be coded by dichotomy, that is, Method 1 is the general method for multiple choice questions.

3 ranking question: rank the importance of options.

When you buy goods, the order of your attention is (please fill in the code and rearrange it).

First, second, third, fourth and fifth.

Code: five variables are defined, which can represent the first bit and the fifth bit respectively. The value of each variable is defined as follows: "1" brand, "2" popularity, "3" quality, "4" practicality and "5" price.

Input: Enter the numbers 1, 2, 3, 4 and 5 to represent five options respectively. If the respondent ranks quality first, enter "3" under the variable representing the first place.

4 Select the sorting problem:

In case 5, the question in case 3 was changed to "What do you think is the most important thing to carry out educational activities to maintain the advanced nature of party member?"

The goal is those three items, in order of importance from high to low, and the options remain the same.

Coding: Six variables are defined according to the six options of ABCDEF6, and the value of each variable is defined as follows: "1" is not selected, "2" ranks first, "3" ranks second and "4" ranks third.

Input: input according to the value of the variable. For example, if ECF is selected in three brackets, the values of the six variables in this question should be entered respectively: 1 (representing that option A is not selected), 1, 3 (representing that option C ranks second), 1, 2, 4.

Note: This method is a combination of multiple-choice questions and ranking questions, and can also be applied to general ranking questions (Example 4), except that they use different analysis methods (Example 4 uses frequency analysis and Example 5 uses descriptive analysis), and the output results reflect the importance of the questions from different aspects (the former method looks at the ranking from the frequency of variables, and the latter method looks at the ranking from variables).

5 Open-ended numerical questions and scale questions: These questions require respondents to fill in their own numerical values or score.

Example 6 Your age (actual age): _ _ _ _ _ _

Code: Variable with no defined value.

Input: enter the actual value filled in by the respondent.

Six open text questions:

If possible, the answers with similar meanings can be coded and converted into closed options for analysis. If the answers are rich and difficult to classify, we will directly make a qualitative analysis of such questions.

Third, the general analysis of the questionnaire

The following details the general processing methods of questionnaires in SPSS. Operation takes spss 13.0 version as an example, and the menu items mentioned below are all under the analysis main menu.

1 frequency analysis: the frequency process can be used as a univariate frequency distribution table; Displays the frequency of occurrence of specific values of variables specified by users in data files; Get some statistics describing the range of values and statistics describing the range of values.

Scope of application: multiple-choice questions (example 1), sorting questions (example 4) and multiple-choice methods (example 3)

Frequency analysis is also the most commonly used method in questionnaire analysis.

Implementation: Descriptive Statistics ... Frequency

2 Descriptive analysis: Descriptives: This process can calculate descriptive statistics of univariate. These statistics include mean value, arithmetic sum, standard deviation, maximum value, minimum value, variance, range of mean value and standard error.

Scope of application: multiple-choice questions and sorting questions (Example 5) and numerical questions (Example 6).

Implementation: descriptive statistics ... descriptive, click the statistics button to select the required statistics.

3 Frequency analysis under multiple reactions:

Scope of application: multiple choice dichotomy (Example 2)

Realization: Step 1, gather all the variables defined by a multiple-choice question in multiple answers ... define a set, name a new set variable, and enter 1 in the dichotomy count. The second step is to do frequency analysis under multiple response ... frequencies.

4. Cross-frequency analysis: solve the frequency analysis problem at all levels of multivariate combination.

Scope of application: It is suitable for contingency table formed by cross-classification of two or more variables, and analyzes the correlation between variables. For example, if you want to know how people with different job properties use transportation at work, you can get a two-dimensional frequency table through cross analysis, which is clear at a glance.

Realization: Step 1, determine the options of cross-analysis according to the analysis purpose, and determine the control variables and explanatory variables (for example, artificial control variables with different work properties use vehicles as explanatory variables). Step 2: Select descriptive statistics ... crosstab.

Introduce four simple graphic descriptions.

When doing the above frequency analysis and descriptive analysis, you can directly make a graph, which is simple and convenient, or you can make another graph. The drawing function of SPSS is powerful, and the graphics under the menu graphics are clear and beautiful. Now the common charts are briefly introduced as follows.

1 pie chart: also known as pie chart, it is a statistical chart that uses the area of a circle to represent the population of the research object, and divides the area of the circle into several sectors according to the proportion of each component in the population to represent the proportional relationship between the phenomenon part and the population. The results of frequency analysis should be represented by pie charts.

2 graph: it is a statistical graph that shows the change of data with the rise and fall of line segments. It mainly shows the changing trend of phenomena in time, the distribution of phenomena and the dependence of the two phenomena.

Area map: a statistical map that emphasizes the change of phenomena with the shadow area under the line segment.

Bar chart: a statistical chart showing the size and change of statistical data by the length or height of bars with the same width.

Five questionnaire in-depth analysis

In addition to the above simple analysis, we can also use the powerful functions of spss to make in-depth analysis of the questionnaire, such as cluster analysis, cross analysis, factor analysis, mean ratio analysis (parameter test), correlation analysis, regression analysis and so on. Because it involves very professional statistical knowledge, the following is a brief introduction to the application scope and analysis purpose of personal useful methods:

1 cluster analysis

Sample clustering can classify the respondents and calculate the proportion of each category according to these attributes, so as to clearly study the groups concerned. For example, the respondents are clustered according to their consumption characteristics.

2 correlation analysis

Correlation analysis is an analysis method for whether there is correlation between two or more variables, and different correlation measurement methods should be selected according to the different characteristics of variables. Most of the variables used in questionnaire analysis belong to classified variables, and Spearman correlation coefficient should be adopted.

Chi-square test can be used, which is an analysis method of whether there is significant influence between two variables.

Comparison and test of three average values

(1) mean process: comprehensively describe and analyze the specified variables, calculate the mean in groups and then compare them. For example, it can be divided into men and women according to gender variables to study whether there is a gap in income between them.

(2)T test:

Independent sample t test is used to test whether irrelevant samples come from the same population. For example, study whether there is a significant difference in income between customers who buy the product and customers who don't buy the product.

If the samples are not independent, paired t test should be used. For example, study whether the work efficiency is improved after participating in vocational training.

4 Regression analysis

In the regression analysis of questionnaire analysis, discrete regression model, usually logistic model, is often used to explain the influence of one variable on another. For example, study the impact of income on the consumption of a commodity.