Advanced Econometrics 14: Binary Choice Model (Basics)

For the progress of my personal project, I will choose what I need to study first according to the progress? Sorry if I don’t follow the normal order!

The content of this article is the notes of "Advanced Econometrics and STATA Application", written by Teacher Chen Qiang and published by Higher Education Press.

I only took notes on the knowledge that I would use personally, and further elaborated on the difficult-to-understand parts of the textbook. In order to make it easier to understand, I have also modified some parts of the textbook (including proofs and text).

Contents

If the explanatory variable is discrete (for example, a dummy variable), this does not affect the regression. But sometimes the explained variable is discrete rather than continuous, which causes a headache.

This type of model is called a discrete choice model or a qualitative response model. In addition, sometimes the explained variables can only take non-negative integers, such as the number of patents obtained by a company within a certain period of time. This type of data is called count data, and its explained variables are also discrete.

Considering the characteristics of discrete explained variables, it is usually not suitable to use OLS for regression

Assume that individuals have only two choices, such as and . Whether to take the postgraduate entrance examination depends on the graduate's expected income after graduation, personal interests, etc. It is assumed that these explanatory variables are integrated in the vector . Therefore, the simplest model is the Linear Probability Model (LPM):

A consistent estimation requirement for (no endogeneity). However, there are several problems here:

Although LPM has the various shortcomings mentioned above, its advantage is that it is easy to calculate and easy to analyze the economic significance. Therefore, in order to make the predicted value always be between, we extend the LPM: in a given situation, the probability of the two-point distribution considered is:

Therefore, the function is called a connection Function (link function) because it links the explanatory variable with the explained variable. Since the value of is either 0 or 1, then must obey the two-point distribution.

There is a certain flexibility in the choice of the link function, which can be guaranteed by choosing an appropriate link function and will be understood as the "probability of occurrence" because:

In particular, if The standard normal distribution cumulative function (cdf) is:

Then this model is called the Probit model. If is the cdf of logistic distribution, that is:

Then this model is called a logit model.

Since the logistic distribution function has an analytic expression but the normal distribution does not, it is usually more convenient to calculate the logit model than the probit model. Obviously, this is a nonlinear model and can be estimated using the maximum likelihood method (MLE).

Taking the Logit model as an example, the probability density of the observation data is:

It can be written without segmentation:

After removing the logarithm, we have:

Assumption The individuals in the sample are independent of each other, then the LLF (log likelihood function) of the entire sample is:

This nonlinear maximization problem can be solved using numerical methods.

It should be noted that in this nonlinear model, the estimator is not the marginal effect. Taking Probit as an example, it can be calculated:

The chain rule of differential is used here, and it is assumed that is a continuous variable. Since the distribution functions used by Probit and Logit are different, their parameters cannot be directly compared. Instead, the marginal effects of the two need to be calculated separately and then compared. However, for nonlinear models, the marginal effect itself is not constant and changes as the explanatory variables change. Commonly used concepts of marginal effects are:

The calculation results of the above three marginal effects may be different. Traditionally, it is simpler to calculate the marginal effect at the sample mean; however, in nonlinear models, the individual behavior at the sample mean usually does not represent the average behavior of individuals (average behavior of individuals differs from behavior of the average individual). For policy analysis, the average marginal effect is more meaningful and is Stata's default method.

Since it is not a marginal effect, what economic significance does it have? For the logit model, let , then , since , then:

Among them, is called the odds ratio (odds ratio) or relative risk (relative risk). If the odds ratio is 2, it means that the probability of is twice as likely. Taking the derivative of the right side of the second equation, we can find that the meaning of is: if increases by a small amount, then the percentage of the odds ratio will increase by . Therefore, can be regarded as semi-elastic, that is, the percentage change in the odds ratio caused by increasing one unit.

There is another meaning that the field of biostatistics particularly likes to use. The consideration thus becomes, so the ratio of the new odds ratio to the original odds ratio can be written as:

Therefore, means causing The change multiple of the odds ratio.

In fact, if is relatively small, the two methods are equivalent (Taylor expansion). However, if it is necessary to change a unit (such as gender, marital status), then should be used. In addition, the Probit model cannot explain the coefficient similarly, which is a disadvantage of the Probit model.

How to measure the goodness of fit of a nonlinear model? In the absence of the sum of squares decomposition formula, it cannot be calculated. However, Stata still reports a quasi-R2 (Pseudo), proposed by McFadden (1974), which is defined as:

Among them, is the original model The maximum value of LLF is the maximum value of LLF with the constant term as the only explanatory variable. Since is a discrete two-point distribution, the maximum possible value of the likelihood function LF is 1, so the maximum possible value of LLF is 0, recorded as . Therefore, there must be , so .

Another way to judge the goodness of fit is to calculate the percentage of correct predictions. In fact, I think a series of commonly used goodness of fit in the field of machine learning such as MSE, MAPE, etc. can be used.

This section mainly reviews the content of Advanced Metrology 12 and Advanced Metrology 13.

In general, to perform statistical inference on the Probit and Logit models, the following assumptions need to be made:

Below we conduct two tests: the joint test of all coefficients and the single coefficient Describe the independent test

(1) The joint significance of all coefficients

When using Stata, an LR test statistic will be reported to test the significance of all other coefficients except the constant (that is, the joint significance of all coefficients). In Advanced Econometrics 13, we have derived the LR statistical inference expression for the coefficients of MLE:

The above statistical inference expression only relies on the two conditions of sample i.i.d. and the likelihood function being correct, the former is To apply the law of large numbers and the central limit theorem, the latter is to use the information matrix equation.

For Probit and Logit models, if the distribution function is not set correctly, it will be quasi-maximum likelihood estimation (QMLE), then we should pay attention to:

(2) Single coefficient Significance

When using Stata, the Std. err. for each coefficient is also reported. If you want to infer the significance of a single coefficient, you need to use the derivation in Section 6.5.2 of Advanced Econometrics 12:

a. Under the assumption that the sample taken is i.i.d., we use the law of large numbers and the central limit theorem can be deduced:

b. Under the assumption that the distribution function is set correctly (so the proof 3 of Advanced Measurement 11 can be used), it can be further deduced:

As mentioned before, even if the distribution function is incorrectly set, if it is true, then in the case of i.i.d., the robust standard error is equal to the ordinary standard error of the MLE. So the above equation can be used as long as it holds.

c. If , then the Probit and Logit models cannot obtain consistent estimates of the coefficient . Statistical inference is meaningless at this point.

If you want to test a single coefficient from the above formula, you obviously need unknown real parameters. So we can handle it according to the method of 6.6 of Advanced Metrology 12, and we won’t go into details here.