Why should the average value of y be introduced into the goodness of fit?

The goodness of fit is one of the important indexes to evaluate the fitting degree of linear regression model, which reflects the prediction ability of the model. When calculating the goodness of fit, we introduce the average value of y as a reference. This is because in linear regression, we need to compare the actual value with the predicted value, and the average value of y can be used as the benchmark. By comparing each actual value with the average value of y, we can calculate the sum of squares of errors and use it to evaluate the fitting degree of the model.

On the other hand, the average value of y can also help us understand the distribution of data. If each actual value is equal to the average value of y, then the data presents a completely random distribution. In real life, data are often not completely randomly distributed, but have a certain trend or law. By calculating the sum of squares of errors, we can model this trend or law and use this model to predict future data.

When calculating goodness of fit, we also need to consider degrees of freedom. Degree of freedom refers to the number of parts in an independent variable that can be changed freely. In linear regression, the degree of freedom is equal to the number of samples minus 1. By introducing the average value of y, one degree of freedom can be reduced, and the fitting degree of the model can be evaluated more accurately.

Introducing the average value of y is a necessary step to calculate the goodness of fit. It can not only be used as a benchmark for the sum of squares of errors, but also help us understand the distribution of data and accurately evaluate the fitting degree of the model.