On the other hand, the average value of y can also help us understand the distribution of data. If each actual value is equal to the average value of y, then the data presents a completely random distribution. In real life, data are often not completely randomly distributed, but have a certain trend or law. By calculating the sum of squares of errors, we can model this trend or law and use this model to predict future data.
When calculating goodness of fit, we also need to consider degrees of freedom. Degree of freedom refers to the number of parts in an independent variable that can be changed freely. In linear regression, the degree of freedom is equal to the number of samples minus 1. By introducing the average value of y, one degree of freedom can be reduced, and the fitting degree of the model can be evaluated more accurately.
Introducing the average value of y is a necessary step to calculate the goodness of fit. It can not only be used as a benchmark for the sum of squares of errors, but also help us understand the distribution of data and accurately evaluate the fitting degree of the model.