Method of Least Square

Consider the points (x1, y1), (x2, y2), (x3, y3), ………., (xn, yn) shown on the graph.

The blue line has the equation y = m x + b

We will find m and b so that the error is least.

E = ( m * x1 + b - y1 ) 2 + ( m * x2 + b - y2 ) 2 + ( m * x3 + b - y3 ) 2 + …….


We must have:

2(m*x1 + b -y1)*x1 + 2(m*x2 + b -y2)*x2 + 2(m*x3 + b -y3)*x3 + ….. = 0

and

2(m*x1 + b -y1) + 2(m*x2 + b -y2) + 2(m*x3 + b -y3) + ….. = 0


where we used the chain rule. The equations can be written as:

m ( x1*x1 + x2*x2 + x3*x3 + ……) + b ( x1 + x2 + x3 + ….) =

x1*y1 + x2*y2 + x3*y3 + …..

m ( x1 + x2 + x3 + …) + b ( 1 + 1 + 1 + …) = y1 + y2 + y3 +…..


Using the summation notation we can write:

Solving for m we get:

If you can show that this is the same equation as the one in the book you will get 10 points added to your score of 1000.

The Correlation Coefficient ( using the same notation as above) is:

If we use the notation

< x > for x average

< y > for y average

< xy > for the average of the product

< xx > for the average of squares of x

< yy > for the average of squares of y


then we have:

m = ( < xy > - < x > <y > ) / ( < xx > - < x > < x > )

< y> = m < x > + b

and

r = ( < xy > - < x > <y > ) / { ( < xx > - < x > < x > )0.5 *( < yy > - < y > < y > )0.5 }