next home previous table of contents chap 3 toc

3.3.4 Linear Least Squares

In the method of least squares (also referred to as the best l2 fit), a curve is found that fits the data in such a way that the sum of the squares of the error at each point is minimized.

Example 1: Fit of the Data with a Linear Function.

Consider again the SAT I data in Table 3.7.


Income Level 1 2 3 4 5 6 7 8 9 10
Average SAT I Score 444 458 478 493 506 516 523 531 543 571

Table 3.7

We want to find the linear function f(x) of the form

f(x) = c0 + c1x (3.5)

that best represents the relationship between x (Income Level) and y(Average SAT I Score) for this set of data, according to the method of least squares. In order to find this line, we must find the optimal values for c0 and c1. To do this, we first create a system of linear equations by substituting each SAT data point from Table 3.7 into (3.5). The system of equations is

system of equations (3.6)

Each data point provides one of the equations in the system. Note that we have ten equations and two unknowns. Systems with more equations than unknowns are called overdetermined; they are not expected to have solutions. The least-squares solution of the overdetermined system (3.6) is the value (c0*,c1*) that gives the smallest value for the error measure


We say that (c0*,c1*) solves problem (3.6) “in the least-squares sense.”

The linear system (3.6) can be written in matrix form as

Xc = y (3.7)

where

X, c and y

If we multiply the system

Xc = y

on both sides by XT, we get

XTXc = XTy.

Since the matrix X is 10 x 2, its transpose, XT, is 2 x 10. Hence the matrix XTX is a 2 x 2 square matrix.

The equations represented by

XTXc = XTy (3.8)

are called the normal equations.

Computational scientists faced with solving a new type of problem frequently try to convert it into the form of a problem they already know how to solve. However, they must always verify that the two problem formulations have the same solutions. Although the proof is beyond the scope of this text, it can be shown that a least squares solution of the overdetermined linear system (3.6) must satisfy the normal equations (3.8). Conversely, any solution of the normal equations is a least squares solution of the overdetermined linear system (3.6).

For the problem at hand, the normal equations are

normal equations


(3.9)

We can solve for c0 and c1 using Gaussian elimination. The coefficients that solve problem (3.9) are

coefficients

These coefficients solve problem (3.6) in the least squares sense and specify f(x) = 436.2 + 12.7455 x as the line of best fit. Figure 3.5 shows the graph of this line as well as the original data points. If we evaluate this function at the values of x (income level) given in Table 3.7, then we see that the values calculated do not match the SAT scores from Table 3.7 exactly.


Figure 3.5 Best l2 fit for data points.


Look at the data points in Figure 3.5. If we were to follow the data in a connect-the-dot fashion, we would see curvature. This leads us to wonder if a straight line is the best function to use to fit this data. Perhaps a parabola or cubic function would follow the curve of the data more closely. In the next section, we will investigate the application of the least squares technique to the fitting of nonlinear curves.

Example 2: Fitting the Data with a Nonlinear Function.

Consider again the SAT I data.

Income Level 1 2 3 4 5 6 7 8 9 10
Average SAT Score 427 449 476 493 505 514 520 527 539 559

Table 3.8


Figure 3.6 Income Level vs. Average SAT I Math Score


We will again attempt to fit this data, but this time we will use a parabola. The general equation for a parabola is

y = c0 + c1x + c2x2 .

First we substitute each of the SAT data points into the equation for a parabola and obtain the following overdetermined (ten equations; three unknowns) system of linear equations.

overdetermined system of linear equations (3.10)

Notice that the problem we have stated is linear, even though we are fitting data based on a nonlinear model. The unknowns in our problem are the ci, and they appear linearly. Since the problem is linear, we call the technique “linear least squares.”

In matrix form, the system is Xc = y, where

X, c and y

The normal equations are XTXc = XTy, or

normal equations


normal equations

which simplifies to

normal equation

Using Gaussian elimination and back substitution as in the previous example, we find the solution of the normal equations to be

solution

Our least squares parabola is f(x) = -0.2008x2 + 14.9538x + 431.7833. The graph of this parabola is shown in Figure 3.7.

Linear Least Squares in the General Case

The method of linear least squares can be used any time the relationship among data points is known to have the following functional form:


f(x)=c0f0(x) + c1f1(x) + c2f2(x) …+ cnfn(x)
(3.11)

Each ci ( i = 1,...,n) is an unknown coefficient that is to be determined, and each fi ( i = 1,...,n) is some function. The functions fi need not be linear. We will solve for the values of the ci by setting up a least squares problem in which each equation is obtained by evaluating (3.11) at a different point xj. Once we choose the value of xj, each term fi(xj) is a constant.

In our first example, we predicted that the relationship between the data points could be described by the function f(x) = c0 + c1x. This is a specific example of (3.11) using n = 1, f0(x) = 1, and f1(x) = x. We determined the value of f1(x) by substituting into it the x values from the SAT I data points. In the second example, we described the relationship between the SAT I data points by the function f(x) = c0 + c1x + c2x2. This is a specific example of (3.11) using n = 2, f0(x) = 1, f1(x) = x, and f2(x) = x2. If we fit the same data using a cubic (n = 3, f0(x) = 1, f1(x) = x, f2(x) = x2, and f3(x) = x3), then we get the curve shown in Figure 3.8.

Figure 3.7 Parabola of best fit to the SAT I data points.

 

Table 3.3.4 summarizes the results of fitting the SAT data with a linear function, a quadratic, and a cubic.


Degree Function Used to Fit the Data Sum of
Squared Errors
1 428.8          + 13.109091x
1023.9
2 409.966667 + 22.525758x - 0.856061x2
 863.7210
3 384.066667 + 45.497669x - 5.836830x2 + 0.301865x3
 593.8309

next home previous go to the top table of contents chap 3 toc

Send comments on material to Richard Tapia

These pages are maintained by Hilena Vargas (hvargas@rice.edu)
Updated: December 17, 2003

 Copyright © 2001 Richard Tapia and Cynthia Lanius