|
In the method of least squares (also referred to as the best l2 fit), a curve is found that fits the data in such a way that the sum of the squares of the error at each point is minimized.
Example 1: Fit of the Data with a Linear Function.Consider again the SAT I data in Table 3.7.
We want to find the linear function f(x) of the form
that best represents the relationship between x (Income Level) and y(Average SAT I Score) for this set of data, according to the method of least squares. In order to find this line, we must find the optimal values for c0 and c1. To do this, we first create a system of linear equations by substituting each SAT data point from Table 3.7 into (3.5). The system of equations is
Each data point provides one of the equations in the system. Note that we have ten equations and two unknowns. Systems with more equations than unknowns are called overdetermined; they are not expected to have solutions. The least-squares solution of the overdetermined system (3.6) is the value (c0*,c1*) that gives the smallest value for the error measure We say that (c0*,c1*) solves problem (3.6) in the least-squares sense. The linear system (3.6) can be written in matrix form as
where
If we multiply the system
Xc = y
on both sides by XT, we get XTXc = XTy.
Since the matrix X is 10 x 2, its transpose, XT, is 2 x 10. Hence the matrix XTX is a 2 x 2 square matrix.
The equations represented by
are called the normal equations. Computational scientists faced with solving a new type of problem frequently try to convert it into the form of a problem they already know how to solve. However, they must always verify that the two problem formulations have the same solutions. Although the proof is beyond the scope of this text, it can be shown that a least squares solution of the overdetermined linear system (3.6) must satisfy the normal equations (3.8). Conversely, any solution of the normal equations is a least squares solution of the overdetermined linear system (3.6).
For the problem at hand, the normal equations are
We can solve for c0 and c1 using Gaussian elimination. The coefficients that solve problem (3.9) are These coefficients solve problem (3.6) in the least squares sense and specify f(x) = 436.2 + 12.7455 x as the line of best fit. Figure 3.5 shows the graph of this line as well as the original data points. If we evaluate this function at the values of x (income level) given in Table 3.7, then we see that the values calculated do not match the SAT scores from Table 3.7 exactly.
Look at the data points in Figure 3.5. If we were to follow the data in a connect-the-dot fashion, we would see curvature. This leads us to wonder if a straight line is the best function to use to fit this data. Perhaps a parabola or cubic function would follow the curve of the data more closely. In the next section, we will investigate the application of the least squares technique to the fitting of nonlinear curves. Example 2: Fitting the Data with a Nonlinear Function.
Consider again the SAT I data.
We will again attempt to fit this data, but this time we will use
a parabola.
The general equation for a parabola is
y = c0 + c1x
+ c2x2 .
First we substitute each of the SAT data points into the equation
for a parabola and obtain the following overdetermined (ten equations;
three unknowns) system of linear equations.
Notice that the problem we have stated is linear, even though we are fitting data based on a nonlinear model. The unknowns in our problem are the ci, and they appear linearly. Since the problem is linear, we call the technique linear least squares.
In matrix form, the system is Xc = y, where
The normal equations are XTXc = XTy,
or which simplifies to Using Gaussian elimination and back substitution as in the previous example, we find the solution of the normal equations to be Our least squares parabola is f(x) = -0.2008x2 + 14.9538x + 431.7833. The graph of this parabola is shown in Figure 3.7. Linear Least Squares in the General CaseThe method of linear least squares can be used any time the relationship among data points is known to have the following functional form:
Each ci ( i = 1,...,n) is an unknown coefficient that is to be determined, and each fi ( i = 1,...,n) is some function. The functions fi need not be linear. We will solve for the values of the ci by setting up a least squares problem in which each equation is obtained by evaluating (3.11) at a different point xj. Once we choose the value of xj, each term fi(xj) is a constant. In our first example, we predicted that the relationship between the data points could be described by the function f(x) = c0 + c1x. This is a specific example of (3.11) using n = 1, f0(x) = 1, and f1(x) = x. We determined the value of f1(x) by substituting into it the x values from the SAT I data points. In the second example, we described the relationship between the SAT I data points by the function f(x) = c0 + c1x + c2x2. This is a specific example of (3.11) using n = 2, f0(x) = 1, f1(x) = x, and f2(x) = x2. If we fit the same data using a cubic (n = 3, f0(x) = 1, f1(x) = x, f2(x) = x2, and f3(x) = x3), then we get the curve shown in Figure 3.8.
Table 3.3.4 summarizes the results of fitting the SAT data with a linear function, a quadratic, and a cubic.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Send comments on material to Richard Tapia
These pages are maintained by Hilena
Vargas (hvargas@rice.edu)
Updated: December 17, 2003
Copyright © 2001 Richard Tapia and Cynthia Lanius