next home previous table of contents chap 3 toc

3.3.5 Data Fitting Using Other Measures of Aggregate Error

In Section 3.3.4, we focused on the method of least squares. This method uses the sum of the squares of the individual errors at the data points as a definition of aggregate error and attempts to make this total error as small as possible. Now we will consider some other definitions of aggregate error: the largest individual error and the sum of the errors.

Figure 3.8 Cubic of best fit through SAT I data points.

Suppose that a function f(x) is used to fit n data points, and let (xi, yi) be one of these points. Then the error at this point is given by ei = |yi-f(xi)|. The line that minimizes sum_{i = 1}^{n} e_i, the sum of the individual errors, is called the best l1 fit. The line that minimizes sum_{i = 1}^{n} e_{i}^2, the sum of the squares of the individual errors, is called the best l2 fit. This is the least squares fit that we discussed earlier. The line that minimizes max{e1, e2, …,en}, the largest of the individual errors, is called the best l_infinity fit or best minmax fit.

In general, we define the best lp fit to be the function that minimizes sum_{i = 1}^{n} e_{i}^p. The different ways of defining aggregate error can lead to different “best” lines. Each has its own basic characteristics.

Consider the data points in Figure 3.9. All but one of the points seem to lie in a straight line; one of the points is far away from the others. This point is commonly referred to as an ``outlier.'' These points may be part of the graph of a function with a spike in one area. If so, a straight line will not do a very good job of fitting the function. Another possibility is that the faraway point reflects an error in measurement or transcribing of the data.

scatter plot

Figure 3.9 Illlustrative scatter plot

Figure 3.10 shows the lines resulting from performing the best l1, l2, and l_infinity fits of the data.

Figure 3.10 Representation of various lines of best fit.


The best l1 fit ignores the outlier and defines a line passing through the remaining points. This is one of the chief characteristics of an l1 fit. It is not heavily influenced by isolated points far from the main grouping of the data. If the isolated point is really in error, then a fit that ignores it is just what we want. However, if the isolated point is a good value, then we may have thrown away some important information.

The best l_infinity fit tries to make the maximum individual error as small as possible. In our case, it determines a line located midway between the row of lined-up data points and the isolated point. The best l2 fit places the line between these two extremes, but much closer to the grouped points.

Unless there is some particularly compelling reason to choose a different type of fit, computational scientists use a least squares fit. This is the only line of best fit whose coefficients can be computed by solving a linear system of equations, and linear systems can be solved efficiently. The other types of fit involve computational difficulties that make them less attractive.

Suppose that we are trying to find the constant function f(x) = c that best fits a set of data points, y1, y2, …, yn . We will do this by finding a value c* that makes as small as possible. (We can include the l_infinity case in here by agreeing that means max{|y1 - c|, |y2 - c|, …, |yn - c|}.) For now, we will assume that the value of p has been selected, but we won't specify what it is. To determine c*, we could plot as a function of c and find the lowest point, or minimum. Methods for determining minima often look for points at which the line tangent to the function has slope zero (that is, places where the function has derivative zero). This type of method is not appropriate for , since E_infinity is not a continuous function of c. E1 is a continuous function of c, but its graph has sharp corners for values of c that coincide with any of the data values. The slope of the tangent line at these points is not defined. So the cases p = 1 and p = infinity will have to be handled by other types of methods. For the other cases, let's let gp be the function defined by gp(c) = the slope of Ep at c. For the case p = 2 (least squares), g2 is a line. Finding the value of c for which g2(c) = 0 is easy. For higher values of p, gp is a nonlinear function, and it will be much harder to find a point at which gp(c) is zero (see Chapter 4).

For problems that require us to determine more than one coefficient, using the least squares error function allows us to determine the minimum error by solving a linear system (the normal equations). Using Ep, with p > 2, leads to a nonlinear system of equations, which is significantly more difficult to solve.

next home previous go to the top table of contents chap 3 toc

Send comments on material to Cynthia Lanius

These pages are maintained by Hilena Vargas (hvargas@rice.edu)
Updated: March 1, 2001

 Copyright © 2001 Richard Tapia and Cynthia Lanius