|
We will assume that ATA is invertible. This is a reasonable assumption
because there are more equations than unknowns. The length of the vector x is defined to be
so the length of the vector squared is
The square of the length of the vector Ax-b is the sum of the
squares of the errors since each element of Ax - b is the error at a point.
We want to minimize this quantity. We claim that the solution to the normal
equations, which we will denote as x* = (ATA)-1ATb, minimizes the sum of the squares of the errors.
We claim, and will prove, that
||Ax - b||2 = ||Ax* - b||2 + ||A (x - x*)||2.
Since we are minimizing over x, ||Ax - b||2 will be minimized when x = x* so that ||A (x - x*)||2 is zero. This is true because ||A (x - x*)||2 can never be negative.
To prove the claim, we will work with the right side of the equation and
prove that it is equal to the left side. These equations are numbered. There
are short explanations of these steps following the proof.
| |
= |
||Ax* - b||2 + ||A (x - x*)||2
|
(1) |
| |
= |
(Ax* - b)T (Ax* - b) + (A(x - x*))T A(x - x*)
|
(2) |
| |
= |
(A (ATA)-1ATb - b)T (A (ATA)-1ATb - b)T +
|
|
| |
|
(A(x - (ATA)-1ATb))T A(x - (ATA)-1ATb)
|
(3) |
| |
= |
(A (ATA)-1 ATb - b)T (A (ATA)-1 ATb - b) +
|
|
| |
|
(Ax - A (ATA)-1 ATb)T (Ax - A (ATA)-1 ATb)
|
(4) |
| |
= |
(bTA (ATA)-1 AT - bT) (A (ATA)-1 ATb - b) +
|
|
| |
|
(xTAT - bTA (ATA)-1 AT)(Ax - A (ATA)-1 ATb)
|
(5) |
| |
= |
(bTA (ATA)-1 ATA (ATA)-1ATb - 2bTA (ATA)-1ATb + bTb +
|
|
| |
|
(xTATAx - 2bTA (ATA)-1 ATAx +
bTA (ATA)-1 ATA (ATA)-1ATb)
|
(6) |
| |
= |
(bTA (ATA)-1 ATb - 2bTA (ATA)-1 ATb + bTb) +
|
|
| |
|
(xTATAx - 2bTAx + bTA (ATA)-1 ATb)
|
(7) |
| |
= |
bTb + xTATAx - 2bTAx
|
(8) |
Now, if we look at the left side of the equation, we can see that it equals
the right side because
||Ax - b||2 = (Ax - b)T (Ax - b) = (xTAT - bT) (Ax - b) = xTATAx - 2bTAx + bTb.
This proves that x = x* = (ATA)-1ATb
uniquely minimizes
the sum of the squares of the error for the equation Ax - b.
The following are comments on the steps of the proof:
- This is the right hand side of the equation from the claim.
- ||y||2 = yTy
- Replace x* with
(ATA)-1ATb.
- Distribute A in the second term.
- (M + N)T = MT + NT, (RS)T = STR T, and (A-1)T = (AT)-1
- Because each term is a vector and xTy = yTx, which can also be
written as
,
we can multiply in a manner similar to polynomial multiplication. However,
we must remember that matrix multiplication is not commutative, so the order
of multiplication of the matrices matters.
- (ATA)-1(ATA) = I
- Combine like terms.
|