Least-Squares Fitting

Least-Squares Fitting

Course Material Index

Section Index

The detail of the mathematics given in this page is to be treated as extra-curricular material.

Least-Squares Fitting

Least-squares fitting is an essential element of structure refinement, so some understanding of the concepts involved is needed for a full appreciation of the structure refinement process. However, you do not need a detailed understanding of the mathematics unless you wish to write your own computer programs for least-squares fitting.

Linear Least-Squares

Before considering the complexities of crystallographic least-squares refinement, it is instructive to consider the simple case of linear least- squares for the two parameter straight-line function y = m x + c. For the case of just two observed data points (x₁,y₁) and (x₂,y₂), we have two equations and two unknowns, m and c (see figure A). The equations are:

y₁ = m x₁ + c
y₂ = m x₂ + c

which may be written in matrix format § as:

	( y₁ )	=	( x₁	1)		( m )
	( y₂ )	=	( x₂	1)		( c )

The parameters of m and c are obtained by inverting the 2 × 2 square matrix and multiplying it by the column matrix in y as shown below:

	( m )	=	( x₁	1 )⁻¹		( y₁ )
	( c )	=	( x₂	1 )		( y₂ )

This demonstrates that a simple matrix inversion method can be used to determine values for the two unknown parameters, m and c. The 2 × 2 square matrix used here is an example of a design matrix.

Now suppose we have more observations than parameters as shown in figure B below:

Figure A	Figure B

We can now write 5 equations of the type y = m x + c (as shown below in matrix form), but they cannot all be satisfied simultaneously with the same values for the unknown parameters, m and c. What we are saying, in effect, is that the points do not all lie on the straight line in figure B.

( y₁ )	=	( x₁	1 )	( (	m c	) )
( y₂ )		( x₂	1 )
( y₃ )		( x₃	1 )
( y₄ )		( x₄	1 )
( y₅ )		( x₅	1 )

Any pair of the above equations can yield values for m and c, but the values will depend on which pair of equations is picked, i.e. the values of m and c will not be unique. We might then ask: what are the best/optimum values for m and c? (Note that this makes the tacit assumption that a straight line function provides an appropriate model for the data.) It has proved easy to solve this problem by defining the best values as those for which the function (y(obs) − y(calc)) squared is a minimum: hence the expression least-squares minimization. For the case of our straight line function, this involves minimization of:

S =

Σ
i

( y_i − (m x_i + c))²

The function, S, has a minimum when the partial derivatives with respect to each of the parameters, p_j, is zero, i.e. when

(

∂S
—
∂p_j

)_k

= 0

For the straight-line function, S is a minimum when:

(

∂S
—
∂m

)_c

Σ
i

x_i² + 2c

Σ
i

x_i − 2

Σ
i

x_i y_i

= 0

(

∂S
—
∂c

)_m

Σ
i

x_i + 2c

Σ
i

1 − 2

Σ
i

y_i

= 0

These two linear equations with two unknown parameters may be solved to give the "best" values for m and c directly. The approach may be generalized to the fitting of M parameters with N observations (where M < N) as long as the function is linear in the M parameters.

Iterative Methods

Crystallographic least-squares refinements require a quite different approach, because the functions, whether they be for unit-cell or structure refinement, are very non-linear in all (or nearly all) of the parameters. In consequence, a non-linear least-squares refinement procedure must be used, and for this a numerical iteration process is required. Before explaining non-linear least-squares, a short digression on the subject of iteration is given.

Suppose that we want to know a solution to the equation ƒ(x) = 0, where the function ƒ(x) = y is shown (in magenta) in the figure below:

We might guess that one solution to the equation corresponds to the x value of the point shown in red. Newton's method gives an improved value for the solution to the equation using the gradient, dƒ/dx, at this point:

x₂ = x₁ − ƒ(x₁) /

(

dƒ
—
dx

)

Using this equation, the x value of the point shown in green is obtained. The process can be repeated yet again, and it should be obvious from the figure that the next iteration will give a value of x (indicated by the third point in orange) that is very close to one solution of the equation ƒ(x) = 0.

When considering crystallographic refinements it is important to understand the necessity of iteration, where in general several (or sometimes many) refinement cycles are required to obtain best-fit parameter values, in contrast to the linear least-squares process described earlier, where the results are obtained directly and without the requirement for initial values of the parameters. The figure also illustrates another point. Suppose that the initial value chosen for x was negative: the procedure would still converge, but to a different value of x. In this instance, the alternative value for x is an equally valid solution to the equation ƒ(x) = 0. However, the functions used in crystallography are not as simple as shown here, and poor initial values, e.g. a poor initial crystallographic model, can result in the iterative least-squares refinement converging to a false solution, often called a local minimum. Crystallographers need to be able to spot local minima and escape from them in order to find the global minimum.

Non-linear Least-Squares

It was stated earlier that the function we want to minimise in the Rietveld method is

Δ =

Σ
i

w_i { y_i(obs) − y_i(calc)}²

where the calculated function is non-linear in terms of the various parameters, p_j. As with linear least-squares, this function will be at a minimum when each of the partial derivatives with respect to each parameter is zero, i.e.

(

∂Δ
—
∂p_j

)_k

N
Σ
i =1

{

2w_i ( y_i(obs) − y_i(calc))

(

∂y_i
—
∂p_j

)_k

}

= 0

[1]

where N is the total number of observations, i. If there are M parameters, p_j, then we have M equations of this type. Supposing that we now have a crystallographic model that is close to satisfying the above equations, but such that parameters in our model differ from those of the true minimum by an amount δp_j.

The value of a function at x + Δx given its value at x can be calculated using the Taylor series:

ƒ(x + Δx) = ƒ(x) + Δx

dƒ(x)
——
dx

(Δx)²
——
2!

d²ƒ(x)
——
dx²

(Δx)³
——
3!

d³ƒ(x)
——
dx³

+ ...

As the difference Δx tends to δx tends to zero, then the higher terms Δxⁿ in this expansion may be ignored for n > 1.
It is upon this assumption that the whole process of non-linear least-squares refinement is based. If the initial crystallographic model is too approximate, then the assumptions made in the least-squares procedure will not be valid. As a result of this approximation, all crystallographic refinements are of an iterative nature.

Applying the Taylor expansion to our multi-parameter function y(calc) gives:

y_i(calc) = y_i(p₁ ... p_M) +

M
Σ
k=1

δp_k

(

∂y_i
—
∂p_k

)_j

[2]

Substituting y(calc) from equation [2] into equation [1] (and rearranging) results in the jth so-called normal equations:

N
Σ
i=1

M
Σ
k=1

w_i δp_k

(

∂y_i
—
∂p_k

)_j

(

∂y_i
—
∂p_j

)_k

N
Σ
i=1

{

w_i ( y_i(obs) − y_i(p₁ ... p_M))

(

∂y_i
—
∂p_j

)_k

}

These are more easily visualized in matrix form as shown schematically below:

where A is an M × M symmetric square matrix and S and V are M × 1 column matrices. The column matrix S contains the shifts δp_j that will be added to the parameters as part of the least-squares refinement procedure. In order to obtain the matrix S, it is necessary to invert the square matrix A, i.e. S = A⁻¹ × V. The elements of the matrix A are the sum of the products of the partial derivatives:

A(j,k) =

N
Σ
i=1

w_i

(

∂y_i
—
∂p_k

)_j

(

∂y_i
—
∂p_j

)_k

while the elements of the matrix V contain information regarding the observed data:

V(j) =

N
Σ
i=1

{

w_i ( y_i(obs) − y_i(p₁ ... p_M))

(

∂y_i
—
∂p_j

)_k

}

From this we can see that if there are problems in inverting the matrix A, then it is our crystallographic model that is at fault and not the data. Furthermore, if the model is good (so that the matrix inverts correctly), but the data are poor, then the calculated shifts to the parameters will be ill-defined.

§ The large brackets used for matrices (and partial derivatives) are difficult to render in standard HTML: stacked round brackets have therefore been used as a replacement. Our apologies for this rather poor alternative.

Course Material Index

Section Index

Author(s): Jeremy Karl Cockcroft