A review of ideas and methods

It is often difficult to work with non-linearity in data. In this work I use special basic procedure when analysing non-linearity in data. When developing linear models we found a score vector t, t=Xw, that in some sense is good to work In Vol 1 several methods are presented that show how we can find the weight vector w. In the non-linear extensions we work with the score vector t and powers in t and their and cross products with previously selected score vectors. The result of the analysis is schematically illustrated as follows:

X® T = [t1, t2, ...] ® H = [t1, t12, t2, t22, t2×t1, t12×t2,... ]

The first score vector is found such that [t t2 t3 t4] is in some sense are good to work with. In regression context we e.g., can maximize the term

|Y¢ t|2 + |Y¢ t2 |2 + |Y¢ t3 |2 + |Y¢ t4|2

with respect to w. When w has been found each term tn is considered closer. If it is not found significant it is dropped from the analysis and we revise the computation of w. Suppose that only the first and second power of t are significant. The resulting terms are t1, t12. Then X is adjusted for this choice of score vector. At the next step we find t2=Xw such that t2, the powers of t2 and their cross products with t1, t12 are good to work with. When w has been found, each term is studied closer and removed if not significant. This is necessary, because otherwise the amount of terms increase very rapidly. The result of the analysis is the matrix H. The regression analysis is carried out by projecting Y onto H. If w is allowed to vary freely in the maximization tasks, we obtain nonlinear extension of PLS regression. If some coordinates of w are restricted to be zeros except for one coordinate that is equal to one, we are finding variables such that polynomials in these variables are good to work with. If only linear terms are significant, the procedure reduces to different types of linear analysis (of PCA or of regression type). This approach is used within many areas of modelling. In this book we consider closer the following areas:

1) Finding a non-linear low-dimensional surface in a high-dimensional (variable) space

2) Estimation of parameters in non-linear functions

3) Simultaneous estimation of linear parameters and covariance matrix.

The advantage of the approach is that linear methods of dimension analysis, sensitivity analysis, variable selection and similar methods can be transferred to the nonlinear analysis. It means that we have basically the same tools to analyse nonlinear data as the linear ones. These methods are efficient, when the non-linearity is smooth, because we are expecting a polynomial behaviour in the data. They generally function better than traditional methods partly because they take into account the reduced rank in data and partly because the model variation is controlled. It is recommended not to use higher power than the fourth one, except there is a clear curvature in data that suggests higher than fourth power terms. We shall now briefly look at the applications within the areas mentioned above.

 

Finding a non-linear low-dimensional surface in a high-dimensional (variable) space

We shall consider here the NIR data, see p. 321 in Vol. 1. There are given 19 x-variables that are reflection of NIR light at different places of the meat pieces. The y-variable is the fat content of the meat pieces. 25 pieces of meat have been measured. The figure shows a plot of the fat contents versus the first PLS component.

It shows a clear curvature in the relationship between the fat content and the first PLS score vector. If we carry out a linear analysis we find that three components are significant. In the next figure we show the plot of estimated y-values versus the observed y-values.

There we do not detect any non-linearity. The fit is relatively good. 95.65% of the variation in the fat content is explained by the model. The third score vector, which is relatively small, smooths out the non-linearity. Should we use the linear model or try a non-linear one? If we know that all possible values of the fat content are between the values of the present data, we can use the linear model. The fat content varies between 11.16% and 17.04%. What if we risk a fat content of value 20%? In that case we can not expect the linear model to function well. There is a clear curvature in data and it is always risky to explain the non-linearity by linear models and use extra components. If we use the approach presented above, we find that the vectors t1, t12, t2 are significant. If we experiment and leave out samples with fat content close to 17.04% or to 11.16%, we find that the non-linear model is more robust than the linear one. The non-linear model with the vectors t1, t12, t2 explains 94.77% of the variation. It explains less variation than the linear model, but it takes into account the curvature in data.

Estimation of parameters in non-linear functions

A frequent task is to estimate unknown parameters in a non-linear function. Consider the following function of Gaussian curves

y(x) = c1 exp(-(x - a1)2/b1) + c2 exp(-(x - a2)2/b2) + c3 exp(-(x - a3)2/b3).

The function contains nine unknown parameters. It is often difficult to estimate the parameters, if there are variation in the sample values. In the next figure we show the sample values and an estimated curve by a traditional approach.

We see that there is a lot of 'noise' in the sample values. We also see that the first peak is not satisfactorily estimated. The problem here is that the estimated matrix X of derivatives is close to singular for the present data. Our approach takes appropriately care of singularity of X. The approach above gives us the following result.

We see that all three peaks are correctly found by the procedure. All estimated parameter values have approximately correct sizes. And we can compute the statistical uncertainties of the parameter estimates. An important aspect of this approach is that we can identify parameters that are redundant.