Graphics in data analysis


1. Introduction. Graphic methods.  
2. Introductory analysis of data. 
3. Algorithms of the H-method.
4. Observed vs computed response values.
5. Graphic analysis of residuals.
6. Score space and vectors
7. Loading vectors.
8. Weight vectors.
9. Loading weight vectors.
10. Plots along internal trend.
11. Regression coefficients.
12. Importance of variables/samples.
13. Cross-validation
14. Modelling console.
15. Performance measures.
16. Bootstrapping and confidence intervals.
17. Blindfolding parts of data
18. Subdivision of samples
19. S
20. C
21. Guidelines for presentation of results

An important aspect of the H-method is the possibility of analysing graphically the inherent variation of the latent structure. These graphic procedures are valid for both linear and non-linear modelling of data. Here the focus is on procedures that in principle can be used in every situation, where the H-method has been applied. When one starts analyzing data, we typically have some expectation to the data. When we analyze the data, we learn about the special features in data. It may be that we find differences in the first part of data compared to the last part or to recent samples. It might also happen that it is detected that there is an 'internal' trend in data, for instance, a data over one year show different linear models for the winter or the summer. Often it is possible to divide the data into parts given by some variables. This happens for instance in many experimental designs. It is natural to formulate a maximal model and carry out statistical tests for reducing the model and finding significant effects. This approach (which is suggested by program packages) is often not appropriate. There can be too few 'degrees of freedom' for the analysis of the residuals. Also, which is often seen in industrial data, the residuals of the full model is on the same level as the 'noise' of the measurement instruments. The application of the H-method gives more reliable analysis and supplied with graphic procedures secures best possible model with respect to the prediction ability of the model. The significance testing becomes more reliable, when the testing is carried out in a model that has predictive performance.

By experimenting with these different graphic procedure the user learns to know the basic features of data. This assists him/her in making decisions concerning different model choices.