Large scale modelling and optimization procedures

Large scale modelling and optimization procedures

A unified algorithmic approach to modelling and optimization

Resumé. Here are presented algorithms that combine methods of regression analysis and optimization algorithms into one set of framework.

The given data. We supppose here that there is given an N times K matrix X that contains the technical data. In regression context the rows of X are often viewed as objects and columns of X as variables. In optimization context it can be the technical specifications that given in the concrete situation. The N times M matrix Y is in regression context often viewed as the response matrix that we want to describe. In optimization context it often represents the resources that are available. Typically, we want the solution matrix B not to exceed the available resources Y, which is formulated as XB£Y. Finally we have the L times K matrix C. In optimization context it can represent the costs that we want to minimize. In regression context it is typically an index of quality, environment or costs by which we want to judge our solution B. Schematically the three matrices X, Y and C can be shown as follows. It shows that the number of columns of X and C are the same and also the number of rows of of X and Y. In regression context we usually do not have the matrix C. We are typically only interested in finding the solutions B such that XB describes Y as well as possible.

The decomposition of X. The matrix X is decomposed into rank one parts as follows,

(1) X = λ₁ t₁ p₁¢ + λ₂ t₂ p₂¢ + ...

We call the set (λ_a , t_a , p_a) for components. It is the set that is selected at each step. The vectors (t_a) are called the score vectors, (p_a) the loading vectors and (λ_a) scaling constants. These vectors are determined by certain criteria that reflect the problem in question. t_a is determined by finding a weight vector w_a for the columns of X, and computed as t_a=Xw_a. Similarly, the loading vector p_a is computed as p_a=X¢v_a, where v_a is found by an appropriate criterion. I often use the language that t_a is to describe Y as well as possible and p_a is to report on C as well as possible. It is clear that there is a symmetry between Y and C. A criterion used for finding good weight vector w_a to use in computing t_a can also be used to find the weight v_a for rows of X that is used to compute p_a. But usually there is a different emphasis on C and Y. A typical objective concerning C is to get as ‘cheap’ solution as possible, because C represents costs or related terms. The objective concerning Y, on the other hand, is to obtain a solution B such that XB is ‘close’ to Y. Thus, the two types of objective can be very different. Also, although both Y and C are present, the modelling task may be concerned with only one of them. The role of the different vectors and matrices are illustrated by the figure below.

The algorithm automatically computes the generalized inverse, X⁺, given by

(2) X⁺= λ₁ r₁ s₁¢ + λ₂ r₂ s₂¢ + ...

This generalized inverse satisfies XX⁺X=X. The vectors (r_a) and (s_a) satisfy the orthogonality relationships,

r_a¢p_b=0, s_a¢t_b=0, r_a¢p_a=1/λ_a s_a¢t_a=1/λ_a, a¹b.

The solution B is computed as B=X⁺Y. If only A components are selected, we work with the truncated expression for X and X⁺ with only the first A terms in (1) and (2).

Requirements to the solution vector B. We often have certain requirements that the solution B must satisfy. In fact the algorithm distinguishes between the following situations:

1) B can vary freely

2) The values of B must be non-negative, B³0.

3) Linear constraints on B. The rows of X are arranged in such a way that the constaints

are of following types:

X₀B Free equations

X₁B = Y₁ Equality constraints

X₂B £ Y₂ Inequality constraints, upper limit equations

X₃B ³ Y₃ Inequality constraints, lower limit equations

Orthogonality. If there are two objectives in the analysis, i.e., the weight vectors w_a and v_a are determined according to separate criteria and independent of each other, neither the score vectors (t_a) nor the loading vectors (p_a) will be orthogonal. On the other hand we get numerially more stable computations, if orthogonality is used. Also, it is easier to make interpretations there is orthogonality. Therefore, it is often desirable to make either (t_a) or (p_a) an orthogonal set of vectors. If we want orthogonal score vectors we choose the weight vectors v_a as v_a=t_a/|t_a|. And similarly, if we want orthogonal loading vectors we choose the weight vectors w_a=p_a/|p_a|.

Multiplicative criteria of double objectives. If there is given a criterion to determine w_a and another to determine v_a, the algorithm uses the product form of the two criterion. If the two vectors are found independently of each other, there are no problems. If one of the weight vectors is expressed in terms od the other one, the algorithm uses a product of the two criteria. Therefore, it is required to express both criteria as a maximization task. If one criterion has become very small, only the other criterion is used. E.g., if the squared covariance is used as a criterion for the weight vector w_a and size of the costs as criterion for v_a, only the costs criterion is used, if the squared covariance has reached the zero level.

PLS regression. In PLS regression the matrix C is not used in the computations. The weight vector w_a is computed as the solution to the eigen system

Double PLS regression. C can be treated in the same way as Y. Thus, we can determine the weight vector v_a that maximizes |Cp_a|²=|CX¢v_a|². In case we want orthogonal score vectors, we maximize the three terms

The w_a found by maximizing c) is used and v_a=t_a/|t_a|. In case a) is small, w_a is found by maximizing b). Similarly, if b) is small, a) is used and if orthogonal score vectors are wanted, the term maximized is |CX¢Xw_a|².

Linear programming. In case C is a vector and we want to minimize c¢b subjet to Xb£y, b³0, we get linear programming, where a linear function is minimized subject to linear constraints. Thus, the algorithm can used to obtain solution to linear programming tasks. This special way of obtaining the solution follows the PLS methodology. Thus, the algorithm, the PLS linear programming algorithm, obtains a balance between the optimization and the associated precision of the solution. The algorithm identifies the subspace of the columns space of X that provides with a stable solution. In PLS regression we frequently do not need many components for appropriate description of Y. In optimization task we typically need many more components. E.g., if X contains 100 rows and 200 columns and is densely covered with nonzero values, we may need say 60 components to describe appropriatly the optimal value. In regression context this is a high value. On the other hand, the exact solution requires 100 components. By working only with 60 components, we get considerably more stable solution than the exact one.