Home C.I. Data Analysis C.I. System Optimization Contact / Comment

Context Integrated Data Analysis
  • does not depend on a priori assumptions like Regression Analysis.
  • can employ various goal definitions without bias towards larger deviations.
  • can cope with large numbers of input variables and has no static structure like Artificial Neural Networks
  • does not disguise internal influences and connections between variables.
  • regularly achieves decisive improvement over other approaches used to establish correlations between variables and goal values.
Our approach is not just context based, our approach is Context Integrated Data Analysis.

The animated graphic below shows an example of Context Integrated Data Analysis (CIDA) compared with Regression Analysis (RA) in a multivariate measurement system. The multivariate measurement system is used, for example, to replace destructive testing for the determination of yield point and tensile strength of steel bands with an online-capable non-destructive multivariate measurement system. The table on the right side of the graphic shows the stats of both evaluation methods. Determining more than 10,000 measurements based on the evaluation of 4 variables each in less than 1 second, the CIDA-Evaluator of CDS Data Analysis Inc. is online-capable even within high speed measurement facilities.

All empiric data concerning yield point and tensile strength originated from varies steel manufacturing plants and were obtained by destructive testing of samples.

Each blue dot in the left graphic above represents the ratio between the calculated measurement based on the destruction-free testing and the empirically (destructive) determined value. The higher the accuracy of the calculated value, the closer the blue dots are to the red line.

Context Integrated Data Analysis (CIDA)

Based on years of research in the controlling of highly complex systems, CDS Data Analysis Inc. has developed a new approach to Data Analysis.

Systems defined by several different domains that are interconnected by several non-linear functions stood at the beginning of this development.
Initially, the internal structures were completely unknown to the controller. Step by step, the systems had to be learned and understood. To solve this problem CDS Data Analysis developed the Context Integrated System Optimizer. This optimizer is capable of analysing the behaviour of complex systems, drawing reliable conclusions about the internal structures of the systems to be controlled and developing improvement strategies, completely without any human interaction or assistance. The ability to analyze internal structures of unknown systems was then successfully used to forecast values that correlated to actual outcomes with unmatched precision. In our approach, the observed variables are regarded as describing scenarios - states an underlying complex system can be in. The strength and influence of the variables on each other as well as on the goal values are analysed within the context of the different scenarios.
Without filtering or isolating variables, the goal values are considered consequences of the respective states. Similar to Artificial Neural Networks (ANNs), an underlying complex system is simulated. However, CIDA does not require predefined and therefore static structures and is not limited to only small numbers of input variables as ANNs often are.

How CIDA is employed

Using CIDA follows the same steps as RA. It essentially consists of two phases, the optimization and the application phase. During optimization the values to be predicted are referred to as goal values.

During the optimization phase, a set of training data is used to optimize the parameters of the respective approach. In RA it is necessary to define a formula that approximately reflects the behaviour of the training data. To facilitate the procedure, a simple linear equation is often employed. This, however, predetermines the outcome of the regression equations as only the regression parameters can then be optimized.

Determining the underlying relationship between the variables and the goal value by exploiting the intuition of the analyst is tricky at best. CIDA, on the other hand, does not make any a priori assumptions. The optimization process does not exclude any possibility of mutual interaction of the involved variables. The strength of influence a variable has on the goal value can vary depending on the current value of other variables as well as within its own interval of training data.

The graphic to the left shows the procedures required for the two approaches. The left path shows RA while the right path shows the steps involved in CIDA. RA can employ linear or non-linear regression equations. The parameters for the respective equations can be determined with standard procedures. The quality of the forecasted values determined by the regression equations is commonly determined by the Standard Error (deviation) and the Confidence Value. CIDA requires a more time-consuming optimization process than RA. Once the optimization process has been completed, however, more than 10 000 goal values can be predicted in less than 1 second.
In different variations of 16 596 sets of variables where RA and CIDA were compared, the improvement of the standard errors achieved by CIDA were up to 57%. At the same time, the confidence values of the CIDA based results were between 30% and 60% higher than the RA based values.

The stability of CIDA has been shown by optimizing our CIDA Evaluator and then employing it with a new set of data. The animated graphic below shows the data used in the optimization process and compares the forecasted values with the results based on Multiple Regression.

Here we used the CIDA Evaluator optimized with the data used in the previous example on a new set of data. Equivalently, the RA based results employ the equations optimized with the data of the previous example.





Regression Analysis

RA is a statistical method used to determine the relationship between one or more variables and a goal value.

The fundamental assumption made when employing RA is that the goal value can be determined by the right combination of variables. Unfortunately, the structure of the combination has to be defined a priori, and is then fixed for the entire process.

RA comprises two main approaches: Linear Regression and Non-Linear Regression. Often the notion of Linear Regression is used solely to describe single variable linear combinations. If the regression equation is comprised of more than one variable, the notion of Multiple Regression is employed.

Nevertheless, Linear Regression assumes that the goal value can be determined by a linear combination of variables. The parameters of the linear combination are approximated with the goal to minimize the distance between the empirically determined values and the values determined by the regression equation. Generally, the parameter optimization employs a method called Least Square Method. It focuses on the minimization of the squared distances between the calculated and the observed or measured values. A major drawback of this method is that the values with the biggest deviation have the strongest influence on the parameters.

Non-Linear RA comprises the basic structures shown in the picture above. Unless the analyzed data show obvious characteristics such as periodicity, Non-Linear Regression is often avoided. Highly complex combinations of variables can increase the challenge of parameter optimization to such a degree that standard procedures fail entirely.

CIDA was developed to cope with the challenges of optimizing the parameters for highly complex combinations of variables.

These combinations of variables are often essential in Multivariate Measurement Systems.

Multivariate Measurement Systems

Multivariate measurement systems exploit several measurements of often entirely different type such as temperature, humidity, thickness, and so on to determine a particular value of interest.
On their own, these measurements might not have any significance at all and yield a concrete meaning only through a specific combination.
Multivariate measurement systems are often employed when the actual value of interest can not be determined, or its determination has some unacceptable consequences such as the destruction of the measured entity or unjustifiably high costs.

Example:

The traditional method of determining the yield point and tensile strength of steel bands requires destructive testing. Aside from being costly and time consuming, these tests cannot be applied for continuous testing of entire coils of steel bands. Several research projects have been dedicated to finding combinations of measurements that can be determined without destructive testing and are suitable for completely replacing these tests. Unfortunately, standard procedures such as RA have rarely been successful in correlating surrogate measurements with acceptable accuracy and confidence values to respective goal values. Currently, there are multivariate measurement systems in industrial use that require a confidence value of 75%. For combinations of variables for which the RA based values deviated from the empirically determined values such that the confidence value fell short of the threshold by 7%, the confidence value exceeded the threshold by more than 20% when CIDA determined values were used.