Home C.I. Data Analysis C.I. System Optimization Contact / Comment

Context Integrated Data Analysis
  • correlates variables and goal values when the influence of the variables on the goal value and possibly on each other is unknown.
  • does not depend on a priori assumptions like Regression Analysis.
  • can employ various goal definitions without bias towards larger deviations.
  • can cope with large numbers of input variables and has no static structure like Artificial Neural Networks
  • does not disguise internal influences and connections between variables.
  • regularly achieves decisive improvement over other approaches used to establish correlations between variables and goal values.
Our approach is not merely context based, our approach is Context Integrated Data Analysis.

The animated graphic below shows an example of Context Integrated Data Analysis compared with Regression Analysis in a multivariate measurement system. The multivariate measurement system is used, for example, to replace destructive testing for the determination of yield point and tensile strength of steel bands with an online-capable non-destructive multivariate measurement system. The table on the right side of the graphic shows the stats of both evaluation methods. Determining more than 10,000 measurements based on the evaluation of 4 variables each in less than 1 second, the CIDA-Evaluator of CDS Data Analysis Inc. is online-capable even within high speed measurement facilities.

Each blue dot in the left graphic above represents the ratio between the calculated measurement based on the destruction-free testing and the empirically (destructive) determined value. The higher the accuracy of the calculated value, the closer the blue dots are to the red line.

Context Integrated Data Analysis (CIDA)

Based on years of research in the controlling of highly complex systems, CDS Data Analysis Inc. has developed a new approach to Data Analysis.
Systems defined by several different domains that are interconnected by several non-linear functions stood at the beginning of this development.
Initially, the internal structures were completely unknown to the controller. Step by step, the systems had to be learned and understood. To solve this problem CDS Data Analysis developed the Context Integrated System Optimizer. This optimizer is capable of analysing the behaviour of complex systems, drawing reliable conclusions about the internal structures of the systems to be controlled and developing improvement strategies, completely without any human interaction or assistance. The ability to analyze internal structures of unknown systems was then successfully used to determine correlation of unmatched quality. In our approach, the observed variables are regarded as describing scenarios - states an underlying complex system can be in. The strength and influence of the variables on each other as well as on the goal values are analysed within the context of the different scenarios.
Without filtering or isolating variables, the goal values are considered consequences of the respective states. Similar to Artificial Neural Networks (ANNs), an underlying complex system is simulated. However, Context Integrated Data Analysis does not require predefined and therefore static structures and is not limited to only very small numbers of input variables as ANNs are.

How CIDA is employed

Using Context Integrated Data Analysis follows the same steps as Regression Analysis. It essentially consists of two phases, the optimization phase and the application phase.
During the optimization phase, a set of training data is used to optimize the parameters of the respective approach. In Regression Analysis it is necessary to define a formula that approximately reflects the behaviour of the training data. To make things easy, a simple linear equation is very often assumed. This assumption, however, predetermines the outcome of the regression equations as only the regression parameters can then be optimized.

Determining the underlying relationship between the variables and the goal value by exploiting the intuition of the Analyst is at best tricky. Context Integrated Data Analysis on the other hand does not use any a priori assumptions. The optimization process does not exclude any possibility of mutual interaction of the involved variables. The strength of influence a variable has on the goal value can vary depending on the current value of other variables as well as within its own interval of training data.

The graphic to the left shows the procedures required for the two approaches. The left path shows Regression Analysis while the right path shows the steps involved in Context Integrated Data Analysis. Regression Analysis can employ linear or non-linear regression equations. The parameters for the respective equations can be determined with standard procedures. The quality of the goal values determined by the regression equations is commonly determined by the Standard Error (deviation) and the Confidence Value. Context Integrated Data Analysis requires a more time-consuming optimization process than Regression Analysis. Once the optimization process has been completed, however, more than 10 000 goal values can be determined in less than 1 second.
In different variations of 16 596 sets of variables where Regression Analysis and Context Integrated Data Analysis were compared, the improvement of the standard errors achieved by CIDA were up to 57%. At the same time, the confidence values of the CIDA based results were between 30% and 60% higher than the Regression Analysis based values.

The stability of Context Integrated Data Analysis has been shown by optimizing our CIDA Evaluator and then employing it with a new set of data. The animated graphic below shows the data used in the optimization process and compares the goal values with the results based on Multiple Regression.

Here we used the CIDA Evaluator optimized with the data used in the previous example on a new set of data. Equivalently, the Regression Analysis based results employ the equations optimized with the data of the previous example.





Regression Analysis

Regression Analysis is a statistical method used to determine the relationship between one or more variables and a goal value.

The fundamental assumption made when employing Regression Analysis is that the goal value can be determined by the right combination of variables. Unfortunately, the structure of the combination has to be defined a priori, and is then fixed for the entire process.

Regression Analysis comprises two main approaches: Linear Regression and Non-Linear Regression. Often the notion of Linear Regression is used solely to describe single variable linear combinations. If the regression equation is comprised of more than one variable, the notion of Multiple Regression is employed.

Nevertheless, Linear Regression assumes that the goal value can be determined by a linear combination of variables. The parameters of the linear combination are approximated with the goal to minimize the distance between the actual goal values and the values determined by the regression equation. Generally, the parameter optimization employs a method called Least Square Method. It focuses on the minimization of the squared distances between the calculated and the observed or measured values. The biggest drawback of this method is that the values with the biggest deviation have the strongest influence on the parameters.

Non-Linear Regression Analysis comprises the basic structures shown in the picture above. Unless the analyzed data show obvious characteristics such as periodicity, Non-Linear Regression is often avoided. Highly complex combinations of variables can increase the challenge of parameter optimization to such a degree that standard procedures fail entirely.

Context Integrated Data Analysis was developed to cope with the challenges of optimizing the parameters for highly complex combinations of variables.

These combinations of variables are often essential in Multivariate Measurement Systems.

Multivariate Measurement Systems

Multivariate measurement systems exploit several measurements often of entirely different type such as temperature, humidity, thickness, and so on to determine a particular value of interest.
By themselves, these measurements might not have any significance at all and yield a concrete meaning only through a specific combination.
Multivariate measurement systems are often employed when the actual value of interest can not be determined, or its determination has some unacceptable consequences such as the destruction of the measured entity or unjustifiably high costs.

Example:

The traditional method of determining the yield point and tensile strength of steel bands requires destructive testing. Aside from being costly and time consuming, these tests cannot be applied for continuous testing of entire coils of steel bands. Several research projects have been dedicated to finding combinations of measurements that can be determined without destructive testing and are suitable for replacing these tests completely. Unfortunately, standard procedures such as Regression Analysis have rarely been successful in correlating surrogate measurements with acceptable accuracy and confidence values to respective goal values. Currently, there are multivariate measurement systems in industrial use that require a confidence value of 75%. For combinations of variables for which the Regression Analysis based confidence value fell short of the threshold by 7%, the confidence value exceeded the threshold by more than 20% when Context Integrated Data Analysis was used.