The character of knowledge-driven models
For more than 60 years, knowledge-driven models have proven very useful in describing a chemical or biological process’s behavior and helping optimize and control. They can be put together when we know the inner working of a process in some detail. These models consist of algebraic, ordinary, or/and partial differential equations. They express steady state or dynamic mass balances of a species or process component or a total mass and energy balance over a section in a processing unit. To be completed, they require a quantitative understanding of all local rate phenomena involving mass and energy transfer or the dependence of local reaction rates on compositions.
These models have been also called “fundamental” or “first principles” models.
Necessary steps in model development
The development of a knowledge-driven model starts with the mathematical description of our understanding of the inner workings of the process. This is translated into mathematical equations expressing material and energy balances around process units. Because the values of many model parameters are not known, they must be estimated by fitting the model to available experimental data. This task is called the Least Square Estimation and aims to minimize the sum of squares (SS) of the differences between the model predictions and the data.
Testing the accuracy of the model
The SS minimization is necessary but not sufficient. Many modelers fail to recognize this. Two additional tasks must be completed before one claims the model is accurate. These are the test of significance for the parameters and the Lack-of-Fit (LoF) tests for the model as a whole. The latter is also called Goodness-of-Fit (GoF) test. Both of these tests need an estimate of the sum of squares of pure error SSpe in the process. This value represents the normal variability in the process. It is calculated from the replicated experiments.
Are all model parameters significant?
With the SSpe value at hand, one can calculate the confidence interval (CI) of each of the estimated model parameters, expressing the uncertainty related to their estimated values. It makes a difference if the estimate of a parameter is equal to 25.7 ± 15.2 rather than 25.7 ± 2.4. In the former case, the prediction of the model will be more uncertain. Parameters whose confidence interval includes zero are declared insignificant and should be removed from the model. The SS minimization task is repeated until all model parameters are significant.
Does the model represent all non-random data?
We need to test whether the SS value is statistically larger than the SSpe value through an F-test. This is the Lack-of-Fit (LoF) test. If it is discovered, usually with 95% certainty, that SS is not larger than SSpe, the modeling task has been successfully concluded. The model has represented all the non-random information in the data.
If the opposite is found, that SS is larger than SSpe, the model has failed to represent all the non-random information in the data. It must be modified and the modeling task restarted.