Tag Archives: model validation

Validating machine-learning models

While machine-learning techniques can improve business processes, predict future outcomes, and save money, they also increase modeling risk because of their complex and opaque features. In this article, Milliman’s Jonathan Glowacki and Martin Reichhoff discuss how model validation techniques can mitigate the potential pitfalls of machine-learning algorithms.

Here is an excerpt:

An independent model validation carried out by knowledgeable professionals can mitigate the risks associated with new modeling techniques. In spite of the novelty of machine-learning techniques, there are several methods to safeguard against overfitting and other modeling flaws. The most important requirement for model validation is for the team performing the model validation to understand the algorithm. If the validator does not understand the theory and assumptions behind the model, then they are likely to not perform an effective model validation on the process. After demonstrating an understanding on the model theory, the following procedures are helpful in performing the validation.

Outcomes analysis refers to comparing modeled results to actual data. For advanced modeling techniques, outcomes analysis becomes a very simple yet useful approach to understanding model interactions and pitfalls. One way to understand model results is to simply plot the range of the independent variable against both the actual and predicted outcome along with the number of observations. This allows the user to visualize the univariate relationship within the model and understand if the model is overfitting to sparse data. To evaluate possible interactions, cross plots can also be created looking at results in two dimensions as opposed to a single dimension. Dimensionality beyond two dimensions becomes difficult to evaluate, but looking at simple interactions does provide an initial useful understanding of how the model behaves with independent variables….

…Cross-validation is a common strategy to help ensure that a model isn’t overfitting the sample data it’s being developed with. Cross-validation has been used to help ensure the integrity of other statistical methods in the past, and with the rising popularity of machine-learning techniques, it has become even more important. In cross-validation, a model is fitted using only a portion of the sample data. The model is then applied to the other portion of the data to test performance. Ideally, a model will perform equally well on both portions of the data. If it doesn’t, it’s likely that the model has been over fit.

Financial model validation

Increased computing capabilities, advanced modeling techniques, and commensurate increases in complexity among financial products have resulted in great reliance on mathematical models within the banking and insurance industries. These financial models are instrumental in developing robust risk management frameworks; however, the potential misuse and failures of these models can present large risks as well.

In his new paper, Jonathan Glowacki offers perspective regarding proper model validation and governance policies necessary to mitigate financial model risk.

Here is an excerpt from the paper:

Use of the model

A model validation generally starts the way you would start when building a financial model: by understanding the use of the model. This will help shape the level of detail of the model validation and allow the model validation group to focus on key areas of the model throughout the review. For example, if you are reviewing an economic model for stress tests, then it is critical that the results produced from the model are reasonable and steady in stressful environments. If the model is being used for pricing, where you are trying to develop an average cost, then the results produced by the model in extremely stressful scenarios may not be as important in the model validation. The model validation should identify the use of the model, whether the model is consistent and applicable for the intended use, and it should ensure that the model is not being used for exercises that are outside of the capabilities of the model.

Review of data

A second step of a model validation is to review the data used to develop the model. The model validation group should start with the same data that was used to develop the model. The model validation review of the data could include univariate analysis to independently identify potential variables to include in the model, a review of the range of the response being modeled (e.g., the minimum and maximum default rate in the data by calendar quarter), a review of the number and magnitude of stressful events included in the data, and more. External data not considered in the model development process could be appended to the validation dataset for a potential review of other variables that may be influential in the modeling objective that were not considered in the development stage of the model. The intent of this segment of the model validation is to understand any implications or limitations the data used to develop the model may have on the estimates produced from the model. For example, data used to develop mortgage credit models in the early 2000s generally did not include severe stress environments in the housing market. Even today, the ultimate resolution of a stressful environment is not included in mortgage data, as losses are still developing. This fact is a limitation about which users of mortgage credit risk models must be aware.

To read the entire paper, click here.