Overfitting

Definition

Overfitting describes a state in which a machine learning model has learned the training data too precisely. It recognizes not only relevant patterns but also noise, outliers, and random peculiarities of the training dataset. The result: excellent performance on training data, but poor generalization to new, unseen data.

The causes are varied: too few training data, too many model parameters, too long training without regularization, or the choice of a too-complex model for the available data. A decision tree with 1000 leaves on 200 records will memorize every single record -- and fail on new data.

Ben Kraiem et al. (2023) addressed this problem through cross-validation: The 99 projects were split into training and test sets, so the model was not trained and evaluated on the same data. The 94.4% accuracy on the test set shows that the Gradient Boosting model was generalizable -- not memorized.

Why it matters

Overfitting is particularly dangerous in risk management because:

Historical data is never the future -- A model that has only memorized historical projects will overlook new risk constellations.
False confidence -- High training accuracy creates trust that is not justified on new data. Risk managers act on the basis of false predictions.
Overfitting is not always obvious -- Only when applied to new data does the error become apparent. In the worst case, too late.

The most important countermeasures are: cross-validation, regularization (L1/L2), early stopping, dropout in neural networks, and choosing simpler models when the data allows it.

Aversight and Overfitting

Aversight combats overfitting on multiple levels. Technically through cross-validation and regularization in all models. Practically through the concept of "living models": Every week, new project data flows in, the model is retrained, and performance drifts are automatically detected. If test accuracy drops, the model is reset or retrained. Additionally, we use ensemble methods -- not individual decision trees, but combinations of many weak learners that together are more robust than any single one.

Definition

Why it matters

Aversight and Overfitting

Related terms

Risk intelligence is not a black box. Let us show you how it works.

Overfitting

Definition

Why it matters

Aversight and Overfitting

Related terms

Related content

Risk intelligence is not a black box. Let us show you how it works.