Glossary

Overfitting

When your model knows the training data perfectly -- but cannot recognize new projects. The classic learning trap.

Definition

Overfitting describes a state in which a machine learning model has learned the training data too precisely. It recognizes not only relevant patterns but also noise, outliers, and random peculiarities of the training dataset. The result: excellent performance on training data, but poor generalization to new, unseen data.

The causes are varied: too few training data, too many model parameters, too long training without regularization, or the choice of a too-complex model for the available data. A decision tree with 1000 leaves on 200 records will memorize every single record -- and fail on new data.

Ben Kraiem et al. (2023) addressed this problem through cross-validation: The 99 projects were split into training and test sets, so the model was not trained and evaluated on the same data. The 94.4% accuracy on the test set shows that the Gradient Boosting model was generalizable -- not memorized.

Why it matters

Overfitting is particularly dangerous in risk management because:

The most important countermeasures are: cross-validation, regularization (L1/L2), early stopping, dropout in neural networks, and choosing simpler models when the data allows it.

Aversight and Overfitting

Aversight combats overfitting on multiple levels. Technically through cross-validation and regularization in all models. Practically through the concept of "living models": Every week, new project data flows in, the model is retrained, and performance drifts are automatically detected. If test accuracy drops, the model is reset or retrained. Additionally, we use ensemble methods -- not individual decision trees, but combinations of many weak learners that together are more robust than any single one.

Related terms

Risk intelligence is not a black box. Let us show you how it works.

30 seconds -- and we will get back to you within 24 hours.

Start Free Maturity Check →