When 95% of your projects are green and only 5% are red, the model learns to always say "green." That is not a bug -- it is math.
Imbalanced data describes a machine learning problem where the target classes in a dataset are very unevenly distributed. A classic example: In a portfolio of 100 projects, 95 are stable and 5 are critical. A model that always predicts "stable" achieves 95% accuracy -- and is still worthless because it does not recognize a single critical situation.
The problem is widespread. Credit card fraud (0.1% of all transactions), machine failures (2% of all devices), project escalations (5% of all projects) -- in every case, the minority class is the actually interesting one but statistically underrepresented.
Ben Kraiem et al. (2023) had a similar problem: 61 Traditional projects vs. 38 Agile projects. Without countermeasures, the model would systematically favor Traditional -- regardless of actual project characteristics.
Imbalanced data leads to three practical problems:
Solutions are varied: SMOTE (synthetic data generation), cost-sensitive learning (higher penalty for errors in the minority class), or simply choosing the right evaluation metrics (precision, recall, F1-score instead of accuracy).
Aversight addresses imbalanced data on three levels: First, through cost-sensitive learning -- a missed budget alert is weighted more heavily than a false alarm. Second, through dynamic threshold adjustment: when the escalation rate rises in a quarter, the system automatically lowers the alert threshold. Third, through continuous retraining: every new escalation event immediately flows into the model, so the minority class steadily grows and is better learned.
30 seconds -- and we will get back to you within 24 hours.
Start Free Maturity Check →