Two numbers that determine whether your risk model is useful -- or just annoying.
Precision and Recall are the two most important evaluation metrics for classification models -- especially when class imbalances exist. Both answer different questions and have different costs for errors.
Precision = Of all cases predicted as positive, how many were actually positive?
Formula: TP / (TP + FP)
Question: When the model sounds an alarm, how likely is it that something is actually happening?
Recall = Of all actually positive cases, how many were detected by the model?
Formula: TP / (TP + FN)
Question: How many of the actual escalations did the model miss?
Precision and recall are in a trade-off: If you maximize recall (find all escalations), precision drops (more false alarms). If you maximize precision (only certain alarms), recall drops (more missed escalations).
Ben Kraiem et al. (2023) used accuracy as the main metric (94.4%), but in risk management practice, precision and recall are far more meaningful. A model with 94% accuracy can still miss 50% of critical cases at a 5% escalation rate -- if the minority class is systematically classified worse.
In risk management, the choice between precision and recall is a strategic decision:
The right balance depends on context. A fire alarm should have high recall (better once too often than too rarely). A drug test should have high precision (no false-positive side effects).
Aversight does not optimize for a single metric but for business value. Our models are calibrated so that recall for critical escalations is >90% -- we do not want to miss any budget overruns. At the same time, we keep precision at a level that does not overwhelm the operations team. The user can adjust the balance via a slider: More safety (high recall) or more efficiency (high precision). The model dynamically adjusts the decision threshold.
30 seconds -- and we will get back to you within 24 hours.
Start Free Maturity Check →