Precision and Recall

Definition

Precision and Recall are the two most important evaluation metrics for classification models -- especially when class imbalances exist. Both answer different questions and have different costs for errors.

Precision = Of all cases predicted as positive, how many were actually positive?
Formula: TP / (TP + FP)
Question: When the model sounds an alarm, how likely is it that something is actually happening?

Recall = Of all actually positive cases, how many were detected by the model?
Formula: TP / (TP + FN)
Question: How many of the actual escalations did the model miss?

Precision and recall are in a trade-off: If you maximize recall (find all escalations), precision drops (more false alarms). If you maximize precision (only certain alarms), recall drops (more missed escalations).

Ben Kraiem et al. (2023) used accuracy as the main metric (94.4%), but in risk management practice, precision and recall are far more meaningful. A model with 94% accuracy can still miss 50% of critical cases at a 5% escalation rate -- if the minority class is systematically classified worse.

Why it matters

In risk management, the choice between precision and recall is a strategic decision:

High recall = safety -- You hardly miss an escalation, but accept more false alarms. Good when the damage of a missed escalation is extremely high (e.g., pharma study termination).
High precision = efficiency -- Every alarm is relevant, but you miss some escalations. Good when false alarms are expensive (e.g., management escalations that tie up resources).
F1-score = balance -- The harmonic mean of precision and recall. Useful when both are equally important.

The right balance depends on context. A fire alarm should have high recall (better once too often than too rarely). A drug test should have high precision (no false-positive side effects).

Aversight and Precision/Recall

Aversight does not optimize for a single metric but for business value. Our models are calibrated so that recall for critical escalations is >90% -- we do not want to miss any budget overruns. At the same time, we keep precision at a level that does not overwhelm the operations team. The user can adjust the balance via a slider: More safety (high recall) or more efficiency (high precision). The model dynamically adjusts the decision threshold.

Definition

Why it matters

Aversight and Precision/Recall

Related terms

Risk intelligence is not a black box. Let us show you how it works.

Precision and Recall

Definition

Why it matters

Aversight and Precision/Recall

Related terms

Related content

Risk intelligence is not a black box. Let us show you how it works.