Six levers for managing AI challenges. The master framework for the course.
From model metrics to business impact. Connecting FP/FN to dollar consequences.
Class discussion on podcast insights: types of bias, where they enter the pipeline, who bears the cost.
Case discussion: AI deployment in lending, fairness tradeoffs, regulatory landscape.
Dr. Addas's framework organizes AI challenges and risks into six categories. Every AI risk discussion in class should be mapped to one or more of these levers. Know them cold — they'll appear on cases, in class discussion, and likely the individual assignment report.
For any binary classifier (predicts yes/no), every prediction falls into one of four cells:
This example from class shows how each confusion matrix cell maps to a real dollar outcome:
| Outcome | What it means | Business Cost |
|---|---|---|
| TP — Predicted default, actually defaults | Loan correctly denied to a defaulter | −$1,000 (processing costs) |
| FP — Predicted default, actually repays | Good customer wrongly denied loan | −$13,000 (lost interest income) |
| FN — Predicted repay, actually defaults | Defaulter incorrectly given loan | −$1,013,000 (principal + lost interest) |
| TN — Predicted repay, actually repays | Good customer correctly approved | +$11,000 (interest income) |
These metrics tell different stories. Know which one the business actually cares about.
| Metric | Formula | What it measures |
|---|---|---|
| Accuracy | (TP + TN) / Total | Overall correctness. Misleading with class imbalance. |
| Precision | TP / (TP + FP) | Of all positive predictions, how many were right? |
| Recall (Sensitivity) | TP / (TP + FN) | Of all actual positives, how many did we catch? |
| F1 Score | Harmonic mean of Precision & Recall | Balance metric when both FP and FN matter. |
| FPR | FP / (FP + TN) | Rate of false alarms among innocents. Key fairness metric. |
| FNR | FN / (FN + TP) | Rate of missed catches among true positives. |
These four metrics are often used to evaluate whether an AI model is "fair" across demographic groups. The fundamental problem: when base rates differ between groups, you cannot simultaneously satisfy all four.
| Metric | Formula | Plain English | Who it protects |
|---|---|---|---|
| FPR (False Positive Rate) |
FP / (FP + TN) | Of people who DON'T reoffend, what share is wrongly flagged high-risk? | The innocent — protects from wrongful classification |
| FNR (False Negative Rate) |
FN / (FN + TP) | Of people who DO reoffend, what share is wrongly cleared? | Public safety — catches the dangerous |
| FOR (False Omission Rate) |
FN / (FN + TN) | Of people cleared, what share actually reoffends? | Reliability of low-risk label |
| FDR (False Discovery Rate) |
FP / (FP + TP) | Of people flagged high-risk, what share is actually innocent? | Reliability of high-risk label |
ProPublica (2016): COMPAS is racially biased. Black defendants are nearly twice as likely to be wrongly flagged as high-risk (FPR: Black ~45%, White ~23%). False alarms systematically harm Black defendants through harsher bail/parole decisions.
Northpointe's response: COMPAS is calibrated fairly. Within any given score level, the actual reoffense rate is the same for Black and White defendants. Predictive parity holds — the score means the same thing regardless of race.
In class: Be ready to argue both sides and then pivot to: "So what should a responsible leader do?" The answer isn't a single correct metric — it's transparency about the tradeoffs being made and accountability for the choice.
ML bias doesn't come from a single source. Knowing where each type enters the pipeline determines how to address it.
| Bias Type | Category | Where It Enters | Example |
|---|---|---|---|
| Historical bias | Data | Training data reflects past injustice | AI trained on historical loan approvals inherits redlining patterns |
| Label bias | Data | Target variable is a proxy, not the true outcome | Re-arrest ≠ re-offense; COMPAS predicts re-arrest, which is affected by policing intensity |
| Sampling bias | Data | Training data over/under-represents groups | Facial recognition trained mostly on lighter-skinned faces; fails on darker-skinned faces |
| Proxy variable bias | Algorithm | Excluded protected attribute leaks through correlated features | Removing "race" from COMPAS; zip code and priors_count act as proxies |
| Feedback loop bias | Algorithm | Model output influences future data collection | Predictive policing sends more cops to flagged areas → more arrests there → "validates" the model |
| Automation bias | User | Humans defer to AI without critical scrutiny | Judges accepting COMPAS scores without examining underlying factors |
| Confirmation bias | User | AI confirms existing beliefs; contradicting evidence ignored | Hiring AI confirms existing workforce demographics as "ideal candidate" |
This case examines how a financial institution navigates AI deployment in consumer lending — credit scoring, loan approvals, fraud detection. The central tension: AI improves efficiency and profitability, but fair lending laws (ECOA, Fair Housing Act) impose constraints that pure accuracy optimization would violate.