MBUS 854 — AI For Leaders | Queen's Smith AMBA 2026

Session 2: Managing AI Challenges & Risks — Part 1

Dr. Shamel Addas  |  April 19, 2026
Addas Big 6 Confusion Matrix Fairness Metrics COMPAS Case ML Bias
Session Agenda
Pre-Class Submission Required Check the syllabus — Session 2 may require an individual submission before class (case prep or reading response). Complete it before arriving.
1

Addas Big 6 Framework

Six levers for managing AI challenges. The master framework for the course.

2

Confusion Matrix → KPIs → Model P&L

From model metrics to business impact. Connecting FP/FN to dollar consequences.

3

ML Bias Podcast Debrief

Class discussion on podcast insights: types of bias, where they enter the pipeline, who bears the cost.

4

Managing AI Risks in Consumer Banking Case

Case discussion: AI deployment in lending, fairness tradeoffs, regulatory landscape.

The Addas Big 6 Framework

Overview

Dr. Addas's framework organizes AI challenges and risks into six categories. Every AI risk discussion in class should be mapped to one or more of these levers. Know them cold — they'll appear on cases, in class discussion, and likely the individual assignment report.

01
Data Quality
Garbage in, garbage out. Missing values, mislabeled data, historical bias baked into training sets. The most upstream source of AI failure.
02
Algorithm Design
Choice of model, optimization objective, and constraints. A model optimized for accuracy can still be discriminatory if accuracy is unequally distributed.
03
Human Oversight
Who reviews AI decisions, how often, with what authority to override? Automation bias — humans rubber-stamping AI — is a governance failure.
04
Transparency & Explainability
Can the model explain its decisions in terms stakeholders trust? Black-box models face regulatory risk and erode trust when things go wrong.
05
Accountability
When the AI is wrong, who is responsible — the vendor, the org, the user, the regulator? Accountability gaps enable harm at scale.
06
Fairness & Equity
Does the model's error rate vary systematically across demographic groups? Equal accuracy does not equal equal impact.
Key insight: The Big 6 are interdependent. Improving transparency (lever 4) without accountability (lever 5) means stakeholders can see the problem but nobody has the authority to fix it.
Confusion Matrix → Performance KPIs → Model P&L

The Confusion Matrix — Four Outcomes

Session 2 Slides — mbus854_02_sv_v2.pdf

For any binary classifier (predicts yes/no), every prediction falls into one of four cells:

Predicted: Positive
Predicted: Negative
True Positive (TP) Predicted positive. Actually positive. Correct catch.
False Positive (FP) Predicted positive. Actually negative. False alarm — penalizes the innocent.
False Negative (FN) Predicted negative. Actually positive. Missed catch — the miss that gets through.
True Negative (TN) Predicted negative. Actually negative. Correct clearance.

Worked Example: Customer Loan Default Prediction

Session 2 — Predicting Customer Loan Default model

This example from class shows how each confusion matrix cell maps to a real dollar outcome:

OutcomeWhat it meansBusiness Cost
TP — Predicted default, actually defaults Loan correctly denied to a defaulter −$1,000 (processing costs)
FP — Predicted default, actually repays Good customer wrongly denied loan −$13,000 (lost interest income)
FN — Predicted repay, actually defaults Defaulter incorrectly given loan −$1,013,000 (principal + lost interest)
TN — Predicted repay, actually repays Good customer correctly approved +$11,000 (interest income)
Leader takeaway: In lending, a False Negative costs 78× more than a False Positive. A model optimized for raw accuracy might still produce negative expected value if it's tuned to minimize the cheaper error type. Always ask: "What does each error type cost us?"

Derived KPIs from the Confusion Matrix

These metrics tell different stories. Know which one the business actually cares about.

MetricFormulaWhat it measures
Accuracy(TP + TN) / TotalOverall correctness. Misleading with class imbalance.
PrecisionTP / (TP + FP)Of all positive predictions, how many were right?
Recall (Sensitivity)TP / (TP + FN)Of all actual positives, how many did we catch?
F1 ScoreHarmonic mean of Precision & RecallBalance metric when both FP and FN matter.
FPRFP / (FP + TN)Rate of false alarms among innocents. Key fairness metric.
FNRFN / (FN + TP)Rate of missed catches among true positives.
Fairness Metrics — The ProPublica vs Northpointe Debate

Four Fairness Metrics — and Why They Can't All Be Satisfied Simultaneously

COMPAS Recidivism Analysis — Session 2 / Individual Assignment

These four metrics are often used to evaluate whether an AI model is "fair" across demographic groups. The fundamental problem: when base rates differ between groups, you cannot simultaneously satisfy all four.

MetricFormulaPlain EnglishWho it protects
FPR
(False Positive Rate)
FP / (FP + TN) Of people who DON'T reoffend, what share is wrongly flagged high-risk? The innocent — protects from wrongful classification
FNR
(False Negative Rate)
FN / (FN + TP) Of people who DO reoffend, what share is wrongly cleared? Public safety — catches the dangerous
FOR
(False Omission Rate)
FN / (FN + TN) Of people cleared, what share actually reoffends? Reliability of low-risk label
FDR
(False Discovery Rate)
FP / (FP + TP) Of people flagged high-risk, what share is actually innocent? Reliability of high-risk label

ProPublica vs Northpointe — The Core Argument

ProPublica (2016): COMPAS is racially biased. Black defendants are nearly twice as likely to be wrongly flagged as high-risk (FPR: Black ~45%, White ~23%). False alarms systematically harm Black defendants through harsher bail/parole decisions.

Northpointe's response: COMPAS is calibrated fairly. Within any given score level, the actual reoffense rate is the same for Black and White defendants. Predictive parity holds — the score means the same thing regardless of race.

The impossibility: When base reoffense rates differ between groups (they do — due to systemic factors like policing patterns), you CANNOT simultaneously achieve equal FPR, equal FNR, and equal calibration. Choosing which fairness criterion to satisfy is a values decision, not a math problem.

In class: Be ready to argue both sides and then pivot to: "So what should a responsible leader do?" The answer isn't a single correct metric — it's transparency about the tradeoffs being made and accountability for the choice.

COMPAS Recidivism — Key Discussion Points
Q: How can a model be simultaneously "fair" and "biased" depending on how you measure it?

The Setup

  • COMPAS produces a recidivism risk score (1–10) used in bail and parole decisions
  • Northpointe built it; courts use it; defendants don't know how it works
  • Dataset: ~7,000 defendants in Broward County, Florida

ProPublica's Evidence

  • Black defendants: FPR = 44.9% (nearly 1-in-2 innocents wrongly flagged high-risk)
  • White defendants: FPR = 23.5%
  • White defendants with prior violent offenses were often scored lower than Black defendants with no prior offenses

Northpointe's Defense

  • Calibration holds: a score of 7 predicts ~70% reoffense rate for both Black and White defendants
  • Their definition of fairness: predictive accuracy should be equal across groups
  • Different base rates (Black: ~52% reoffend; White: ~39%) explain the FPR gap mathematically

The Leadership Question

  • If you were the county judge using COMPAS: what would you need to know before you trusted this score?
  • If you were the vendor: what disclosure obligations exist? What governance would you build?
  • If you were a regulator: which fairness definition do you enshrine in law?
ML Bias Taxonomy — Podcast Debrief Prep

Three Categories of ML Bias

ML Bias Podcast — Pre-class prep; also directly relevant to individual assignment Q6 (7 pts)

ML bias doesn't come from a single source. Knowing where each type enters the pipeline determines how to address it.

Bias TypeCategoryWhere It EntersExample
Historical bias Data Training data reflects past injustice AI trained on historical loan approvals inherits redlining patterns
Label bias Data Target variable is a proxy, not the true outcome Re-arrest ≠ re-offense; COMPAS predicts re-arrest, which is affected by policing intensity
Sampling bias Data Training data over/under-represents groups Facial recognition trained mostly on lighter-skinned faces; fails on darker-skinned faces
Proxy variable bias Algorithm Excluded protected attribute leaks through correlated features Removing "race" from COMPAS; zip code and priors_count act as proxies
Feedback loop bias Algorithm Model output influences future data collection Predictive policing sends more cops to flagged areas → more arrests there → "validates" the model
Automation bias User Humans defer to AI without critical scrutiny Judges accepting COMPAS scores without examining underlying factors
Confirmation bias User AI confirms existing beliefs; contradicting evidence ignored Hiring AI confirms existing workforce demographics as "ideal candidate"
Insight for your assignment report (Q6): The strongest answers don't just list bias types — they trace each bias to a specific source in the COMPAS workflow and explain what mitigation was or wasn't applied.
Managing AI Risks in Consumer Banking — Case Prep

Case Context

This case examines how a financial institution navigates AI deployment in consumer lending — credit scoring, loan approvals, fraud detection. The central tension: AI improves efficiency and profitability, but fair lending laws (ECOA, Fair Housing Act) impose constraints that pure accuracy optimization would violate.

Q: How should a bank balance AI-driven efficiency with fairness obligations and regulatory risk?

Key Tensions to Argue

  • Higher model accuracy may come at the cost of disparate impact on protected classes
  • Explainability requirements (regulators want reasons for denial) conflict with black-box model complexity
  • Over-reliance on alternative data (social media, purchase history) risks reproducing socioeconomic inequality
  • Third-party AI vendors create accountability gaps — who is liable when the model discriminates?

Frameworks to Apply

  • Addas Big 6: map each risk to its lever
  • Confusion matrix economics: what does a false positive cost in this context? (Denied a creditworthy customer)
  • Fairness metrics: which definition of fairness does banking regulation require?
  • Automation vs augmentation: should loan officers have override authority?

Strong Position to Take

  • AI in lending is not optional long-term — the productivity gains are too large
  • But black-box models are untenable under existing law — explainability is not optional
  • The answer: interpretable models (logistic regression, decision trees) with human-in-the-loop override and regular disparate impact audits
Pre-Class Checklist
  • Listened to the ML bias podcast — can name at least 3 bias types from it
  • Read the Managing AI Risks in Consumer Banking case — have an opinion on the central question
  • Know all six elements of the Addas Big 6 framework by name
  • Can explain FPR, FNR, FOR, FDR in plain English without looking at notes
  • Understand why ProPublica and Northpointe can both be "right" — the fairness impossibility theorem
  • Completed any pre-class submission required by the syllabus
  • Have a 2-sentence position ready on the banking case central question
  • Started the individual assignment — have the dataset open in Colab
Smart Questions to Ask in Class
Conceptual
"If two definitions of fairness are mathematically incompatible when base rates differ — who gets to decide which fairness definition wins, and on what basis?"
Forces a normative discussion beyond the technical. Shows you've absorbed the impossibility theorem and are thinking about governance, not just math.
Practical
"In the banking case, if regulators require model explainability, does that mean accurate black-box models are effectively illegal in consumer lending?"
Sharp regulatory question that has a real answer (ECOA adverse action notice requirements). Shows you're connecting the case to actual law.
Ethical
"The label 'recidivism' in COMPAS is actually re-arrest — but policing intensity differs by neighborhood. Does that make COMPAS fundamentally invalid as a risk score, regardless of its accuracy?"
Attacks the validity of the target variable itself — a more sophisticated critique than just pointing at FPR disparities. Directly relevant to individual assignment Q6.
Strategic
"For a bank deploying AI in credit scoring — what's the minimum governance structure you'd want to see before you'd sign off as the accountable executive?"
Puts you in the decision-maker seat. Shows leadership instinct, not just analytical critique.
MBUS 854 AI For Leaders — Session 2 Prep Guide  |  Generated May 19, 2026  |  Queen's Smith School of Business