MBUS 854 — Session 2 Prep

Session Agenda

Pre-Class Submission Required Check the syllabus — Session 2 may require an individual submission before class (case prep or reading response). Complete it before arriving.

1

Addas Big 6 Framework

Six levers for managing AI challenges. The master framework for the course.

2

Confusion Matrix → KPIs → Model P&L

From model metrics to business impact. Connecting FP/FN to dollar consequences.

3

ML Bias Podcast Debrief

Class discussion on podcast insights: types of bias, where they enter the pipeline, who bears the cost.

4

Managing AI Risks in Consumer Banking Case

Case discussion: AI deployment in lending, fairness tradeoffs, regulatory landscape.

The Addas Big 6 Framework

Overview

Dr. Addas's framework organizes AI challenges and risks into six categories. Every AI risk discussion in class should be mapped to one or more of these levers. Know them cold — they'll appear on cases, in class discussion, and likely the individual assignment report.

01

Data Quality

Garbage in, garbage out. Missing values, mislabeled data, historical bias baked into training sets. The most upstream source of AI failure.

02

Algorithm Design

Choice of model, optimization objective, and constraints. A model optimized for accuracy can still be discriminatory if accuracy is unequally distributed.

03

Human Oversight

Who reviews AI decisions, how often, with what authority to override? Automation bias — humans rubber-stamping AI — is a governance failure.

04

Transparency & Explainability

Can the model explain its decisions in terms stakeholders trust? Black-box models face regulatory risk and erode trust when things go wrong.

05

Accountability

When the AI is wrong, who is responsible — the vendor, the org, the user, the regulator? Accountability gaps enable harm at scale.

06

Fairness & Equity

Does the model's error rate vary systematically across demographic groups? Equal accuracy does not equal equal impact.

Key insight: The Big 6 are interdependent. Improving transparency (lever 4) without accountability (lever 5) means stakeholders can see the problem but nobody has the authority to fix it.

Confusion Matrix → Performance KPIs → Model P&L

The Confusion Matrix — Four Outcomes

Session 2 Slides — mbus854_02_sv_v2.pdf

For any binary classifier (predicts yes/no), every prediction falls into one of four cells:

Predicted: Positive

Predicted: Negative

True Positive (TP) Predicted positive. Actually positive. Correct catch.

False Positive (FP) Predicted positive. Actually negative. False alarm — penalizes the innocent.

False Negative (FN) Predicted negative. Actually positive. Missed catch — the miss that gets through.

True Negative (TN) Predicted negative. Actually negative. Correct clearance.

Worked Example: Customer Loan Default Prediction

Session 2 — Predicting Customer Loan Default model

This example from class shows how each confusion matrix cell maps to a real dollar outcome:

Outcome	What it means	Business Cost
TP — Predicted default, actually defaults	Loan correctly denied to a defaulter	−$1,000 (processing costs)
FP — Predicted default, actually repays	Good customer wrongly denied loan	−$13,000 (lost interest income)
FN — Predicted repay, actually defaults	Defaulter incorrectly given loan	−$1,013,000 (principal + lost interest)
TN — Predicted repay, actually repays	Good customer correctly approved	+$11,000 (interest income)

Leader takeaway: In lending, a False Negative costs 78× more than a False Positive. A model optimized for raw accuracy might still produce negative expected value if it's tuned to minimize the cheaper error type. Always ask: "What does each error type cost us?"

Derived KPIs from the Confusion Matrix

These metrics tell different stories. Know which one the business actually cares about.

Metric	Formula	What it measures
Accuracy	(TP + TN) / Total	Overall correctness. Misleading with class imbalance.
Precision	TP / (TP + FP)	Of all positive predictions, how many were right?
Recall (Sensitivity)	TP / (TP + FN)	Of all actual positives, how many did we catch?
F1 Score	Harmonic mean of Precision & Recall	Balance metric when both FP and FN matter.
FPR	FP / (FP + TN)	Rate of false alarms among innocents. Key fairness metric.
FNR	FN / (FN + TP)	Rate of missed catches among true positives.

Fairness Metrics — The ProPublica vs Northpointe Debate

Four Fairness Metrics — and Why They Can't All Be Satisfied Simultaneously

COMPAS Recidivism Analysis — Session 2 / Individual Assignment

These four metrics are often used to evaluate whether an AI model is "fair" across demographic groups. The fundamental problem: when base rates differ between groups, you cannot simultaneously satisfy all four.

Metric	Formula	Plain English	Who it protects
FPR (False Positive Rate)	FP / (FP + TN)	Of people who DON'T reoffend, what share is wrongly flagged high-risk?	The innocent — protects from wrongful classification
FNR (False Negative Rate)	FN / (FN + TP)	Of people who DO reoffend, what share is wrongly cleared?	Public safety — catches the dangerous
FOR (False Omission Rate)	FN / (FN + TN)	Of people cleared, what share actually reoffends?	Reliability of low-risk label
FDR (False Discovery Rate)	FP / (FP + TP)	Of people flagged high-risk, what share is actually innocent?	Reliability of high-risk label

ProPublica vs Northpointe — The Core Argument

ProPublica (2016): COMPAS is racially biased. Black defendants are nearly twice as likely to be wrongly flagged as high-risk (FPR: Black ~45%, White ~23%). False alarms systematically harm Black defendants through harsher bail/parole decisions.

Northpointe's response: COMPAS is calibrated fairly. Within any given score level, the actual reoffense rate is the same for Black and White defendants. Predictive parity holds — the score means the same thing regardless of race.

The impossibility: When base reoffense rates differ between groups (they do — due to systemic factors like policing patterns), you CANNOT simultaneously achieve equal FPR, equal FNR, and equal calibration. Choosing which fairness criterion to satisfy is a values decision, not a math problem.

In class: Be ready to argue both sides and then pivot to: "So what should a responsible leader do?" The answer isn't a single correct metric — it's transparency about the tradeoffs being made and accountability for the choice.

COMPAS Recidivism — Key Discussion Points

Q: How can a model be simultaneously "fair" and "biased" depending on how you measure it?

The Setup

COMPAS produces a recidivism risk score (1–10) used in bail and parole decisions
Northpointe built it; courts use it; defendants don't know how it works
Dataset: ~7,000 defendants in Broward County, Florida

ProPublica's Evidence

Black defendants: FPR = 44.9% (nearly 1-in-2 innocents wrongly flagged high-risk)
White defendants: FPR = 23.5%
White defendants with prior violent offenses were often scored lower than Black defendants with no prior offenses

Northpointe's Defense

Calibration holds: a score of 7 predicts ~70% reoffense rate for both Black and White defendants
Their definition of fairness: predictive accuracy should be equal across groups
Different base rates (Black: ~52% reoffend; White: ~39%) explain the FPR gap mathematically

The Leadership Question

If you were the county judge using COMPAS: what would you need to know before you trusted this score?
If you were the vendor: what disclosure obligations exist? What governance would you build?
If you were a regulator: which fairness definition do you enshrine in law?

ML Bias Taxonomy — Podcast Debrief Prep

Three Categories of ML Bias

ML Bias Podcast — Pre-class prep; also directly relevant to individual assignment Q6 (7 pts)

ML bias doesn't come from a single source. Knowing where each type enters the pipeline determines how to address it.

Bias Type	Category	Where It Enters	Example
Historical bias	Data	Training data reflects past injustice	AI trained on historical loan approvals inherits redlining patterns
Label bias	Data	Target variable is a proxy, not the true outcome	Re-arrest ≠ re-offense; COMPAS predicts re-arrest, which is affected by policing intensity
Sampling bias	Data	Training data over/under-represents groups	Facial recognition trained mostly on lighter-skinned faces; fails on darker-skinned faces
Proxy variable bias	Algorithm	Excluded protected attribute leaks through correlated features	Removing "race" from COMPAS; zip code and priors_count act as proxies
Feedback loop bias	Algorithm	Model output influences future data collection	Predictive policing sends more cops to flagged areas → more arrests there → "validates" the model
Automation bias	User	Humans defer to AI without critical scrutiny	Judges accepting COMPAS scores without examining underlying factors
Confirmation bias	User	AI confirms existing beliefs; contradicting evidence ignored	Hiring AI confirms existing workforce demographics as "ideal candidate"

Insight for your assignment report (Q6): The strongest answers don't just list bias types — they trace each bias to a specific source in the COMPAS workflow and explain what mitigation was or wasn't applied.

Managing AI Risks in Consumer Banking — Case Prep

Case Context

This case examines how a financial institution navigates AI deployment in consumer lending — credit scoring, loan approvals, fraud detection. The central tension: AI improves efficiency and profitability, but fair lending laws (ECOA, Fair Housing Act) impose constraints that pure accuracy optimization would violate.

Q: How should a bank balance AI-driven efficiency with fairness obligations and regulatory risk?

Key Tensions to Argue

Higher model accuracy may come at the cost of disparate impact on protected classes
Explainability requirements (regulators want reasons for denial) conflict with black-box model complexity
Over-reliance on alternative data (social media, purchase history) risks reproducing socioeconomic inequality
Third-party AI vendors create accountability gaps — who is liable when the model discriminates?

Frameworks to Apply

Addas Big 6: map each risk to its lever
Confusion matrix economics: what does a false positive cost in this context? (Denied a creditworthy customer)
Fairness metrics: which definition of fairness does banking regulation require?
Automation vs augmentation: should loan officers have override authority?

Strong Position to Take

AI in lending is not optional long-term — the productivity gains are too large
But black-box models are untenable under existing law — explainability is not optional
The answer: interpretable models (logistic regression, decision trees) with human-in-the-loop override and regular disparate impact audits

Pre-Class Checklist

Listened to the ML bias podcast — can name at least 3 bias types from it
Read the Managing AI Risks in Consumer Banking case — have an opinion on the central question
Know all six elements of the Addas Big 6 framework by name
Can explain FPR, FNR, FOR, FDR in plain English without looking at notes
Understand why ProPublica and Northpointe can both be "right" — the fairness impossibility theorem
Completed any pre-class submission required by the syllabus
Have a 2-sentence position ready on the banking case central question
Started the individual assignment — have the dataset open in Colab

Smart Questions to Ask in Class

Conceptual

"If two definitions of fairness are mathematically incompatible when base rates differ — who gets to decide which fairness definition wins, and on what basis?"

Forces a normative discussion beyond the technical. Shows you've absorbed the impossibility theorem and are thinking about governance, not just math.

Practical

"In the banking case, if regulators require model explainability, does that mean accurate black-box models are effectively illegal in consumer lending?"

Sharp regulatory question that has a real answer (ECOA adverse action notice requirements). Shows you're connecting the case to actual law.

Ethical

"The label 'recidivism' in COMPAS is actually re-arrest — but policing intensity differs by neighborhood. Does that make COMPAS fundamentally invalid as a risk score, regardless of its accuracy?"

Attacks the validity of the target variable itself — a more sophisticated critique than just pointing at FPR disparities. Directly relevant to individual assignment Q6.

Strategic

"For a bank deploying AI in credit scoring — what's the minimum governance structure you'd want to see before you'd sign off as the accountable executive?"

Puts you in the decision-maker seat. Shows leadership instinct, not just analytical critique.

Session 2: Managing AI Challenges & Risks — Part 1