MBUS 854 — Session 3 Prep

Session Agenda

Active Day — Prepare for Multiple Activities Session 3 includes: the Deltacore interactive case (65 min), AI in a Minute team presentations (two rounds), and AWS PartyRock hands-on exercise. This is the highest-activity session of the course.

1

Gen AI Landscape & Trends

Who the players are, what's changed since 2023, Stanford AI Index 2026 highlights.

2

How LLMs Work

Tokenization, next-token prediction, temperature. What you can and can't control.

3

AI in a Minute Presentations — Round 1

Teams MSA, TO-E, VE-A present. Class votes and scores.

4

Deltacore Analytics Case (65 min)

Interactive Ivey case. Role-play as Deltacore leadership navigating a GenAI strategy decision.

5

The Gen AI Process Framework

Five pillars: model choice, context design, prompt design, data quality, safeguards.

6

AI in a Minute — Round 2 + AWS PartyRock

Teams VC, TO-C, OTT present. Hands-on PartyRock prototype exercise.

Gen AI Player Landscape

The Major Players — Know Who Does What

The Gen AI market has consolidated around a small number of dominant frontier model providers, with a larger ecosystem of fine-tuned and open-source models around them. Know the key players and their strategic positioning.

OpenAI

GPT-4o, o1, o3

Closed

Microsoft-backed. API leader. ChatGPT = dominant consumer product. DALL-E for images. Sora for video.

Anthropic

Claude 3.5 / Claude 4

Closed

Amazon + Google-backed. Safety-focused. Strong at long documents, coding, reasoning. Powers AWS Bedrock's premier model option.

Google DeepMind

Gemini 1.5 / 2.0

Closed

Integrated in Google Workspace, Colab, Search. Gemini Ultra is the frontier model. 1M token context window.

The Three Counter-Intuitive Trends

Stanford HAI AI Index 2026 — referenced in Session 3 slides (mbus854_03_sv_v1.1.pdf)

The Stanford AI Index tracks AI progress annually. Three 2026 findings are particularly relevant for leaders:

Trend	What's Happening	Leader Implication
Training cost: up	Frontier model training costs have grown 10-100× since 2020. GPT-4-scale training runs cost $50–100M+. Only a handful of companies globally can afford frontier model development.	Your org will be a consumer of frontier models, not a builder. The strategic question is which provider to bet on, not whether to build your own.
Inference cost: down	Cost per 1M tokens dropped 10-100× in 2024 alone. Running AI is becoming close to free. This is what enables mass enterprise deployment.	Cost is no longer the barrier to AI adoption at scale. Speed of integration, governance, and change management are the new bottlenecks.
Emissions: up sharply	AI training and inference are now material contributors to tech company emissions. A single GPT-4 training run emits as much CO2 as ~300 transatlantic flights.	ESG-conscious boards and regulators are starting to ask about AI's carbon footprint. Factor this into vendor selection and model sizing decisions.

How LLMs Work — The Mechanics Leaders Need

Tokenization

Session 3 Slides

LLMs don't read words — they read tokens. A token is roughly 4 characters or 0.75 words. "Unbelievable" might be 3-4 tokens; "AI" is 1 token. This matters because:

Context window limits are measured in tokens, not words ("128K context" = ~96,000 words)
API costs are priced per token — verbose prompts are more expensive
Rare words and technical jargon cost more tokens than common words
The model processes tokens, not semantic meaning — it has no "understanding" in the human sense

Next-Token Prediction

Session 3 Slides

At its core, an LLM is a system trained to predict: "given all the tokens so far, what token comes next?" It does this by learning a probability distribution over its vocabulary (~50,000+ tokens) and sampling from it.

This means:

The model doesn't "plan" a response — it generates one token at a time, left to right
It can't go back and revise — each token commits to the next
Hallucination is a natural failure mode: the model generates the most probable next tokens, even if they're factually wrong
Longer context = more conditioning information = generally better outputs

Temperature — The Creativity Dial

Session 3 Slides

Temperature controls how much randomness the model introduces when sampling from the probability distribution:

Temperature	Behavior	Best For
0	Always picks the highest-probability token. Deterministic — same input → same output every time.	Factual extraction, code generation, classification
0.3–0.7	Slightly varied. More natural language, some creativity.	General business writing, summarization
1.0+	High randomness. Creative but potentially incoherent or factually wrong.	Brainstorming, creative writing, ideation

The Gen AI Process — 5 Pillars

Framework Overview

This is Dr. Addas's framework for structuring how organizations should think about deploying Gen AI. The five pillars are interdependent — weakness in one undermines the others. This will be directly relevant to the group project.

1

Model Choice

Which foundation model fits the task? Cost, capability, latency, data privacy.

2

Context Design

System prompt, retrieved documents, conversation history. What the model knows before you ask.

3

Prompt Design

How you frame the query. Role assignment, few-shot examples, chain-of-thought instructions.

4

Data Quality

Retrieval-augmented generation (RAG) quality. Garbage context produces garbage answers.

5

Safeguards

Output filters, human review checkpoints, jailbreak resistance, hallucination detection.

Pillar Deep Dives — What Leaders Need to Know

Session 3 Slides + Group Project Relevance

Model Choice: Bigger ≠ better for every task. A smaller, cheaper model fine-tuned on your domain often outperforms GPT-4 on specific tasks at 1/100th the cost. The tradeoff is: fine-tuning requires data, expertise, and maintenance. For most MBA projects → use a frontier model via API.

Context Design: The system prompt is the most powerful lever most people don't use well. A well-crafted system prompt (role, constraints, output format, examples) can eliminate 80% of prompt engineering from individual queries.

Prompt Design: Key techniques: zero-shot (just ask), few-shot (give examples), chain-of-thought (ask it to think step-by-step), role prompting (you are a...). Chain-of-thought dramatically improves performance on reasoning tasks.

Data Quality (RAG): Retrieval-Augmented Generation = give the model your documents at query time instead of fine-tuning. Faster to deploy, easier to update, more auditable. The #1 enterprise pattern for custom AI applications.

Safeguards: Output filters (block toxic content), constitutional AI (model trained to refuse harmful requests), human-in-the-loop for high-stakes outputs, and logging/audit trails for compliance.

In-Context Learning vs Fine-Tuning

In-Context Learning (ICL)

No model weight updates — the model learns from examples in the prompt at inference time
Zero-shot: just ask without examples
Few-shot: provide 3–10 examples in the prompt
Fast to iterate — change the prompt, change the behavior
No proprietary data leaves your prompt
Limited by context window size
Works well for: summarization, classification, format conversion

Fine-Tuning

Model weights are updated on your specific dataset
Creates a custom model with domain-specific knowledge baked in
More consistent, reliable outputs for specialized tasks
Requires labeled training data (expensive to create)
Higher upfront cost; maintenance burden when base model updates
Data privacy risk if using third-party fine-tuning services
Works well for: specialized writing style, domain Q&A, structured extraction

Decision rule: Start with in-context learning (it's free to try). Only invest in fine-tuning if ICL consistently fails after serious prompt engineering effort, AND you have 1,000+ high-quality labeled examples, AND the task is high-enough volume to justify the cost.

Deltacore Analytics Case — Interactive Session Prep

Case Overview

Deltacore Analytics is an Ivey Publishing interactive case. The format is different from a standard case — you'll be navigating decisions in real-time, not discussing a static narrative. The session is 65 minutes and involves role-playing as leadership navigating a Gen AI strategy decision.

The case explores: When should a B2B analytics company build proprietary Gen AI capabilities vs. integrating existing LLM APIs? And within that: automation vs augmentation — does Gen AI replace Deltacore's analysts or make them more valuable?

Q: Should Deltacore build its own LLM, fine-tune an existing model, or use APIs — and how does this affect its competitive moat?

Core Tension: Cost Efficiency vs Differentiation

Using GPT-4 API is fast and cheap — but any competitor can do the same. No moat.
Fine-tuning on proprietary client data creates a moat — but raises data governance questions and requires ongoing maintenance
Building a foundation model is reserved for companies with billions in compute budget — not a realistic option for Deltacore
The real differentiation isn't the model — it's the data, the workflow integration, and the client trust

Core Tension: Automation vs Augmentation

Full automation: Gen AI replaces junior analysts → short-term cost savings, long-term talent pipeline problem
Augmentation: Gen AI accelerates analyst output → each analyst handles 3× the workload → revenue per headcount improves
Client trust depends on human experts. If clients learn "it's just ChatGPT," perceived value drops.
Augmentation preserves the human brand promise while capturing AI productivity gains

Positions to Prepare

The case will ask you to recommend a build/buy/partner decision — have a defensible position before you walk in
Apply the Gen AI Process framework: what does Deltacore's ideal context design look like? What safeguards are needed for client-facing outputs?
Think about which fairness/ethics considerations apply to an analytics firm (client confidentiality, model hallucination risk in reports)

Interactive Case Format You may be asked to vote, role-play a specific executive persona, or make real-time decisions with limited information. Prepare by having a clear point of view on the core question — wishy-washy middle grounds tend to lose in interactive case formats.

AI in a Minute — Presentation Format

Format Overview

Each team presents a 60-second summary of a current AI article (HBR, MIT Sloan, or similar). The goal: distill a complex AI topic into something a non-technical executive would find immediately actionable. Class provides scores and feedback.

The grading criteria typically reward: clarity, relevance to business leaders, one sharp insight (not a Wikipedia summary), and time discipline.

Round 1 Teams

Team MSA
Team TO-E
Team VE-A

Round 2 Teams

Team VC
Team TO-C
Team OTT

For presenting teams One sharp insight beats three generic ones. End with a question or provocation, not a summary. 60 seconds is shorter than you think — rehearse with a timer.

For non-presenting teams Good peer feedback scores points. Prepare 1 genuine question or push-back for each presenting team. Engage; don't coast.

AWS Bedrock / PartyRock — Course Tool

What You Need to Know Before Session 3

PartyRock is AWS's no-code Gen AI builder (bedrock.aws.amazon.com/partyrock). You can build a working AI app — chatbot, document analyzer, recommendation engine — in under 30 minutes with zero code. It's the fastest way to understand what Bedrock can do.

Bedrock is the enterprise layer: a managed service that gives API access to 30+ models from Anthropic, Amazon, Meta, Mistral, and others through a single endpoint, with enterprise security and compliance controls.

Models Available on Bedrock

Anthropic Claude — Premier option for reasoning, writing, long docs
Amazon Titan — Cost-effective for embeddings and basic tasks
Amazon Nova — Amazon's newer multimodal models
Meta Llama — Open-source option, lower cost
Mistral — Efficient for structured tasks
Stability AI — Image generation

Why Bedrock for the Group Project

Single API, multiple models — switch models without rewriting code
Enterprise data privacy — inputs don't train models
RAG built-in (Bedrock Knowledge Bases)
Guardrails feature for content moderation
Agents capability for multi-step workflows
AWS Free Tier covers course-level usage

Pre-Class Checklist

Know the major Gen AI players and what differentiates each (OpenAI, Anthropic, Google, Meta, DeepSeek)
Can explain in plain English: what is a token, what is temperature, how does next-token prediction work
Know the five pillars of the Gen AI Process framework
Read the Deltacore case (Ivey Publishing) — have a recommendation ready on the build/buy/partner question
If presenting in AI in a Minute — practiced to under 60 seconds with a timer
If not presenting — have 1 question ready for each presenting team
AWS account created and PartyRock accessible (bedrock.aws.amazon.com/partyrock)
Can articulate the difference between in-context learning and fine-tuning
Individual assignment notebook open in Colab — EDA section underway

Smart Questions to Ask in Class

Strategic

"If inference costs keep dropping toward zero, does that change the build vs buy calculus for a company like Deltacore — or does it actually make the API route more attractive, not less?"

Applies the Stanford AI Index trends directly to the Deltacore decision. Shows you're connecting macro trends to specific strategic choices.

Technical

"Temperature controls randomness at inference time — but what controls factual accuracy? Is there a way to make an LLM more reliable, or is hallucination an irreducible property of the architecture?"

Gets at RAG and grounding — the honest answer is that RAG reduces hallucination but doesn't eliminate it. Shows you've thought past the surface mechanics.

Deltacore

"If Deltacore's value proposition is expert human analysis — and Gen AI makes that analysis 3× faster — do they raise prices, lower prices, or hire fewer analysts? All three have different strategic implications."

Forces the class to work out the economics of augmentation. There's no clean answer — that's the point. Shows sharp strategic thinking about the augmentation vs automation spectrum.

Ethical

"DeepSeek matches GPT-4 performance at a fraction of the cost and is open source — should North American companies use it, or does the China origin create data security concerns that override the cost savings?"

Geopolitical dimension of model selection — real consideration in enterprise vendor selection. Shows you're thinking beyond the classroom into actual deployment decisions.

Session 3: Working & Building with AI — Generative AI

Gen AI Landscape & Trends

How LLMs Work

AI in a Minute Presentations — Round 1

Deltacore Analytics Case (65 min)

The Gen AI Process Framework

AI in a Minute — Round 2 + AWS PartyRock

The Major Players — Know Who Does What

The Three Counter-Intuitive Trends

Tokenization

Next-Token Prediction

Temperature — The Creativity Dial

Framework Overview

Pillar Deep Dives — What Leaders Need to Know

In-Context Learning (ICL)

Fine-Tuning

Case Overview

Core Tension: Cost Efficiency vs Differentiation

Core Tension: Automation vs Augmentation

Positions to Prepare

Format Overview

Round 1 Teams

Round 2 Teams

What You Need to Know Before Session 3

Models Available on Bedrock

Why Bedrock for the Group Project