Big Idea 5 – Computing Bias

Mar 17, 2025 • Avika, Gabi, Zoe


What is Computing Bias?

Bias: A prejudice in favor of or against a person or group in a way that is usually unfair.

Computing Bias occurs when algorithms or systems produce results that disadvantage certain groups. It often arises from:

  • Biased or incomplete data
  • Flawed design
  • Unintended consequences of programming choices

Example: Netflix Recommendation Bias

Netflix uses algorithms to recommend content, but those algorithms can introduce bias by:

Majority Preference Bias

  • Recommends only popular shows, hiding niche or diverse options.

Filtering Bias

  • Filters out content based on limited viewing history.
  • If you mostly watch rom-coms, you may never see documentaries or foreign films.

How Does Computing Bias Happen?

  1. Unrepresentative or Incomplete Data
    • Models trained on limited datasets don’t reflect real-world diversity.
  2. Flawed or Biased Data
    • If existing data includes prejudice (e.g., historical hiring patterns), the system learns and repeats those biases.
  3. Biased Data Labeling
    • Human annotators may unconsciously inject cultural or personal bias during labeling.

Explicit vs. Implicit Data

Type Definition Netflix Example
Explicit Data Data directly provided by users Entering your name, age, or rating a movie
Implicit Data Data inferred from user behavior Viewing history, time spent watching, click patterns

Why It Matters:

  • Implicit data can reinforce user habits, creating feedback loops that limit discovery.
  • Explicit data may still be biased if limited by design or user understanding.

Popcorn Hack #1

Question: What is an example of Explicit Data?
Options:
A) Netflix recommends shows based on your viewing history
B) You provide your name, age, and preferences when creating a Netflix account
C) Netflix tracks the time you spend watching certain genres

Answer: B – This is explicit data, because it’s provided directly by the user.


Types of Bias

Algorithmic Bias

  • Comes from faulty system logic that repeats discrimination.
    Example: Amazon’s hiring tool favored men because it was trained on past hiring data that was male-dominated.

Data Bias

  • Arises when training data is incomplete or unbalanced.
    Example: A health AI system underestimates disease risk for underrepresented groups.

Cognitive Bias

  • Introduced by researchers or developers due to personal assumptions.
    Example: A researcher only selects data supporting their belief about screen time affecting grades.

Popcorn Hack #2

Question: What is an example of Data Bias?
Options:
A) A hiring algorithm favors men due to biased past resumes
B) A dataset underrepresents people with darker skin tones
C) A researcher selects data that supports their screen time theory

Answer: B – Underrepresentation in data leads to performance issues for certain groups.


Intentional vs. Unintentional Bias

Intentional Bias

  • Purposefully embedding prejudice to favor one group.
    Example: A hiring algorithm is designed to rank resumes from certain schools or companies higher, favoring specific demographics.

Unintentional Bias

  • Occurs accidentally due to flawed datasets.
    Example: A facial recognition tool trained on mostly light-skinned faces struggles to recognize darker skin tones—not due to intent, but poor data variety.

Popcorn Hack #3

Activity: Describe a biased scenario. Have classmates guess: was it intentional or unintentional?


Mitigation Strategies

To reduce bias in algorithms, apply these techniques at every phase:

1. Pre-processing (Planning & Data Collection)

  • Check for data diversity and completeness
  • Remove irrelevant or biased variables

Goal: Prepare balanced data to avoid bias in training.


2. In-processing (Training & Validation)

  • Use cross-validation
  • Add synthetic data to represent minorities

Goal: Ensure fairness during model development.


3. Post-processing (Deployment & Real-World Use)

  • Monitor system performance
  • Adjust output if unfair results appear

Goal: Maintain equity as the model operates in real settings.


Homework Questions

Multiple Choice

(Each worth 0.1 points)

  1. Which phase includes inserting synthetic samples?
  2. What is an example of cognitive bias?
  3. What’s the key difference between implicit and explicit data?
  4. Which type of bias occurs due to flawed system logic?

(More questions provided in-class or online)


Short-Answer

Prompt:
Explain the difference between implicit and explicit data. Give an example of each.

Scoring Rubric (Total: 1.0 point):

Criteria Description Points
Multiple-Choice (7 total) 0.1 point each 0.7
Short-Answer - Clarity Clear explanation 0.15
Short-Answer - Examples Two accurate examples provided 0.15

Suggested File Name