Friday, January 30, 2009

Probabilities and Innumeracy

As part of the National Too Much Information Week, I'm explaining some of my blogging habits.  I have a few searches looking for interesting posts in the blogosphere, and a feed reader across a number of programming-relevant sites, and I glean interesting posts from them as fodder for my posts.  It's my small part of making sure the blogs are a self-referential recursive rabbit hole.

I read this a few weeks ago, and having taken a number of math and biology classes in my years, immediately saw that the correct answer was 2/3s.  But reading the comments really depressed me.  Not because so many people jumped to the naive conclusion of 1/2, but that they are seemingly resistant to learning why the naive answer is wrong.

I'm not talking about the people who take the "commonsense" approach that the father saying "One of them is a girl" meaning "only one" is a girl - most of them seem to agree that is a semantic issue.  But those that continue to insist that with 4 outcomes, the odds of 2 of them don't add is just dumbfounding.

Just for those not wanting to just through the long discussion, here's the short form correct answer:

Dad says "I have 2 kids, and one of them is a girl.  What are the odds that I have one son?"

the grid:
             1st B     1st G
2nd B     BB         GB
2nd G     BG        GG

That is the sum total of possibilities for the family makeup

Now, the odds of a given child being male is 50% (overall, ignoring the slight differences of survival, etc), so each "box" of the grid has a 25% chance of being the "real" family makeup.  But Dad says he's got one girl, so we can exclude the upper left box, leaving 3 boxes each with 25% chance.  Now, note that there are 2 boxes with mixed kids, and one with only girls, so the odds of having a boy are 2/3s.

I think I have found the problem that everyone who says 50% is running into - they are flipping the meaning of "one of them is a girl" - they are taking this to mean they get to pick one of the BG or GG cells, but they can't do that with an even probability - all they can really do is exclude the BB cell.  In other words "one of them is a girl" only uniquely identifies a single cell for exclusion.

[later edit]
Here's a small Python program that details the logic, and shows how the bad assumptions work.  It's especially useful because the bad assumptions play directly to the numbers of boys and girls in the generated families. (some lines got folded, so if you cut&paste this, be awa

# 2kids - run simulations to test
# generate 1000 random 2-kid families
# 0 = boy, 1 = girl

import random

# generate child
def genChild():
     sex = random.random()
     if sex < 0.5:
         return 0
         return 1

# define Family class
class Family:
     def __init__(self):

     # generate family
     def genFamily(self):
         self.child1 = genChild()
        self.child2 = genChild()

     # see if there is at least one girl
     def hasAGirl(f):
         if(f.child1 == 1 or f.child2 == 1):
             return 1
         return 0

     # see if there is a boy
     def hasABoy(f):
         if(f.child1 == 0 or f.child2 == 0):
             return 1
         return 0

familyList = []
for f in range(1000):
allBoys = 0
allGirls = 0
for family in familyList:
             allGirls += 1
             allBoys +=1
print "Family breakdown: BB = ", allBoys, " GG = ", allGirls," mixed = ", 1000 - (allBoys + allGirls)

brothers1 = 0
for family in familyList:
             brothers1 += 1
print "filter by hasAGirl first, then hasABoy, number of familes with boys = ", brothers1

brothers2 = 0
for family in familyList:
    if(family.hasAGirl() == 0):
         brothers2 += 1

print "filter by !hasAGirl first, then hasABoy, number of familes with boys = ", brothers2

print "filtering by eliminating all-boy families first"
brothers3 = 0
familiesWithGirls = []
for family in familyList:
print "# of families that have 1 or more girls = ", len(familiesWithGirls)

for family in familiesWithGirls:
         brothers3 +=1
print "# of families with 1 or more girls that have one boy = ",brothers3
print "% of families with 1 or more girls that have one boy = ", ((1.0 * brothers3) / en(familiesWithGirls)) * 100

Technorati Tags --
, , ,

No comments: