« I don't know how to title this post | Main | Competition for my ethnic dining guide »

A Girl Named Florida

I've been reading Leonard Mlodinow's The Drunkard's Walk: How Randomness Rules our Lives.  The book covers the Monty Hall problem, Bayes's Theorem, availability bias, the illusion of control and so forth.  If these are unfamiliar, look no further for an entertaining account.

On the other hand, I can't say that I learned much I didn't already know.  Nevertheless, I still enjoyed reading the book - it's well written and filled with interesting nuggets (Did you know that the great mathematician Paul Erdos refused to believe that you should switch doors?).  If you teach probability theory or intro stats you will find lots of good examples to brighten up your lectures. 

One problem did intrigue me.  Suppose that a family has two children.  What is the probability that both are girls?  Ok, easy.  Probability of a girl is one half, probabilities are independent thus probability of two girls is 1/2*1/2=1/4.

Now what is the probability of having two girls if at least one of the children is a girl?  A little bit harder.  Temptation is to say that if one is a girl the probability of the other being a girl is 1/2 so the answer is 1/2.  That's wrong because you are not told which of the two children is a girl and that makes a difference.  Better approach is to note that without any additional information there are four possibilities of equal likelihood for the sex of two children (B,B), (G,B), (B,G), (G,G).  If we know that at least one is a girl we can remove (B,B) so three equally likely possibilities, (G,B), (B,G), (G,G), remain and of these 1 has two girls so the answer is 1/3.

Ok, now here is the stumper.  What is the probability of a family having two girls if one of the children is a girl named Florida?

At first it seems impossible that knowing the name should make a difference.  Surely, the answer is 1/3 just as before?  After all, every child has a name.  But knowing the name does make a difference.  Here's a hint, Florida is a rare name.

Posted by Alex Tabarrok on July 9, 2008 at 07:25 AM in Science | Permalink

Comments

Is it because Florida is a rare name, the girl named Florida is more likely to be the second girl? This makes (G,G) more likely than (B,G) or (G,B).

I hope I am not embarrassing myself with this argument.

Posted by: londenio at Jul 9, 2008 7:37:57 AM

I suspect that the answer to this question is in the wording.

"if one of the girls is named Florida"

That implies that there are two girls. Ergo, the probability that the family has two girls is 100%.

Posted by: at Jul 9, 2008 7:46:05 AM

Half of one of course.

But I'm going to name both of my girls Florida just to spite him!

Posted by: Jefferson Davis at Jul 9, 2008 7:50:35 AM

i guess a better wording would be:

"what is the probability that a family has two girls, if you know that they have at least one daughter named florida"

then you have to consider these options:

they have one girl named florida and a boy.

they have one girl named florida and another girl.

thus if you can remember the bayes formula (which i would have to look up - how embarassing) you can use the probability of a girl called florida, and the probability for the sex of the child...

Posted by: c8to at Jul 9, 2008 7:54:18 AM

actually i'm sure theres something missing there...heh =)

Posted by: c8to at Jul 9, 2008 7:56:25 AM

The girl named Florida problem (spoiler) was also discussed in a WSJ blog recently.

Posted by: at Jul 9, 2008 8:00:21 AM

What if the family's last name is 'Florida'?

Posted by: outback at Jul 9, 2008 8:05:33 AM

Presumably the rarity of the name means that you can rule out the possibility of both girls being called Florida. Sorry I'm too lazy to go beyond that...

Posted by: Bonapart O Cunasa at Jul 9, 2008 8:23:09 AM

P(G=2/G=florida) = P(G=FLORIDA/G=2)/P(G=florida). I guess the probability is pretty big if P(G=florida) id small.

Posted by: steve at Jul 9, 2008 8:51:55 AM

shouldn't the first "easy" question actually be slightly > 1/4 due to identical twins?

Posted by: Nate at Jul 9, 2008 9:02:16 AM

Does her dad have a scar on his cheek and an evil eye?

Posted by: david at Jul 9, 2008 9:03:11 AM

"What is the probability of a family having two girls if one of the girls is named Florida?"
Did you mean to ask 'what is the probability of a family having two girls conditional on their having a girl named Florida'/'Florida has one sibling, what is its sex?'

Suppose that 1/x girls are named Florida, and that the naming of each girl is independent of the others. With four family types (G,B; B,G; B,B; G,G) half of all girls are found within (G,G) families, so the probability of a family having two girls conditional on their having a girl named Florida is 50% less an adjustment for 'clumping' when two sisters are both named Florida. That adjustment is driven by (1/x^2), so when x is large (the name is rare), the probability approximates 50%. We might also neglect the clumping term based on the fact that parents would have to be very weird to give their daughters exactly the same name.

If we interpret the question to mean, 'what is the probability that the family with two kids has two girls, one of which is named Florida,' that probability will approximate to 1/2x.

Posted by: Carl Shulman at Jul 9, 2008 9:27:43 AM

My colleagues and I have spent lots of time arguing over this question as one of us insists on using it in interviews. I claim the problem is ill-defined, i.e. is not a yet a well-posed mathematical problem, because the even though the English sounds correct, the "preparation procedure" for the situation is ambiguous.

For example, is "Florida" a constant or a random variable? If I searched for a family with a girl of this name, making "Florida" a constant, then the answer is 1/2. If I took any family such that there is at least one girl and asked for a name and reported the answer "Florida" (making "Florida" a random variable), then this name provides no new information (after all every child DOES have a name) and the answer is 1/3.

Rarity of name plays no role here. We may as well assume all names are unique, for example perhaps names are DNA sequences.

In fact, one can extend this reasoning to the "easy" part of the problem, the first part. How did I choose this family before presenting the problem? Perhaps I choose the family to have EXACTLY one girl or EXACTLY two girls. Then the probabilities are 0 and 1 respectively. The reason that Part 1 seems unambiguous is that there seems to be a "canonical" preparation procedure that everyone assumes, i.e. the family is uniformly randomly chosen from all families having two children and at least one girl.

Posted by: AntiAntiCamper at Jul 9, 2008 9:29:26 AM

I think the probability of having a girl is not strictly 50:50, but more like 17/36.

Posted by: Tom at Jul 9, 2008 9:30:03 AM


My colleagues and I have spent lots of time arguing over this question as one of us insists on using it in interviews. I claim the problem is ill-defined, i.e. is not a yet a well-posed mathematical problem, because the even though the English sounds correct, the "preparation procedure" for the situation is ambiguous.

For example, is "Florida" a constant or a random variable? If I searched for a family with a girl of this name, making "Florida" a constant, then the answer is 1/2. If I took any family such that there is at least one girl and asked for a name and reported the answer "Florida" (making "Florida" a random variable), then this name provides no new information (after all every child DOES have a name) and the answer is 1/3.

Rarity of name plays no role here. We may as well assume all names are unique, for example perhaps names are DNA sequences.

In fact, one can extend this reasoning to the "easy" part of the problem, the first part. How did I choose this family before presenting the problem? Perhaps I choose the family to have EXACTLY one girl or EXACTLY two girls. Then the probabilities are 0 and 1 respectively. The reason that Part 1 seems unambiguous is that there seems to be a "canonical" preparation procedure that everyone assumes, i.e. the family is uniformly randomly chosen from all families having two children and at least one girl.

Posted by: AntiAntiCamper at Jul 9, 2008 9:30:53 AM

Thanks to c8to and others for pointing out that my wording was slightly off. The question, as most people understood, is indeed "What is the probability of a family having two girls if one of the children is a girl named Florida?"

I will post an answer later today.

Posted by: Alex Tabarrok at Jul 9, 2008 9:53:29 AM

Although I am just an organic chemist, and we never use statistics or probability (thank God) I agree with AntiAntiCamper. Everyone is rare along some axis, so what is true is the Florida girl is true of any sibling for a slightly different wording of the problem. In other words, as AntiAntiCamper points out, since rare is not numerically defined, one can use DNA data to arrive at the limit of uniqueness (or close to it, say 1 out of 80 billion). Since every human who is not an identical twin is genetically unique, we can make a "Florida" like statement about any sibling.

Posted by: rhodium at Jul 9, 2008 9:53:48 AM

"Ok, now here is the stumper. What is the probability of a family having two girls if one of the children is a girl named Florida?"

100%. The answer was revealed on a re-run of Good Times.

Posted by: Mike Moffatt at Jul 9, 2008 10:01:11 AM

I have a comment/question for those here who are more versed in probability than I am.

I claim that it does not matter if you know or do not know which child (older/younger) was revealed to you UNLESS which child is to be revealed is part of the conditions of the test. It does you no good to be told afterward which child was revealed to you. Here is my reasoning:

You have a family with "at least one girl." As we have seen above, this means that there is a 1/3 possibility of the other child being a girl. The reason for this is that only one of the four cases can be absolutely rejected (BB). The other three remain and are all equally likely. This appears to me to be totally sound.

However, if we are now told "the older child is a girl," the claim is that the probability changes to 1/2 that the other child is also a girl because we can now also reject BG, leaving two equally likely possibilities.

But if this claim is true, then I believe that the following line of reasoning is valid:

At least one of the children is a girl and therefore there must be a girl child that is the older or younger child. We have no information about which child she may be, so she is 50% likely to be the older child and 50% likely to be the younger child. If the above claim is true, then if we know that she is the older child then the younger child is a 50% shot to be another girl and if we know she is the younger child than the older child is a 50% shot to be another girl. Since 50% * 50% + 50% * 50% = 50%, I claim that even before we know whether the child is the older or younger child, the probability is 50% that the unknown child is a girl. But this is a contradiction, since it has been established that the probability before we know which child it is is only 1/3. Therefore, I claim that it does no good to know which child is the girl, UNLESS that requirement is stipulated as part of the test.

In other words, if you consider all cases where the older child is a girl, half are immediately rejected because the older child is a boy and the probability is 1/2 that the other child is a girl. However, if either the older or younger child or both could be a girl and you are simply told which it is after knowing that the family has "at least one girl," meaning that you could be told either "older" or "younger," you have not actually acquired any new information and the probability remains 1/3.

I feel that I am wrong about this. I feel that this is likely because I am moving from "at least one child" to positively selecting one of the children to be the "at least one child," and that this is probably an error. Nonetheless, I would like feedback. I'm not scared of math, so if you need to use it, by all means, do so.

Posted by: John Lynch at Jul 9, 2008 10:01:51 AM

I am totally in AntiAntiCamper's camp. This is a famous problem that turns strongly on the wording and the assumptions made. Frustratingly, people often refuse to see through their own assumptions, and think that people who disagree with them don't understand the math.

Rarity of name plays no role, completely agreed. To ascribe "rarity" to "Florida" is to ignore that I could say something rare or unique about anyone. First-middle-last. SS#. "A man has two children, at least one of which is a girl who stands precisely 5'3" tall, weighs 115 lbs 2 oz, and has precisely 103,618 hairs on her head." If you could say those things about anyone, it doesn't define rarity in a probabilistically useful way. It's a false distinction.

Although I should also say that I don't necessarily agree with 1/3, for many of the reasons that AntiAntiCamper gets at. It's the assumptions behind how the problem is presented. When you are told "at least one of whom is a girl" how do you know that you might not have been told "at least one of whom is a boy" if the circumstances were different? You could say "at least one of whom is x" about ANY family with two children, and therefore the statement without some other context (say, a girl scout picnic or a study of ovarian cancer) loses all informational value.

In the absence of more context, distinguishing between 1/3 and 1/2 with righteous mathematical certainty is a dangerous game of overconfidence and miscalibration.

Posted by: addisonbr at Jul 9, 2008 10:03:22 AM

If a family already has a girl named Florida, why would they also name their other girl Florida? You'd think this is a trivial point but it completely affects the general case of Mlodinow's solution, which he gave in a comment in the WSJ link above.

He writes,

"According to the rules of conditional probability the chances a family has two girls if it has a girl named Florida are thus:

(Total Probability of GF-GN, GN-GF, and GF-GF) / (Total Probability of all 5 of the above events) =
= .25x*x + 2*.25x*(1-x) / [.25x*x + 2*.25x*(1-x) +2*.25x]
= [2x-x*x] / [4x – x*x]"

x is the probability that a girl would be named Florida. So if x is very small, then the solution approaches 1/2. However, as x increases, the solution decreases.

But what if the question was,

"What is the probability of a family having two girls if one of the girls is named Jen?"

Assuming that 5% of girls are named Jen, the answer would be 49.4%. Close to 50%, but not a trivial difference.

If 10% of girls are named Jen, then the answer would be 48.7%. But this assumes, per Mlodinow's solution, that 1% of all families with two girls would name both of their girls Jen!

So Mlodinow's solution doesn't really work for cases like "girls named Florida" where names aren't given out randomly. But it still works if "girls named Florida" is replaced with a trait that is truly random, even if the solution is somewhat counterintuitive.

*And yes, regardless of whether a family would name both of their daughters Florida the answer is still approximately 1/2, but the way Mlodinow arrives at his solution assumes names are given out randomly.

Posted by: Hei Lun Chan at Jul 9, 2008 10:06:13 AM

"Suppose that a family has two children. What is the probability that both are girls?"

Imagine a country in which parents are particularly desirous of having at least one male child. Then parents who have two girls will be more likely to have a third child than will be parents of two kids who have at least one boy. So if we look at all two-child families, we should find that fewer than 1/4 of them will consist of two girls. (Maybe?)

Posted by: Jim at Jul 9, 2008 10:07:50 AM

One half. Because families with two girls have two chances to name one "Florida" while those with only one girl have only one chance. It doesn't quite work out because there is some chance that both willbe named "Florida" (in a strict probability sense) but that's why it's important that the name is rare, so th chance of that is negligible.

Posted by: Bernard Yomtov at Jul 9, 2008 10:08:43 AM

Okay now that I'm done with my nitpick let me defend Mlodinow's solution. I was in AntiAntiCamper's camp until I spent way too long working out the math. Assuming names are truly random ...

Consider a population of 4 million families with two children each, and that girls have a 1/1000 chance of being named Florida. There will be 1 million families with an older girl and younger boy, 1 million with an older boy and younger girl, 1 million with 2 girls, and 1 million with 2 boys. From there, we will have:

1000 families with a Florida and a younger brother
1000 families with a Florida and an older brother
999 families with a Florida and a younger non-Florida sister
999 families with a Florida and an older non-Florida sister
1 family with two Floridas

Of these 3999 families, 1999 families will have two girls. 1999/3999 = 0.49987 ~ 0.5.

Note that as the name Florida gets more common, the number of families with one Florida and one non-Florida girl decreases in relation to the number of families with a Florida and a boy (while the number of families with two Floridas increases at a slower rate), so the rarity of the name does affect the probability. Yes, it doesn't seem to make much sense in a way, but it's still true.

For the vigorous math see Mlodinow's solution in the comments here.

Posted by: Hei Lun Chan at Jul 9, 2008 10:31:22 AM

I have Bruce Schechter's book on Erdos, and he says that Erdos was convinced a few days later by Ron Graham at Bell Labs.

Posted by: Michael Webster at Jul 9, 2008 10:39:53 AM

Post a comment