Sunday, March 18, 2012

A Statistics Puzzle

March 18, 2012
A Fun DIY Science Goodie: Proof Yourself Against Sensationalized Stats

For example, in his very good monthly column Devlin’s Angle, he quotes the following problem, originally designed by puzzle master Gary Foshee: “I tell you that I have two children, and that (at least) one of them is a boy born on Tuesday. What probability should you assign to the event that I have two boys?”

13 out of 27. No joke. Tuesday really does matter. I offer you a table as proof.


Click to enlarge.

I used the often neglected but almost always useful "When In Doubt, Brute Force It" method. ;)

Note that there are 27 open possibilities in the table. 13 out of 27 are cases where there are two boys.

Hat tip to a friend for sending me this link. It took longer than I would like to admit for me to prove this to myself. Great puzzle.


Update:

My "proof" is now in doubt. JeffJo has made some interesting points in the comments and I'm leaning his way now.

Update #2:

I'm now convinced that the correct answer is not 13 out of 27 but is instead 1 out of 2. The 13 out of 27 only works if we poll the group. It does not work if the parents freely offer us information.

Here's the extended reasoning. We will limit our group to parents who would offer us information in the following form.

I tell you that I have two children, and that (at least) one of them is a [gender of child] born on a [day of the week].

Each parent has a choice to make. They can only tell us about one of their two children using this sentence structure. The following chart shows the impact of the parent's choice.


Click to enlarge.

As an example, let's discuss parents with one child #1 being a boy born on a Tuesday and child #2 being a girl born on a Friday. These parents must choose how to tell us this information. Half of them will tell us they have at least one boy born on a Tuesday. The other half will tell us they have at least one girl born on a Friday. That's represented as 50% in the chart.

Here's where it gets interesting. One set of parents has child #1 being a boy born on a Tuesday and child #2 being a boy born on a Tuesday. They will *always* tell us that they have at least one boy born on a Tuesday. That's represented as 100% on the chart.

Now that we've made the chart, let's do a real world example. Let's say there are 196,000 parents. If distribution is perfect, then each cell in the chart above holds exactly 1,000 parents.

Going back to my previous example, 1,000 parents will have child #1 being a boy born on a Tuesday and child #2 being a girl born on a Friday. However, only 500 of these parents would tell us about the boy. The other 500 would tell us about the girl. Symmetry demands this.

1,000 parents have child #1 being a boy born on a Tuesday and child #2 being a boy born on a Tuesday. All 1,000 of these parents will therefore tell us that they have at least one boy born on a Tuesday.

If we now add up all the parents who would tell us the original statement (using the table) then we will find the following.

14,000 parents out of 196,000 parents would tell us that they have at least one boy born on a Tuesday. Of those 14,000 parents, 7,000 have two boys.

“I tell you that I have two children, and that (at least) one of them is a boy born on Tuesday. What probability should you assign to the event that I have two boys?”

The article in Scientific American is wrong. The answer really is 1 out of 2.

I have added the "my personal blunders" tag to this post. Fair is fair. I missed a key aspect of this puzzle when I first tried to solve it.

51 comments:

Stagflationary Mark said...

I should add that when you see an "x" on the chart, that means that it cannot happen.

Troy said...

IIRC there's a countervailing probability out in the real-world distribution of sexes & birth dates that returns the probability to the expected 1/2, both for the Tuesday case and gender in general.

A priori, we have to assume Foshee is more likely to have at least one boy in his family (out of the distribution BB, BG, GB, GG), and less likely to have a child born on a Tuesday.

Or something like that.

Stagflationary Mark said...
This comment has been removed by the author.
Stagflationary Mark said...

Troy,

I think that does alter the 13/27 math in a significant way. It wouldn't balance the 1/3 math though.

Human Sex Ratio

Some of that balancing is undone by the higher death rates of male children. I guess we really need to know the age of the children.

Stagflationary Mark said...

I remember giving birth to my "boy" on that Tuesday like it was yesterday. It was SO exciting!

In that event, I'm going to lean towards the other child being a girl. ;)

Fun with statistics!

Troy said...

It wouldn't balance the 1/3 math though.

"There are only two possibilities when I win the lottery. Either I win, or I lose. Therefore my chances of winning are 50%!"

Doing more reading, I like this argument:

The correct distribution of the 2nd child question is:

The known boy has a younger sister.
The known boy has an older sister.
The known boy has an older brother.
The known boy has a younger brother.

1/2 not 1/3

The known boy (born on Tuesday) has a younger sister.
The known boy (born on Tuesday) has an older sister.
The known boy (born on Tuesday) has an older brother.
The known boy (born on Tuesday) has a younger brother.

1/2 not 13/27

Stagflationary Mark said...

Troy,

The correct distribution of the 2nd child question is:

The known boy has a younger sister.
The known boy has an older sister.
The known boy has an older brother.
The known boy has a younger brother.


The only thing we can safely rule out is that there aren't two girls. There can be BG, GB, or BB and they are all equally likely.

The reason is that we don't really have a known boy. We have an any boy. That's a subtle but important difference to me.

This is a bit similar to the Monty Hall problem.

Give that one a shot too. Conditional probabilities are definitely tricky.

Stagflationary Mark said...

Many readers refused to believe that switching is beneficial. After the Monty Hall problem appeared in Parade, approximately 10,000 readers, including almost 1,000 with PhDs, although there was a small loophole in Ms. vos Savant's wording, as detected by Monty Hall, where the game show host didn't have to open an empty door and offer the switch. But the people who got the answer wrong didn't even notice the loophole, according to Ms. vos Savant.[1] wrote to the magazine claiming that vos Savant was wrong. (Tierney 1991) Even when given explanations, simulations, and formal mathematical proofs, many people still do not accept that switching is the best strategy.

It is the best strategy to switch. I believe that with every fiber of my being.

Stagflationary Mark said...

For what it is worth, I do not have that same level of confidence in this puzzle. It feels so wrong on an intuitive level.

I thought the Monty Hall one was a bit easier.

You pick Door #1. He shows you the goat behind Door #2. He asks if you want to switch to Door #3. I say switch.

You had a 1/3rd chance of winning originally. He offered you new information that was absolutely useless at face value (he can always show you the contents behind a crappy door since there are two of them and he knows where they are).

If that's the case, then you still have a 1/3rd chance of winning if you stick with Door #1. We've ruled out Door #2. That means that Door #3 must have a 2/3rds chance of holding the good item.

That's my way of thinking about it anyway, and was the day that I first saw that puzzle in Parade.

AllanF said...

Not sure the Monty Hall and conditional probabilities apply here. With Monty Hall, opening the door precludes that door, thereby changing the probabilities of the remaining doors.

In this case, a boy born on Tuesday does not preclude the other child also being born on Tuesday.

... which is why I always hated prob (no one can ever agree on what to "count") & could never understand why they lumped it in with stat.

Stagflationary Mark said...

AllanF,

Let's try this from another direction and let's assume boys and girls are equally likely (as the puzzle intended).

Let's say there are 1,000,000 mothers with 2 children each.

We talk to one of the mothers at random.

She tells us that she has two children and at least one of them is a boy.

250,000 mothers can't say this. They have two girls.

250,000 mothers can say this because they have a first born boy and a younger girl.

250,000 additional mothers can say this because they have a first born girl and a younger boy.

250,000 additional mothers can say this because they have 2 boys.

There are 250,000 mothers out of 750,000 mothers that can claim that they have a boy "and" actually have 2 of them.

It's the "and" that makes this tricky. It's the conditional part.

I hope this helps. It took me a long time to think up another way to look at it.

AllanF said...

OK, I guess. Not sure what's tricky about the "and". But, um, what does that have to do with Tuesday?

Stagflationary Mark said...

By the way, my confidence level is 100% regarding the 1 in 3 chance.

It is about 99% on the Tuesday part, just because it is complex enough for me to have doubts.

I'm still pretty darned sure it is right though and I think I could get to 100% by simply repeating what I just did in my last comment and adding in days of the week.

Stagflationary Mark said...

Here's why the "and" makes it tricky and nonintuitive.

I have a 50% chance of flipping a coin and having it be heads (or close enough for government work).

I do not have a 100% chance of getting heads at least once if I flip it once "and" flip it again though. I only have a 75% chance.

Is it any wonder that two children, at least one boy, and born on a Tuesday would be tricky?

I must have worked on this for hours before I even felt comfortable posting what I did. No joke.

AllanF said...

Nevermind. I read the link which explained the "Tuesday" effect.

Addendum to my first post I said I never understood why they always teach Prob & Stat together. They don't do it this way, but one good reason I can think of is to show that it's all BS, so don't believe anyone's stats until they've given you their list of qualifiers, such as, "don't need to count Tuesday twice, yuk, yuk." Of course they don't intend it that way. They intend it as we're smarter than you so shut-up and do what we experts tell you. "Home prices never go down, yuk, yuk."

Troy said...

It makes no logical sense that adding the Tuesday information would change the probability of the 2nd child's sex.

Therefore the error is with the model used to calculate the probability.

My distribution of outcomes above is completely valid, AFAICT.

The BG, BB, GB distribution is essentially missing another BB. This is similar to the '50% chance of winning the lottery, either I win or I lose' -- GIGO.

Stagflationary Mark said...

For what it is worth, I'm up to 100% on the Tuesday part. I'm absolutely convinced now. Any doubt is gone.

Let's assume that there are 196,000 mothers with 2 children each, that a boy and girl are equally likely, and that so is the day each child is born on.

Now look at my chart within this post. There are 196 slots on the chart. There should be 1,000 mothers per slot (assuming perfect random distribution).

For example, we would expect 1,000 of the mothers to have Child #1 be a boy born on a Sunday and Child #2 be a boy born on a Sunday.

We can rule those mothers out right off. Neither child was born on a Tuesday.

If we continue this process through the entire table we will find that there are 27,000 mothers who could have made such a statement to us.

Of those 27,000 mothers, 13,000 of them have two boys.

Stagflationary Mark said...

Troy,

If you can, find a fault with my last comment. I don't think you will be able to do it.

AllanF said...

Therefore the error is with the model used to calculate the probability.

"Home prices never go down, yuk, yuk."

Ha. We cross-posted the same point. :-)

Troy said...

OK, out of a sample of 196,000 there are 27,000 mothers with at least 1 son born on Tuesday in that population:

1,000 with two boys, both born on Tu
6,000 with two boys, first born on Tu
6,000 with two boys, last born on Tu
7,000 with one older boy, born on Tu
7,000 with one younger boy, born on Tu

While I reserve the right to object to bringing information into the problem that wasn't specified, this is a losing battle I guess.

I did read elsewhere that we get p = 13/27 when we specify 1 day (eg Tu), 12/26 two days, 11/25 3 days, 10/24 4 days, 9/23 5 days, 8/22 6 days 7/21 = 1/3 when we specify 7 days (e.g. "I have a boy").

http://helives.blogspot.com/2010/07/tuesday-child-puzzle.html

Thank you for damaging my brain.

Stagflationary Mark said...

Troy,

Thank you for damaging my brain.

You know what they say. What doesn't kill it makes it stronger. Well, I think they say that. And when I say they, I mean mad scientists in a secret lab, lol.

Truth be known, it damaged my brain too. I even dreamed of it.

JeffJo said...

Let's try applying Stagflationary Mark's argument to Monty Hall: Assume 3,000 games were played: In 1,000 of them the car is behind Door #3, so Monty Hall can't open that door and we eliminate these cases. In another 1,000, the car is 1 behind Door #2, and switching wins. In the last 1,000, the car is behind Door #1 (the door the contestant chose), and switching loses. So switching wins in 1,000 out of 2,000 cases remaining, and loses in 1,000. The chances are even.

Yet we know that result is wrong. And the reason it is wrong is because, in 500 of that last group of 1,000, Monty Hall will open Door #2 instead of Door #3. Since that would not match the circumstances in the original question, we must eliminate those cases as well. Switching wins in 1,000 of the 1,500 cases that do match those circumstances, so the probability is 2/3.

The reason people find this answer non-intuitive is because you have to account for the probability of something you know didn't happen. What they do to get the 1/2 answer is effectively requiring Monty Hall to open Door #3 in every case where it has a goat.

Now look at the "I have two children and at least one is a boy" question that Stagflationary Mark is so confident is answered 1/3. Out of 1,000 mothers, 250 have two girls and would have to say "at least one is a girl," so we eliminate them. Another 250 have two boys, and can't say anything else. The remaining 500 have a boy and a girl, BUT ONLY 250 WOULD MENTION THE BOY THIS WAY. The other 250 would mention the girl - something we know didn't happen, but we still have to account for that possibility. Just like in Monty Hall. The answer is 1/2, unless there is some implied reason why the mother was required to mention a boy.

And interestingly enough, adding "born on Tuesday" does not change the answer when it is solved this way. The change from 1/3 to 13/27 can happen only if the parent is required to mention a boy born on a Tuesday, which is not implied in any way. And that's why is seems absurd. In fact, all of these problems are variations on Bertrand's Box Paradox.

Stagflationary Mark said...

JeffJo,

Interesting take on it. I would offer this.

For parents who would not willingly show favoritism between their two children the answer would be 100%.

If I tell you that I have at least one boy born on a Tuesday then I must have two boys born on a Tuesday. Otherwise, I am telling you about one of my children while ignoring my other one. Most parents would not willingly do that. Right?

That said, this is an assumption and is not techically part of the math story problem puzzle. We cannot assume we know why the parent has worded it the way it was. We can only stick to the facts.

Stagflationary Mark said...

One more thought.

Thanks for posting a comment. :)

JeffJo said...

But you are assuming you know why the parent worded as indicated. In fact, any answer you give makes an assumption about why. You are assuming every parent of a boy and a girl will always tell you about the boy, and never the girl. The better assumption is that such a parent will, for reasons unknown, choose “boy” only half the time.

You also seem to have missed the part where “just the facts” produces the wrong answer for the Monty Hall Problem. It isn’t facts that dicate probability, it is the random process that produces whatever facts you could learn. Try it this way: I tell you that I have exactly two children. What is the probability that they have the same gender if:

1) I tell you nothing else.
2) I tell you that at least one is a boy.
3) I tell you that least one is a girl.
4) I write the gender of at least one on a piece of paper that I show to you.
5) I write the gender of at least one on a piece of paper that I don’t show to you; so you learn nothing.

The answer to #1 is clearly A1=1/2.

If you say the answer to #2 is A2=1/3, based on “just the facts,” then you must also say the answer to #3 is A3=1/3. The effect of the different facts is equivalent in the two questions, as it is in A4=1/3 also. But since “boy” and “girl” are the only two things I could have written in #4, the Law of Total Probability says A5 is A2*P(I wrote “boy”)+A3*P(I wrote “girl”) = (1/3)*[P(I wrote “boy”)+P(I wrote “girl”)] = (1/3)*(1)=1/3. In fact, that would have to be the answer even if I didn’t tell you that I wrote a gender.

This is what is known as Bertrand’s Box Paradox. A5 has to be the same as A1 because, as far as your information is concerned, they are the same question. But A5 also has to be the same, working backwards, as A4, A3, and A2. The resolution of the paradox is that A2 cannot be based on “just the facts;” that is, what I told you. It has to be also be based on what I could tell you, but didn’t.

In fact, we can generalize Bertrand’s Box Paradox: N boxes are divided into M that have two gold coins hidden inside (with M<N/2), M that have two bronze coins (the change here is intentional), and N-2M that have one of each. If you choose a box, and I look in it and tell you it has a bronze coin, what are the chances it has two bronze coins?

The “just the facts” answer, since M boxes have two bronze coins (i.e., the BB boxes) and N-M have at least one (all except the GG boxes), is M/(N-M). In 1889 Joseph Bertrand proved that is wrong by using the paradox I outlined above. The correct answer, since M boxes have two and you can only assume I would tell you about a bronze coin in M+(N-2M)/2=N/2, is 2M/N. He used N=3 and M=1, but it works the same with N=4 and M=1, which makes it the Two Child Problem. The answer is not M/(N-M)=1/3, it is 2M/N=1/2. This is a fact long established in Mathematical History, and acknowledged by Martin Gardner himself when he retracted his original answer of 1/3.

Stagflationary Mark said...

JeffJo,

I think you are making an compelling case and yet I still cannot spot the flaw in my own logic.

Let's say there are 4 parents in a room and they each have 2 children (perfect distribution). When in doubt, let's start naming them.

Tom has a first born boy and a second born boy.
Fred has a first born boy and a second born girl.
Todd has first born girl and a second born boy.
Jim has a fist born girl and second born girl.

And now, one of these people slides me a note under the door. (I don't know which one.)

If it says that they have at least one boy then I definitely know the person isn't Jim. The answer would be 1/3.

If it says that they have at least one girl then I definitely know the person isn't Tom. The answer would be 1/3.

There were 4 parents in the room and yet I was definitely able to rule one out the instant I was slid the note. Correct?

I am sticking to the only facts I know. I am making no assumptions about the motivation of the parent who wrote me the note. So where is the flaw in my logic?

Stagflationary Mark said...

As a side note, it took me more than an hour to form my thoughts on that.

This is definitely a good puzzle.

Stagflationary Mark said...

One more thought.

But since “boy” and “girl” are the only two things I could have written in #4...

You will note that in my example that there is no "I".

Tom could not have written girl.
Jim could not have written boy.

I think that's important.

Stagflationary Mark said...

Yet another thought.

If you were the intermediary between me and the parents then you could have taken all their notes and decided which note to pass me. In that event, you could have basically chosen boy or girl. In that case, I would not be given any extra information and my answer should be 1/2.

I am not assuming you are there to do that though. Right?

Stagflationary Mark said...

Let's try applying Stagflationary Mark's argument to Monty Hall: Assume 3,000 games were played: In 1,000 of them the car is behind Door #3, so Monty Hall can't open that door and we eliminate these cases. In another 1,000, the car is 1 behind Door #2, and switching wins. In the last 1,000, the car is behind Door #1 (the door the contestant chose), and switching loses. So switching wins in 1,000 out of 2,000 cases remaining, and loses in 1,000. The chances are even.

This puzzle is not the Monty Hall problem. We did know Monty Hall's motivations. In sharp contrast, we do not know the parent's motivations.

3000 games played. We always choose door #1.

In 1000 games, the car is behind door #1. Monty Hall will show us door #2 or #3. We will lose when we switch.

In 1000 games, the car is behind door #2. Monty Hall knows this and will therefore show us door #3. We win when we switch.

In 1000 games, the car is behind door #3. Monty Hall knows this and will therefore show us door #2. We win when we switch.

We win 2/3rds of the time by switching. I posted my thoughts on this earlier in these comments. Perhaps I was not clear.

Stagflationary Mark said...

JeffJo,

I think I finally see the flaw!

If it says that they have at least one boy then I definitely know the person isn't Jim. The answer would be 1/3.

I know more than that though.

Tom has a first born boy and a second born boy.
Fred has a first born boy and a second born girl.
Todd has first born girl and a second born boy.

I know for sure that it is one of those three. Tom is more likely than either Fred or Todd though. There are 4 boys out there and he's got half of them!

Tom's talking about his first born.
Tom's talking about his second born.
Fred's talking about his first born.
Todd's talking about his second born.

50% chance!

Correct?

Stagflationary Mark said...

At the very least, I am no longer confident. I need to spend more time thinking through this example and how it would affect the original problem.

Stagflationary Mark said...

If I contine this line of reasoning for the original problem then the 13 out of 27 turns into 14 out of 28 (50%).

The parent with two boys both born on Tuesdays could be talking about *either* child.

Here's the part I'm still wrestling with. If I poll the 196 perfectly distrubuted parents in my table asking if they have at least one boy born on a Tuesday, then 27 will say yes. Of those, only 13 have two boys. That math is fairly clear.

So why does it matter if they tell me or I poll them?

Stagflationary Mark said...

Please forgive my typos. It's been a long night.

JeffJo said...

"...yet I still cannot spot the flaw in my own logic." Conditional probability depends on the event that gives you the information. And "A family HAS at least one boy" is not the same event as "A parent TELLS YOU he has one boy."

Use your 4-parent analogy, but expand it to 100. There are exactly 25 in each category. Ask each parent to write down, on a slip of paper, a set of facts that applies to at least one of their children and that includes gender.

Of the 100 slips of paper, how many do you expect will include "girl?" How many do you expect will include "boy?" These answers can't be different, and they must add up to 100, so they must be 50 each. 75 could have written about boys, but only 50 will.

Now, pick a slip at random from the 100. If I look at it, and tell you that what it says includes "boy," what are the chances that it was written by a parent of two boys? That depends on how many BB families are represented by the 50 slips that say "boy." Well, the 25 BB parents must be represented, and the GG ones can't be, so the other 25 must be one-boy families. That makes the chances 25/50=1/2.

What you are missing, essentially, is that the "facts of the problem" are that "boy" was written, not that the family has one boy. There are 25 families that have one boy, but wrote "girl."

And it can't matter what other facts were written in addition to "boy," such as "born on a Tuesday." Which is exactly what everybody expects at first, for good reason. These additional facts can't affect the gender of a different child.

+++++
You talk about "knowing motivations." There are three kinds of motivations here, but only one is critical:

1) Why would a parent be cryptic and tell you a fact that could apply to one or two children, and why would Monty hall open a door? The first is unknowable, which is why we postulate these slips of paper. The second is reasonable to assume: to add interest to the game. But neither motivation is really important - it did happen, so we can assume there was a reason.

2) What are the possible sets of information? Monty Hall could open either of two doors, a fact you need to recognize to answer the problem as you do. Because you always see what door he opens, you can't count both as you did in your 3,000-game example. A parent could mention either of two genders, but you insist that assuming both are possible is a motivation. It isn't - both are, indeed, possible. And if you ask "what is the probability of two of the same gender," I can do the same thing with the Two Child Problem as you did with the Monty Hall Problem. The parent will say either "boy" or "girl," and they will have two of that gender in 50% of the cases.

3) How does the informant choose between the possibilities. This is the only important motivation. If you aren't told how, the problem is ambiguous to some degree. But probability is the only field where that may not be a problem. If you don't know how, you can assume that all functionally-equivalent options were give equal chances. This isn’t assuming a motivation, it is assuming the lack of one.

+++++

"If I poll the 196 perfectly distributed parents in my table asking if they have at least one boy born on a Tuesday, then 27 will say yes." Correct - but where did the original problem mention such a poll? If you ask these 196 parents to make a statement about gender and day, but you don't specify "boy" or "Tuesday," then only 14 will say "boy born on Tuesday." And 7 will have two boys.

Stagflationary Mark said...

JeffJo,

Monty Hall could open either of two doors, a fact you need to recognize to answer the problem as you do. Because you always see what door he opens, you can't count both as you did in your 3,000-game example.

I stand by my Monty Hall example. The chances are definitely not even. You are wrong. You are missing why his motivations matter.

When playing that game, Monty Hall *never* shows contestants the car before they are asked to switch. Never.

He does not pick a door at random to show us. He picks one intentionally that does not have a car. This is a very important part of the puzzle.

So once again, we play 3 games. I pick door #1 each time.

In the first game, the car is behind door #1. He shows what's behind another door. I switch. I lose.

In the second game, the car is behind door #2. He intentionally doesn't show me door #2. He shows me door #3 instead. I'm asked if I want to switch to door #2. I do. I have picked the car. I win!

In the third game, the car is behind door #3. He intentionally doesn't show me door #3. He shows me door #2 instead. I'm asked if I want to switch to door #3. I do. I have picked the car. I win!

We played 3 times. I won the car twice. I won because I switched every single time he asked me. And the reason I switched, is because I knew that he would never show me the car. He had extra information and I used that to my advantage.

The math doesn't lie here. I played 3 times. I won twice. That's all there is to it.

As for the parents, that's different. I'm still thinking that through. I find your arguments more compelling on this puzzle.

Stagflationary Mark said...

I have turned off moderation so that you don't have to wait for me to approve your comments.

I had it set to moderate comments on posts older than 30 days to cut down on SPAM.

I'll turn it back on when we're done discussing this topic.

On the one hand, I am enjoying the discussion. This is a complicated puzzle and you have made some very compelling arguments.

On the other hand, it is not helping my problems with insomnia. ;)

Stagflationary Mark said...

I should probably offer an additional assumption I'm making about the Monty Hall problem.

I assume that he always shows you a door and asks if you wish to switch.

If he doesn't always do that, then all bets are off.

For example...

If he only asks you to switch if he knows you picked the car, then clearly you should never switch.

JeffJo said...

Monty Hall's "motivations" matter only to establish that it is a problem that can be characterized as a Generalized Bertrand's Box Problem, like the Two Child Problem. He always opens a door, and he always opens one with a goat. So he always has two possible responses, from your point of view; but either one or two from his, depending on where the car is. Similarly, a parent who wants to tell you a fact that applies to at least one child has two choices about gender, from your point of view; but either one or two from his. The only difference is in the number of cases where there is a choice. And to show you how similar they are, I'll do the same thing you did: ignore the actual choices which were reported in the problem, and treat it as a generic answer.

Pick four parents, like Tom, Fred, Todd, and Jim. My original bet is that a random selection has two children of the same gender. You will pick one at random, have him tell me a gender, and then I can switch my bet if I want, based on this new information.

Tom gets picked first. He tells me he has a boy. I win if I keep my bet, and lose if I switch.

Jim gets picked second. He tells me he has a girl. I win if I keep my bet, and lose if I switch.

Fred and Todd get picked third and fourth. Each tells me a gender - it doesn't matter what. I lose if I keep my bet, and win if I switch.

We played four times I won twice, and lost twice, either way. The answer is 1/2.

Now, try your game, but more like the original problem statement. You choose door #2, and Monty hall opened door #3. In the first game, the car is behind door #1 and you lose by switching. In the second, it is behind door #2 and you win by switching. In the third, it is behind door #3 and ... wait a minute; just like Jim couldn't tell you he had a boy (which wouldn't match your original Two Child Problem), Monty Hall can't open Door #3 here. Just like you ignore Jim, you also have to ignore the case where the car is behind door #3. So only two of the possible games count, and your chances of winning are 1/2 whether or not you switch.

What you are missing, about motivation, is that how you apply it depends on whether you treat the problem generically (ignoring Monty Hall's choice, or the gender mentioned) or specifically. In the generic case, it doesn’t matter how the choice is made when there is one. But in the specific case, it does. If you assume the chooser was motivated to make the choice you actually observed - Monty Hall opened door #3, or the parent told you about a boy - you have to ignore one case and you get the incorrect answers of 1/2 and 1/3, respectively, for the two problems. If you assume they were unmotivated, and so choose randomly, you also ignore half of the cases where there was a choice, and get the correct answers of 2/3 and 1/2. More specifically, in six games, you ignore three where Monty Hall opens door #2 (two when the car was behind door #3, and one when it was behind door #1), and end up winning two of the other three when you switch.

Stagflationary Mark said...

JeffJo,

I understand where you are coming from on this puzzle and as I said previously, I find your arguments compelling. What I don't understand is why you said that I was wrong about Monty Hall. You said...

Monty Hall could open either of two doors, a fact you need to recognize to answer the problem as you do. Because you always see what door he opens, you can't count both as you did in your 3,000-game example.

Perhaps we have a miscommunication here? You aren't suggesting that I was wrong about the Monty Hall problem. You are instead suggesting I cannot apply the Monty Hall problem to this puzzle's solution. Is that correct?

JeffJo said...

Out of 300 games where you pick door #1, you will win by switching 200 times. We agree about that. It comes from dividing the 300 games evenly into the three places where the car could be.

Out of 300 games where you pick door #1, Monty Hall can open door #2 200 times, and he can open door #3 200 times. That's 400 games. You win by switching in 100 of the 200 games, or 50%. But something is wrong here - we double counted 100 games!

Out of 300 games where you pick door #1, Monty Hall must open door #2 only 100 times (1/3), and door #3 100 times. That's 200 games. But you will win by switching in every one of the 100 games, or 100%. But again, something is wrong - 100 games were left out.

The number of games you should count is somewhere in between 100 and 200 games, representing those where Monty Hall does open a particular door. And if he opens door #2 100+X times, he opens door #3 200-X times. You counted both options together, essentially averaging the two, but only one can apply.

Of course, lacking any other information, you can only assume he picks randomly, so X=50. That returns the probabilities to the same as the average. But the point is, it is an assumption about how Monty Hall chooses. In the Two Child Problem, you insist the equivalent assumption is assuming a motivation. This is probably because you aren't recognizing choosing to "tell" about a boy can be a choice between two possibilities in the exact same way as Monty Hall's choice is. The two problems really are identical, except in the number of cases. If you assume Minty Hall chooses randomly between door #2 and door #3 when both have a goat, you have to assume Fred chooses randomly between "boy" and "girl."

Stagflationary Mark said...

JeffJo,

I'm glad we cleared up the Monty Hall problem. We were just having a communication problem there and that was distracting me from this problem.

As I said earlier, I found your arguments compelling about this recent puzzle involving parents. I still do. Yesterday, I finally saw the flaw that you've seen all along. I now lean heavily towards your arguments. There does seem to be a difference between me polling the group and someone from the group offering me information.

I'm going to give this some more thought in the coming week and try to come up with an example that makes it far more clear. Or perhaps you can come up with one that really enhances the flaw. I'm pretty sure this will end with me posting a retraction, but I want to be able to prove to others why it is broken. It's still too complicated for that I think.

Here's what I've done with the Monty Hall problem to convince others.

Let's say there are a million doors. Let's say I pick door #187,215.

Monty Hall then shows me 999,998 other doors because he knows the car isn't behind them.

There's one door he won't show me though. That's door #748,321.

The car is either behind door #187,215 that I picked at random or behind door #748,321 that Monty Hall won't show me.

Would I like to switch? Absolutely!

Stagflationary Mark said...

JeffJo,

I have updated this post to say that my "proof" is in doubt and I'm leaning your way now. I have also started a new post so that regular readers see what you've written.

June 6, 2012

A Statistics Puzzle Revisited

AllanF said...

Would I like to switch? Absolutely!

Nice one. :-)

dearieme said...

I had a heated argument with an actuary about Monty Hall. I pointed out that like many probability problems I remember from university days, the problem was under-specified.

In other words my conclusion was consistent with "...lacking any other information, you can only assume... But the point is, it is an assumption about how Monty Hall chooses." Except that I disagree with "...you can only assume ..." because making arbitrary assumptions won't do at all. Insofar as this is a real problem, you need real evidence. In its absence, the problem is insoluble.

It took me some time as a freshman to realise that many of the problems put before me were bogus, being either physically meaningless or incomplete. It didn't help that mathematicians seemed incapable of precision when describing what was intended to be a real world problem while they were capable of great precision in doing pure maths. Very odd.

Stagflationary Mark said...

AllanF,

There is definitely something satisfying about extrapolating to the extremes, unless one is investing in exponential trends moments before they utterly collapse. ;)

Stagflationary Mark said...

dearieme,

It is impossible to prove that Monty Hall actually did know where the car is. He could just be amazingly lucky. It does therefore seem a bit odd to be part of a math problem proof.

Further, I don't even know if I am remembering the show correctly. That means even more assumptions.

Stagflationary Mark said...

JeffJo,

I've had more time to think about it. Your arguments have convinced me. I've also thought up a new replacement chart which may help convince others. I'll be posting it soon (within 24 hours more than likely).

Stagflationary Mark said...

JeffJo,

I've updated the post with the correction.

I'm hoping it meets with your approval, if for no other reason that this puzzle has seriously added to my insomnia over the past week. :)

JeffJo said...

dearieme:

If we would adopt your attitude, nearly all probability puzzles are insoluble. We don't know, for example, that a child has 50% chance to be a boy (and in fact, it is wrong in the real world). We don’t know that a coin or a die is fair, that a card is not missing from a deck, or that an information not lying.

If we lack the information to assign probabilities, it is not only acceptable in puzzles, but required, to apply the Principle of Indifference. It says that any two possibilities that differ only in the label we apply to them must be assigned equal probabilities. And it is required because, if you re-state the problem with the labels changed, you have to use the same answer without changing those probabilities.

But you are wrong about Monty Hall being a real-world problem. It is entirely made up. There was never a game like this played on Let's Make A Deal, and Monty Hall himself said a contestant would never be given a second chance for the same prize. It is called that only because the motivations you are supposed to assume - that he always opens a door to show it did not have the prize - is similar to the style he actually used. He would lead people on, without ever revealing any conclusive information. So yes, we can assume he always opens a door, offers a switch, and never reveals the car. We also must assume, by the Principle of Indifference, that he chooses his door as randomly as possible.

A probability puzzle has to be ambiguous to some degree - it's about what might happen, not what did. This causes people to look for any and every ambiguity they can find to defend an answer they arrived at by intuition. And I'm not criticizing them for it. Many famous Mathematicians - including a Nobel Prize Winner - have done it with the Monty Hall Problem. It takes a lot of integrity to get past the desire to defend, and change one's answer.

Stagflationary Mark said...

JeffJo,

The epiphany moment for me was when I finally saw that there was a difference between being offered information and polling for it.

That made all the difference. I would never have gotten there without your help. Thank you.