I was recently trying to explain Bayesian logic to a friend, and came up with th...

klodolph · on Feb 12, 2014

Right. And the frequentist version is also useful.

If you test one person and the the test is positive, then that person is the president (p=0.00001).

If you test a thousand people, and the test is positive for one of them, then that person is the president (p=0.01).

So you don't really need Bayesian logic to reason that you should test fewer people if you want a more significant result. (Note I'm not saying you don't need Bayes' Theorem, which everyone uses.)

Edit: I think most people on HN get their knowledge of frequentist and bayesian statistics from XKCD #1132. That's sad.

jey · on Feb 12, 2014

> So you don't really need Bayesian logic to reason that you should test fewer people if you want a more significant result.

That's misleading by the use of the word "significant" which apparently means something in frequentism than it does in normal speech. I certainly wouldn't use "significant" in that way as a non-frequentist, I would instead rephrase what you said as:

> So you don't really need Bayesian logic to reason that you should test fewer people if you want more confirmation bias in your result.

And that's a statement I can definitely get behind!

_delirium · on Feb 12, 2014

It can definitely be used misleadingly, but it's not too out of line with normal scientific usage of the term. The "significant" in "significant figures" is the same: if a number has "11 significant figures", it doesn't mean the 11th digit is significant in the sense of being important or having a big impact, just that the 11th digit is within the measurement precision (as propagated through any subsequent calculations).

klodolph · on Feb 12, 2014

That's a good point. Conversely, when you run a test and get a result which is "not statistically significant", that doesn't mean that the measured effect is not significant. For example, if I test a drug and say "it does not cause cancer (p=0.17)", that's wrong. Yes, the correlation with cancer is not statistically significant, but it is significant in the normal sense of the word.

Houshalter · on Feb 12, 2014

The point is not the number of people that are tested, but the prior probability that the person tested is the president. The person wandering around the white house already has a decent probability of being president. Some guy found in the middle of the country wearing every day clothes, not so much.

ronaldx · on Feb 13, 2014

>If you test one person and the the test is positive, then that person is the president (p=0.00001). >If you test a thousand people, and the test is positive for one of them, then that person is the president (p=0.01).

I am frequently a frequentist, but this particular test doesn't make sense to me. Without any prior belief, the person you tested is still no more likely to be the president than any of the other 3199 who would test positive.

dougabug · on Feb 18, 2014

If you test one person at random, the odds of that person being the President equals the probability of a true positive divided by the probability that the test provides a Positive result (i.e. the sum of the probabilities for True and False Positives). The probability of a true positive is the probability that test subject is the President times the conditional probability that the subject tests positive given that they are the President. If the subject is chosen at random from the population of the US, then the a priori odds of this person being the President is about 1/314M times a conditional probability of testing positive of 1. The probability of a false negative is the a priori odds of a random person being the President, times the conditional probability of a non-President testing positive. By hypothesis, this is (1-1/314M)0.00001. So the odds of a subject chosen at random from the overall US population must be approximately (1/314M)/((1/314M)+(1-1/314M)0.00001), which simplifies to 1/(1+(314M-1)*0.00001, or about 1/3141.

The ~99.97% chance that the test result is a false positive for a subject chosen at random is a consequence of the prior probability of that subject being the President in the first place being over 3000 times lower than the probability of a positive test result. In other words, though the sensitivity of the test is perfect, the specificity of the test is insufficient to isolate a condition as rare as the Presidency. It has nothing to do with testing one person being inherently more decisive than testing more than one.

Only narrowing the pool of candidates in a way which will increase the prior probability that the subject is the President will improve the situation, not random decimation to a single test subject. The GP gave several correct examples such as limiting the test to those who were much more likely to be the President in the first place (eg. a person randomly found sitting in the President's chair at the Oval Office might have a prior probability of being the President of greater than 0.01, millions of times greater than that of a randomly selected person in the US).

dougabug · on Feb 13, 2014

The parent poster is wrong, for the reason you surmise.

nkurz · on Feb 13, 2014

It seems awkward that p-value does not account for the setting in which the test is performed.

That is, if I test 1000 people at a high school football game in Peoria, Illinois, and one of them comes out positive (p=0.01), there is a very different likelihood that I have found the true president than if I test 1000 people at a official dinner in the White House and find a match (also p=0.01). In fact, I think I'd be willing to wager more on the chance that the individual tested in the White House is actually the president than the random football fan in Peoria, even if only a single person in Peoria was tested (p=0.00001).

Of course, this is only an issue if p-value is being used to express relative confidence in a conclusion, which it shouldn't (?) be. Still, how does a frequentist account for a choice of venue like this?

klodolph · on Feb 13, 2014

Frequentist approach would be to design an experiment, determine rules for interpreting data, and then consider the probability that the experiment lead to the correct conclusion.

In this case, the experiment is really "who is president?" and not "is person X president?" The "who is president?" question does not really have a null hypothesis, so it makes no sense to talk about p-values. Instead, we can talk about two different results: identified the president correctly, and identified the president incorrectly.

We then have to design the experiment in such a way that probabilities can be calculated. But this is not possible if you say the repeated experiment is testing a bunch of people to see if one is president--because the probability depends on who actually is president, and that's not known.

So you consider the experiment to be "the president goes about their business and ends up in some random place, we get amnesia, and then run the test." We can simulate this experiment because we can come up with some probability distribution for where the president is. We can't come up with a probability for who the president is, because that's unknown and either 0% or 100%--only Bayesians would let you do that.

Then you can choose the order of people to test to maximize the probability that the president will be identified correctly.

And then you say things like,

* The test had a 92% chance of succeeding.

* We only had to test 14 people, and in such cases, the test had a 98% chance of succeeding.

vbs_redlof · on Feb 13, 2014

P-values are not comparable across different null hypotheses, they are a property of a particular sample given a null hypothesis. You can compare p-values only by making very strong assumptions that the samples come from the same data-generating process. In almost all cases this is unlikely.

A frequentist usually builds a more descriptive model by adding more variables (accounting for obvious factors that contribute to differences in observed frequencies) and increasing sample size to increase statistical power (reduce false negatives). This is not applicable in this case because there is only one president and the tests have low power. The test is bogus because you can't 'sample' and determine to probability of X==president effectively.

comicjk · on Feb 13, 2014

I think you're mistaken about what frequentist p-values mean. The definition you're using is the one we would like to have - the probability that X is president given the data. But what a p-value actually gives you is the chance of seeing the given data given that X is president. And if X is the president, we have p=0, since there are no false negatives. So a p-value is useless in this example, although better frequentist measures can handle it.

klodolph · on Feb 13, 2014

> seeing the given data given that X is president

You've got it backwards. The p-value is the chance that we see the given data given that X is not president.

vbs_redlof · on Feb 13, 2014

Less wrong, but still off. The p-value is the probability of committing a type-1 error (false positive: we declare X to be president when he really isn't) given a sample and a null hypothesis. P-values are not the 'chances' of seeing the given data, it is a test-statistic that describe how reliable your probability estimates are for a given sample and null.

onetwofiveten · on Feb 13, 2014

Actually I think we're still a little off. The p-value is the probability of seeing a test statistic at least as extreme as the observed sample statistic, under the assumption that the null hypothesis is true. This can be restated in terms of tests and errors. The p-value is the probability of committing a type-1 error for a statistical test, where the threshold of that test is chosen to be equal to the sample statistic. In more intuitive terms, it's the probability of a type-1 error under the null hypothesis for the most extreme test that the sample data is able to pass.

In your explanation, where you said "given a sample" you should say "given a sample size", to make it clear that the type-1 error probability is not conditional on the sample data. If you condition on the observed data, then the probability of passing a statistical test is going to be 1 or 0 depending on whether or not the data passes the test. It is the test itself, not the error probability, that depends on the sample data. Which is also something you should have specified.

vbs_redlof · on Feb 13, 2014

This is the clearest answer I've read so far. The p-value is the minimum probability of ending up with a test statistic that lies in the rejection region (for a given null hypothesis and critical value).

Paradigma11 · on Feb 13, 2014

No, that is alpha: http://statistics.about.com/od/Inferential-Statistics/a/What...

vbs_redlof · on Feb 14, 2014

Please re-read onetwofiveten's explanation. All I've done is paraphrase it. Alpha is just an arbitrary cutoff p-value corresponding to a threshold t-value for a given null hypothesis.

pcrh · on Feb 12, 2014

Furthermore, practicing biomedical researchers rarely, if ever, take a single study as "proof" of a hypothesis, no matter what the P-value is.

Replication and plausibility in the context of other studies are taken into account, however these additional parameters are difficult to reduce to numbers.

A recent blog post (link below) describes the actual practice of (good) biomedical research fairly well.

http://www.tcpinnovations.com/drugbaron/the-all-new-good-old...

GFK_of_xmaspast · on Feb 13, 2014

oh that's not fair, lots of people here read that lesswrong site (http://lesswrong.com/lw/qa/the_dilemma_science_or_bayes/)

pcrh · on Feb 12, 2014

A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is.

Say you are looking for genes that might influence the rate of occurrence a particular disease. There might be genes that influence this rate, or there might not, it could be entirely environmental, or it could be entirely genetic, or something in between. In any case, you go genome-wide studies, and find that certain gene variants occur more often in your diseased population than in your control population. You apply frequentist statistics, using some corrections for multiple hypothesis testing, and get some kind of "significant" result. This gets published in Nature (you lucky thing!).

Are your conclusions correct? Do the genes you identified really modify the course of the disease you studied? Bayesian statistics won't give you the answer.

The only way to get the answer is to do experimental science, i.e. deliberately modify the gene(s) in question and show that your modifications change the occurrence or course of the disease.

Unfortunately, that is not always feasible, for either technical or ethical reasons, so we have to fall back on the poor cousin of experimental science that is population statistics.

noelwelsh · on Feb 12, 2014

Both frequentist and Bayesian decision making ultimately rest on arbitrary prior assumptions. In Bayesian statistics it is explicit. In frequentist stats it is less acknowledged, but still there. Where do your cutoff values for alpha and beta (p-value and power) come from? Same place the Bayesians get their priors.

pcrh · on Feb 12, 2014

My point was that in most cases of medical research statistics alone will not get you the answer to the question of cause-and-effect.

noelwelsh · on Feb 12, 2014

Ok, but your first sentence said something very different.

pcrh · on Feb 13, 2014

I don't see how?

If you are hypothesizing that genetics influences the incidence of a disease, without any prior knowledge as to whether there is any influence of genetics on a disease, how can you have a prior probability that there is a genetic influence on the disease in question?

upquark · on Feb 13, 2014

- Most trivially, your prior distribution can assign equal probabilities to different outcomes, if you have no reason to do otherwise. Beta(1, 1) for modeling a coin's bias, for example, if you have no prior information about its bias.

- There are more advanced tools in Bayesian analysis such as Jeffreys prior (known as uninformative priors, look it up).

- As was mentioned in other responses the same "big problems" exist in every other statistical and mathematical modeling approach, namely that you have to make assumptions and your results are going to be crap if your assumptions are crap.

- Generally, Bayesian stats got a late start due to high computational resource costs, not some theoretical limitations. The issue with priors that gets repeated by philosophers and some statisticians does not stop the huge, monumental progress Bayesian statistics has had in a ton of applied fields, from computer science / machine learning all the way to economics and political science.

Edit: language

noelwelsh · on Feb 13, 2014

You said "A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is." This is a common complaint about Bayesian stats -- the choice of priors seems arbitrary. That's what I was responding to.

dllthomas · on Feb 13, 2014

You could look at the portion of similar diseases that have had genetic influences.

dllthomas · on Feb 13, 2014

My loose understanding is that you can get at causality without intervention, although I'm not very far into Pearl's Causality yet...

judofyr · on Feb 12, 2014

> A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is.

Reminds me of this quote: (These values are known as priors, which is ironic because Bayesians inevitably pull them out of their posteriors.)

http://plover.net/~bonds/cultofbayes.html

cossatot · on Feb 12, 2014

If you learn about Bayes from crazy people, what is the likelihood that you will think Bayesian reasoning is always used in crazy ways?

Last week I did some Bayesian (!!) work on finding a pdf for the direction of a certain quantity. My priors were... [0,2*pi).

pcrh · on Feb 12, 2014

Thanks for that, it was entertaining to read.

Gravityloss · on Feb 12, 2014

I'm not sure I follow. Say you have

  P(gene|healthy) = 0.010 and P(no_gene|healthy) = 0.99
  P(gene|sick) = 0.011 and P(no_gene|sick) = 0.989

Then you do large population studies (testing for sick vs healthy is cheaper than testing for genes). You get

  P(healthy) = 0.90 and P(sick) = 0.10.

From the small sample you get the average of the gene's expression frequency:

  P(gene) = 0.0101 and P(no_gene) = 0.9899.

You are interested in what is the probability of a person being sick when they have the gene

  P(sick|gene) = 
  P(sick,gene) / P(gene) = 
  P(gene|sick) * P(sick) / P(gene) =
  0.011 * 0.10 / 0.0101 = 
  0.109

And conversely, odds of being sick without the gene:

  P(sick|no_gene) =
  P(no_gene|sick) * P(sick) / P(no_gene) =
  0.989 * 0.10 / 0.9899 =
  0.0979

So, based on this data, having the gene increases your probability of having the disease some 11% (0.109/0.0979) or 1.1 percentage points (0,109-0,0979).

Or you might want to calculate some other figures from these. If you just want the ratio, then P(sick) does not have a bearing on it. You could perform some sensitivity analysis etc...

(I am not a statistician by far)

pcrh · on Feb 13, 2014

I am not a statistician either, apart from a few courses, but your argument starts with:

P(gene|healthy) = 0.010 and P(no_gene|healthy) = 0.99

P(gene|sick) = 0.011 and P(no_gene|sick) = 0.989

My point is that in many cases there is no justification for placing any number whatsoever on a prior hypothesis. You can't simply say that the prior probability of a particular gene being involved in a disease is 1%, or 0.00001% or 10%, or whatever.

Edit: I'm not saying that Bayesian statistics is without uses, it is very useful in epidemiology, for example. However it is not appropriate for determining molecular and genetic mechanisms.

Gravityloss · on Feb 14, 2014

I think you misunderstand.

P(gene|healthy), P(gene) and P(healthy) are something we can often measure.

P(healthy|gene) is something we can calculate with the Bayes formula from the above values.

More generic version:

We want to know P(model|data). That's what we always want. What is the probability of some model, based on this data that we measured.

But we only have P(model), P(data) and P(data|model). So we use the Bayes formula to get the answer to the interesting conditional probability. It needs all three inputs. We must estimate if we don't have some.

Just presenting P(data|model) and not constraining P(model) or P(data) in any way means that we can't say anything about P(model|data).

It's like if we start with x + a + b = 5.

If we don't know or estimate a and b, it's impossible to say anything about x. Bayes' formula is like saying then, solve it like this: x = 5 - a - b.

So, if you say, there is not justification placing any constraint on a or b - then it's just saying we clearly do not have enough data to say anything about x either. There is no way out of it.

pdonis · on Feb 13, 2014

it not always possible to have any sense of what the prior probability is

And in those cases, frequentist statistics is no better off, because you have no sense of what the sample space is.

However, there are cases where there is no well-defined sample space, but you can still assign reasonable priors; so Bayesian statistics covers a range of cases that is a superset of the range of cases that frequentist statistics covers. E. T. Jaynes goes into this in some detail in his book Probability Theory: The Logic of Science.

skybrian · on Feb 13, 2014

As I understand it, using Bayesian statistics, you can report the posterior probability given various reasonable priors. It seems useful to know how sensitive the conclusion is to the choice of prior. (With a strong result, it should make relatively little difference.)

an_opabinia · on Feb 13, 2014

>A scientist comes up with a test to determine if someone is the President.

It's a poor analogy, because it's not clear to people why such a test is "natural." It's not clear how your specific test could be broken in the peculiar way that it would have 99.999% chance to confirm that someone who isn't the president is indeed not the president.

And people would get caught up with what you mean by "If they are" and "If they are not," since it's not clear how you would know the error of your test without a real president around to identify.

False positives or false negatives are not at all intuitive to people who have never done experimental design. Most people would get stuck at percentages anyhow.

reyan · on Feb 12, 2014

This is the correct xkcd for this analogy http://xkcd.com/882/, not #1132.

joe_the_user · on Feb 12, 2014

OK,

So just to get a handle on this stuff, the two problems you have if you do a test and only look at a low p are:

1) Unusual things do occur. If a million people do the same test, it's obvious they'll come up with some wrong values. It's less obvious that a similar number of wrong values will come if a million people do a million different tests with a similar small chance of bogus results.

2) The pattern of results may indeed be unusual but not necessarily in the fashion you think it is. There may be a non-random pattern possessed by the data but may not because of your particular hypothesis but a "this is not random" result may seem to say your hypothesis does explain the data.

Does that characterize the problem?

tlarkworthy · on Feb 12, 2014

yeah its similar to a classic rare disease screening example in text books:

http://www.math.hmc.edu/funfacts/ffiles/30002.6.shtml (sorry its a bit informal, it was first google hit, but I have seen that in real textbooks for sure)

Although its a bit confusing associating the prior with geometric proximity to the oval chair.

tedsanders · on Feb 12, 2014

xkcd's version sounds similar, but simpler: http://xkcd.com/1132/

trainfromkansas · on Feb 12, 2014

This has probably been addressed millions of times, but just to give a response to why this is a misleading comic:

If you were equally mocking of both Bayesian and Frequentist, the Bayesian would arguably come up with basically the same conclusion. In this scenario where the Frequentist ignores previous data based on human history and our understanding of the lifecycle of the sun based on physics, the Bayesian should do the same if we are fair. Then his prior distribution would likely be 50% belief that the sun would explode (Why favor either outcome? Thus the prior distribution is split down the middle between both outcomes.). With the new information given from the detector, his posterior would be 35/36 probability that the sun has exploded.

Maybe the Bayesian approach provides a more explicit way of incorporating previous information, and without that, it's cause for some misuse with a Frequentist approach, but that doesn't mean Frequentists need be fundamentally ignorant in their approach.

The comic's good for a laugh. Just don't take it too seriously as a criticism.

klodolph · on Feb 12, 2014

That's the most famous comparison between frequentist and bayesian statistics, and it's a shame because the frequentist interpretation depicted in the comic is really a straw man argument. See: http://stats.stackexchange.com/questions/43339/whats-wrong-w...

haberman · on Feb 12, 2014

I'm not sure it's a straw man, there seems to be plenty of examples of people publishing results based on p < 0.05 findings even when it wasn't justified... just like the person mentioned in the Nature article almost did.

jamesaguilar · on Feb 12, 2014

In a world where all statisticians were Bayesian, you'd still see results like this published. It would simply read differently: "We saw something that seems pretty unlikely, care to confirm?"

haberman · on Feb 12, 2014

Perhaps, or perhaps the scientist would try to replicate the results themselves first (as the person in the article did).

But even if they published them with the verbiage you mentioned, that would be a better world because reporters would be less likely to turn it into a story with a headline like "Scientists say ...". The prevalence of articles like this erode trust in science, and fill the public's head with misconceptions.