A big problem with Bayesian statistics (as I see it) it that it not always possi...

noelwelsh · on Feb 12, 2014

Both frequentist and Bayesian decision making ultimately rest on arbitrary prior assumptions. In Bayesian statistics it is explicit. In frequentist stats it is less acknowledged, but still there. Where do your cutoff values for alpha and beta (p-value and power) come from? Same place the Bayesians get their priors.

pcrh · on Feb 12, 2014

My point was that in most cases of medical research statistics alone will not get you the answer to the question of cause-and-effect.

noelwelsh · on Feb 12, 2014

Ok, but your first sentence said something very different.

pcrh · on Feb 13, 2014

I don't see how?

If you are hypothesizing that genetics influences the incidence of a disease, without any prior knowledge as to whether there is any influence of genetics on a disease, how can you have a prior probability that there is a genetic influence on the disease in question?

upquark · on Feb 13, 2014

- Most trivially, your prior distribution can assign equal probabilities to different outcomes, if you have no reason to do otherwise. Beta(1, 1) for modeling a coin's bias, for example, if you have no prior information about its bias.

- There are more advanced tools in Bayesian analysis such as Jeffreys prior (known as uninformative priors, look it up).

- As was mentioned in other responses the same "big problems" exist in every other statistical and mathematical modeling approach, namely that you have to make assumptions and your results are going to be crap if your assumptions are crap.

- Generally, Bayesian stats got a late start due to high computational resource costs, not some theoretical limitations. The issue with priors that gets repeated by philosophers and some statisticians does not stop the huge, monumental progress Bayesian statistics has had in a ton of applied fields, from computer science / machine learning all the way to economics and political science.

Edit: language

noelwelsh · on Feb 13, 2014

You said "A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is." This is a common complaint about Bayesian stats -- the choice of priors seems arbitrary. That's what I was responding to.

dllthomas · on Feb 13, 2014

You could look at the portion of similar diseases that have had genetic influences.

dllthomas · on Feb 13, 2014

My loose understanding is that you can get at causality without intervention, although I'm not very far into Pearl's Causality yet...

judofyr · on Feb 12, 2014

> A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is.

Reminds me of this quote: (These values are known as priors, which is ironic because Bayesians inevitably pull them out of their posteriors.)

http://plover.net/~bonds/cultofbayes.html

cossatot · on Feb 12, 2014

If you learn about Bayes from crazy people, what is the likelihood that you will think Bayesian reasoning is always used in crazy ways?

Last week I did some Bayesian (!!) work on finding a pdf for the direction of a certain quantity. My priors were... [0,2*pi).

pcrh · on Feb 12, 2014

Thanks for that, it was entertaining to read.

Gravityloss · on Feb 12, 2014

I'm not sure I follow. Say you have

  P(gene|healthy) = 0.010 and P(no_gene|healthy) = 0.99
  P(gene|sick) = 0.011 and P(no_gene|sick) = 0.989

Then you do large population studies (testing for sick vs healthy is cheaper than testing for genes). You get

  P(healthy) = 0.90 and P(sick) = 0.10.

From the small sample you get the average of the gene's expression frequency:

  P(gene) = 0.0101 and P(no_gene) = 0.9899.

You are interested in what is the probability of a person being sick when they have the gene

  P(sick|gene) = 
  P(sick,gene) / P(gene) = 
  P(gene|sick) * P(sick) / P(gene) =
  0.011 * 0.10 / 0.0101 = 
  0.109

And conversely, odds of being sick without the gene:

  P(sick|no_gene) =
  P(no_gene|sick) * P(sick) / P(no_gene) =
  0.989 * 0.10 / 0.9899 =
  0.0979

So, based on this data, having the gene increases your probability of having the disease some 11% (0.109/0.0979) or 1.1 percentage points (0,109-0,0979).

Or you might want to calculate some other figures from these. If you just want the ratio, then P(sick) does not have a bearing on it. You could perform some sensitivity analysis etc...

(I am not a statistician by far)

pcrh · on Feb 13, 2014

I am not a statistician either, apart from a few courses, but your argument starts with:

P(gene|healthy) = 0.010 and P(no_gene|healthy) = 0.99

P(gene|sick) = 0.011 and P(no_gene|sick) = 0.989

My point is that in many cases there is no justification for placing any number whatsoever on a prior hypothesis. You can't simply say that the prior probability of a particular gene being involved in a disease is 1%, or 0.00001% or 10%, or whatever.

Edit: I'm not saying that Bayesian statistics is without uses, it is very useful in epidemiology, for example. However it is not appropriate for determining molecular and genetic mechanisms.

Gravityloss · on Feb 14, 2014

I think you misunderstand.

P(gene|healthy), P(gene) and P(healthy) are something we can often measure.

P(healthy|gene) is something we can calculate with the Bayes formula from the above values.

More generic version:

We want to know P(model|data). That's what we always want. What is the probability of some model, based on this data that we measured.

But we only have P(model), P(data) and P(data|model). So we use the Bayes formula to get the answer to the interesting conditional probability. It needs all three inputs. We must estimate if we don't have some.

Just presenting P(data|model) and not constraining P(model) or P(data) in any way means that we can't say anything about P(model|data).

It's like if we start with x + a + b = 5.

If we don't know or estimate a and b, it's impossible to say anything about x. Bayes' formula is like saying then, solve it like this: x = 5 - a - b.

So, if you say, there is not justification placing any constraint on a or b - then it's just saying we clearly do not have enough data to say anything about x either. There is no way out of it.

pdonis · on Feb 13, 2014

it not always possible to have any sense of what the prior probability is

And in those cases, frequentist statistics is no better off, because you have no sense of what the sample space is.

However, there are cases where there is no well-defined sample space, but you can still assign reasonable priors; so Bayesian statistics covers a range of cases that is a superset of the range of cases that frequentist statistics covers. E. T. Jaynes goes into this in some detail in his book Probability Theory: The Logic of Science.

skybrian · on Feb 13, 2014

As I understand it, using Bayesian statistics, you can report the posterior probability given various reasonable priors. It seems useful to know how sensitive the conclusion is to the choice of prior. (With a strong result, it should make relatively little difference.)