A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is.
Say you are looking for genes that might influence the rate of occurrence a particular disease. There might be genes that influence this rate, or there might not, it could be entirely environmental, or it could be entirely genetic, or something in between. In any case, you go genome-wide studies, and find that certain gene variants occur more often in your diseased population than in your control population. You apply frequentist statistics, using some corrections for multiple hypothesis testing, and get some kind of "significant" result. This gets published in Nature (you lucky thing!).
Are your conclusions correct? Do the genes you identified really modify the course of the disease you studied? Bayesian statistics won't give you the answer.
The only way to get the answer is to do experimental science, i.e. deliberately modify the gene(s) in question and show that your modifications change the occurrence or course of the disease.
Unfortunately, that is not always feasible, for either technical or ethical reasons, so we have to fall back on the poor cousin of experimental science that is population statistics.
Both frequentist and Bayesian decision making ultimately rest on arbitrary prior assumptions. In Bayesian statistics it is explicit. In frequentist stats it is less acknowledged, but still there. Where do your cutoff values for alpha and beta (p-value and power) come from? Same place the Bayesians get their priors.
If you are hypothesizing that genetics influences the incidence of a disease, without any prior knowledge as to whether there is any influence of genetics on a disease, how can you have a prior probability that there is a genetic influence on the disease in question?
- Most trivially, your prior distribution can assign equal probabilities to different outcomes, if you have no reason to do otherwise. Beta(1, 1) for modeling a coin's bias, for example, if you have no prior information about its bias.
- There are more advanced tools in Bayesian analysis such as Jeffreys prior (known as uninformative priors, look it up).
- As was mentioned in other responses the same "big problems" exist in every other statistical and mathematical modeling approach, namely that you have to make assumptions and your results are going to be crap if your assumptions are crap.
- Generally, Bayesian stats got a late start due to high computational resource costs, not some theoretical limitations. The issue with priors that gets repeated by philosophers and some statisticians does not stop the huge, monumental progress Bayesian statistics has had in a ton of applied fields, from computer science / machine learning all the way to economics and political science.
You said "A big problem with Bayesian statistics (as I see it) it that it not always possible to have any sense of what the prior probability is." This is a common complaint about Bayesian stats -- the choice of priors seems arbitrary. That's what I was responding to.
So, based on this data, having the gene increases your probability of having the disease some 11% (0.109/0.0979) or 1.1 percentage points (0,109-0,0979).
Or you might want to calculate some other figures from these. If you just want the ratio, then P(sick) does not have a bearing on it. You could perform some sensitivity analysis etc...
I am not a statistician either, apart from a few courses, but your argument starts with:
P(gene|healthy) = 0.010 and P(no_gene|healthy) = 0.99
P(gene|sick) = 0.011 and P(no_gene|sick) = 0.989
My point is that in many cases there is no justification for placing any number whatsoever on a prior hypothesis. You can't simply say that the prior probability of a particular gene being involved in a disease is 1%, or 0.00001% or 10%, or whatever.
Edit: I'm not saying that Bayesian statistics is without uses, it is very useful in epidemiology, for example. However it is not appropriate for determining molecular and genetic mechanisms.
P(gene|healthy), P(gene) and P(healthy) are something we can often measure.
P(healthy|gene) is something we can calculate with the Bayes formula from the above values.
More generic version:
We want to know P(model|data). That's what we always want. What is the probability of some model, based on this data that we measured.
But we only have P(model), P(data) and P(data|model). So we use the Bayes formula to get the answer to the interesting conditional probability. It needs all three inputs. We must estimate if we don't have some.
Just presenting P(data|model) and not constraining P(model) or P(data) in any way means that we can't say anything about P(model|data).
It's like if we start with x + a + b = 5.
If we don't know or estimate a and b, it's impossible to say anything about x. Bayes' formula is like saying then, solve it like this: x = 5 - a - b.
So, if you say, there is not justification placing any constraint on a or b - then it's just saying we clearly do not have enough data to say anything about x either. There is no way out of it.
it not always possible to have any sense of what the prior probability is
And in those cases, frequentist statistics is no better off, because you have no sense of what the sample space is.
However, there are cases where there is no well-defined sample space, but you can still assign reasonable priors; so Bayesian statistics covers a range of cases that is a superset of the range of cases that frequentist statistics covers. E. T. Jaynes goes into this in some detail in his book Probability Theory: The Logic of Science.
As I understand it, using Bayesian statistics, you can report the posterior probability given various reasonable priors. It seems useful to know how sensitive the conclusion is to the choice of prior. (With a strong result, it should make relatively little difference.)
Say you are looking for genes that might influence the rate of occurrence a particular disease. There might be genes that influence this rate, or there might not, it could be entirely environmental, or it could be entirely genetic, or something in between. In any case, you go genome-wide studies, and find that certain gene variants occur more often in your diseased population than in your control population. You apply frequentist statistics, using some corrections for multiple hypothesis testing, and get some kind of "significant" result. This gets published in Nature (you lucky thing!).
Are your conclusions correct? Do the genes you identified really modify the course of the disease you studied? Bayesian statistics won't give you the answer.
The only way to get the answer is to do experimental science, i.e. deliberately modify the gene(s) in question and show that your modifications change the occurrence or course of the disease.
Unfortunately, that is not always feasible, for either technical or ethical reasons, so we have to fall back on the poor cousin of experimental science that is population statistics.