Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That thread is seriously wrong.

Here is an excerpt from an email discussion I had recently that touched on http://www.evanmiller.org/how-not-to-run-an-ab-test.html.

Evan Miller has a point, but not as good of one as he thinks.

It is true that multiple peeks mean that eventually any test will find significance at any level you want. However in A/B tests the peeks are not independent. This greatly weakens the effect he is talking about.

Section 7 of my presentation, starting at http://elem.com/~btilly/effective-ab-testing/#slide59, is about the question of how long it takes for a test to complete. For that I ran numerical experiments with constant peeking, literally every time you add one to A and one to B you peek again. You can see graphs of how many errors there were, and how long it takes to get an answer.

Here are key points:

- Be suspicious of tests that end quickly. Run them a bit longer on general principal. (In general I'd call 500 people a very small test.)

- Nobody can predict how long a test will take. Even if you know the actual improvement, you still can't predict time to within an order of magnitude.

- If a test has been running for a long time you know the true difference is small, so there is no harm in accepting whatever answer it gives.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: