In many of these cases, the "A/B test" may have been accidental.
Running a software rollout is frequently done slowly, datacenter by datacenter, and during that time some people might see one version and others might see another.
From the users perspective it looks the same as an A/B test, but the difference is nobody was looking at the results...
Almost any user-noticable change (that is not a bugfix) is run as a AB-test for a few weeks in my team to verify that it does have the intended impact. I'd be surprised if there aren't teams at google, amazon, netflix and similar organizations that work similarly.
I don’t think that in every case it is necessary down to user rejections. I can’t believe that in the netflix case users actually prefer to have to login with two post backs, one for the login, one for the password.
It wouldn't surprise me if people with active netflix accounts tried to login into that page which is why they changed how it works, also like with any A/B testing it's not always clear what user behaviour they were seeking to change or reinforce, and it's usually more than one.
For example the buy now vs add to cart only on the Amazon one might have been looking at more than just how many products are sold, they might were also been trying to see if they can say reduce impulse buys that result in returns without lowering purchases that do not, in fact the reason they've kept the buy now might be because it actually reduced the return rate as people interacted less with the site and didn't buy additional items that they returned later.
That's actually not a login but a signup form. More people will probably enter anything at all with only one field showing. And since they've already interacted with the site, they might feel inclined to complete the process even after they find out it's multi-step. Some will of course bounce because of it, but it should be net positive overall.
Where is the data on results? I looked at an AirBnB “experiment”, they moved an action button above the fold (duh). But no details on how much more effective the move was.
I am all for A/B testing, but the devil is in the details. You can get more users tapping the purchase by moving the purchase button where users are more prone to accidentally tap the purchase button. That doesn’t mean you get more purchases, or that the move was a positive change.
I don't think they have results. It looks like they are regularly scraping sites and looking for diffs across users. So you can say what the test was, how long it ran for roughly, and whether they kept or rejected it, but you have no way of knowing what the quantitative results are (aside from whatever inferences you can make from estimating the tested _n_ and then assuming they are using an efficient testing procedure with optional stopping / bandits plus the final choice to infer upper/lower bounds on the effect size).
Running a software rollout is frequently done slowly, datacenter by datacenter, and during that time some people might see one version and others might see another.
From the users perspective it looks the same as an A/B test, but the difference is nobody was looking at the results...