Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Amazon’s Mechanical Turk Has Reinvented Research (jstor.org)
178 points by DmenshunlAnlsis on May 18, 2018 | hide | past | favorite | 67 comments


Mechanical Turk is great for "open", public research. We used to use them a lot for machine learning tasks (data cleanup, model comparisons, label annotations), along with similar services like CrowdFlower / Figure Eight. We saw two primarily issues when applied to "non-open" (commercial) projects:

- business-related data too sensitive to share with strangers (contractual obligations, too much risk)

- some tasks required non-trivial subject matter expertise and context to annotate properly (quality control issues)

For this reason, we gradually moved to an in-house team of long-term annotators. It's not much more expensive (moms on maternity leave, students…), but infinitely more flexible and safer for our purposes. YMMV.


CrowdFlower / Figure Eight works with a lot of annotators who have signed NDAs and work in secure locations where they can't duplicate materials etc.


Alex from Scale (www.scaleapi.com) here! We've taken an extremely quality-first approach and build out large workforces for datasets with high quality requirements and complexity. For example, we do a bunch of LIDAR / 3D labeling (https://www.scaleapi.com/sensor-fusion-annotation) which is very complex and labor intensive, and provide extremely high quality that would not be possible otherwise.


Deepak from Playment(https://playment.io/). We help companies label data. We have dedicated managers for projects who take care of training annotators and ensuring quality output. We have a large number of annotators who are pre-trained on different annotations, and we train them for different specific business use-cases. Reach out to us if you are looking for fully managed quality annotated data.

Links:

https://playment.io/image-annotation/

https://playment.io/playment-vs-mechanical-turk/

https://playment.io/playment-vs-crowdflower/


I work directly with several top-tier research universities use us for public opinion polling mostly, as we offer a representative sample at a great price point.

Lucid is the creator of the world's largest programmatic survey sample marketplace. We have API integrations with hundreds of US and international panels that allow us to target very specific audiences across dozens of sources, ensuring excellent feasibility.

Happy to discuss any needs the YC community has.

- Cullen Wheatley

Articles for more info: https://techcrunch.com/2017/04/10/lucid-60-million/

https://greenbookblog.org/2017/05/16/ceo-series-how-lucid-ce...


Seconded. Quality control is a nightmare with Turk, often requiring stimuli to be labeled multiple times and have a variety of judging approaches to "crown a winner". Companies like DefinedCrowd (https://www.definedcrowd.com/) have taken a quality-first approach which gives much better data, but of course, at a cost.


try PYBOSSA, open source crowdsourcing platform, you host it yourself, you are in control of your data.


I think the problem is more that you give the data to the turkers than to give it to Amazon.


Moving to in-house annotators is probably the smart strategy. However, for tasks that are easily done by naive annotators, Prolific.ac might be a great source. I've had good luck getting quality data there, and they enforce a minimum hourly wage that, while not truly livable, is still heading in the right direction.


MTurk is a godsend for ML research and is a huge game-changer. For every other project where the problem is "that sounds cool but we don't have enough labeled data" the answer nowadays is "just turk it". Sentiment labeling, qualitative comparison, error identification, and tons of other traditionally data-scarce tasks are made trivially easy (at the cost of some money) with MTurk, and it's pretty much a win-win for everyone involved too!

Now the ethics as far as exploitation are definitely important, but I think the design of the site handles things quite well and makes everything fair for all parties. If you feel a task is underpaid, there are enough alternatives that you can just not do it. It's also true that there are many international turkers for whom $8/hour or less is still solid pay. Then there are also many third-party tools that allow turkers to see which HIT (task) requesters have good track records (low rejection ratio, good pay, etc.), and the site's own tools allow requesters to avoid turkers with bad track records. In my experience just browsing through tasks, heavily underpaid tasks don't tend to get done (for example, writing a 100-word summary for $0.50).


> It's also true that there are many international turkers for whom $8/hour or less is still solid pay

> and it's pretty much a win-win for everyone involved too!

It is not; it is a way to pay below minimum wage, and to have simple labor paid to the lowest common denominator wage in the world. Hence people from countries such as Germany, The Netherlands, United Kingdom would be underpaid with $8 hour. The problem is that these people also have higher expenses than poor countries. My rent alone is 600+ EUR/month, and that's cheap (for NL).


You can't receive any actual money (only amazon money, which is much less useful) as a worker outside of the US as far as I know.


You can in India as well which must be where most of the work is coming from.


Except Amazon now also has access to your dataset.


If you are hosting your own data, Amazon knows the answer but not the question.

So in say a photo classification.. they know "turk xxx said green"

they don't know the photo.

Seems like not too big of a deal, right? I can share with you all of my answer datasets, they are useless without the questions.


The turks access your data (the questions) through Amazon, and I don't see how this will happen without Amazon having access to that data.


External Questions

https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/A...

This seems one of the more popular ways to do turks. You send people off to your own site, to show them stuff. All you really provide is a url, like www.mysite.com/questionId=123456... and render whatever you want when that appears


This doesn't have to be the case. For example, if you're paying mturk workers to fill out a survey you can give them a link to some third party (ie: Qualtrics) which will generate a code upon survey completion which the mturker enters as the answer to the only question on Amazon.

If you're really concerned about a third party having access to your data, you could just host the survey yourself.


Ok, but Amazon also checks for turk performance, e.g. to keep out bots, and people who just submit wrong answers to make a quick buck. It doesn't seem trivial to build upon that by returning a code.

(A related question: would it be possible to use Amazon Turk to train an algorithm that answers Amazon Turk questions?)


You would still use mTurk to recruit participants, so you still benefit from their filters for acceptance rate etc. You would just implement the survey (probably with some attention checks to weed out people who just randomly answer questions), and at survey completion you give them a code to submit to complete the mTurk hit.

Then you would only accept the submissions that have passed the attention checks, and pay those mTurkers.


Data cleanup is also a great use. For example, we needed to parse an amount of address data, we just needed country, state (or equivalent), city, we gave out each address three times, whatever results were the same at least twice was accepted. We had over 92% where all three were the same, another 7% with 2-1 (required review), less than 1% needed either manual cleanup before re-Turking or just manually entering some of the more gnarly cases. We considered it a truly massive success, price efficient and absolutely unbelievable quick.


You outsourced personal data to mechanical turk ? Can I have the name of your company ?


Is address data without a name/email or phone number to link it to really personal data?


They extracted it, so it means it was in a context. Address + context is very much personal.

Also I note that the comment author prefers not to answer.


Because he was asleep :)

The data set was of businesses not personal so this concern didn't occur. I would guess if it would've been PII potential then we would've wiped out the numbers first , it's not needed for city-state-country anyways.


They extracted it, so it means it was in a context. Address + context is very much personal.

That doesn't mean it was available to the MT workers. Why would it be?

Also I note that the comment author prefers not to answer.

They haven't made any new post since that one, how do you know if they even saw yours?


Even if it's just an ID, if you know the company you work for is a specialised online shop and you see your neighbor's address on there (assume rural area), you know they ordered with them. Depending on the kind of store that can already be critical.


In the case of a list of addresses, just knowing the company or person associated with the Mturk task could be enough context to reveal information about the person(s) living at that address.


I don't know. If it's just an address, you can get it from Google Maps Street View. To find out who is in a specific address is what makes it personal.


A list of addresses usually has some context* - the combination of the address and the context could be personal.

For example, a list of customers' addresses for a sex toy shop would be personal even though it was a subset of the non-personal list of all addresses.

*unless it's just a list of every address in the country, obviously


I disagree. The turkers don't need to know that they're extracting the "country, state (or equivalent), city," from a list of adult store customers in order to extract that data. Sharing the context is superfluous to requirements in this instance, so likely is not included in the task mandate.


That information wouldn't be directly included, but some account has to post the hit. That account is either associated with some company or some person who works for some company.


Why would Amazon pass that information to turkers?

"Find city, county, state in this list and post in the three boxes "City" "County" "State" in the web form." is literally all the information required for this task. All of the rest can be handled by Amazon's staff, or more likely automated processes. The turker need not know for whom they do the work, or why. They could be processing marketing communications, or census returns, or credit card applications, or literally anything else where an address may be used.


mTurkers get to pick and choose which hits they want to work on. They see a list of available hits, and each has a title, description, submitter, estimated time, and reward. The submitter matters to mTurkers because this allows them to see the hit acceptance rate (how often the submitter pays out on submissions, or how often they reject submissions).

So it's simply a fact of the platform that mTurkers know who they are working for.


Does the submitter have to use their or their company's real name? As long as it does the job of a name (stable, recognizable) it could be any name and still accomplish the goal of letting the MT worker sift through hits.


I don't think there is any requirement to use real names.


Did not know that. Seems... Weird. Thanks.


I know of one application that at least used to use MTurk to validate OCR translation of business cards. I'd guess that if you hand out a business card, you don't have an expectation that the data on there would be private.

I played with MTurk as a worker about 6 years ago. Lots of the better paying jobs were either doing audio transcriptions or quality checking OCR jobs.


Addresses are public data.


It doesn't seem to be just addresses since they are extracted from something.


This comment makes no sense to me and it seems like you’re pushing this point with a lot of force.

Of course there is context in how the addresses are used. But the reviewers may not have had access to the context. Since addresses are public data, the question remains.. what is the harm?


I've mentioned this in a few places but it seems to be ignored:

The account that posts the addresses is assosciated with some company. If that company isn't just 'Generic Consulting Company' then there is some context derived from the account name.

ie: This account is owned by 'XXX Toys', and they're asking me to clean up this address information. I guess someone at this address ordered some adult toys.


> If that company isn't just 'Generic Consulting Company'

But it is. I am just a small consulting company here, working for various clients and turkers have no idea who I work for, especially because most of the time even that is under NDA. (And as I noted above, this was a B2B sort of thing, so people got themselves into a tizzy over the addresses of businesses all around the world.)


You are making a ton of assumptions with absolutely zero evidence and using that as the basis of an argument for attacking the company. Fortunately he replied stating that it is a small consulting company, which is exactly what you'd expect, thus making your entire argument moot.


I'm not making any assumptions about anything, and I'm not attacking any company. I'm addressing the point that jamra made above, which is a general commenting stating that publishing a list of addresses is harmless because addresses are public data.


is this not re-captcha -able?


It's an interesting question, whether or not the data is representative (or what demographic is the data representative of).

Are there any studies done of the demographic distribution of Turkers?


Several of the referenced papers from the article seek to address this question:

"Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk"

https://www.jstor.org/stable/23260322?mag=amazons-mechanical...

"Socially Mediated Internet Surveys: Recruiting Participants for Online Experiments"

https://www.jstor.org/stable/43284764?mag=amazons-mechanical...


I'm told yes by my colleagues in political science. They seem to think highly of it. Don't have the citations myself sorry


Please fix misleading title, it should be: Reinvented _Survey_ Research.

> thirty to forty minute survey ... paid $1.10

What sort of results do you expect from this "research"? Do you really expect people to read the questions and answer truthfully?

You can prove pretty much anything, by reordering fields or manipulating the questions.


Prolific (http://prolific.ac/) is designed specifically for science research, but I've used used it for two different use cases: 1. Product research, quickly testing appetite for an idea. 2. Micro tasks (a "survey" with at least one participant, where basically the only question was "have you done the task (on external page)?" The task in question was editing a public transcript, so confidentiality wasn't even a concern.

Each time I was impressed with the quality of the responses. In the optional open-ended text fields for the research survey, in particular, I was impressed with how people took time to free-write their thoughts, rather than just rushing on to the next question / to complete the survey. Point is I do think you get what you pay for...

From my experience, it seems Prolific's ethical approach (e.g., insisting on decent minimum payment rates) leads to overall higher quality of participants as well as responses (vs. race-to-the-bottom micro-task platforms like MTurk). [Disclaimer: I'm friends with the founders -- so I'm quite familiar with the amount of resources devoted to ensuring truthful/quality responses.]


It can also be used for cognitive science research, for example to collect reading times, reaction times, vocabulary learning trajectories, etc. Published work suggests that it is provides data of similar quality to in-lab studies, once you do some due diligence (browser fingerprinting, etc.).


I've been turking once, made a $100 worth of Amazon credits and bought a first version of Kindle :) good old days


How many hours? Was it 30-50 hours as some reports suggest or a decent hourly wage?


To paraphrase The Simpsons: "In the future, computer programs will be built by labeled data sets. And our job will be to build and maintain those labeled data sets."


I can't help but imagine unscrupulous bot programmers going through every possible survey and answering them quickly with garbage. Or if they monitor expected times doing parallel instances while waiting long enough to look like a human. High volume low value junk ruining things has a long history with the internet.


Sadly there is disincentive to report bad workers on MTurk. You can block them through the MTurk system which rightfully puts their account at risk for termination. BUT Amazon sends them an email identifying you as the blocker, which causes them to go write terrible reviews of you on worker sites, which reduces the supply of workers for your HITs. So the ideal way to handle it is block them from your HIT server so at least they can't do YOUR HITs again and never report them.


Have anyone used Mechanical Turk for tasks that involve non-English languages like Chinese or Vietnamese?


I tried a batch for some data typing involving Chinese characters. Nobody did it. Maybe it wasn't paid enough, maybe there was no skilled people willing to do it. I'm also interested in other people experience involving to two languages.


That's not too surprising - most of Amazon's IP ranges are blocked by the Great Firewall.


My friend has used this extensively for all kinds of interesting things.

I interviewed him about his work combining mturk & AI to help Trump / Clinton supporters better understand each other.

https://huffpost.com/us/entry/us_581a4825e4b0f1c7d77c9555

With regard to pay, it seems reasonable to adopt a standard where the average per hour rate is disclosed in the research papers. This alone may provide social pressure for academics to adopt payment inline with local norms.

Also this brings up some questions for me. Is it unethical for a researcher in a locale with low wages to post on mturk looking for work at comparable rates? Should posters be posting rates comparable to their own countries minimum wage laws? Is there another standard? Could requiring some researchers (from richer countries) to increase their rates result in resesrchers from other countries being priced out or having their research deprioritized by Turks?


'Mechanical Turk' is a sweatship run by a notorious skinflint.


Do people who participate in MK get any money for their time?


Not outside the US, so I won't be so confident for non english data.


Why not? Are they afraid of users from low-income countries flooding the site?


Yes, and that you get a lot of tasks that you can't prosecute their origin. For example (off the top of my head) solving captchas, cracking passwords, doxing, hacking offers etc. I think there was also the spectre of organized crime enslaving people to do the tasks and taking the profits.


whaat non US workers don't get paid??




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: