Mechanical Turk is great for "open", public research. We used to use them a lot for machine learning tasks (data cleanup, model comparisons, label annotations), along with similar services like CrowdFlower / Figure Eight. We saw two primarily issues when applied to "non-open" (commercial) projects:
- business-related data too sensitive to share with strangers (contractual obligations, too much risk)
- some tasks required non-trivial subject matter expertise and context to annotate properly (quality control issues)
For this reason, we gradually moved to an in-house team of long-term annotators. It's not much more expensive (moms on maternity leave, students…), but infinitely more flexible and safer for our purposes. YMMV.
Alex from Scale (www.scaleapi.com) here! We've taken an extremely quality-first approach and build out large workforces for datasets with high quality requirements and complexity. For example, we do a bunch of LIDAR / 3D labeling (https://www.scaleapi.com/sensor-fusion-annotation) which is very complex and labor intensive, and provide extremely high quality that would not be possible otherwise.
Deepak from Playment(https://playment.io/). We help companies label data. We have dedicated managers for projects who take care of training annotators and ensuring quality output. We have a large number of annotators who are pre-trained on different annotations, and we train them for different specific business use-cases. Reach out to us if you are looking for fully managed quality annotated data.
I work directly with several top-tier research universities use us for public opinion polling mostly, as we offer a representative sample at a great price point.
Lucid is the creator of the world's largest programmatic survey sample marketplace. We have API integrations with hundreds of US and international panels that allow us to target very specific audiences across dozens of sources, ensuring excellent feasibility.
Seconded. Quality control is a nightmare with Turk, often requiring stimuli to be labeled multiple times and have a variety of judging approaches to "crown a winner". Companies like DefinedCrowd (https://www.definedcrowd.com/) have taken a quality-first approach which gives much better data, but of course, at a cost.
Moving to in-house annotators is probably the smart strategy. However, for tasks that are easily done by naive annotators, Prolific.ac might be a great source. I've had good luck getting quality data there, and they enforce a minimum hourly wage that, while not truly livable, is still heading in the right direction.
MTurk is a godsend for ML research and is a huge game-changer. For every other project where the problem is "that sounds cool but we don't have enough labeled data" the answer nowadays is "just turk it". Sentiment labeling, qualitative comparison, error identification, and tons of other traditionally data-scarce tasks are made trivially easy (at the cost of some money) with MTurk, and it's pretty much a win-win for everyone involved too!
Now the ethics as far as exploitation are definitely important, but I think the design of the site handles things quite well and makes everything fair for all parties. If you feel a task is underpaid, there are enough alternatives that you can just not do it. It's also true that there are many international turkers for whom $8/hour or less is still solid pay. Then there are also many third-party tools that allow turkers to see which HIT (task) requesters have good track records (low rejection ratio, good pay, etc.), and the site's own tools allow requesters to avoid turkers with bad track records. In my experience just browsing through tasks, heavily underpaid tasks don't tend to get done (for example, writing a 100-word summary for $0.50).
> It's also true that there are many international turkers for whom $8/hour or less is still solid pay
> and it's pretty much a win-win for everyone involved too!
It is not; it is a way to pay below minimum wage, and to have simple labor paid to the lowest common denominator wage in the world. Hence people from countries such as Germany, The Netherlands, United Kingdom would be underpaid with $8 hour. The problem is that these people also have higher expenses than poor countries. My rent alone is 600+ EUR/month, and that's cheap (for NL).
This seems one of the more popular ways to do turks. You send people off to your own site, to show them stuff. All you really provide is a url, like www.mysite.com/questionId=123456... and render whatever you want when that appears
This doesn't have to be the case. For example, if you're paying mturk workers to fill out a survey you can give them a link to some third party (ie: Qualtrics) which will generate a code upon survey completion which the mturker enters as the answer to the only question on Amazon.
If you're really concerned about a third party having access to your data, you could just host the survey yourself.
Ok, but Amazon also checks for turk performance, e.g. to keep out bots, and people who just submit wrong answers to make a quick buck. It doesn't seem trivial to build upon that by returning a code.
(A related question: would it be possible to use Amazon Turk to train an algorithm that answers Amazon Turk questions?)
You would still use mTurk to recruit participants, so you still benefit from their filters for acceptance rate etc. You would just implement the survey (probably with some attention checks to weed out people who just randomly answer questions), and at survey completion you give them a code to submit to complete the mTurk hit.
Then you would only accept the submissions that have passed the attention checks, and pay those mTurkers.
Data cleanup is also a great use. For example, we needed to parse an amount of address data, we just needed country, state (or equivalent), city, we gave out each address three times, whatever results were the same at least twice was accepted. We had over 92% where all three were the same, another 7% with 2-1 (required review), less than 1% needed either manual cleanup before re-Turking or just manually entering some of the more gnarly cases. We considered it a truly massive success, price efficient and absolutely unbelievable quick.
The data set was of businesses not personal so this concern didn't occur. I would guess if it would've been PII potential then we would've wiped out the numbers first , it's not needed for city-state-country anyways.
Even if it's just an ID, if you know the company you work for is a specialised online shop and you see your neighbor's address on there (assume rural area), you know they ordered with them. Depending on the kind of store that can already be critical.
In the case of a list of addresses, just knowing the company or person associated with the Mturk task could be enough context to reveal information about the person(s) living at that address.
I don't know. If it's just an address, you can get it from Google Maps Street View. To find out who is in a specific address is what makes it personal.
A list of addresses usually has some context* - the combination of the address and the context could be personal.
For example, a list of customers' addresses for a sex toy shop would be personal even though it was a subset of the non-personal list of all addresses.
*unless it's just a list of every address in the country, obviously
I disagree. The turkers don't need to know that they're extracting the "country, state (or equivalent), city," from a list of adult store customers in order to extract that data. Sharing the context is superfluous to requirements in this instance, so likely is not included in the task mandate.
That information wouldn't be directly included, but some account has to post the hit. That account is either associated with some company or some person who works for some company.
Why would Amazon pass that information to turkers?
"Find city, county, state in this list and post in the three boxes "City" "County" "State" in the web form." is literally all the information required for this task. All of the rest can be handled by Amazon's staff, or more likely automated processes. The turker need not know for whom they do the work, or why. They could be processing marketing communications, or census returns, or credit card applications, or literally anything else where an address may be used.
mTurkers get to pick and choose which hits they want to work on. They see a list of available hits, and each has a title, description, submitter, estimated time, and reward. The submitter matters to mTurkers because this allows them to see the hit acceptance rate (how often the submitter pays out on submissions, or how often they reject submissions).
So it's simply a fact of the platform that mTurkers know who they are working for.
Does the submitter have to use their or their company's real name? As long as it does the job of a name (stable, recognizable) it could be any name and still accomplish the goal of letting the MT worker sift through hits.
I know of one application that at least used to use MTurk to validate OCR translation of business cards. I'd guess that if you hand out a business card, you don't have an expectation that the data on there would be private.
I played with MTurk as a worker about 6 years ago. Lots of the better paying jobs were either doing audio transcriptions or quality checking OCR jobs.
This comment makes no sense to me and it seems like you’re pushing this point with a lot of force.
Of course there is context in how the addresses are used. But the reviewers may not have had access to the context. Since addresses are public data, the question remains.. what is the harm?
I've mentioned this in a few places but it seems to be ignored:
The account that posts the addresses is assosciated with some company. If that company isn't just 'Generic Consulting Company' then there is some context derived from the account name.
ie: This account is owned by 'XXX Toys', and they're asking me to clean up this address information. I guess someone at this address ordered some adult toys.
> If that company isn't just 'Generic Consulting Company'
But it is. I am just a small consulting company here, working for various clients and turkers have no idea who I work for, especially because most of the time even that is under NDA. (And as I noted above, this was a B2B sort of thing, so people got themselves into a tizzy over the addresses of businesses all around the world.)
You are making a ton of assumptions with absolutely zero evidence and using that as the basis of an argument for attacking the company. Fortunately he replied stating that it is a small consulting company, which is exactly what you'd expect, thus making your entire argument moot.
I'm not making any assumptions about anything, and I'm not attacking any company. I'm addressing the point that jamra made above, which is a general commenting stating that publishing a list of addresses is harmless because addresses are public data.
Prolific (http://prolific.ac/) is designed specifically for science research, but I've used used it for two different use cases:
1. Product research, quickly testing appetite for an idea.
2. Micro tasks (a "survey" with at least one participant, where basically the only question was "have you done the task (on external page)?" The task in question was editing a public transcript, so confidentiality wasn't even a concern.
Each time I was impressed with the quality of the responses. In the optional open-ended text fields for the research survey, in particular, I was impressed with how people took time to free-write their thoughts, rather than just rushing on to the next question / to complete the survey. Point is I do think you get what you pay for...
From my experience, it seems Prolific's ethical approach (e.g., insisting on decent minimum payment rates) leads to overall higher quality of participants as well as responses (vs. race-to-the-bottom micro-task platforms like MTurk).
[Disclaimer: I'm friends with the founders -- so I'm quite familiar with the amount of resources devoted to ensuring truthful/quality responses.]
It can also be used for cognitive science research, for example to collect reading times, reaction times, vocabulary learning trajectories, etc. Published work suggests that it is provides data of similar quality to in-lab studies, once you do some due diligence (browser fingerprinting, etc.).
To paraphrase The Simpsons: "In the future, computer programs will be built by labeled data sets. And our job will be to build and maintain those labeled data sets."
I can't help but imagine unscrupulous bot programmers going through every possible survey and answering them quickly with garbage. Or if they monitor expected times doing parallel instances while waiting long enough to look like a human. High volume low value junk ruining things has a long history with the internet.
Sadly there is disincentive to report bad workers on MTurk. You can block them through the MTurk system which rightfully puts their account at risk for termination. BUT Amazon sends them an email identifying you as the blocker, which causes them to go write terrible reviews of you on worker sites, which reduces the supply of workers for your HITs. So the ideal way to handle it is block them from your HIT server so at least they can't do YOUR HITs again and never report them.
I tried a batch for some data typing involving Chinese characters. Nobody did it. Maybe it wasn't paid enough, maybe there was no skilled people willing to do it. I'm also interested in other people experience involving to two languages.
With regard to pay, it seems reasonable to adopt a standard where the average per hour rate is disclosed in the research papers. This alone may provide social pressure for academics to adopt payment inline with local norms.
Also this brings up some questions for me. Is it unethical for a researcher in a locale with low wages to post on mturk looking for work at comparable rates? Should posters be posting rates comparable to their own countries minimum wage laws? Is there another standard? Could requiring some researchers (from richer countries) to increase their rates result in resesrchers from other countries being priced out or having their research deprioritized by Turks?
Yes, and that you get a lot of tasks that you can't prosecute their origin. For example (off the top of my head) solving captchas, cracking passwords, doxing, hacking offers etc. I think there was also the spectre of organized crime enslaving people to do the tasks and taking the profits.
- business-related data too sensitive to share with strangers (contractual obligations, too much risk)
- some tasks required non-trivial subject matter expertise and context to annotate properly (quality control issues)
For this reason, we gradually moved to an in-house team of long-term annotators. It's not much more expensive (moms on maternity leave, students…), but infinitely more flexible and safer for our purposes. YMMV.