The public is almost fully to blame, and gets the government it deserves. I only hedge a little because education is in control of the state, so to some degree people don't choose whether to be educated on the relevant matters.
It may be familiarity breeding contempt but I find members of the British public in particular very myopic in obtaining benefits for 'their group'. There's very little interest in society as a whole.
Politicians simply bend in order not to upset any of the key voting blocs. But you understand that's a selection bias: you wouldn't exist as a successful politician if you didn't do this. All those who go another path are doomed to obscurity.
More training data at this point leads to marginal improvements, curve is flattening. So advantage is low. Especially when Anthropic definitely has the budget and talent to carry out the same study.
On the other hand, having it leak that you train on your customers data, ignoring the opt-out, is probably existential when close alternatives exist in the market.
You probably also thought Anthropic did not use pirated PDFs. You don't know how these companies actually operate & you don't know what weasel language they use in their contracts to get away w/ exactly what I assume to be the case.
There is no AI, all these companies have is the chat logs so unless you have further evidence on what they do or don't do behind the scenes I recommend you use a more conservative approach in your assumptions about what they use or don't use for training.
No, why would they care about using pirated PDFs? Did you actually read/understand what I wrote? Violating their customers comes with risk for them. Violating the copyright of unrelated texbook authors does not. If that's even what they did.
They are currently paying book authors over a billion dollars in damages. You're out of your depth in this discussion so further engagement is not going to be fruitful for anyone involved. Good luck.
I'm always confused by the conspiratorial takes that think there's some service out there _not_ bound by the legal system where it resides. Obviously Proton obeys the law and gives up data when it has to. Where are the services that don't do that? Somalia?
I think the key difference is the amount of data the service can offer when it is asked to do so by some legal entity. Signal famously claims to barely have any useful data to turn over when ordered to do so [1]. If some provider states they are pricacy-focused and protect your data from governments, but can still offer loads of your private data when ordered to, that damages their privacy claim.
EDIT: "some provider like Proton" -> "some provider", never wanted to imply Proton specifically did or does this.
> If some provider like Proton states they are pricacy-focused and protect your data from governments, but can still offer loads of your private data when ordered to, that damages their privacy claim.
"Loads" of private data? When has this allegedly happened or how would it technically even be possible?
Well, Proton themselves say they will provide information about who has contacted a randsomware attacker to law enforcement. https://proton.me/legal/law-enforcement
So that probably has happened. Whether they've even provided other private data I don't know, but
> how would it technically even be possible
Well, it's not possible if you trust their claims about E2EE, but that is just a claim. How's that any different from a non-encrypted email provider saying they won't provide your emails to others? It all comes down to trust in the end.
They don't claim email is E2EE. Of course they need to know email metadata to route messages. That's unavoidable if you are using email. It's not encapsulated like that.
Edit: A reply to your misunderstanding and accusation below:
What do you mean? By "provide your emails to others" I obviously mean the email *contents*, not the email *address*. (Which I also clarified with "the storage of your emails on their servers"). You know, the very thing that is almost the whole selling point of Proton: that they keep the contents of your emails encrypted so "only you" can access them.
> Proton Mail protects the contents of all your messages with zero-access encryption, meaning no one can read them except you and your recipients. Messages you send to other Proton Mail accounts are always end-to-end encrypted, as are emails sent to non-Proton Mail accounts when you use Password-protected Emails.
Also, what in the SMTP protocol requires Proton to *store* that metadata? Could they not simply delete it after using it (or, crazy idea, encrypt it in the same way the message contents are encrypted in storage), so they would be unable to respond to law enforcement requests the next week, say? They did also previously claim that they didn't log user's IP addresses. Why would they claim something like that, if it's "obvious to anyone who knows" that it's a false claim? Marketing aimed towards their not so technically savvy userbase?
Let me also remind you that I was replying to a question about "how would it technically even be possible" to "offer loads of your private data when ordered". My reply was, it's easily possible for them to offer your metadata, and you still need to trust their claims about heir implementation of E2EE to believe they won't offer your message contents.
You're very quick to accuse people of spreading misinformation. Let me hit back with an accusation of my own, which is that Proton's PR team have a habit of regularly trying to discredit any critique as "misinformation". Perhaps you've just read too many of their rebuttals?
> Account Activity: Due to limitations of the SMTP protocol, we have access to the following email metadata: sender and recipient email addresses, the IP address incoming messages originated from, attachment name, message subject, and message sent and received times.
This would be obvious to anyone knows how email works. It would be very silly for them to claim otherwise.
Keyword is "like": a service like Proton. No idea if and what data they have offered to their government. I was merely trying to offer an explanation to the parent commenter, who was wondering how people can critique pricacy-focused services offering data when required by law.
Fair enough, I agree. In Proton case, I'm biased because I used to work there ~2019-2022 and the company was basically printing money from subscriptions alone (covid likely helped with that), while fighting (pretty successfully) every request to avoid providing even that limited metadata, because alternative of ruining your core strength - privacy - meant the death of the business. I don't know if anything changed, but I'd bet the goals remain largely the same - providing good-enough privacy any commercial company can realistically give you. Unencrypted user data in this business is poison, and they're well aware fwiw.
But don't they have both the encrypted data and the decryption keys? I don't remember giving them my keys to use, and I can look at my stuff from multiple devices so the keys aren't stored on my device.
So they must have the ability to look at all that encrypted data anyway?
You seem to be hiding behind this "like" while writing into comments about Proton - making accusations and theories that imply it's Proton that actually does that.
I mean, is it really a conspiracy theory to want or believe that there are services (based in Europe) that don't hand over any and all user data to the USA government when asked? It's probably wrong to believe it to be the case, but just because it's wrong doesn't make it "conspiratorial".
It's quite hypocritical of Proton to claim that they protect against government surveillance when they do things like this though [0]. Their legal team has probably ensured they don't claim anything strictly false, but the implication and the reality are wildly different.
Proton's marketing definitely makes it sound like they are fully anonymous and wouldn't even have anything to hand over to law enforcement. Look at the wording they use to describe the product.
Proton has always-on end-to-end encryption and zero‑access encryption, meaning even we do not have access to your data.
[...]
Based in Europe, Proton ensures your data is protected by some of the world’s strongest privacy laws. Because Proton isn’t a US‑based company, we can’t be compelled by laws such as the US CLOUD Act to hand over your data to the US government or terminate your services. [1]
Obviously as we have seen, they 100% can and will hand over your data to the US government. Yes, it's in the privacy policy/ToS & they're compiling with local laws. But that's clearly not how that reads.
[In 2021, the Switzerland-based vendor provided local police with the IP address and device details of a netizen the cops were trying to identify. That individual – a French climate activist who was already known to police – was later arrested.
Shortly after that kerfuffle, Proton removed the claim that it didn't track user IP addresses from its website. Proton has also previously been accused of offering real-time surveillance of users to authorities.] [2]
The last line of GP's comment is key here: "Who do I sue if Palantir decides I am an illegal?"
This shouldn't make as much of a difference as it does, but due to how our legal system works, it's much harder to get meaningful legal satisfaction when an algorithm (or other inhuman distributed system) commits a crime against a person than when a person does so.
I think you're confused about the mechanism involved. It's hard to get satisfaction due to e.g. qualified immunity. The fact they use technology is largely irrelevant. You couldn't sue the NSA for spying on you before AI either.
If we assume they are on quotas then what difference does technology make? They had quotas before the technology, qualified immunity too. 100 false arrests with no recourse are 100 false arrests with no recourse.
If anything I would expect technology favors the victim of false arrest because it gives the cop a face-saving get out. Previously, a cop who false-arrested you would have been incentivized to take it all the way, because you getting justice for it is intrinsically tied to impugning their word and/or reputation.
Options have some minor value in signalling that you're a true believer. You should in fact care only about base salary, but not telling the people doing the hiring that can be quite useful. Doing a fake come-down on base in exchange for options shows you are invested and surely worth hiring.
Why don't we just pagerank github contributors? Merged PRs approved by other quality contributors improves rank. New PRs tagged by a bot with the rank of the submitter. Add more scoring features (account age? employer?) as desired.
Plus-addressing is built in to most email services. There's no 'fancy' set up to break; it just works. That is, there's no way me@gmail.com works but me+someservice@gmail.com doesn't, unless you explicitly configure it not to work. Similarly for custom domains on most services.
If you use a catch-all on a domain, i.e. someservice@somedomain.com, I guess in theory that might break. But it seems about as likely as messing up the overall domain setup.
Also, my account on your service is likely much more disposable to me than my email address/domain. Anything I care about, I'd back up. Not just assume some random website is going to preserve it for me forever.
If you're only passing the address in private to some service, you can just use [some-string-unique-to-that-service]@yourdomain.com. Or, more classically, plus addressing to do the same. Then you just block that recipient.
That solution doesn't apply to the use case in the article.
Surely spammers just turn `me+leaked/sold@mail.com` into `me@mail.com` as well as `me+apple@mail.com`, `me+softbank@mail.com` etc. The cost of stripping any `+postfix` must be about zero even at volume.
Some people block all mail to non-plus-addressed emails on that inbox, so a plus address is required to be received at all. You could say then spammers will just add a random one, but they wouldn't be getting bounces and would have to guess as much. Still, even stripping the +'ed part is beyond what most of them even bother to do. That dropoff plus normal spam filters works well enough.
Of course, the technical term for that setup is 'catch all', you can set this up with your email provider. You can send your email to "ghywertelling@gregegan.net", for example.
A friend gave out an email gmail@hisname.com (he owns the domain). He says it's incredible how many people "corrected" him, and how persistent some of them were. :-)
The _any_ part is not clear to me. Obfuscation is an arms race. Reverse engineers have always been tool-assisted. Now they just have new tools and the obfuscators need to catch up.
It may be familiarity breeding contempt but I find members of the British public in particular very myopic in obtaining benefits for 'their group'. There's very little interest in society as a whole.
Politicians simply bend in order not to upset any of the key voting blocs. But you understand that's a selection bias: you wouldn't exist as a successful politician if you didn't do this. All those who go another path are doomed to obscurity.
reply