Your company’s data isn’t nearly as valuable for training models as you might th...

blibble · on Aug 23, 2024

> Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions

companies already are cancelling their copilot subscriptions as it's "high cost and low value"

https://www.businessinsider.com/pharma-cio-cancelled-microso...

> Convincing people that you don’t train on their data remains one of the hardest problems:

we attempted to protect our valuable data with copyright

they disregarded these terms, trained on it anyway and claim wholesale reproduction of our work is "fair use"

why wouldn't they do the same with Teams/Sharepoint/Word/everything on Azure

because the contract with a company 10000x our size says they won't? HAHAHAHAHA

the only way to protect your data from entities that have previously disregarded terms in this way is to not let them get their dirty hands on it in the first place

simonw · on Aug 23, 2024

Did you read https://simonwillison.net/2023/Dec/14/ai-trust-crisis/ ? Because your comment here is a text-book example of what I was talking about there, right up to the bit where you say "you can't trust them because they've already shown they'll train on unlicensed scraped copyrighted data" (a very reasonable point to argue).

(Update: actually I didn't make that point in the original post, it's from the talk version of this I gave https://simonwillison.net/2024/Jun/27/ai-worlds-fair/#slide.... )

muglug · on Aug 23, 2024

This exactly. If Microsoft had created its own fine-tuned-MSFT-data LLM and seen vastly better results on internal tasks, then they’d be publishing papers about it, and also packaging that up & selling to customers.