The data governance concerns are not that Microsoft has access to the data (that...

toomuchtodo · on Aug 23, 2024

Recommend the Zenity Blackhat materials for a deep dive on the risk.

https://labs.zenity.io/p/links-materials-living-off-microsof...

https://www.youtube.com/playlist?list=PLM_RIPYi59BN6BeHyJQ_9...

gxd · on Aug 23, 2024

I don't think this is the core of the concern, other aspects are much worse IMHO.

For instance, the article says "Microsoft positions its Copilot tool as a way to make users more creative and productive by capturing all the human labor latent in the data used to train its AI models and reselling it.". The Copilot data could make it a lot easier to steal sensitive business procedures and intellectual property - the data would allow any third parties to fully inspect the company procedures and sensitive data in a scale that we've never seen. It would be next to impossible to manage, categorize and protect this data. It's an intellectual property nightmare.

A good version of the technology, which we don't have yet, would allow competitors to create a copy of every employee (in the business sense) and perhaps much more efficiently compete with that company.

simonw · on Aug 23, 2024

Are you talking here about the idea that Microsoft are training their models on private customer data in a way that could later expose details of it to people outside the company?

That’s not happening. Microsoft Copilot doesn’t train on data it has access to. https://learn.microsoft.com/en-us/power-platform/faqs-copilo...

blibble · on Aug 23, 2024

Microsoft of course having an excellent reputation for respecting the law and their customers whenever there's significant money to be made by doing the exact opposite

simonw · on Aug 23, 2024

Your company’s data isn’t nearly as valuable for training models as you might think.

Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions if they heard any hint of data being used for training without permission.

Convincing people that you don’t train on their data remains one of the hardest problems: https://simonwillison.net/2023/Dec/14/ai-trust-crisis/

blibble · on Aug 23, 2024

> Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions

companies already are cancelling their copilot subscriptions as it's "high cost and low value"

https://www.businessinsider.com/pharma-cio-cancelled-microso...

> Convincing people that you don’t train on their data remains one of the hardest problems:

we attempted to protect our valuable data with copyright

they disregarded these terms, trained on it anyway and claim wholesale reproduction of our work is "fair use"

why wouldn't they do the same with Teams/Sharepoint/Word/everything on Azure

because the contract with a company 10000x our size says they won't? HAHAHAHAHA

the only way to protect your data from entities that have previously disregarded terms in this way is to not let them get their dirty hands on it in the first place

simonw · on Aug 23, 2024

Did you read https://simonwillison.net/2023/Dec/14/ai-trust-crisis/ ? Because your comment here is a text-book example of what I was talking about there, right up to the bit where you say "you can't trust them because they've already shown they'll train on unlicensed scraped copyrighted data" (a very reasonable point to argue).

(Update: actually I didn't make that point in the original post, it's from the talk version of this I gave https://simonwillison.net/2024/Jun/27/ai-worlds-fair/#slide.... )

muglug · on Aug 23, 2024

This exactly. If Microsoft had created its own fine-tuned-MSFT-data LLM and seen vastly better results on internal tasks, then they’d be publishing papers about it, and also packaging that up & selling to customers.

muglug · on Aug 23, 2024

There’s significant money to be lost (lawsuits, massive reputations damage) if it was discovered they weee deliberately training on data they had guaranteed that they were not training on.

blibble · on Aug 23, 2024

when has the threat of lawsuits or massive reputational damage ever previously stopped Microsoft?

bongodongobob · on Aug 23, 2024

You're confusing corporate enterprise plans with home users. Completely different ballgames.