Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The data governance concerns are not that Microsoft has access to the data (that is no doubt covered by contract clauses).

The concerns are that most corporate implementations of network roles and permissions are not up to date or accurate, so CoPilot will show data to an employee that they should not be allowed to see. Salary info is an example.

Basically, CoPilot is following “the rules” (technical settings) but corporate IT teams have not kept the technical rules up to date with the business rules. So they need to pause CoPilot until they get their own rules straight.

Edit to add: if your employer has CoPilot turned on, maybe try asking for sensitive stuff and see what you get. ;-)




I don't think this is the core of the concern, other aspects are much worse IMHO.

For instance, the article says "Microsoft positions its Copilot tool as a way to make users more creative and productive by capturing all the human labor latent in the data used to train its AI models and reselling it.". The Copilot data could make it a lot easier to steal sensitive business procedures and intellectual property - the data would allow any third parties to fully inspect the company procedures and sensitive data in a scale that we've never seen. It would be next to impossible to manage, categorize and protect this data. It's an intellectual property nightmare.

A good version of the technology, which we don't have yet, would allow competitors to create a copy of every employee (in the business sense) and perhaps much more efficiently compete with that company.


Are you talking here about the idea that Microsoft are training their models on private customer data in a way that could later expose details of it to people outside the company?

That’s not happening. Microsoft Copilot doesn’t train on data it has access to. https://learn.microsoft.com/en-us/power-platform/faqs-copilo...


Microsoft of course having an excellent reputation for respecting the law and their customers whenever there's significant money to be made by doing the exact opposite


Your company’s data isn’t nearly as valuable for training models as you might think.

Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions if they heard any hint of data being used for training without permission.

Convincing people that you don’t train on their data remains one of the hardest problems: https://simonwillison.net/2023/Dec/14/ai-trust-crisis/


> Certainly not as valuable as the revenue you can make from companies that would instantly cancel their Copliot 365 subscriptions

companies already are cancelling their copilot subscriptions as it's "high cost and low value"

https://www.businessinsider.com/pharma-cio-cancelled-microso...

> Convincing people that you don’t train on their data remains one of the hardest problems:

we attempted to protect our valuable data with copyright

they disregarded these terms, trained on it anyway and claim wholesale reproduction of our work is "fair use"

why wouldn't they do the same with Teams/Sharepoint/Word/everything on Azure

because the contract with a company 10000x our size says they won't? HAHAHAHAHA

the only way to protect your data from entities that have previously disregarded terms in this way is to not let them get their dirty hands on it in the first place


Did you read https://simonwillison.net/2023/Dec/14/ai-trust-crisis/ ? Because your comment here is a text-book example of what I was talking about there, right up to the bit where you say "you can't trust them because they've already shown they'll train on unlicensed scraped copyrighted data" (a very reasonable point to argue).

(Update: actually I didn't make that point in the original post, it's from the talk version of this I gave https://simonwillison.net/2024/Jun/27/ai-worlds-fair/#slide.... )


This exactly. If Microsoft had created its own fine-tuned-MSFT-data LLM and seen vastly better results on internal tasks, then they’d be publishing papers about it, and also packaging that up & selling to customers.


There’s significant money to be lost (lawsuits, massive reputations damage) if it was discovered they weee deliberately training on data they had guaranteed that they were not training on.


when has the threat of lawsuits or massive reputational damage ever previously stopped Microsoft?


You're confusing corporate enterprise plans with home users. Completely different ballgames.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: