Whether training a model on text constitutes copyright infringement is an unresolved legal question. The closest precedent would be search engines using automated processes to build an index and links, which is generally not seen as infringing (in the US).
No, they have not done that. Presumably they believe that the model training was done in fair use and no court has said otherwise yet.
It will take years for that stuff to settle out in court, and by that time none of that will matter, and the winners of the AI race will be those who didn't wait for this question to be settled.
Its not just the big companies you have to think about, lol.
Sure you can sue OpenAI.
But will you be able to sue every single AI startup that happens to be working on Open Source AI tech, that was all trained this way? Absolutely not. Its simply not feasible. The cat is out of the bag.
My point stands. Thats like one guy. Thats not ""an entire industry gets shutdown by the government".
That was my point. Sure, they might go after like one guy or one company. They aren't going to take out half of the tech startups in all of the US though. They also aren't going to confiscate everyone's gamer PCs.
I also think its funny that you literally posted a wikipedia page, where in the page itself it contains the "illegal" numbers.
So that proves my entire point. Your best example, is apparently an example where I can access the "illegal" information on a literal public wikipedia page!
So they're admitting to copyright violations and theft?