> OpenAI has presumably already ingested and trained on these publishers’ archiv...

mdavidn · on May 9, 2024

Whether training a model on text constitutes copyright infringement is an unresolved legal question. The closest precedent would be search engines using automated processes to build an index and links, which is generally not seen as infringing (in the US).

beeboobaa3 · on May 9, 2024

https://www.rvo.nl/onderwerpen/octrooien-ofwel-patenten/vorm...

stale2002 · on May 9, 2024

No, they have not done that. Presumably they believe that the model training was done in fair use and no court has said otherwise yet.

It will take years for that stuff to settle out in court, and by that time none of that will matter, and the winners of the AI race will be those who didn't wait for this question to be settled.

beeboobaa3 · on May 9, 2024

They believe a lot of things, I'm sure.

> and the winners of the AI race will be those who didn't wait for this question to be settled.

Hopefully they'll be in jail.

stale2002 · on May 9, 2024

Its not just the big companies you have to think about, lol.

Sure you can sue OpenAI.

But will you be able to sue every single AI startup that happens to be working on Open Source AI tech, that was all trained this way? Absolutely not. Its simply not feasible. The cat is out of the bag.

beeboobaa3 · on May 9, 2024

The US government has worked hard to make the lives of copyright infringers miserable for years, even driving them to suicide.

stale2002 · on May 9, 2024

> The US government has worked hard to make the lives of copyright infringers miserable for years

They really have not. The fact that I can download any movie in the world right now, and use all of the open source models on my home PC proves that.

I am sure there are some random one off cases of infringers being punished, but it mostly doesn't happen.

Especially if we are talking about the entire tech industry.

The government isn't going to shutdown every single tech startup in the US. Because they are all using these open source AI models.

The government isn't going to be able to confiscate everyone's gamer PCs. The weights can already be run locally.

beeboobaa3 · on May 9, 2024

https://en.wikipedia.org/wiki/Aaron_Swartz

https://en.wikipedia.org/wiki/Illegal_number

stale2002 · on May 10, 2024

My point stands. Thats like one guy. Thats not ""an entire industry gets shutdown by the government".

That was my point. Sure, they might go after like one guy or one company. They aren't going to take out half of the tech startups in all of the US though. They also aren't going to confiscate everyone's gamer PCs.

I also think its funny that you literally posted a wikipedia page, where in the page itself it contains the "illegal" numbers.

So that proves my entire point. Your best example, is apparently an example where I can access the "illegal" information on a literal public wikipedia page!

beeboobaa3 · on May 11, 2024

> Thats like one guy

Also known as an example

> So that proves my entire point

Your point is that you can't use it commercially? Great! We're aligned, then.