Shouldn't have licenced Golang BSD if that's the attitude.
Everybody for years including here on HN denigrated GPLv3 and other "viral" licences, because they were a hindrance to monetisation. Well, you got what you wished for. Someone else is monetising the be*jesus out of you so complaining now is just silly.
All of a sudden copyleft may be the only licences actually able to force models to account, hopefully with huge fines and/or forcibly open sourcing any code they emit (which would effectively kill them). And I'm not so pessimistic that this won't get used in huge court cases because the available penalties are enormous given these models' financial resources.
I tend to agree, but I wonder… if you train an LLM on only GPL code, and it generates non-deterministic predictions derived from those sources, how do you prove it’s in violation?
You don't because it isn't, unless it actually copies significant amounts of text.
Algorithms can not be copyrighted. Text can be copyrighted, but reading publicly available text and then learning from it and writing your own text is just simply not the sort of transformation that copyright reserves to the author.
Now, sometimes LLMs do quote GPL sources verbatim (if they're trained wrong). You can prove this with a simple text comparison, same as any other copyright violation.
All of a sudden copyleft may be the only licences actually able to force models to account, hopefully with huge fines and/or forcibly open sourcing any code they emit (which would effectively kill them). And I'm not so pessimistic that this won't get used in huge court cases because the available penalties are enormous given these models' financial resources.