Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If those weights are a derivative of GPL'd code in a different form, and the results generate things derived from that derivative, then the generated code is still under license. "How much change is enough" has always been a gray area for courts and humans to decide.

If you can get a decent facsimile of licensed code out the other end, how is it really any different from lossy compression? I doubt the courts would consider a lossy re-encode of a disney movie as free from copyright.



If the output is substantially similar to GPL’d training data it may be infringing. Nobody disputes this.

However, copyright isn’t cooties. If the output is not similar, then it is not infringing regardless of how much GPL’d training data was used to generate it.


Suspend all knowledge of copyright law as it exists today for a moment and approach this hypothetical on first principles: a lot of GPL copyleft data is used in the making of an AI tool, that when asked for it, can itself recreate code similar to what was input... also, the creator of that AI tool will reap in all the profits without giving a single penny or even recognition of the value it guzzled from GPL data it was trained on to creators of original copyleft data. Is this fair? What do your scruples tell you?

No, of course not. We should probably revisit copyright law, given that it was written at a time when no-one foresaw modern AI tools, its capabilities, and its effects on creators and societies.


Have you used Copilot? It is generally not creating code similar to GPL code, it is creating code similar to the surrounding context file.

Transformers predict the most likely next token, the most likely next token is usually related to the surrounding context.

So yes it can create code similar to GPL code but it can only do that consistently when the GPL code is included in the context. So don’t do that.


The GPL was never about money, recognition or even abut the creators at all. Copy Left was created “to promote computer user freedom”

Free Software already views all proprietary software as inherently immoral. So there is no need to take a detour of what went into making the software to reach that conclusion from that angle.


Indeed, that's why I said

>"How much change is enough" has always been a gray area for courts and humans to decide.

But copilot has been shown to generate chunks of sufficient size and specificity that as a layman it very much feels like "copied GPL code". And my boss agrees too - we have a blanket ban on generative AI tools in our work because it's not considered worth the risk.


> has been shown to generate chunks of sufficient size and specificity

Only when given chinks of copyrighted code as input. I don't think anyone has demonstrated big chunks of copyrighted code in the output when copyrighted code isn't present in the query/context.

In fact, I suspect microsoft specifically filters the output for that.


>If those weights are a derivative of GPL'd code in a different form, and the results generate things derived from that derivative, then the generated code is still under license.

That's not really Microsoft's problem as long as people aren't afraid of using Copilot to generate (potentially GPL'd) code. And from what I've generally seen from genAI discussions at work, people think very little about any legal implications.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: