GPLed code on github (given the copilot controversy)

jahoti jahoti at envs.net
Mon Jul 12 11:22:00 UTC 2021


Hi,

you've certainly raised some interesting and important questions, as 
well as some deep philosophical comments. Unfortunately they're 
(obviously) not all things I can help with; however, some clarifications 
might be useful:

* The corpus of software is not part of the copilot "software" as such; 
it is only ever used as data, which then gets processed and reprocessed. 
I'm pretty sure that means it never runs afoul of any version of the 
(A/L/)GPL.

The main scenario where those licenses are involved is if someone asks 
copilot for some code and then puts that in a proprietary program.

* There are most definitely sizeable chunks of output reproduced 
verbatim from elsewhere (notably the fast inverse square root function, 
comments and all). Whether or not any are sufficiently large to attract 
copyright is, as you say, the interesting question.

* Simply republishing large amounts of a copyrighted work is not 
copyright infringement- every English-language book published in the 
last three decades shares a large portion of its vocabulary with any 
other! The question, really, is what the "chunks" that define code are.

* Copyright law does indeed differ quite a lot between countries, and 
you are right to think this could lead to different verdicts (that said, 
I'm not sure if it will; IANAL).

* GitHub's terms of service allow them to do a variety of things with 
user submissions only "as necessary to provide the Service, including 
improving the Service over time" (archival is allowed for some other 
purposes); it seems, depending on future developments, that there might 
be a case copilot is not covered by these criteria.

That said, I believe the (A/L/)GPL permit GitHub to do what they did 
anyway; this case might only apply to projects that make don't grant 
those rights separately under a license.

* It is arguably prudent to avoid hosting any code on GitHub, even 
perhaps after any legal issues are settled: 
https://www.gnu.org/software/repo-criteria-evaluation.html#GitHub

On 7/10/21 8:58 AM, marc wrote:
> Hi
> 
> The way I understand this (I welcome corrections), is that
> copilot is a piece of proprietary software which was built
> using a corpus of software hosted on github.
> 
> And if I understand it properly, this corpus of software
> includes code nominally released under the GPL and similar
> licenses. Is my understanding correct sofar ?
> 
> I am told that copilot reproduces verbatim identifiable
> chunks of the code that was loaded into its model. This
> presents two interesting questions - is the model (which
> is an algorithmic transformation of copyrighted material)
> a new, rather than derivative work ? Are the fragments of
> code reproduced by it sufficiently small to be considered
> fair use ?
> 
> If I had to guess, I would answer no to both of these
> questions: The pasting in is verbatim and automated,
> and while individual pieces might not be large, there are
> many. Fundamentally, I also would like to think that being
> creative is something only a human/real intelligence
> can be, otherwise the guy who wrote a program to enumerate
> all melodies shorter than N would be owed sampling fees on
> all new music... and few notes of music require a royalty
> then a snippet of code might too ? If somebody writes an
> "autocompose" equivalent to copilot, trained on existing
> music, then they could expect legal action from record
> labels ? And not only for the output of the "autoimprovise"
> but also the model that forms the core of "autoplay" ?
> 
> But I am not qualified to provide a final answer on
> that - I suppose judges will be the ones to make that
> determination. Note the plural there - I am told fair
> use and its equivalents differ nontrivially between
> countries...
> 
> But assuming a no to those two points, github would then
> (I think) rely on the terms and conditions on its site
> which state that by uploading code to its site, you give
> it the license to re-use your work, even in closed-source
> products, apparently ?
> 
> Now: If you upload somebody else's GPLed code to github
> that then implies that you (and maybe github) have
> contravened the license, right ? And possibly that means
> you lose your own permission to hold a copy of the GPLed
> code ? Alternatively, if you are the legitimate owner of
> the code, you have just dual licensed your code (with a
> commercial grant to github, owned by microsoft) which is
> probably not what you intended ?
> 
> Note the question marks everywhere - have I gotten my
> facts wrong ? Is my reasoning incorrect ? Would it be
> prudent to avoid using github for (A/L/)GPLed code until
> the legalities have been settled ?
> 
> regards
> 
> marc
> _______________________________________________
> Discussion mailing list
> Discussion at lists.fsfe.org
> https://lists.fsfe.org/mailman/listinfo/discussion
> 
> This mailing list is covered by the FSFE's Code of Conduct. All
> participants are kindly asked to be excellent to each other:
> https://fsfe.org/about/codeofconduct
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.fsfe.org/pipermail/discussion/attachments/20210712/ffe8f37a/attachment-0001.sig>


More information about the Discussion mailing list