GPLed code on github (given the copilot controversy)

marc marcxfe at welz.org.za
Sat Jul 10 08:58:59 UTC 2021


Hi

The way I understand this (I welcome corrections), is that
copilot is a piece of proprietary software which was built
using a corpus of software hosted on github.

And if I understand it properly, this corpus of software
includes code nominally released under the GPL and similar
licenses. Is my understanding correct sofar ?

I am told that copilot reproduces verbatim identifiable
chunks of the code that was loaded into its model. This
presents two interesting questions - is the model (which
is an algorithmic transformation of copyrighted material)
a new, rather than derivative work ? Are the fragments of
code reproduced by it sufficiently small to be considered
fair use ?

If I had to guess, I would answer no to both of these
questions: The pasting in is verbatim and automated,
and while individual pieces might not be large, there are
many. Fundamentally, I also would like to think that being
creative is something only a human/real intelligence
can be, otherwise the guy who wrote a program to enumerate
all melodies shorter than N would be owed sampling fees on
all new music... and few notes of music require a royalty
then a snippet of code might too ? If somebody writes an
"autocompose" equivalent to copilot, trained on existing
music, then they could expect legal action from record
labels ? And not only for the output of the "autoimprovise"
but also the model that forms the core of "autoplay" ?

But I am not qualified to provide a final answer on
that - I suppose judges will be the ones to make that
determination. Note the plural there - I am told fair
use and its equivalents differ nontrivially between
countries...

But assuming a no to those two points, github would then
(I think) rely on the terms and conditions on its site
which state that by uploading code to its site, you give
it the license to re-use your work, even in closed-source
products, apparently ?

Now: If you upload somebody else's GPLed code to github
that then implies that you (and maybe github) have
contravened the license, right ? And possibly that means
you lose your own permission to hold a copy of the GPLed
code ? Alternatively, if you are the legitimate owner of
the code, you have just dual licensed your code (with a
commercial grant to github, owned by microsoft) which is
probably not what you intended ?

Note the question marks everywhere - have I gotten my
facts wrong ? Is my reasoning incorrect ? Would it be
prudent to avoid using github for (A/L/)GPLed code until
the legalities have been settled ?

regards

marc


More information about the Discussion mailing list