Questioning The Original Analysis On The Bionic Debate

Created
Sat, 19/03/2011 - 06:52
Updated
Sat, 19/03/2011 - 06:52

I was hoping to avoid having to comment further on this problematic story. I figured a comment as a brief identi.ca statement was enough when it was just a story on the Register. But, it's now hit a major tech news outlet, and I feel that, given that I'm typically the first person everyone in the Free Software world comes to ask if something is a GPL violation, I'm going to get asked about this soon, so I might as well preempt the questions with a blog post, so I can answer any questions about it with this URL.

In short, the question is: Does Bionic (the Android/Linux default C library developed by Google) violate the GPL by importing “scrubbed” headers from Linux? For those of you seeking TL;DR version: You can stop now if you expect me to answer this question; I'm not going to. I'm just going to show that the apparent original analysis material that started this brouhaha is a speculative hypothesis which would require much more research to amount to anything of note.

Indeed, the kind of work needed to answer these questions typically requires the painstaking work of a talented developer working very closely with legal counsel. I've done analysis like this before for other projects. The only one I can easily talk about publicly is the ath5k situation. (If you want to hear more on that, you can listen to an old oggcast where I discussed this with Karen Sandler or read papers that were written on the subject back where I used to work.)

Anyway, most of what's been written about this subject of the Linux headers in Bionic has been poorly drafted speculation. I suppose some will say this blog post is no better, since I am not answering any questions, but my primary goal here is to draw attention that absolutely no one, as near as I can tell, has done the incredibly time consuming work to figure out anything approaching a definitive answer! Furthermore, the original article that launched this debate (Naughton's paper, The Bionic Library: Did Google Work Around the GPL?) is merely a position paper for a research project yet to be done.

Naughton's full paper gives some examples that would make a good starting point for a complete analysis. It's disturbing, however, that his paper is presented as if it's a complete analysis. At best, his paper is a position statement of a hypothesis that then needs the actual experiment to figure things out. That rigorous research (as I keep reiterating) is still undone.

To his credit, Naughton does admit that only the kind of analysis I'm talking about would yield a definitive answer. You have to get almost all the way through his paper to get to:

Determining copyrightability is thus a fact-specific, case-by-case exercise. … Certainly, sorting out what is and isn’t subject to GPLv2 in Bionic would require at least a file-by-file, and most likely line-by-line, analysis of Bionic — a daunting task[.]
Of course, in that statement, Naughton makes the mistake of subtly including an assumption in the hypothesis: he fails to acknowledge clearly that it's entirely possible the set of GPLv2-covered work found in Bionic could be the empty set; he hasn't shown it's not the empty set (even notwithstanding his very cursory analysis of a few files).

Yet, even though Naughton admits full analysis (that he hasn't done) is necessary, he nevertheless later makes sweeping conclusions:

The 750 Linux kernel header files … define a complex overarching structure, an application programming interface, that is thoughtfully and cleverly designed, and almost assuredly protected by copyright.
Again, this is a hypothesis, that would have be tested and proved with evidence generated by the careful line-by-line analysis Naughton himself admits is necessary. Yet, he doesn't acknowledge that fact in his conclusions, leaving his readers (and IMO he's expecting to dupe lots of readers unsophisticated on these issues) with the impression he's shown something he hasn't. For example, one of my first questions would be whether or not Bionic uses only parts of Linux headers that are required by specification to write POSIX programs, a question that Naughton doesn't even consider.

Finally, Naughton moves from the merely shoddy analysis to completely alarmist speculation with:

But if Google is right, if it has succeeded in removing all copyrightable material from the Linux kernel headers, then it has unlocked the Linux kernel from the restrictions of GPLv2. Google can now use the “clean” Bionic headers to create a non-GPL’d fork of the Linux kernel, one that can be extended under proprietary license terms. Even if Google does not do this itself, it has enabled others to do so. It also has provided a useful roadmap for those who might want to do the same thing with other GPLv2-licensed [sic] programs, such as databases.

If it turns out that Google has succeeded in making sure that the GPLv2 does not apply to Bionic, then Google's success is substantially more narrow. The success would be merely the extraction of the non-copyrightable facts that any C library needs to know about Linux to make a binary run when Linux happens to be the kernel underneath. Now, it should be duly noted that there already exist two libraries under the LGPL that have already implemented that (namely, glibc, and uClibc — the latter of which Naughton's cursory research apparently didn't even turn up). As it stands, anyone who wants to write user-space applications on a Linux-based system already can; there are multiple C library choices available under the weak copyleft license, LGPL. Google, for its part, believes they've succeed at is to make a permissively licensed third alternative, which is an outcome that would be no surprise to us who have seen something like it done twice before.

In short, everyone opining here seems to be conflating a lot of issues. There are many ways to interface with Linux. Many people, including me, believe quite strongly that there is no way to make a subprogram in kernel space (such as a device driver) without the terms of the GPLv2 applying to it. But writing a device driver is a specialized task that's very different from what most Linux users do. Most developers who “use Linux” — by which they typically mean write a user space program that runs on a GNU/Linux operating system — have (at most) weak copyleft (LGPL) terms to follow due to glibc or uClibc. I admit that I sometimes feel chagrin that proprietary applications can be written for GNU/Linux (and other Linux-based) systems, but that was a strategic decision that RMS made (correctly) at the start of the GNU project one that the Linux project, for its part, has also always sought.

I'm quite sure no one — including hard-core copyleft advocates like me — expects nor seeks the GPLv2 terms to apply to programs that interface with Linux solely as user-space programs that runs on an operating system that uses Linux as its kernel. Thus, I'd guess that even if it turned out that Google made some mistakes in this regard for Bionic, we'd all work together to rectify those mistakes so that the outcome everyone intended could occur.

Moreover, to compare the specifics of this situation to other types of so-called “copyleft circumvention techniques” is just link-baiting that borders on trolling. Google wasn't seeking to circumvent the GPL at all; they were seeking to write and/or adapt a permissively licensed library that replaced an LGPL'd one. I'm of course against that task on principle (I think Google should have just used glibc and/or uClibc and required LGPL-compliance by applications). But, to deny that it's possible to rewrite a C library for Linux under a license that isn't GPLv2 would also imply immediately the (incorrect) conclusion that uClibc and glibc are covered by the GPLv2, and we are all quite sure they aren't; even Naughton himself admits that (regarding glibc).

Google may have erred; no one actually knows for sure at this time. But the task they sought to do has been done before and everyone intended it to be permitted. The worst mistake of which we might ultimately accuse Google is inadvertently taking a copyright-infringing short-cut. If someone actually does all the research to prove that Google did so, I'd easily offer a 1,000-to-1 bet to anyone that such a copyright infringement could be cleared up easily, that Bionic would still work as a permissively licensed C library for Linux, and the implications of the whole thing wouldn't go beyond: “It's possible to write your own C library for Linux that isn't covered by the GPLv2” — a fact which we've all known for a decade and a half anyway.

Update (2011-03-20): Many people, including slashdot, have been linking to this comment by RMS on LKML about .h files. It's important to look carefully at what RMS is saying. Specifically, RMS says that sometimes #include'ing a .h file creates a copyright derivative work, and sometimes it doesn't; it depends on the details. Then, RMS goes to talk on some rules of thumb that can help determine the outcome of the question. The details are what matters; and those are, as I explain in the main post above, what requires careful analysis done jointly and in close collaboration between a developer and a lawyer. There is no general rule of thumb that always immediately leads one to the right answer on this question.