I've watched the game
show, Jeopardy!, regularly since its Trebek-hosted
relaunch on 1984-09-10. I even remember distinctly the Final Jeopardy
question that night as This date is the first day of the new
millennium
. At the age of 11, I got the answer wrong, falling for
the incorrect What is 2000-01-01?
, but I recalled this memory
eleven years ago during the
debates
regarding when the millennium turnover happened.
I had periods of life where I watched Jeopardy! only rarely, but in recent years (as I've become more of a student of games (in part, because of poker)), I've watched Jeopardy! almost nightly over dinner with my wife. I've learned that I'm unlikely to excel as a Jeopardy! player myself because (a) I read slow and (b) my recall of facts, while reasonably strong, is not instantaneous. I thus haven't tried out for the show, but I'm nevertheless a fan of strong players.
Jeopardy! isn't my only spectator game. Right after college, even though I'm a worse-than-mediocre chess player, I watched with excitement as Deep Blue played and defeated Kasparov. Kasparov has disputed the results and how much humans were actually involved, but even so, such interference was minimal (between matches) and the demonstration still showed computer algorithmic mastery of chess.
Of course, the core algorithms that Deep Blue used were well known and often implemented. I learned α-β pruning in my undergraduate AI course and it was clear that a sufficiently fast computer, given a few strong heuristics, could beat most any full information game with a reasonable branching factor. And, computers typically do these days.
I suppose I never really thought about the issues of Deep Blue being released as Free Software. First, because I was not as involved with Free Software then as I am now, and also, as near as anyone could tell, Deep Blue's software was probably not useful for anything other than playing chess, and its primary power was in its ability to go very deep (hence the name, I guess) in the search tree. In short, Deep Blue was primarily a hardware, not a software, success story.
It was nevertheless, impressive, and last month, I saw the next
installment in this IBM story. I watched with interest
as IBM's
Watson defeated two champion Jeopardy! players. Ken
Jennings, for one, even welcomed our new computer overlords
.
Watson beating Jeopardy! is, frankly, a lot more innovative than Deep Blue beating chess. Most don't know this about me, but I came very close to focusing my career on PhD work in Natural Language Processing; I believe fundamentally it's the area of AI most in need of attention and research. Watson is a shining example of success in modern NLP, and I actually believe some of the IBM hype about how Watson's technology can be applied elsewhere, such as medical information systems. Indeed, IBM has announced a deal with Columbia University Medical Center to adapt the system for medical diagnostics. (Perhaps Watson's next TV appearance will be on House.)
This all sounds great to most people, but to me, my real concern is the freedom of the software. We've shown in the software freedom community that to advance software and improve it, sharing the software is essential. Technology locked up in a vaulted cave doesn't allow all the great minds to collaborate. Just as we don't lock up libraries so that only the guilded overlords have access, nor should the best software technology be restricted in proprietariness.
Indeed, Eric Brown, at his Linux Foundation End User Linux Summit talk, told us that Watson relied heavily on the publicly available software freedom codebase, such as GNU/Linux, Hadoop, and other FLOSS components. They clearly couldn't do their work without building upon the work we shared with IBM, yet IBM apparently ignores its moral obligation to reciprocate.
So, I just point-blank asked Brown why Watson is proprietary. Of
course, I long ago learned to never ask a confrontational question from
the crowd at a technical talk without knowing what the answer is likely to
be. Brown answered in the way I expected: We're working with
Universities to provide a framework for their research
. I followed
up asking
when he would actually release the sources and what license
would be. He dodged the question, and instead speculated about what
licenses IBM sometimes like to use when it does chose to release code;
he did not indicate if Watson's sources will ever be released. In
short, the answer from IBM is clear: Watson's general ideas
will be shared with academics, but the source code won't be.
This point is precisely one of the reasons I didn't pursue a career in academic Computer Science. Since most jobs — including professorships at Universities — for PhDs in Computer Science require that any code written be kept proprietary, most Computer Science researchers have convinced themselves that code doesn't matter; only publishing ideas do. This belief is so pervasive that I knew something like this would be Brown's response to my query. (I was even so sure, I wrote almost this entire blog post before I asked the question).
I'd easily agree that publishing papers is better than the technology being only a trade secret. At least we can learn a little bit about the work. But in all but the pure theoretical areas of Computer Science, code is written to exemplify, test, and exercise the ideas. Merely publishing papers and not the code is akin to a chemist publishing final results but nothing about the methodologies or raw data. Science, in such cases, is unverifiable and unreproducible. If we accepted such in fields other than CS, we'd have accepted the idea that cold fusion was discovered in 1989.
I don't think I'm going to convince IBM to release Watson's sources as Free Software. What I do hope is that perhaps this blog post convinces a few more people that we just shouldn't accept that Computer Science is advanced by researchers who give us flashy demos and code-less research papers. I, for one, welcome our computer overlords…but only if I can study and modify their source code.