## Archive for **August 2010**

## ICALP 2010 Merchandise and Awards

## Merchandise

Usually I’m not a big fan of conference merchandise—university branded ball point pens, T-shirts, bags, notepads… But boy, did I change my mind after ICALP 2010!

I saw little reason to unpack the coffee mug we got at the registration desk, and stuffed it in the conference bag. As did everybody else, I think. Only when I came home did I realise that the cup *rattled*. Was it broken? Intrigued, I opened the box, only to find a piece of chalk! The mug was a blackboard! We could have spent every coffee break eagerly scribbling proofs on our mugs.

Brilliant.

## Presburger Award

The Presburger Award was awarded for the first time in at the Bordeaux ICALP. The award goes to a young scientist, and the 2010 recipient is Mikolaj Bojanczyk from Warsaw for his work in automata theory, logic, and algebra.

The EATCS awards session at ICALP included this award and two more: The Gödel prize and the EATCS award. Each recipient gave a talk, and young Mikolaj faced the thankless task of appearing in a session with some of the best speakers in our field, Sanjeev Arora, Joe Mitchell, and Kurt Mehlhorn, who all gave splendid presentations.

Well, Mikolaj’s speech blew me away. Instead of explaining his research contributions, he devoted the talk to the life and work of fellow Varsovian Mojzesz Presburger (1904–1943).

Not only was this graceful, interesting, and moving, it was also extremely well presented. I say this as somebody who obsesses about presentations.

Mikolaj’s visual aids break the slides metaphor we are used to, no matter which medium—35 mm, overhead transparency, computer projector. You can see Mikolaj’s Presburger Award presentation at his website. Check out his other ICALP 2010 presentation as well: his “slides” pan and zoom about on a single, huge, dynamic canvas. I’ve tried to do something similar a few times, but this is better by orders of magnitude.

## Sci-Fi FilmRank

Sweet circumstance! The huge positronic brain that I built in my cellar to compute the algorithmic movie ranking FilmRank is still warm. So when I saw this list of

how could I resist to submit the question to the electronic abomination? After all, left to its own devices, the machine might otherwise become sentient in its idle time and decide to take over the Internet. And then. The. World.

“Can you give me the 25 most important sci-fi movies, ranked by FilmRank.”

There was a moment’s expectant pause whilst panels slowly came to life on the front of the console. Lights flashed on and off experimentally and settled down into a businesslike pattern. A soft low hum came from the communication channel.

“Difficult. What is this Sci-Fi of which you speak?”

To decide what is Sci-Fi and what isn’t, I again fell back on IMDB, namely on the genre labelling. Now, many movies list dozens of genres, so I arbitrarily chose to only recognise the two or three that make it to the front page.

This leads to some decisions that I disagree with: For example, the hive-mind that is IMDB users don’t classify *Der Golem, wie er in die Welt kam* (1920) as Sci-Fi, so it doesn’t make the list. On the other hand, *Superman* (1978) is in there. Even more unbearably, the *The Rocky Horror Picture Show* would have made a respectable 24th place, had it not been classified as Comedy/Musical.

But of course, the conceit of this whole project is to minimise my influence, so the list is what it is. I defer completely to my algorithmic overlord, and the human life-forms it has absorbed into its computational matrix.

So here it is, presented with infinite majesty and calm. Those movies that appear on both the FilmRank list and on io9’s I have marked with a star.

- Star Wars (1977)
- 2001: A Space Odyssey (1968) [*]
- Metropolis (1927) [*]
- Frankenstein (1931)
- A Clockwork Orange (1971)
- The Terminator (1984)
- E.T.: The Extra-Terrestrial (1982) [*]
- Star Wars: Episode V – The Empire Strikes Back (1980) [*]
- Alien (1979) [*]
- The Matrix (1999) [*]
- THX 1138
- Blade Runner (1982) [*]
- Aliens (1986)
- Superman (1978)
- Back to the Future (1985) [*]
- Gojira (1954)
- Bride of Frankenstein (1935)
- Terminator 2: Judgment Day (1991) [*]
- Star Wars: Episode VI – Return of the Jedi (1983)
- Planet of the Apes (1968) [*]
- Jurassic Park (1993)
- The Day the Earth Stood Still (1951) [*]
- Close Encounters of the Third Kind (1977)
- RoboCop (1987) [*]
- Invasion of the Body Snatchers (1956)

Comments? The FilmRank list is a lot more “classical,” which is an artefact of the algorithm. (The damping factor is 0.50.) For example, the spectacularly good *District 9* doesn’t have a chance. I note with surprise that *Back to the Future* appears on both lists, even though it probably wouldn’t make my own. Oh, and FilmRank has *Gojira*, so it’s full of win. Now, I wonder which parameters I must tweak to get Reptilicus (1961) on the list…

## P versus NP versus PR

In August 2010, an enormous public relations opportunity fell into Computer Science’s lap, when Vinay Deolalikar’s attempt to separate P from NP resulted in unprecedented internet interest and media coverage.

I think it was a big success. (The outreach. Not the proof.) There are lots and lots of people who are interested in this stuff, and we should tell them more about it. Here are some reflections on my own tiny involvement.

## Timeline

- On Sunday, 8 August, email from a former student pointed me to Greg Baker’s blog, which had announced Deolalikar’s manuscript the day before. Within two days, and perhaps largely due to Richard Lipton’s enthusiasm, hundreds of blog comments are written, and it is clear to me that this story could make it into regular news. Not because of the proof, but because of the collaborative model.
- Tuesday, 10 August, I write a crisp but comprehensive summary to the communications department at ITU Copenhagen, explaining the case, and my reasoning for why and how this could attract the attention of journalists. Luckily, they bite, and start pitching the story to Danish media.
- Wednesday, 11 August. The communication department gets back to me. Both venues I had in mind accepted: Danish public radio DR P1 Morgen and the weekly paper Weekendavisen. (I have appeared in both of these places previously, listen to the former and subscribe to the latter, so they are aimed at the demographic I am in.)
Meanwhile, the story itself got even better: a wiki page at polymath has been created to discuss the proof. On the other hand, Deolalikar’s proof doesn’t fare too well because of problems with the structure of random k-Sat.

- Thursday, 12 August. I spend well over an hour in the morning talking to a journalist from DR P1 Morgen on the phone, to prepare for a radio interview for Saturday. This is how it works: she spends a lot of time and energy asking very good question and exploring various angles, actually really trying to understand what’s going on, scientifically. She then writes a brief intro and a list of possible questions and angles to the host of the radio show, who will conduct the broadcast interview.
In the afternoon, I do basically the same hour-long phone interview with a journalist from Weekendavisen. WA’s deadline is Wednesday evening, so we aim for a longer piece in the paper’s science section next week.

Meanwhile, Neil Immerman points to big problems with the finite model theory part of Deolalikar’s proof.

- Saturday, 13 August. I take an early train to Copenhagen, and enter the DR building. I’ve done this part once before, so I already feel like a jaded member of the chattering classes, going through the motions like a frequent airplane traveller through security. I announce my presence at the reception, receive a badge, and an assistant picks me up to lead me to a waiting area. The assistant gives me a brief chance to summarise what I’m going to talk about, probably this just to loosen my tongue. There’s coffee and morning papers. (Nothing about Deolalikar. I had hoped some major newspaper would falsely run the angle that “Outsider solves major math riddle, embarrassing reluctant ivory tower academics.” But no such luck.) After half an hour of waiting I’m ushered into the studio and get a bit more than the previously agreed 8 minutes to say the word algorithms as often as I can.
I think it went well, though I still haven’t had the guts to hear it.

- Tuesday, 17 August. I should have had the Weekendavisen article for review by now, but WA has changed its overall design, including the software, so everything is running very late. (But they did change the body typeface to Garamond, making WA officially the best typeset paper on the planet. So all is forgiven, and my subscription renewal is guaranteed.)
- Wednesday, 18 August. I get the WA article, and I’m quite impressed. The angle is via Wiles and Perelman, then to collaborative software, outsourcing, and what I call the triumph of amateurism. Even some of my concerns about basic research funding are briefly touched upon, mathoverflow and polymath are mentioned. I know full well that I can’t change the angle anyway, but there are of course numerous problems with the details. Predictably, the article writes “mathematics” whenever it should write “computer science.” I spend some more time editing and send my changes back, fingers crossed.
- Thursday, 19 August. I get WA with my regular Swedish paper in the morning. The article is the front piece for the science section. (WA has no electronic edition I can link to. You have to buy it.)

Even though this all took a lot of time and energy, I’m pretty happy. The angle, i.e., collaborative math, turned out to be just right, and is the same that several other media have chosen. But remember this is all just a platform to say P, NP, algorithms, and complexity in public media as often as possible.

## But it’s Wrong!

When doing something like this, know from the beginning that *you have no bandwidth*. You cannot explain anything. You have no time, and you have no visual aids. News media are neither very good at, nor even interested in, explanation of concepts. So don’t. And refrain from unnecessary precision. I wish I would learn.

The summary of the radio show includes the following (in my translation):

P versus NP is a scientific hypothesis that can be translated, very simplified, to the claim that there are problems that can be solved, and problems that can’t be solved.

Now, this is of course *false*. There *are* problems that cannot be solved, whole classes of them. Everything in UNDECIDABLE is undecidable no matter how much time you invest, and the problems in EXPTIME are infeasible in the sense that even though algorithms exist, they don’t run in polynomial time; we know P differs from EXPTIME by the time hierarchy theorem.

Would it have been worth correcting? No. The whole P versus NP problem loses a lot of its appeal once you start with “even though this claim is proved for some problems, there is a certain class of problems called NP for which this is not yet known.” Of course, the trick is instead to establish NP via verification versus computation, and I tried that both in the radio and the paper using Sudoku.

But the important thing is that none of this is the point. My mission is not to ventilate the correct definition of NP; anybody really interested can look it up on Wikipedia. I’ve had students who took whole classes from me who blissfully live in this misunderstanding. Those I try to correct, and sometimes successfully. But for the general public, I think this is unnecessary complication that only weakens the story.

I did get them to sneak “very simplified” in the summary, out of scientific integrity, and left it at that.

I can see that the New York Times currently has the same problem. A pretty good article from 16 August now has the following retraction:

Correction: August 21, 2010.

An article on Tuesday about a proposed solution of a classic mathematical formula called P versus NP described NP incorrectly and misidentified the location of the CERN physics research center. NP stands for the class of problems for which solutions can be verified relatively quickly, not the class of problems that are difficult to solve but easy to verify. And the CERN laboratory is near Geneva, not in Zurich.

Got it? I don’t.

The WA article contains a bigger mistake, and one I feel bad about. In my translation:

A 9×9 Sudoku is just big enough to entertain us during our morning coffee. But if we played Sudoku with letters instead of numbers, the number of combinations would be so huge, that all supercomputers on the planet would need until the end of the Universe to find a solution. “Unless, of course, there’s a really clever way to solve Sudokus that we just haven’t discovered yet,” adds Thore Husfeldt.

The usual Sudoku is a P-problem: it’s possible to write a recipe, an algorithm, to solve it. The big Sudoku is an NP-problem, one of the tasks mathematicians called Herculean, where you can use infinite time to check all combinations without ever finding the solution, but where you could immediately verify one if you

hadfound it.

Vader voice: *Noooo!* This is the scientifically most ambitious part of the article, and contains crippling mistakes. All I can say is that it was worse when I got the first draft. Some of my edits made it (for example, it now no longer says that an 18×18 Sudoku contains twice as many cells as a 9×9 Sudoku.) Others, as you can see, didn’t. For completeness:

- Sudoku is always NP-hard. The small ones are merely small enough that systematic exploration is still feasible. But for NP-hard problems, solution time presumably grows exponentially with problem size, so even the seemingly harmless increase to 25×25 alphabet Sodoku destroys any hope of solving it. (There’s more to say about the hardness of Sudoku. Maybe for another post.)
- Even a 25×25 Sudoku still does not use
*infinite*time. It’s a finite problem, after all. To a mathematician, 25×25 Sodoku is an easy problem, because the number of steps it takes is a small integer “close to zero”. But to a computer scientist, it’s important that this finite number of steps exceeds the computational power that remains in the universe for the rest of its lifetime. Yes, even if you install Ubuntu. - “Herculean” was a term suggested in Donald Knuth’s Herculean efforts to give NPC a better name.

Would another round of edits have made a difference? Probably. But because of the narrow schedule imposed by WA’s new layout engine, it was not to be. On the other hand, the new layout means I get to see my name in beautiful Garamond. I retain a weak spot for Aestheticism from my younger days, so I guess I will let the pleasures of form triumph over the constraints of content.

## What I Should have Done

Next time, I will try with Impagliazzo’s five worlds, or rather two of them.

When trying to explain P versus NP, talk about *Algorithmica* versus *Cryptomania* instead. Morally, it’s the same problem, you need to worry about average-case complexity to see the difference. But it’s so much nicer to talk about. Algorithimica is a country in which you have Star Trek computers (or Skynet, depending no how much of an optimist you are), no mathematicians, etc. In Cryptomania you have crypto. We know that we live in one of these worlds (there are a few more), but we don’t know which. I think that’s a good way of putting it, because it conjures *images*. So that’s what I’ll try next time.

I just have to wait until the next proof attempt.

## The Most Important Movies Of All Time

Does this sound familiar to you? You are watching a Simpsons episode with friends. Bart, waking up from an out-of-body experience where he visited Hell, says,

“I was miles and miles and miles away, writhing in agony in the pits of Hell! And you were there, and you and you and you.”

Everybody snickers knowingly and lets out a satisfied sigh. Ah, the subtext!

But you? You just missed another movie reference! Social ostracism awaits.

But fret not! There are not *that* many movies, and in fact most of them tend to cite the same classics over and over again. So, as a special service, her is the list of the most important movies ever made, ordered by a carefully constructed principle I call FilmRank. The ranking is the result of an algorithm, similar to what Google uses to rank web pages, and based on the Internet Movie Database’s list of movie references. I haven’t tinkered with the outcome. If you want to read more about the methodology, look at my related post

So, without further ado, here’s the list:

- The Wizard of Oz (1939)
- Shadow of a Doubt (1943)
- King Kong (1933)
- The Merry Widow (1934)
- Psycho (1960)
- Cabiria (1914)
- Star Wars (1977)
- Metropolis (1927)
- Snow White and the Seven Dwarfs (1937)
- Casablanca (1942)
- 2001: A Space Odyssey (1968)
- The Birth of a Nation (1915)
- Snow White (1916/I)
- Chang: A Drama of the Wilderness (1927)
- Frankenstein (1931)
- Vertigo (1958)
- Jaws (1975)
- The Godfather (1972)
- Gone with the Wind (1939)
- The Lost World (1925)
- Dr. No (1962)
- The Searchers (1956)
- A Clockwork Orange (1971)
- Citizen Kane (1941)
- Nosferatu, eine Symphonie des Grauens (1922)
- Rosemary’s Baby (1968)
- Night of the Living Dead (1968)
- Bronenosets Potyomkin (1925)
- The Exorcist (1973)
- The Terminator (1984)
- La dolce vita (1960)
- The Scarecrow (1920)
- Un chien andalou (1929)
- Tarzan the Ape Man (1932)
- Taxi Driver (1976)
- The Wizard of Oz (1925)
- The Poor Little Rich Girl (1917)
- The Battle at Elderbush Gulch (1913)
- Dracula (1931)
- Il buono, il brutto, il cattivo. (1966)
- Shichinin no samurai (1954)
- College (1927)
- Apocalypse Now (1979)
- His Majesty, the Scarecrow of Oz (1914)
- La grande illusion (1937)
- The General (1926)
- The Lost Patrol (1934)
- Singin’ in the Rain (1952)
- E.T.: The Extra-Terrestrial (1982)
- The Great Train Robbery (1903)

Amazingly, it’s pretty good! There are some really old movies in there because the methodology will tend to favour them in an ever-decreasing pool of available references. But, if I use the same damping factor that Google uses for ranking web pages, this is what you get.

For fun, we can decrease this mysterious factor from .85 to .45. This will decrease the value of long chains of references, giving films like *Star Wars* more of a chance. *Star Wars* has tons of references (half a thousand other movies reference it, many more than *Oz*), but most of them are recent, and haven’t yet amassed the authority that comes from having generations of other movies refer to you. So here’s the second, more modern list:

- The Wizard of Oz (1939)
- Star Wars (1977)
- Psycho (1960)
- King Kong (1933)
- Shadow of a Doubt (1943)
- Casablanca (1942)
- 2001: A Space Odyssey (1968)
- The Godfather (1972)
- Snow White and the Seven Dwarfs (1937)
- Metropolis (1927)
- Jaws (1975)
- Gone with the Wind (1939)
- Frankenstein (1931)
- Cabiria (1914)
- The Birth of a Nation (1915)
- The Merry Widow (1934)
- Vertigo (1958)
- A Clockwork Orange (1971)
- Night of the Living Dead (1968)
- Citizen Kane (1941)
- The Exorcist (1973)
- Taxi Driver (1976)
- The Terminator (1984)
- Rosemary’s Baby (1968)
- Apocalypse Now (1979)
- E.T.: The Extra-Terrestrial (1982)
- Il buono, il brutto, il cattivo. (1966)
- Dr. No (1962)
- Snow White (1916/I)
- Nosferatu, eine Symphonie des Grauens (1922)
- Bronenosets Potyomkin (1925)
- The Searchers (1956)
- Dracula (1931)
- Chang: A Drama of the Wilderness (1927)
- Tarzan the Ape Man (1932)
- Scarface (1983)
- Un chien andalou (1929)
- Shichinin no samurai (1954)
- La dolce vita (1960)
- The Shining (1980)
- Singin’ in the Rain (1952)
- The Texas Chain Saw Massacre (1974)
- Star Wars: Episode V – The Empire Strikes Back (1980)
- The Lost World (1925)
- Pulp Fiction (1994)
- Rocky (1976)
- Halloween (1978)
- It’s a Wonderful Life (1946)
- The Wizard of Oz (1925)
- Deliverance (1972)

As you see, little Dorothy still wins, but Darth Vader breathes down her neck! I like this list a bit better than the previous, so .45 seems to be the right damping factor for this particular network. Of course, there are many other PageRank-like algorithms in the literature, each with obscure constants to fiddle with. (Google used a long time fiddling with these things before they went public.) But I assume you’ll end up with pretty much the same movies as on the FilmRank lists.

I find the FilmRank lists just as inspiring and authoritative as many similar lists compiled by movie critics or fan votes. But this one is produced completely by machine, no human heart has acted as a mediator. Maybe *that’s* why *Metropolis* scores so high?

## How FilmRank Works

Ranking—in parcticular, the ranking of information such as web pages—is one of the most visible effects of algorithms to people’s lives. Imagine navigating the internet like we did in 1994, before Google’s PageRank algorithm!

I enjoy explaining how Google works; in fact I have a popular science presentation,

that I use quite a lot.

I think I’ve found a new way of explaining the essence of the PageRank idea, using movies. The outcome, a ranking of all films ever made, is interesting in itself.

Here’s how it works.

Movies can refer to other movies. When Mary McFly in *Back to the Future III* finds himself alone with a gun and a mirror, he quotes De Niro’s character in *Taxi Driver*,

“You talkin’ to me? You talkin’ to me? You talkin’ to

me? Then who the hell else are you talkin’ to? You talkin’ to me? Well I’m the only one here. Who the fuck do you think you’re talking to?”

In fact, the line has an entire Wikipedia article.

And De Niro’s Army jacket has a patch reading “King Kong Company 1968-1970”, referencing a classic 1933 movie about another monster on the rampage in New York. (This reference is quite deliberate, the patch is described on page 1 in the script, instead of being a random detail inserted by a costume designer.)

Like the basic idea of PageRank, we can use these references as a measure of *quality*. A film must be important if other filmmakers want to reference it. Like all measures of quality, it is open to ridicule and skepticism. But it is different from other measures, such as box office success or even critical reception. With FilmRank, I put my trust in the good tastes of other filmmakers, just as PageRank puts its trust into other web pages: PageRank is based on incoming links, not on number of visitors.

If you just want to see the results and don’t care about the methodology, read my companion post

## Weighted References

What I like about FilmRank is that I can explain the *weighting*. Assume first that I just count the number of references to a movie. Which film wins? What is the most referenced film of all time? Well, according to the Internet Movie Database, here are the top three:

- Star Wars (1977), 570 references
- The Wizard of Oz (1939), 554 references
- Psycho (1960), 360 references

In this list, every reference counts the same.

*E.T.*’s reference to

*Star Wars*counts just as much as

*Eragon*’s, even though the former is a much more important movie than the latter.

This is where the weighting comes in. FilmRank works in rounds, as follows. Initially, all films (there are well over 60,000) receive the same rank. Then, each film inherits rank from the films referencing it. For example, if A references B and C, it confers rank in equal measures to B and C, increasing their rank. If these films in turn refer to other films, the rank gets passed on and on and on.

An equivalent way of seeing this is the *random moviegoer* model. Start viewing some random movie. After it’s done, randomly select on of the movies it referenced and go see that one. Keep doing that hundreds of thousands of times. (If there were no references in the last movie, start over.) In the end, you end up watching certain movies again and again. This frequency is the FilmRank. There is a certain probability (Google uses 85% for PageRank), called the *damping factor* that you grow tired of your current project and start all over anyway.

Quite formally, the behaviour is a Markov chain modelling a random walk on the directed graph whose vertices are movies, and whose edges are references. The FilmRank is the stable distribution of that random process.

## Comparison with PageRank

Remember that this is only for fun, and to explain how PageRank works. (Though I like the results.) The huge difference between PageRank and FilmRank is that PageRank uses *no human manipulation*. The links between web pages are already there, and not subject to interpretation. In contrast, it is a matter of opinion whether a reference to another movie is intentional or not. “You’re not in Kansas anymore” is certainly deliberate, but are all quotes of “I am your father” really meant to remind you of Darth Vader?

Thus, FilmRank loses some of its purity, as well as its efficiency. Moreover, it can be gambled by maliciously editing the underlying database. Moreover, FilmRank is gloriously unnecessary: there are not very many films (only tens of thousands, with few being added every day), so human intervention suffices to rank them and make authoritative lists of top films based on individual movie critics’s *taste*. Not so with the web: its size made it impossible to rank by the middle of the 1990s, so a fully automatic system like PageRank is essential if we are to be able to navigate it.

Finally, there’s a huge topological difference between the network of linked web pages and the network of film references: The web is very cyclic, whereas films cannot refer to films “in the future”. You can refer to *The Time Machine*, but you can’t build one. This means that FilmRank will tend to favour older movies, simply by the fact that they are old enough to have assembled lots of references. However, I find this quite refreshing; many other formal measures of quality (such as box office success) will artificially favour newer movies because of inflation.

## Implementation

So here’s what I did. The Internet Movie Database maintains exactly the list of movie references we need, and is freely available and can be downloaded. I decided to throw out TV shows and video games, and to honour the connections “references”, “features”, and “remake of”. Then it was a simple matter of coding a mindless implementation of the algorithm in Python:

```
alpha = .85 # damping factor
q= {}
for i in range(20):
for v in G: q[v]= 0
r = 0 # will be sum p[v] over all dangling vertices
for v in G:
q[v] += p[v] * (1-alpha)
if dout[v] == 0: # dangling v? then all w get some..
r += p[v]
else: # ..else v's successors share
delta=alpha* p[v]/dout[v]
for w in out[v]: q[w] += delta
for v in G: q[v] += r/n # add contribution of all dangling vertices
p= q.copy()
```

Probably not very good Python, comments are welcome. The reason to accumulate the contributions of the dangling vertices (instead of spreading them out over the rest of the graph) is of course to avoid quadratic running time. Also, PageRank seems to stabilise to its final distribution after a handful of iterations, so 20 iterations must be enough for our relatively small graph. So I think there’s no need for fancy-shmancy sparse matrix algebra.