Here’s a puzzle for you: which eight-letter word describes the events of April 24, 2021, at the annual American Crossword Puzzle Tournament?
Did you guess historic?
For the first time in crossword history, the tournament winner was not human, but instead a puzzle-solving algorithm developed in part by a team of UC Berkeley scientists. This tournament was certainly not the first time that a computer has beaten us at our own games. Deep Blue shocked both the worlds of artificial intelligence and chess when it defeated world champion Garry Kasparov in 1997, and then in 2011 IBM’s Watson won first place in the trivia game show Jeopardy!. But both of these competitions play to the strengths of computers, such as rapidly calculating finite permutations. Chess is a strategy game with a limited number of outcomes, and Jeopardy! tasks contestants with recalling facts. In contrast, crosswords are often full of double meanings and wordplay that require a sophisticated understanding of human language.
The idea of teaching crosswords to computers was first taken up a decade ago by Dr. Matt Ginsberg, a programmer and mathematician who lives in Oregon. Ginsberg designed a computer program with a massive database and the lightning fast ability to read a crossword clue, search its memory for similar clues it has encountered before, and then figure out which possible answers fit best into the surrounding puzzle. He debuted his program—creatively christened “Dr. Fill” —at the American Crossword Puzzle Tournament in 2012. On its first try, Dr. Fill placed 141st out of 650 entrants. It slowly inched up the leaderboards each successive year, placing in 14th place by 2019, but never quite managed to edge out its competitors. While Dr. Fill was far faster than the best human solvers, it made too many mistakes.
Then, for the first time, Dr. Fill showed up with a teammate in tow. Ginsberg had partnered with the Natural Language Processing Group at UC Berkeley to improve the algorithm. The UC Berkeley scientists noted that Dr. Fill was already excellent at searching through its memory of crossword clues and placing answers within the grid. According to Nicholas Tomlin, a Department of Electrical Engineering and Computer Science graduate student on the team, Dr. Fill could even handle some types of wordplay, such as “mixing up letters within words, reversed words, or even answers that jump over black squares in the grid.” But if the computer was presented with a clue that didn’t use any of the same words as the clues it had already seen, it was stumped.
To help Dr. Fill, the Berkeley team turned to a different subfield of artificial intelligence. They developed a neural network—a complex model that mapped millions of crossword clues and answers by how close together they were in meaning. With this model, Tomlin said, “If two clues have very similar meanings but don’t use any of the same words, the algorithm can pick up on that and predict that they’re likely to have similar answers.”
In essence, Dr. Fill had been trying to solve crosswords like a computer: assigning probabilities to millions of possible answers in the blink of an eye, without quite understanding the meanings of the clues beyond their individual words. The Natural Language Processing Group trained Dr. Fill to think more like a human, a task that requires an understanding of the emergent meaning of human language beyond the definitions of individual words. Dr. Fill’s new design finally catapulted the computer into the ranks of crossword-solving elites. In a nerve-wracking end to the 2021 tournament, the algorithm racked up a score of 12,825, edging out the top human competitor by 15 points.
What’s next for Dr. Fill? After the victory, Ginsberg announced that he would be retiring Dr. Fill from crossword competitions. But training a computer to meaningfully understand human language has implications far beyond puzzle solving. Language interpretation algorithms are transforming many other aspects of our society already, from translation and speech to text services to digital assistants like Alexa and Siri. Tomlin notes that computers can still struggle to understand meaning in the same way as humans, describing them as “increasingly convincing, but not always increasingly competent” over the years. He says that the field’s current models “do a great job of producing fluent English, but it’s not always meaningful. We trust artificial intelligence more than we used to, but we probably shouldn’t.” In games like Two Truths and a Lie, players often benefit from personal knowledge of one another. In other games, such as Monopoly, they must negotiate with their competitors. These are tasks that continue to be difficult for computers. So don’t worry about robots beating you at all your favorite games just yet—they still have a lot to learn!
Reena Debray is a graduate student in integrative biology
Design by Shannon O'Brien
This article is part of the Spring 2022 issue.
Notice something wrong?
Please report it here.