The *Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!*

Will deep learning really live up to its promise? We don’t actually know. But if it’s going to, it will have to assimilate how classical computer science algorithms work. This is what DeepMind is working on, and its success is important to the eventual uptake of neural networks in wider commercial applications.

Founded in 2010 with the goal of creating AGI — artificial general intelligence, a general purpose AI that truly mimics human intelligence — DeepMind is on the forefront of AI research. The company is also backed by industry heavyweights like Elon Musk and Peter Thiel.

Acquired by Google in 2014, DeepMind has made headlines for projects such as AlphaGo, a program that beat the world champion at the game of Go in a five-game match, and AlphaFold, which found a solution to a 50-year-old grand challenge in biology.

Now DeepMind has set its sights on another grand challenge: bridging the worlds of deep learning and classical computer science to enable deep learning to do everything. If successful, this approach could revolutionize AI and software as we know them.

Petar Veličković is a senior research scientist at DeepMind. His entry into computer science came through algorithmic reasoning and algorithmic thinking using classical algorithms. Since he started doing deep learning research, he has wanted to reconcile deep learning with the classical algorithms that initially got him excited about computer science.

Meanwhile, Charles Blundell is a research lead at DeepMind who is interested in getting neural networks to make much better use of the huge quantities of data they’re exposed to. Examples include getting a network to tell us what it doesn’t know, to learn much more quickly, or to exceed expectations.

When Veličković met Blundell at DeepMind, something new was born: a line of research that goes by the name of Neural Algorithmic Reasoning (NAR), after a position paper the duo recently published.

NAR traces the roots of the fields it touches upon and branches out to collaborations with other researchers. And unlike much pie-in-the-sky research, NAR has some early results and applications to show for itself.

## Algorithms and deep learning: the best of both worlds

Veličković was in many ways the person who kickstarted the algorithmic reasoning direction in DeepMind. With his background in both classical algorithms and deep learning, he realized that there is a strong complementarity between the two of them. What one of these methods tends to do really well, the other one doesn’t do that well, and vice versa.

“Usually when you see these kinds of patterns, it’s a good indicator that if you can do anything to bring them a little bit closer together, then you could end up with an awesome way to fuse the best of both worlds, and make some really strong advances,” Veličković said.

When Veličković joined DeepMind, Blundell said, their early conversations were a lot of fun because they have very similar backgrounds. They both share a background in theoretical computer science. Today, they both work a lot with machine learning, in which a fundamental question for a long time has been how to generalize — how do you work beyond the data examples you’ve seen?

Algorithms are a really good example of something we all use every day, Blundell noted. In fact, he added, there aren’t many algorithms out there. If you look at standard computer science textbooks, there’s maybe 50 or 60 algorithms that you learn as an undergraduate. And everything people use to connect over the internet, for example, is using just a subset of those.

“There’s this very nice basis for very rich computation that we already know about, but it’s completely different from the things we’re learning. So when Petar and I started talking about this, we saw clearly there’s a nice fusion that we can make here between these two fields that has actually been unexplored so far,” Blundell said.

The key thesis of NAR research is that algorithms possess fundamentally different qualities to deep learning methods. And this suggests that if deep learning methods were better able to mimic algorithms, then generalization of the sort seen with algorithms would become possible with deep learning.

To approach the topic for this article, we asked Blundell and Veličković to lay out the defining properties of classical computer science algorithms compared to deep learning models. Figuring out the ways in which algorithms and deep learning models are different is a good start if the goal is to reconcile them.

## Deep learning can’t generalize

For starters, Blundell said, algorithms in most cases don’t change. Algorithms are comprised of a fixed set of rules that are executed on some input, and usually good algorithms have well-known properties. For any kind of input the algorithm gets, it gives a sensible output, in a reasonable amount of time. You can usually change the size of the input and the algorithm keeps working.

The other thing you can do with algorithms is you can plug them together. The reason algorithms can be strung together is because of this guarantee they have: Given some kind of input, they only produce a certain kind of output. And that means that we can connect algorithms, feeding their output into other algorithms’ input and building a whole stack.

People have been looking at running algorithms in deep learning for a while, and it’s always been quite difficult, Blundell said. As trying out simple tasks is a good way to debug things, Blundell referred to a trivial example: the input copy task. An algorithm whose task is to copy, where its output is just a copy of its input.

It turns out that this is harder than expected for deep learning. You can learn to do this up to a certain length, but if you increase the length of the input past that point, things start breaking down. If you train a network on the numbers 1-10 and test it on the numbers 1-1,000, many networks will not generalize.

Blundell explained, “They won’t have learned the core idea, which is you just need to copy the input to the output. And as you make the process more complicated, as you can imagine, it gets worse. So if you think about sorting through various graph algorithms, actually the generalization is far worse if you just train a network to simulate an algorithm in a very naive fashion.”

Fortunately, it’s not all bad news.

“[T]here’s something very nice about algorithms, which is that they’re basically simulations. You can generate a lot of data, and that makes them very amenable to being learned by deep neural networks,” he said. “But it requires us to think from the deep learning side. What changes do we need to make there so that these algorithms can be well represented and actually learned in a robust fashion?”

Of course, answering that question is far from simple.

“When using deep learning, usually there isn’t a very strong guarantee on what the output is going to be. So you might say that the output is a number between zero and one, and you can guarantee that, but you couldn’t guarantee something more structural,” Blundell explained. “For example, you can’t guarantee that if you show a neural network a picture of a cat and then you take a different picture of a cat, it will definitely be classified as a cat.”

With algorithms, you could develop guarantees that this wouldn’t happen. This is partly because the kind of problems algorithms are applied to are more amenable to these kinds of guarantees. So if a problem is amenable to these guarantees, then maybe we can bring across into the deep neural networks classical algorithmic tasks that allow these kinds of guarantees for the neural networks.

Those guarantees usually concern generalizations: the size of the inputs, the kinds of inputs you have, and their outcomes that generalize over types. For example, if you have a sorting algorithm, you can sort a list of numbers, but you could also sort anything you can define an ordering for, such as letters and words. However, that’s not the kind of thing we see at the moment with deep neural networks.

## Algorithms can lead to suboptimal solutions

Another difference, which Veličković noted, is that algorithmic computation can usually be expressed as pseudocode that explains how you go from your inputs to your outputs. This makes algorithms trivially interpretable. And because they operate over these abstractified inputs that conform to some preconditions and post-conditions, it’s much easier to reason theoretically about them.

That also makes it much easier to find connections between different problems that you might not see otherwise, Veličković added. He cited the example of MaxFlow and MinCut as two problems that are seemingly quite different, but where the solution of one is necessarily the solution to the other. That’s not obvious unless you study it from a very abstract lens.

“There’s a lot of benefits to this kind of elegance and constraints, but it’s also the potential shortcoming of algorithms,” Veličk