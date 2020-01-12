Computers can now drive cars, beat world champions at board games such as chess and Go, and even write prose. The revolution in artificial intelligence stems largely from the power of a certain type of artificial neural network, the design of which is inspired by the connected layers of neurons in the visual cortex of mammals. These ‘convolutional neural networks’ (CNNs) have proven to be surprisingly proficient in learning patterns in two-dimensional data – especially in computer vision tasks such as recognizing handwritten words and objects in digital images.

But when applied to data sets without built-in flat geometry, for example models of irregular shapes used in 3D computer animation, or the point clouds generated by self-driving cars to map their environment, this powerful machine learning architecture does not work well. Around 2016, a new discipline called geometric deep learning was created with the aim of lifting CNNs from flatland.

Now researchers have provided a new theoretical framework for building neural networks that can learn patterns on any kind of geometric surface. These “meter-equivalent convolutional neural networks”, or meter CNNs, developed at the University of Amsterdam and Qualcomm AI Research by Taco Cohen, Maurice Weiler, Berkay Kicanaoglu and Max Welling, can not only detect patterns in 2D arrays of pixels, but also on spheres and asymmetrically curved objects. “This framework is a fairly definitive answer to this deep learning problem on curved surfaces,” Welling said.

CNNs have already outperformed their predecessors in terms of learning patterns in simulated global climate data, which of course have been mapped onto a sphere. The algorithms can also prove useful for improving the visibility of drones and autonomous vehicles that see objects in 3D, and for detecting patterns in data collected from the irregularly curved surfaces of hearts, brains or other organs.

Taco Cohen, a machine learning researcher at Qualcomm and the University of Amsterdam, is one of the leading architects of meter-equivalent convolutional neural networks. Photo: Ork de Rooij

The researchers’ solution to make deep learning work outside of flatland also has deep connections with physics. Physical theories that describe the world, such as the general theory of relativity by Albert Einstein and the standard model of particle physics, exhibit a property called “meter equivalence”. This means that quantities in the world and their relationships do not depend on arbitrary frames of reference. (or “meters”); they remain consistent regardless of whether an observer is moving or standing still, and regardless of how far the numbers are on a ruler. Measurements in those different meters must be able to be converted into each other in a way that preserves the underlying relationships between things.

For example, imagine that you measure the length of a soccer field in meters and then measure it again in meters. The numbers will change, but in a predictable way. Similarly, two photographers who take a picture of an object will produce different images from two different vantage points, but these images may be related to each other. Equal equivalence ensures that the reality models of physicists remain consistent, regardless of their perspective or units of measurement. And meter CNNs make the same assumption about data.

“The same idea (from physics) that there is no special orientation – they wanted that in neural networks,” said Kyle Cranmer, a physicist at New York University who applies machine learning to particle physics data. “And they figured out how to do it.”

Escape from Flatland

Michael Bronstein, a computer scientist at Imperial College London, coined the term ‘geometric deep learning’ in 2015 to describe burgeoning efforts to get rid of flatland and design neural networks that can learn patterns in non-planar data. The term – and the research effort – caught on quickly.

Bronstein and his associates knew that going beyond the Euclidean plane would require them to rethink one of the basic calculation procedures that made neural networks so effective in 2D image recognition in the first place. With this procedure, called “convolution,” a layer of the neural network can perform a mathematical operation on small patches of the input data and then pass the results to the next layer in the network.

“You can roughly consider convolution as a sash,” Bronstein explained. A convolutional neural network slides many of these “windows” over the data such as filters, each of which is designed to detect a certain type of pattern in the data. In the case of a cat photo, a trained CNN can use filters that detect low-level functions in the raw input pixels, such as borders. These functions are passed on to other layers in the network, which perform additional convolutions and extract functions at a higher level, such as eyes, tails or triangular ears. A CNN trained to recognize cats will eventually use the results of these layered convolutions to assign a label – say “cat” or “no cat” to the entire image.

