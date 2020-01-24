Google’s DeepMind works to improve AI

Machine learning forms of artificial intelligence are increasingly struggling with the limits of computer hardware and this is causing scientists to reconsider how they design neural networks.

That was clear in last week’s Google search offer, called Reformer, which was meant to put a natural-language program in a single graphical processing chip instead of eight.

And this week brought another offer from Google focused on efficiency, something called Sideways. With this invention, scientists have borrowed a page of computer architecture, creating a pipeline that does more work at any time.

What is sideways? During their training phase, most machine learning neural nets use a forward pass, a transmission of a signal through layers of the network, followed by back propagation, a backward pass through the same layers, only in reverse order, to gradually adjust the weights of a neural network. until they are just right. Sideways is an alternative to that traditional rule for performing the forward and backward passes.

The authors of sideways, Mateusz Malinowski, Grzegorz Świrszcz, João Carreira and Viorica Pătrăucean, all with Google’s DeepMind unit, noted that a deeply learning neural does just less than at any time during the forward pass and the backward pass.

The neural net continues to sample data in batches, one batch at a time, with all other batches of data waiting until both forward and backward passage are calculated. The reason is that the activations of layers of the network that are activated during the forward pass must be retained by the computer so that it can use these activations when it comes to calculating back propagation.

Think of it as an assembly line where only one car is built at a time, so that the people at the front of the assembly line continue to wait until the people at the back have done their part.

It should be possible, they reasoned, to fill those moments in time with other batches of signals to get more work done. So they made what is called a pipeline. In computer chips, the pipeline has been a way to get more work done for decades. Intel has developed famous very deep pipelines to pass many instructions through its microprocessors instead of just one at a time. By splitting the operations into more and more phases, more pieces of code can be executed in the processor at any time.

An illustration of the typical forward and backward calculations of a neural net, left, and the revised form, “Sideways”, developed by Mateusz Malinowski and colleagues at DeepMind. In the old way, a single batch of data must traverse the entire neural net and back before anything else can be done, such as an assembly line that executes one unit at a time. With Sideways, batches of data come in a pipeline at any time so that the different phases of the network remain full at all times.

Malinowski et. already.

With Sideways, a layer of the neural net records a new batch of data at any time while the last batch goes to the next layer. No parts of the network remain inactive. They call this a pseudo-gradient because it is assembled differently than what usually happens with gradient drop, the aforementioned term for that forward and backward pass combo.

To do that, there must of course be something to give, because those activations will be overwritten, which should be avoided in the first place. The great insight into the core of the paper is that some types of data can afford to be overwritten and it would not be a problem. If the batches of data are close together in some respect, it would be fine to overwrite the activations of one batch with another, you don’t really change much.

Applied sideways to an encoder-decoder task, a slightly more complex requirement. Again, multiple signals enter the network one after the other before a signal is calculated, keeping the computing resources fully occupied.

Malinowski et. already.

And that’s what they found in video. Video, in the form of frames with image data, contains many superfluous objects from one frame to the other, because most of the scene usually does not change.

As Malinowski and colleagues put it: “The flexibility of the input space is the most important underlying assumption behind the Sideways algorithm.”

The authors discovered that when they tried to classify actions in video, a classical task for machine learning, using a typical convolutional neural network, their test results were just as good as traditional back-propagation. Nothing was undone by writing about those weights. “Lateral training achieves competitive accuracy for BP [back-propagation],” they write.

In some cases, accuracy was even better with Sideways, which they attribute to the possibility that overwriting activations serve the additional purpose of regularizing the data, which may be a desirable process.

They also tried an encoder-decoder task, in which the neural has to faithfully reconstruct video frames. Here they have actually found much better results than traditional back-propagation. They write that backpropagation usually drops video frames to keep pace with the calculation, while the Sideways pipeline is sufficiently processed to handle all video frames.

A big advantage of all this is that there is a big acceleration in the calculation. They train the convolutional neural network in the classifier case five times faster than with back propagation.

They also assume that they can use computer memory more efficiently: “By placing different Sideways modules in different GPUs, the memory requirements for training large neural networks will also be considerably reduced.”

The latter point raises an interesting question that is not addressed in the work. Numerous companies produce new types of chips that are aimed at improving the performance of deep learning, companies such as Cerebras Systems. Where is the dividing line between pipeline approaches of the kind that Sideways represents and what these chip makers do? Will the neural network designers and chip makers come together in something totally new through their respective efforts? The future looks very interesting.