The image most robots have in society is that of slaves: creations that can be forced to do what humans want.

Researchers at the Massachusetts Institute of Technology have asked an interesting question about the robot that is less about slavery, more about co-operation. They noticed that language is a function of co-operation humans, and figured out how robots could use language when working with humans to obtain some result.

The word “team” is a word used prominently at the top of the document, “Decision-making for two-way communication in sequential collaborative tasks between robot-humans,” written by Vaibhav V. Unhelkar, Shen Li and Julie A. Shah from MIT Computer and AI Labs and published on MIT’s website on March 31.

The use of the word “equipment” is significant, given the structure of the experiment designed by scientists.

Generally speaking, the work shows that it is possible to use language to help train a robotic arm for a task, such as helping to prepare meals in the kitchen. The approach is something called ‘reinforcement learning’, which has exploded dramatically in recent years. Google’s DeepMind unit used RL to train a computer program to defeat humans in chess and go play.

The CommPlan program uses a reinforcement learning approach called the “Markov decision-making process” to know when to pronounce statements, such as a planned measure, to inform a human partner to avoid conflicts. The experiment is a kitchen team where humans make sandwiches and robots pour out cups of juice, and both parties need to share the workspace as efficiently as possible.

Unhelkar et al. 2020

What is different here is the insertion of verbal expression options by the robot (verbal in the sense that the computer that manages the robot arm speaks the statements through a speaker using a commercial texting program at voice.) The robot program can consult a human. or make requests of a human. Experimental implementation designed by Unhelkar and his colleagues is a kitchen where a person makes sandwiches and his robot arm is pouring juice into cups. The robot and the human have to share the space, which means they have to negotiate what each one will do at each moment so that they don’t collide.

Too: Stuart Russell: Will we pick the right target for the AI ​​before it destroys us all?

As a team, they get a reward whose value depends on how far they perform the task more efficiently. Another way of saying this is that, from the point of view of robot programming, the robot must seek the maximum effectiveness of its actions in conjunction with what the man decides to do, he must take into account the intentions human. The machine is programmed to cooperate i.e.

“To the best of our knowledge, our focus is on decision-making first for various types of communication, anticipating the dormant behavior and latent status of the teammate,” writes Unhelkar and contributors.

The scientific question that Unhelkar and his colleagues posed is whether a robot’s ability to cooperate with a person and optimize a task as part of a team is enhanced through verbal communication.

This simple question is an important turn in robotics. Previous research has explored robot-human communication, but not generally about a task.

Too: Is this the investment point of delivery by robot?

The algorithm developed by Unhelkar and colleagues, called “CommPlan”, uses machine learning to develop not only the machine’s actions, as DeepMind did with functions, but also their communications.

“CommPlan jointly discusses robot actions and communication to reach its policy.”

In their experiments, Unhelkar and his colleagues used a “UR10 Collaborative Robot,” a robotic arm that has various ways in which it can pivot, rotate, and bend. They are made by Universal Robots, based in Odense, Denmark. A three-finger clamp, created by the startup Robotiq, based in Lévis, Quebec, Canada, is attached. The authors compared CommPlan performance to “basic” approaches where no declarations are made between a robot and a person, or where the scheduler rigidly schedules declarations.

In contrast, CommPlan is solving the equations for what real-time uses are, what you can call “learning” for communication. The assumption was that the learned approach will do better than rigid protocol or silence.

In fact, yes. They report that CommPlan exceeded both bases. It earned “higher cumulative rewards and shorter task completion times compared to silence policy.” Compared to the rigidly programmed approach, “Despite only communicating more (on average) than hand-crafted policy, CommPlan receives substantially higher rewards.”

A video of the experiment is posted on the MIT website.

No matter how inspiring a robot-human “team” is, the work begs another question: Who calls the shots? In any collaboration, including human collaborations, there can sometimes be a party that tells collaborators what to do, taking the lead, acting as a kind of boss, even if it is seemingly a collective. egalitarian work.

If the robot is not going to be a slave to human beings, the opposite is also true – it is probably not desirable to build robots to enslave people. Therefore, it is important to look at where dominance and subsistence emerge.

Unhelkar and his co-workers solve this by how the robot communicates with humans at a cost that affects the ultimate reward and can be learned. This provides the programmer with an indirect way of affecting how the robot acts in relation to human choices.

In an email exchange, ZDNet asked Unhelkar what if a human decides not to follow the robot’s requests. Unhelkar told ZDNet that by adjusting the “cost function”, variables that affect the team’s final reward, the robot will modify its communications to fit more referentially to a human.

Too: Patio work: Automation goes through the warehouse door

“If we want a more” polite “robot in the cost model, we can say that a request is less expensive than a command,” Unhelkar wrote. “Our model also captures that the human can follow a different command during different steps of the task,” wrote Unhelkar. “For example, human beings may not heed the robot’s suggestion if it has committed to a decision, but it may be open to suggestions while it still decides.”

“If the human being declines the request, the robot reframes and adapts to the human behavior.”

You can go a little further in this line of research: However, such a human association may not be trampled on by a machine trying to optimize its activity to obtain the best interest of the people. any rewards?

This question has been posed elegantly by Stuart Russell, of the AI ​​Center for Human Compatibility with UC Berkeley. Russell argues that the goals of artificial intelligence must be those that match the primacy of human life. According to him, this means understanding what a human may want but not expressing himself, which returns to the question of communication.

Russell has suggested altering the typical lens specification of a smart machine. Instead of saying, “Machines are smart to the extent that their actions can be expected to achieve their goals”, instead he proposes “Machines are beneficial to the extent that their actions can be expected to be. achieve our goals, “where the emphasis is on Russell.

Asked about Russell’s opinion, Unhelkar told ZDNet that he shares Russell’s concern and said that specifying the correct target was “both difficult and critical for the design of AI systems”.

Too: Someday, watching YouTube videos may allow robots to copy humans

The challenge is that, in some sense, a system of machines must infer what is the desire of a human. The CommPlan program is only partially capable of doing this. It infers what is the “latent state of decision-making” of a human by asking questions and observing a person’s answers. But it takes more work to figure out what the man’s intentions are and how they communicate, Unhelkar told ZDNet.

“The CommPlan learning component can be expanded to learn human latent preferences for communications,” Unhelkar wrote in an email.

“In the future, we intend to explore this extension,” added Unhelkar. Unhelkar noted that the main challenge in deducing people’s intentions through communications is that statements in a task configuration are often sparse. This means that it is difficult to gather enough examples of human statements to create a data set from which a program can learn.

“I assumed that this adjustment would be best suited to scenarios where the robot is interacting and learning over a longer period of time (that is, long-term interactions),” said Unhelkar, as it would allow for enough sun data to be collected. Statements and statements needed to learn. ”

This poses a more interesting challenge, the problem of how to safely train robots when performing tasks around humans. They learn by trial and error, and they do not want their mistakes to be dangerous.

Generally, “trial-and-error errors must be thoroughly understood and secure before letting the robot train with humans,” Unhelkar told ZDNet. There is also the fact that substantial training with a person requires time on the part of human beings, which was not the case with the self-paid DeepMind chess program.

Too: Google suggests that all software could use some AI robot

To speed things up, for the moment, CommPlan is not pure machine learning. Only part of the program is “learned”.

CommPlan is an example of what is called a “Markov decision-making process”, in which a state of affairs and possible actions are evaluated at every turn of the task, to calculate which actions lead to the states of a question which they maximize. future returns. (This is similar to the DeepMind method used, a “Monte Carlo Tree Search”).

Only some of the parameters of the Markov process are learned from the data; others are programmed by the developer “by hand”. Replacing these manually-encoded parameters with learned parameters is a complex task that will take time, Unhelkar told ZDNet. Unhelkar proposes to leverage “the domain experience when available, as it speeds up learning and gives us a better understanding of why a robot / agent makes a certain decision.”

There is also “enormous work potential in designing algorithms that can digest the human domain experience more seamlessly (for example, by learning from high-level instructions, instead of from low-width tags from band in the sense of supervised learning), “Unhelkar told ZDNet. He cited the example of work by MIT colleagues where robots learn from task descriptions.

It is too early to talk about robots as master or slave. Today’s robots are automated mechanical structures with limited degrees of freedom, capable of only the simplest of repetitive routines. But as a society, we’re obsessed with how to communicate with a really sophisticated robot.

CommPlan’s work suggests that we may need to think about teamwork and collaboration to prepare for this day.