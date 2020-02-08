With coronavirus becoming more deadly in China, artificial intelligence researchers are applying machine learning techniques to social media, the internet and other data for subtle signs of the disease spreading elsewhere.

The new virus emerged in December in Wuhan, China, and caused a global health emergency. It remains uncertain how deadly or contagious the virus is and how wide it has already spread. Infections and deaths continue to rise. More than 31,000 people have now contracted the disease in China, and 630 people have died, according to figures released by the authorities on Friday.

John Brownstein, chief innovation officer at Harvard Medical School and expert on social media information for health trends, is part of an international team that uses machine learning to post messages on social media, news reports, data from official public health channels and information by combing provided by doctors for warning signs that the virus is entering countries outside of China.

The program is looking for posts on social media that report specific symptoms, such as breathing problems and fever, from a geographic area where doctors have reported potential cases. Natural language processing is used to analyze the text on social media, for example to distinguish between someone who discusses the news and someone who complains about how they feel. A company called BlueDot used a similar approach – minus the social media – to find the corona virus at the end of December, before the Chinese authorities acknowledged the emergency.

“We are moving to surveillance efforts in the US,” says Brownstein. It is crucial to determine where the virus can pop up if the authorities have to allocate resources and effectively block its spread. “We are trying to understand what is happening in the population as a whole,” he says.

The number of new infections has been slightly reduced in recent days, from 3,900 new cases on Wednesday to 3,700 cases on Thursday to 3,200 cases on Friday, according to the World Health Organization. However, it is not clear whether the spread really slows down or whether new infections just become harder to trace.

So far, other countries have reported far fewer cases of coronavirus. But there is still great concern about the spread of the virus. The US has imposed a travel ban on China, although experts question the effectiveness and ethics of such a move. Researchers at Johns Hopkins University have made a visualization of the worldwide progress of the virus based on official figures and confirmed cases.

Health experts did not have access to such amounts of social, web and mobile data when searching for earlier outbreaks, such as severe acute respiratory syndrome (SARS). But finding signs of the new virus in a huge soup of speculations, rumors and reports about common cold and flu symptoms is a formidable challenge. “The models need to be retrained to think about the terms that people will use and the slightly different symptoms,” says Brownstein.

Yet the approach has proven to be able to find a corona virus needle in a haystack of big data. Brownstein says that colleagues who followed Chinese social media and news sources were warned on December 30 for a cluster of reports of a flu-like outbreak. This was shared with the WHO, but it took a while to confirm the seriousness of the situation.

In addition to identifying new cases, Brownstein says the technology could help experts learn how the virus behaves. It is possible that the age, gender and location of those most at risk can be determined more quickly than with the help of official medical sources.

Alessandro Vespignani, a professor at Northeastern University who specializes in modeling contamination in large populations, says it will be a particular challenge to identify new corona virus instances through social media messages, even using the most advanced AI tools because their characteristics are still not entirely clear. “It’s something new. We don’t have historical data,” Vespignani says. “There are very few cases in the US and most of the activity is driven by the media, by people’s curiosity.”

