Skip to main content

Detecting Disease Outbreaks in Mass Gatherings Using Internet Data

Home > Detecting Disease Outbreaks Mass Gatherings Using Internet Data

Researchers from Microsoft, UCL and i-sense have developed algorithms that can alert to possible outbreaks of infectious diseases at mass gatherings using internet data, specifically Twitter and search engine queries, as part of a study published in the Journal of Medical Internet Research.  

Mass gatherings, such as music festivals and religious events, pose a threat to public health because of the risk of transmission of infectious diseases. This risk is increased further by the movement of participants, who disperse soon after the gathering, potentially spreading disease within their communities. This rapid dispersion means that participants are very hard to track and presents a challenge for traditional surveillance methods.

The constant, widespread use of the Internet during and shortly after these events, in particular social media and search engines, provides an opportunity to rapidly monitor the spread of disease and potential outbreaks.

The researchers extracted all Twitter postings and queries made to the Bing search engine by users who repeatedly mentioned one of nine major music festivals held in the United Kingdom and one religious event (the Hajj in Mecca) during 2012. Tweets were collected in the period beginning 30 days before each festival, and ending 30 days after. The data comprised, on average, of 7.5 million tweets made by 12,163 users, and 32,143 queries made by 1756 users from each festival.

Researchers analysed this data using methods which compared keywords associated with diseases symptoms before and after the time of the festival and compared the frequency of those words with those of other users in the United Kingdom in the days following the festivals. 

Using innovative data extraction and analytic methods, researchers were able to see a statistically significant appearance of a disease symptom in two of the nine festivals they studied. For example, 'cough' was detected at higher than expected levels following the Wakestock festival. Statistically significant agreement between methods and across data sources was found where a significant symptom was detected. This evidence suggests that symptoms detected are indeed indicative of a disease that some users attributed to being at the festival.

By combining multiple data sources and analysis methods, researchers were able to reduce the error often present in studies, which look at keywords associated with disease symptoms. Further studies will be necessary in order to validate these findings with data from public health authorities, but this work serves to demonstrate the feasibility of creating a public health surveillance system for mass gatherings based on Internet data.

Such a system could help fill the gaps in traditional surveillance methods and support the rapid detection of potential outbreaks, even before people visit their doctor. Early detection and faster treatment would ease the burden of infectious diseases on patients and public healthcare systems.

Related link