Talk and walk

by Andrea Apolloni, Leon Danon

We aim to compare previous analysis on human mobility, based on CDR information, with more detailed information from OpenPaths community, to establish similarities/discrepancies between the two datasets

Using cell phone data is becoming a technique for studying human mobility that has gained more and more traction in the last few years. Applications of this type of study are numerous, but of particular interest is the impact on disease transmission. Human mobility is considered the main driver for the geographical spread of infectious disease, and commuting patterns are responsible for transmission at local scales. In many cases commuting patterns can be detected through use of Call Data Records (CDR). However, all of this data is discontinuous in time and space: individuals make calls at a few points during the day, and we cannot infer the path between different antennae. This can result in missing data on co-location. Detailed information on co-location is key to understanding transmission routes: if two people are in close proximity, they can pass infection to each other; otherwise, transmission is very unlikely.
In this project we would like to examine the reliability of our estimates of co-location based on coarse grained CDR versus fine-scale OpenPaths data.
How will these discrepancies affect predictions drawn from epidemic models based on data from these two different sources? We aim to answer these question studying a specific case where CDR are already available to us.

One of the two researchers has worked on CDR data from Iceland to infer patterns of co-location. We would like to request for OpenPaths data from members in Iceland up to 2012 in the first instance, to be compared with cell phone records of the same period. We will extract commuting patterns, and aggregated data at level of cell tower, urban area and compare with cell phone data from the same area. Statistical analysis will be conducted to determine the discrepancies between the datasets, their source and extent. Where possible, we will use statistical inference techniques (in particular Monte Carlo Markov Chain) to infer detailed paths for individuals whose mobility is known only from CDR. Finally, we adapt our existing disease transmission models to assess the effect on predictions drawn from the data, and compare the two scenarios. The outcome of the project is expected to provide new insight into the appropriateness of use of CDR for epidemic modeling.

In order to further ensure anonymity and security, we will anonymise the OpenPaths data a second time and destroy identifiers. Data will be stored on a server accessible only through ssh, behind the firewall at Queen Mary University of London. The network at QM is administered by central IT services which have a solid record on data security.

The results of this project will constitute the main object of at least one scientific paper to be published in an open access, peer-review journal. OpenPaths will be cited as data provider and in the acknowledgements part if not otherwise indicated.

We would be delighted if OpenData staff would be interested in starting a closer collaboration rather than just providing data. In that case, it would be fantastic to start a dialogue on how our expertise could be of use to the OpenData project.