What's in a crowd? Analysis of face-to-face behavioral networks
Introduction
Access to large data sets on human activities and interactions has long been limited by the difficulty and cost of gathering such information. Recently, the ever increasing availability of digital traces of human actions is widely enabling the representation and the analysis of massive amounts of information on human behavior. The representation of this information in terms of complex networks (Anon, 2009, Dorogovtsev and Mendes, 2003, Newman, 2003, Pastor-Satorras and Vespignani, 2004, Caldarelli, 2007, Barrat et al., 2008, Wasserman and Faust, 1994, Watts, 2007) has led to many research efforts because of the naturally interlinked nature of these new data sources.
Tracing human behavior in a variety of contexts has become possible at very different spatial and temporal scales: from mobility of individuals inside a city (Chowell et al., 2003) and between cities (De Montis et al., 2007), to mobility and transportation in an entire country (Brockmann et al., 2006), all the way to planetary-scale travel (Barrat et al., 2004, Balcan et al., 2009). Mobile devices such as cell phones make it possible to investigate mobility patterns and their predictability (González et al., 2008, Song et al., 2010). On-line interactions occurring between individuals can be monitored by logging instant messaging or email exchange (Eckmann et al., 2004, Kossinets and Watts, 2006, Golder et al., 2007, Leskovec and Horvitz, 2008, Rybski et al., 2009, Malmgren et al., 2009). Recent technological advances further support mining real-world interactions by means of mobile devices and wearable sensors, opening up new avenues for gathering data on human and social interactions. Bluetooth and Wifi technologies give access to proximity patterns (Hui et al., 2005, Eagle and Pentland, 2006, O’Neill et al., 2006, Pentland, 2008, Clauset and Eagle, 2007), and even face-to-face presence can be resolved with high spatial and temporal resolution (http://www.sociopatterns.org; Cattuto et al., 2010, Alani et al., 2009, Van den Broeck et al., 2010). The combination of these technological advances and of heterogeneous data sources allow researchers to gather longitudinal data that have been traditionally scarce in social network analysis (Padgett and Ansell, 1993, Lubbers et al., 2010). A dynamical perspective on interaction networks paves the way to investigating interesting problems such as the interplay of the network dynamics with dynamical processes taking place on these networks.
In this paper, we capitalize on recent efforts (http://www.sociopatterns.org; Cattuto et al., 2010, Alani et al., 2009, Van den Broeck et al., 2010) that made possible to mine behavioral networks of face-to-face interactions between individuals in a variety of real-world settings and in a time-resolved fashion. We present an in-depth analysis of the data we collected at two widely different events. The first event was the INFECTIOUS exhibition (http://www.sciencegallery.com/infectious) held at the Science Gallery in Dublin, Ireland, from April 17th to July 17th, 2009. The second event was the ACM Hypertext 2009 conference (http://www.ht2009.org/) hosted by the Institute for Scientific Interchange Foundation in Turin, Italy, from June 29th to July 1st, 2009. In the following, we will refer to these events as SG and HT09, respectively. Intuitively, interactions among conference participants differ from interactions among museum visitors, and the concerned individuals have very different goals in both settings. The study of the corresponding networks of proximity and interactions, both static and dynamic, reveals indeed strong differences but also interesting similarities. We take advantage of the availability of time-resolved data to show how dynamical processes that can unfold on the close proximity network—such as the propagation of a piece of information or the spreading of an infectious agent—unfold in very different ways in the investigated settings. In the epidemiological literature, traditionally, processes of this kind have been studied using either aggregated data or under assumptions of stationarity for the interaction networks: here we leverage the time-resolved nature of our data to assess the role of network dynamics on the outcome of spreading processes. At a more fundamental level, simulating simple spreading processes over the recorded interaction networks allows us to expose several properties of their dynamical structure as well as to probe their causal structure.
The paper is organized as follows: first, we briefly describe the data collection platform and our data sets in Section 2; in Section 3 we discuss the salient features of the networks of interactions aggregated on time windows of one day. These networks are static objects, carrying only information about the cumulative time that—daily—each pair of individuals has spent in face-to-face proximity. Section 4 analyzes the dynamical properties of face-to-face interactions between conference participants and museum visitors. Section 5 further characterizes the aggregated network structures by investigating the effect of incremental link removal. Finally, Section 6 investigates the role played by causality in information spreading along the proximity network, and Section 7 concludes the paper and defines a number of open questions.
Section snippets
Data
The data collection infrastructure uses active Radio-Frequency IDentification (RFID) devices embedded in conference badges to mine face-to-face proximity relations of persons wearing the badges. RFID devices exchange ultra-low power radio packets in a peer-to-peer fashion, as described in http://www.sociopatterns.org, Cattuto et al. (2010), Alani et al. (2009), and Van den Broeck et al. (2010). Exchange of radio packets between badges is only possible when two persons are at close range (1–1.5
The static interaction network
We start by analyzing aggregated networks of interaction obtained by aggregating the raw proximity data over one day. This aggregation yields a social graph where nodes represent individuals, and an edge is drawn between two nodes if at least one contact was detected between those nodes during the interval of aggregation. Therefore, every edge is naturally weighted by the total duration of the contact events that occurred between the tags involved, i.e., by the total time during which the
Temporal features
The availability of time-resolved data allows one to gain much more insight into the salient features of the social interactions taking place during the deployments than what could be possible by the only knowledge of “who has been in face-to-face proximity of whom”.
We first investigated the presence duration distribution in both settings. For the conference case, the distribution is rather trivial, as it essentially counts the number of conference participants spending one, two or three days
Percolation analysis
The issue of network vulnerability to successive node removal has attracted a lot of interest in recent years starting from the pioneering works of Albert et al. (2000) and Cohen et al. (2000), that have shown how complex networks typically retain their integrity when nodes are removed randomly, while they are very fragile with respect to targeted removal of the most connected nodes. While the concepts of node failures and targeted attacks are pertinent for infrastructure networks, successive
Dynamical spreading over the network
Aggregated networks often represent the most detailed information that is available on social interactions. In the present case, they would correspond to information obtained through ideal surveys in which respondents remember every single person they encountered and the overall duration of the contacts they had with that person. While such a static representation is already informative, it lacks information about the time ordering of events, and it is unable to encode causality. The data from
Conclusions
In this paper we have shown that the analysis of time-resolved network data can unveil interesting properties of behavioral networks of face-to-face interaction between individuals. We considered data collected in two very different settings, representative of two types of social gatherings: the HT09 conference is a “closed” systems in which a group of individuals gathers and interacts in a repeated fashion, while the SG museum deployment is an “open” environment with a flux of individuals
Acknowledgments
Data collections of the scale reported in this manuscript are only possible with the collaboration and support of many dedicated individuals. We gratefully thank the Science Gallery in Dublin for inspiring ideas and for hosting our deployment. Special thanks go to Michael John Gorman, Don Pohlman, Lynn Scarff, Derek Williams, and all the staff members and facilitators who helped to communicate the experiment and engage the public. We thank the organizers of the ACM Hypertext 2009 conference and
References (53)
- et al.
Longitudinal analysis of personal networks
Soc. Networks
(2010) - et al.
Detection of topological patterns in complex networks: correlation profile of the Internet
Physica A
(2004) - Alani, H., Szomsor, M., Cattuto, C., Van den Broeck, W., Correndo, G., Barrat, A., 2009. Live social semantics. In: 8th...
- et al.
Error and attack tolerance of complex networks
Nature
(2000) - et al.
Infectious Diseases of Humans: Dynamics and Control
(1992) - Anon, 2009. Special issue of Science on complex networks and systems. Science 325,...
- et al.
Modern Information Retrieval
(1999) - et al.
Multiscale mobility networks and the spatial spreading of infectious diseases
Proc. Natl. Acad. Sci. USA
(2009) - et al.
The architecture of complex weighted networks
Proc. Natl. Acad. Sci. USA
(2004) - et al.
Dynamical Processes on Complex Networks
(2008)