Use of social network analysis and global sensitivity and uncertainty analyses to better understand an influenza outbreak

In the summer of 2014, an influenza A(H3N2) outbreak occurred in Yichang city, Hubei province, China. A retrospective study was conducted to collect and interpret hospital and epidemiological data on it using social network analysis and global sensitivity and uncertainty analyses. Results for degree (χ2=17.6619, P<0.0001) and betweenness(χ2=21.4186, P<0.0001) centrality suggested that the selection of sampling objects were different between traditional epidemiological methods and newer statistical approaches. Clique and network diagrams demonstrated that the outbreak actually consisted of two independent transmission networks. Sensitivity analysis showed that the contact coefficient (k) was the most important factor in the dynamic model. Using uncertainty analysis, we were able to better understand the properties and variations over space and time on the outbreak. We concluded that use of newer approaches were significantly more efficient for managing and controlling infectious diseases outbreaks, as well as saving time and public health resources, and could be widely applied on similar local outbreaks.


INTRODUCTION
Public health events occurred frequently in China. Take 2013 as an example, a total of 1,077 public health emergencies occurred. [1] Analysis of these emergencies only focused on traditional epidemiological methods in the past. Field epidemiological investigation skill has been identified as one of the five top weaknesses in national health emergency response skills and techniques. [2] Traditional epidemiological methods do not consistently provide reliable evidence on how to objectively identify the correct patients, how to select correct sampling objects for laboratory tests, and how to understand and describe outbreak characteristics. [3,4] Social network analysis (SNA) and global sensitivity and uncertainty analyses (GSUA) are relatively new tools that can be used to address these problems. [5,6] Using SNA, nodes and ties represent patients and connections between them in network diagrams. Through centrality analysis and connectedness measurement, important patients could be identified, and the propagation of outbreaks could be more accurately described. [7][8][9][10] By using these network analysis and graphics, SNA could be used to study outbreak structures and characteristics. [5,11,12] GSUA is the study of how uncertainties in the output of a model can be apportioned to different sources of uncertainty among model inputs. It is a variance-based method for analyzing data and models using an objective function. [6,13,14] GSUA can be used to rank parameters such as infection coefficient, contact coefficient, recovery rate and death rate based on their relative influence on the dynamics of simulated epidemics. [15] It can also inform researchers on the dynamics of investigation processes, and can potentially play an important role in outbreak management.

Clinical Research Paper
Oncotarget 43418 www.impactjournals.com/oncotarget From Wednesday, July 16, 2014, to Monday, August 4, 2014, an influenza A(H3N2) outbreak with 63 cases, including nine laboratory-confirmed positive cases, occurred in an isolated compulsory detoxification center. Yichang center for disease control and prevention managed the outbreak using traditional and molecular epidemiological methods, and reported it as a general public health emergency (grade IV) in the China Information System for Disease Control and Prevention.

Selection of sampling objects
Normalized centrality measures (degree and betweenness) of all 72 index cases were analyzed using SNA (Table 1). A total of 14 nodes (six from platoon A;   (Tables 2 and 3). Kruskal-Wallis test results showed statistical significance in degree centrality (χ 2 = 17.6619, P < 0.0001, Table 2) and betweenness centrality (χ 2 = 21.4186, P < 0.0001, Table 3), showing the two selection methods (SNA approach and traditional method) of sampling objects to be significantly different.

Outbreak characteristics
A total of 134 drug abstainers were strictly separated into two platoons; the 75 abstainers in platoon A resided on the second floor, and the other 59 abstainers in platoon B resided on the third floor. Each floor had its own workshop. The assigned exercise, work-break and dining areas were also separate. The mealtime of platoon A was 5 minutes earlier than that of platoon B. Therefore, there was no close contact between the two platoons. The resulting cliques also suggested that the epidemic situation may have contained two or more networks. Figure 1 demonstrates the cliques and structure of the networks. Figures 2 and 3 shows the schematic diagrams of the possible propagation chain of the two platoons.
A total of 21 contacts were recorded in the dormitory, six in the refectory and eight in other places from platoon A, and 16 in the dormitory, three in the refectory and four in other places from platoon B ( = 0.5403, P = 0.7633). A total of 31 direct and six indirect contacts were found in platoon A, whereas 20 direct and 10 indirect contacts were found in platoon B. More direct contacts than indirect contacts were found among the patients, although the difference (χ 2 = 2.6704, P = 0.1022) was not statistically significant.

Global sensitivity and uncertainty analyses
Factor k (contact coefficient) was distributed as three different ranges. When distributed as k ~ beta (2, 7), the first-order indices of factors v, k and r were 0.4144, 0.4450, and 0.0002, respectively, and the total-order indices were 0.5549, 0.5855, and 0.0001, respectively. Other indices are shown in Table 4. Figure 4 shows three cases of the infection dynamics exercise based on uncertainty analysis. The plot shows that the infection propagated in 97.19% (case 1), 66.80% (case 2), and 31.56% (case 3) of individuals in each case, respectively.

DISCUSSION
Following the principles of efficiency and effectiveness, investigators may neglect quantitative analysis for interrelations between cases, contacts, and places. This could result in suboptimal selection of sampling objects and cases and my lead investigators to overlook infection propagation characteristics. [16,17] Although some studies have focused on cases, [18][19][20][21] places, [22][23][24] and contact networks, [10,25,26] few studies have employed quantitative and graphics analyses, [15,27] and no studies to date have combined SNA and GSUA approaches for quantitative and graphic analyses in field epidemiology for selecting sampling objects and characterizing infectious disease transmission in similar local outbreaks.
Nodes with high density or centrality are key to controlling and preventing disease outbreaks. [7,8,28] Selectively choosing only clinical symptoms may limit information regarding important vectors of transmission in those with subclinical or latent infections. [29,30] SNA and GSUA enable quantitative methods for selection of sampling objects to help avoid loss of important patients. During the outbreak, 20 samples were selected in an unbalanced manner (17 from platoon A, only three from B), and the positive rate of the two rounds of sampling was poor (nine of 20 samples were positive). Therefore, the 14 nodes recommended by the SNA approach should be a priority for disease control, even in the absence of laboratory support. It should be noted that one limitation of this method is the inability to detect patients with latent infection; accordingly, we suggest collecting other samples.
The second problem pertained to mastering the characteristics of disease transmission. The strict security Oncotarget 43423 www.impactjournals.com/oncotarget procedures of the detoxification center forbid all drug abstainers from bringing any electronic devices inside, including mobile phones, watches, and wearable sensors. We could also not obtain surveillance video data due to privacy considerations. Therefore, rather than state-ofthe-art approaches such as SocioPatterns, [9,20,26,31] we had to resort to direct and indirect individual data collection. Obtaining the data used to create Figures 2 and 3 required considerable investigation of three-dimensional distribution, contact information, and hospital data. In contrast, procuring the data used to create Figure 1 merely required collection of contact information. Accordingly, the SNA approach was more efficient with regards to grasping features of the disease outbreak. Furthermore, the government and the society paid a high attention on the outbreak, and all subjects investigated were able to recall details of the events, even 10 months later, during our retrospective investigation in May 2015.
The most important factor of our dynamic model was k (contact coefficient), which consistently presented a rapidly growing trend. Across the time span of the outbreak, we found a clear downward trend of v (infection coefficient), whereas r (recovery rate) played almost no role. Sums of the total indices different from a value of 1 indicate the presence of interactions among factors in the model.
In our study, 55 patients had been infected when the CDC was informed on July 20, 2014; four days after the first case fell ill. Thereafter, a total of 61 patients were infected until disinfection and quarantine on July 22.  1,280). The model outputs for each case are sorted in ascending order, so that each plot is a monotonic curve, and the Y axis is cut (scaled from -3 to +5) to visualize the plot around zero. We also noted that until July 23, oseltamivir had been used for treatment and prevention purpose. Because the effective contacts among 96.83% of patients (61/63) were not influenced by disinfection, quarantine or treatment measures, the contribution of the contact coefficient increased continuously across the time span of the epidemic. This may explain why the change in sum of the first-order effects was relatively small. Figure 4 was generated based on the guide of Global Sensitivity Analysis: The Primer. [6] Subsequently, the solution focused via transformation on Y (a concept similar to the basic reproductive rate, R 0 ), where Y = vk(S+I+R)-r. In this case, Y > 0 indicates spread of the infection, and Y < 0 indicates subsidence of the infection. The X value (36,425,877) indicates the number of model runs (the total number of model runs was 1280) with Y being extremely close to zero (Y could be positive or negative), after which percentages were used to determine the degree of infection spread or subsidence. The infection propagated in 31.56% (1 -877/1280 -1/1280) of the cases in Figure 4, of which case 3 can be used as an example. Overall, if we wished to control the outbreak effectively and cause the infection rate to drop sharply, we would immediately perform the most stringent control measures, especially when dealing with highly infectious diseases in a relatively closed community.
In this study, data were collected primarily from interviews with affected patients; hence, the findings may be affected by recall-bias or non-response bias. Use of state-of-the-art methods would mitigate these biases to some degree. The conclusions of our study would also be more powerful and persuasive if it were possible to obtain virological data on all important nodes that selected by our approaches. [32,33] Previous studies have demonstrated that both the humoral and cellular immune systems are abnormal in drug abusers. [34,35] Thus this population is theoretically more susceptible to influenza than the general population. However, all patients included in our study were male drug abstainers and their median age was 34 (interquartile range: 28 to 40), so that the discrepancy of age and gender would be equally true.
We suggest that SNA and GSUA approaches may be widely used for quantitative and graphic analyses on similar infectious disease outbreak and that additional prospective studies using molecular biological techniques be undertaken.

Data sources
Data were collected from patients and staff members involved in the outbreak. We began by specifying the case definitions. Clinically diagnosed cases were defined as those with sudden onset of high fever, an axillary temperature of 38°C (100.4°F) or above, and at least three other clinical symptoms within the past week (dry cough, headache, muscle and joint pain, severe malaise, sore throat or runny nose). Laboratory-confirmed cases were defined as clinically diagnosed cases with influenza virus identified in respiratory specimens. Exclusion criteria included an axillary temperature below 38°C (100.4°F), fewer than three of the symptoms, negative laboratory test result before initiation of antiviral treatment, or no epidemic history. We also reinvestigated and recorded the time of onset (accurate to the hour), extent (direct contact defined as close contact within 1 m; indirect contact defined as touching objects that patients used), and place (dormitory, refectory, or other) of the close contacts of all 72 preliminary screening cases (index case: body temperature of 37.5°C (99.5°F) or above together with one of the symptoms, but without laboratory diagnosis). Here, we defined a pair of close contacts as two persons in contact irrespective of the time, extent, frequency, or place of contact. Secondly, we performed in-depth research on medical records, the field environment, disease control and prevention measures, and the managements regarding the outbreak. Finally, we interviewed all medical staff members and administrators to verify the information that had been previously collected.
This study was approved by the institutional review board of Tongji Medical College of Huazhong University of Science and Technology.

Data processing
We collected contact information from all 72 index cases and established a matrix referred to as "whole72". We calculated the degree centrality and betweenness centrality of all index cases ( Table 1). Comparison of degree and betweenness centrality between traditional method and SNA approach was analyzed using the Kruskal-Wallis test (Tables 2 and 3).
Based on clinical diagnosis criteria, laboratory test results and epidemic history, nine nodes were excluded Oncotarget 43425 www.impactjournals.com/oncotarget (see Table 1), and 63 nodes were recognized as patients who then formed the matrix "adjust63". We then added two types of relationships (platoon A, platoon B) to create a new matrix, "adjust65". We generated a structure and distribution diagram of outbreak networks (Figure 1). Based on onset time, contact time and spatial distribution (dormitory), we drew two diagrams of the possible transmission pattern (Figures 2 and 3).
Parameter I represented the number of infected individuals at time t, parameter S represented the number of individuals susceptible to infection at time t, and parameter R represented the number of recovered individuals at time t. Factors v and r represented the "infection coefficient" and "recovery rate", respectively, factors v and r were both in accord with a normal distribution. Factor k represented the "contact coefficient", which was distributed as k ~ beta (2,7) at the beginning of the influenza outbreak, as k ~ beta (0.5, 10) during the period of quarantine, and as k ~ beta (0.2, 15) when all patients and susceptible individuals had received oseltamivir. The dynamic equations of our retrospective study are We calculated sensitivity indices for the three factors. Results for the three configurations of k were shown in Table 4. We then performed uncertainty analysis with the model simulation outputs (Figure 4).