This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
The COVID-19 epidemic is still spreading globally. Contact tracing is a vital strategy in epidemic emergency management; however, traditional contact tracing faces many limitations in practice. The application of digital technology provides an opportunity for local governments to trace the contacts of individuals with COVID-19 more comprehensively, efficiently, and precisely.
Our research aimed to provide new solutions to overcome the limitations of traditional contact tracing by introducing the organizational process, technical process, and main achievements of digital contact tracing in Hainan Province.
A graph database algorithm, which can efficiently process complex relational networks, was applied in Hainan Province; this algorithm relies on a governmental big data platform to analyze multisource COVID-19 epidemic data and build networks of relationships among high-risk infected individuals, the general population, vehicles, and public places to identify and trace contacts. We summarized the organizational and technical process of digital contact tracing in Hainan Province based on interviews and data analyses.
An integrated emergency management command system and a multi-agency coordination mechanism were formed during the emergency management of the COVID-19 epidemic in Hainan Province. The collection, storage, analysis, and application of multisource epidemic data were realized based on the government’s big data platform using a centralized model. The graph database algorithm is compatible with this platform and can analyze multisource and heterogeneous big data related to the epidemic. These practices were used to quickly and accurately identify and trace 10,871 contacts among hundreds of thousands of epidemic data records; 378 closest contacts and a number of public places with high risk of infection were identified. A confirmed patient was found after quarantine measures were implemented by all contacts.
During the emergency management of the COVID-19 epidemic, Hainan Province used a graph database algorithm to trace contacts in a centralized model, which can identify infected individuals and high-risk public places more quickly and accurately. This practice can provide support to government agencies to implement precise, agile, and evidence-based emergency management measures and improve the responsiveness of the public health emergency response system. Strengthening data security, improving tracing accuracy, enabling intelligent data collection, and improving data-sharing mechanisms and technologies are directions for optimizing digital contact tracing.
The COVID-19 epidemic is still spreading rapidly worldwide. Tens of millions of people have been infected, and the number of infections is still growing rapidly. Most countries or regions are still in states of public health emergency. Although China, the United States, and other countries have successfully developed COVID-19 vaccines, the production and application of the vaccines still have large gaps. Moreover, effective drugs for treatment of COVID-19 have not been successfully developed. Therefore, quickly identifying and tracing individuals infected with COVID-19 and their contacts and adopting active emergency management measures, such as travel restrictions, health monitoring, and home-based quarantine, are necessary for all countries and regions to overcome the COVID-19 epidemic [
Contact investigation has always been an important public health strategy and a key process in epidemic emergency management. The scale of COVID-19 infection poses a major challenge to contact investigation [
Digital contact tracing uses electronic information to identify exposures to infection; it has the potential to address the limitations of traditional contact tracing, such as scalability, notification delays, recall errors, and contact identification in public spaces [
The data storage and processing models of digital contact tracing adopted by some countries and regions are generally divided into centralized and decentralized models [
South Korea and China have adopted the centralized model for digital contact tracing [
The decentralized model was adopted by countries and regions such as Europe, North America, and Singapore [
App-based digital contact tracing in a decentralized mode can only be effective when used by 40%-70% of smartphone users [
An empirical study [
Therefore, this study aimed to overcome the insufficient use and objectivity of existing digital contact tracing–related practices and provide new solutions to further improve the effectiveness and reliability of digital contact tracing. Hainan Province, China, was selected as the case in this study. Hainan Province adopted a centralized model to conduct contact tracing during the COVID-19 epidemic. Moreover, the model gathered multisource epidemic data that relied on the government’s big data public service platform, which enabled government agencies to apply graph database algorithms, data visualization, and other digital technologies to determine and trace contacts from hundreds of thousands of epidemic records. This study describes the organizational process, technical process, application prospects and possible obstacles of digital contact tracing in Hainan Province, which may provide a more effective solution and technical support for other countries and regions.
Hainan Province is one of the most popular tourist destinations among Chinese and even global tourists because of its tropical island scenery. Due to the massive influx of tourists, the flow rate of the population of Hainan Province is very large. Additionally, due to the geographical location of the island, the influx of infected individuals is the main challenge in the emergency management of the COVID-19 epidemic. As of August 7, 2020, Hainan Province reported a total of 171 confirmed individuals infected with COVID-19 [
Hainan Province issued the “Regulations on the Development and Application of Big Data in Hainan Province” in October 2019 [
We conducted semistructured interviews in collaboration with the project manager and technical staff of the Big Data Administration who were in charge of Hainan’s COVID-19 epidemic digital contact tracing project to understand the details of the whole process. We explained the purpose and content of the interview to the interviewees before the interview, recorded the outlines during the interview, and transcribed the voice recording to text format after the interview.
Under the leadership and coordination of the Command of Hainan Provincial Epidemic Prevention and Control, the Hainan Provincial Big Data Administration collected epidemic data from different agencies, including epidemic investigation records, confirmed infected individuals’ information and their spatiotemporal trajectory, high-risk population information, information on close contacts and patients with fever, and the mobile phone signaling data of imported residents or travelers with a history of residence in Hubei Province (patients with COVID-19 were first reported in Wuhan, Hubei Province, and Hubei Province accounted for the majority of cases in China). The information in the specific databases is shown in
Data collection list of COVID-19 epidemic digital contact tracing in Hainan Province.
Database | Data provider | Description | Sample size (n) | Collection method |
Resident trajectory information database from Hubei Province | Health Committee | Used to understand where citizens have gone and who they have contacted | 205,833 | The health committee organizes the disease prevention and control center, community health service agencies, and community (village) resident committees to conduct household surveys and telephone verifications. |
Confirmed Person Information Database | Public Security Department; Health Committee | Used to understand which patients have died or have recovered and been discharged | 163 | Reported by medical and health institutions |
Information database of high-risk groups and close contacts | Epidemic Prevention and Control Headquarters | Used to understand the close contacts of high-risk individuals in confined spaces | 2,269 | Summary of Center for Disease Control information |
Hospital fever information database | Epidemic Prevention and Control Headquarters | Used to understand who was infected, where the person was treated, whether the infected person recovered, and where the infected person lives | 113,606 | Hospitals report through the health committee information system |
Mobile phone signaling database | Communications operator branch | Used to understand where people moved to and are staying | 231,296 | Directly provided by China Mobile, China Unicom, China Telecom, and China Broadcasting Network Corp, Ltd |
A graph database is a new type of database system based on graph theory and algorithms that efficiently processes complex relational networks. A graph database can efficiently process large-scale, complex, interconnected, and changeable data, and its computational efficiency is far higher than that of a traditional relational database [
Most graph databases provide a query language that is suitable for representing graph structures and graph query. Neo4j (Neo4j, Inc) is a Java-based open-source graph database with high performance, high reliability, and strong scalability [
Our research used the Cypher algorithm of Neo4j version 3.4.15 to construct an association graph among the high-risk population, the general population, vehicles, public places, and other key information; reveal the hidden network of relationships; and identify the risks in the relationships to identify and trace the contacts of COVID-19 cases.
Graph database model based on Neo4j.
ECharts (Apache Software Foundation) is an open-source visualization tool based on JavaScript that can run smoothly on personal computers and mobile devices and is compatible with most current web browsers. The bottom layer relies on a vector graphics tool, ZRender, to provide intuitive, interactive, and highly personalized data visualization [
Our research used ECharts to visualize basic data and the results of the graph database algorithms as well as to develop web portals and interactive operating systems to intuitively and dynamically query and analyze epidemic data.
Based on the data planning, data collection, data storage, data analysis, and data application processes involved in the data life cycle, we used interview data to summarize the activities performed by various participants in the digital contact tracing process of the COVID-19 epidemic in Hainan Province. The specific organizational process is as follows.
Data planning includes organization, assessment of the situation and demand, and the formulation of strategic goals. This project was initiated by the Hainan Provincial Command of COVID-19 Epidemic Prevention and Control on January 29, 2020. The governor of Hainan Province, as the person in charge, instructed various government departments to cooperate with the Big Data Administration to apply big data technology to COVID-19 epidemic emergency management–related work. That is, the Hainan Provincial Command of COVID-19 Epidemic Prevention and Control played a leading and coordinating role, the Hainan Big Data Administration played a leading role in implementation, and other government departments played cooperative roles. Thus, an integrated and flat organizational structure and coordination mechanism were formed.
The Hainan Big Data Administration immediately assessed the COVID-19 epidemic situation and emergency management measures, coordinated communication with the Command of COVID-19 Epidemic Prevention and Control and other relevant government departments, and organized epidemiologists and big data technicians to formulate an implementation plan. The plan determined the specific implementation details of using multisource epidemic big data for digital technology tracing and submitted achievable data requirements to the Command of COVID-19 Epidemic Prevention and Control. Then, the COVID-19 Epidemic Prevention and Control headquarters coordinated the cooperation of work by relevant departments and provided the specified data.
Hainan Provincial Big Data Management collected the first batch of data in Excel format (Microsoft Corporation) from the Command of COVID-19 Epidemic Prevention and Control, Public Security Department, Health Commission, communication operators, and other departments on February 10, 2020.
The technicians of the Hainan Provincial Big Data Administration cleaned and merged the data under the guidance of officials and epidemiologists, uploaded all databases to Hainan’s big data public service platform, and used the hive data warehouse tool to manage, extract, query, and analyze the data.
The data technicians of the Hainan Provincial Big Data Administration retrieved data from Hainan’s big data public service platform; used the graph database Neo4j algorithm to perform association analysis on key populations, vehicles, and public places; and then used ECharts to visualize the data analysis results and determine contacts and high-risk public places.
The Hainan Provincial Big Data Administration regularly writes COVID-19 epidemic contact tracing reports based on the results of data analysis and reports them to the Command of COVID-19 Epidemic Prevention and Control. The Command of COVID-19 Epidemic Prevention and Control promptly releases early warning information to the public; strengthens cross-departmental sharing of information; and guides the Health Commission, CDC, medical and health institutions, and grassroots residents’ committees in the implementation of measures, such as isolation, health monitoring, and nucleic acid testing of contacts. Measures are also implemented to limit the flow of people and close high-risk infected public places.
The technicians of the Hainan Provincial Big Data Administration took confirmed, suspected, and asymptomatic infected individuals, as well as individuals with a history of residence in Hubei Province, as the nodes of population; private cars, trains, and flights as the nodes of transportation; and communities and shopping malls as the nodes of public places. The technicians defined a relationship as appearance in the same public place or vehicle at the same time. Based on these nodes and relationships, the Neo4j Cypher algorithm graph database was applied to build an association graph to analyze and display the associations among key populations, vehicles, public places, and other key pieces of information. The core algorithm of the graph database used in our research is shown in
Hainan provincial digital contact tracing core algorithm based on a graph database.
Algorithm function | Specific algorithm | Algorithm description |
Tracing the travel companions of confirmed, suspected, and asymptomatic infected individuals | MATCH p=(n:Individual)-[r:'sameTransportation']-(n1)-[rr:'sameTransportation']-(n2) |
“Individual” means the individuals included in the analysis; “sameTransportation” means taking one mode of transportation at the same time; “Name” means the name of the analyzed objects; “ID” is the ID number, where “ID/ID1/ID2/ID3/ID4/ID...” are the ID numbers of confirmed, suspected and asymptomatic infected individuals;”phoneNo” is the mobile phone number. |
Tracing individuals who have had contact with >2 confirmed infected individuals | MATCH p=(n:'Individual')-[r1]->()<-[r2]-(nm)-[r21]->()<-[r22]-(m:'Individual') |
“confirmedindividual” means a confirmed individual; “sameID” means the same ID number. |
Tracing contacts through transportation modes with high risk of infection | match p=(n:Transportation)-[*..4]-() where n.name='CarNo1' return p; | “Transportation” means the transportation included in the analysis (private car, train, airplane); “CarNo1” is the license plate number of a certain private car. |
The representative analysis results of the graph database algorithm are shown in
Contact tracing based on the algorithm of the Neo4j software package for the graph database: (A) contacts associated with travel, (B) contacts associated with >2 confirmed cases, and (C) contacts associated with private cars.
The Hainan Provincial Big Data Administration applied ECharts data visualization tools to compute and visualize the distribution of the population with a residence history in Hubei Province in all cities of Hainan Province. As shown in
The Hainan Provincial Big Data Administration applied the graph database algorithm to rank the frequency of confirmed, suspected, and asymptomatic infected individuals and their contacts appearing in shopping malls and communities. The results were used to infer and predict public places with high risk of infection.
Distribution of the population of residents of Hainan Province who have a history of residence in Hubei Province.
The Hainan Provincial Big Data Administration used ECharts to visualize the basic data and analysis results and also designed an interactive operating system that can be accessed through a web page on a browser. The system interface is shown in
Data visualization and interactive system interface designed with ECharts.
The Hainan Provincial Big Data Administration identified 61,439 analysis objects from hundreds of thousands of records and identified 10,871 contacts. The authorities took measures to isolate, transport, and monitor these contacts. Hainan provincial digital contact tracing identified 378 individuals with the highest infection risk (that is, the highest numbers of contacts and exposures), including 106 close contacts and 154 second-degree contacts who traveled on the same vehicles, 110 contacts within the same communities, and 8 contacts within the same malls. The Hainan Provincial Big Data Administration identified 6 high-risk communities and a number of high-risk shopping malls. According to the list of the highest-risk infected individuals and high-risk shopping malls and communities, the Command of Hainan Provincial COVID-19 Epidemic Prevention and Control directed health committees, the CDC, medical and health institutions, grassroot governments, and community residents’ committees to take mandatory measures (such as nucleic acid testing, isolation, and intensive treatment) for the highest-risk infected individuals and to implement emergency shutdown measures for high-risk public places. As a result, a patient who was not detected by traditional contact tracing was discovered through digital contact tracing, which provided vital information support to comprehensively curb the spread of the COVID-19 epidemic from the source.
We summarized the organizational process, technical process, and main achievements of the graph database algorithm in tracing the contacts of COVID-19 cases in Hainan Province, China. This approach has practical importance in overcoming the limitations of traditional emergency management measures and existing digital contact tracing methods. From the perspective of the organizational process, our research found that Hainan Province formed a scientific and effective organizational structure and operating mechanism for digital contact tracing during the COVID-19 epidemic. The success of the organizational process is attributed to (1) the establishment of a special high-level administrative leadership agency to coordinate the entire process, namely, the Hainan Provincial Command of COVID-19 Epidemic Prevention and Control; (2) the clarification of the rights and obligations of relevant emergency management agencies in the digital contact tracing of the COVID-19 epidemic; (3) the formation of a flat and integrated organizational structure and operating mechanism for multiple agencies to communicate effectively in a timely manner and achieve collaborative governance; (4) the establishment of a dedicated big data management department to provide technical support required for data lifecycle management; and (5) the establishment of a government big data public service platform that supports the storage, recall, and analysis of multisource data according to the centralized model, and the application of the platform in digital contact tracing during the COVID-19 epidemic.
From the perspective of the technical process, our research is based on the use of the Neo4j graph database algorithm and ECharts data visualization tool to mine and analyze multisource COVID-19 epidemic big data. Hidden contacts and public places associated with confirmed patients, suspected patients, and asymptomatic infections were discovered and traced so that contacts and high-risk public places could be accurately identified. The results show that the digital contact tracing in Hainan Province is compatible with multisource heterogeneous epidemic big data; it can quickly and accurately find close, second-degree, or third-degree contacts, and it can identify public places with high risk of infection. Visualization technology is of great importance in optimizing public decision-making [
Our research is expected to overcome the main difficulties faced in emergency management of the COVID-19 epidemic in different countries and regions due to their unique political, cultural, and civic concepts [
The main limitation of our research is the small coverage and insufficient information of COVID-19 epidemic data. Thus, tracing high-risk infected populations among the entire population of Hainan Province and their contacts is impossible. The participants in digital contact tracing in Hainan Province are limited to people with a history of residence in Hubei Province. This selection greatly reduces the scope of digital contact tracing. In addition, WeChat positioning–based GPS data, UnionPay consumption data, and railway and flight passenger information, which can accurately reflect the trajectory of residents, are owned by enterprises. These enterprises did not agree to provide users’ spatiotemporal trajectory data to the Hainan Provincial Big Data Administration to protect the privacy of their users.
Hainan Province needs to further improve the accuracy of data contact tracing using a graph database algorithm. Authorities need to adopt more comprehensive, accurate, and dynamic population spatiotemporal trajectory data to combat the epidemic. These data are the key to improve the accuracy of digital-based tracing. In addition, when confirmed, suspected, and asymptomatic infected individuals come into contact with the general population, the distance and duration of exposure are the main indicators for assessing the risk of cross-infection [
Furthermore, security issues in the process of digital contact tracing of epidemics based on a digital government platform under the centralized model need to be properly resolved in two aspects: governance and technology. In terms of governance, the law, system, and data management structure must be improved for the government to obtain citizens’ personal information in public health emergencies. Data sharing standards, authority management, and data security management in the process of epidemic data analysis and application, as the bases for the government to adopt useful data to respond to epidemics under the premise of ensuring the privacy of citizens, should be clarified. In addition, the government’s use of data from enterprises, social organizations, and other nongovernmental organizations to respond to the epidemic needs to be further clarified and improved. In terms of technology, data security technologies, such as cryptography methods, data encryption technologies, system vulnerability monitoring and repair technologies, virus-killing technologies, and automatic data deletion technologies, must be comprehensively used in the digital platform for epidemic emergency management.
Finally, the methods of collecting epidemic data were not sufficiently intelligent. For example, epidemiological investigation data were mainly collected by health personnel through face-to-face interviews with the cooperation of community resident committees. These data were manually entered into the information system by the staff of the CDC or medical institutions. This data collection method was relatively inefficient and requires substantial staffing. Furthermore, onsite investigation by staff increases the risk of cross-infection. Therefore, for digital contact tracing, it is necessary to develop more intelligent means of completion. Therefore, authorities should follow the development trend of governance in the digital age; develop an epidemic emergency management big data platform with the aid of governmental big data platforms; and fully use cloud transmission, cloud storage, and cloud computing technology to realize the real-time and dynamic collection of epidemic data from the transaction information systems of public and private agencies through a network interface to replace the traditional methods of collecting multisource epidemic data.
Hainan Province, China, was selected as the case for this study. The use of a graph database algorithm and ECharts visualization tools in Hainan Province to trace contacts and identify high-risk public places was summarized in a centralized model during the process of emergency management of the COVID-19 epidemic. Moreover, the organizational and technical processes of the case we studied can help government agencies to implement precise emergency management measures and provide agile, evidence-based decision support to improve the responsiveness of the public health emergency response system. The organizational arrangements and technology involved in our research can be applied in different countries and regions to respond to the COVID-19 epidemic and can be applied at different stages of the COVID-19 epidemic. Future research should focus on solving the problems of insufficient data coverage information, insufficient contact tracing accuracy, insufficient intelligence of data collection methods, security concerns, and insufficient real-time performance of data sharing caused by the tradeoff between citizen privacy and public health security to optimize digital contact tracing.
quick response
Wireless Fidelity
Center for Disease Control and Prevention
The authors would like to thank the National Natural Science Foundation of China, the National Social Science Fund of China and the Fundamental Research Funds for the Central Universities for funding this study; we also thank all the officers and professionals from the Hainan Provincial Government who were involved in this study. We also acknowledge the reviewers and editors of the Journal of Medical Internet Research for improving this study. This work was supported by the National Natural Science Foundation of China (grant numbers: 71734002, 72042016), the National Social Science Fund of China (grant number: 20ZDA038), and the Fundamental Research Funds for the Central Universities, HUST (grant numbers: 2020kfyXGYJ022, 2020JYCXJJ035). The funders had no role in the design of the study; the collection, analysis, and interpretation of data; or the writing of the manuscript.
ZM contributed to the conceptualization, methodology, and writing of the original draft. HY contributed to the data curation, investigation, visualization, and writing of the original draft. QZ was involved in the investigation, methodology, writing, reviewing, and editing. WZ was involved in the writing, reviewing, and project administration. YD contributed to the writing and editing and supervised the study. All authors revised the manuscript.
None declared.