Forming a COVID-19 Map From Diverse Data Sources and Machine Learning

Who could benefit from a geographical map that ranks counties by the risk of residents getting COVID-19? Certainly, such detail is valuable to public health agencies-but also to municipal governments, hospitals, clinics, and even private businesses.

Japan illustrates the powerful benefits of identifying geographic clusters, which it did early enough in the spread of COVID-19 to avoid a total lockdown of the country and the economic devastation that would accompany it. Taiwan implemented a cluster-based approach too, avoiding a lockdown, and Vietnam needed a national lockdown for only 15 days. Of course, other measures were instituted in these countries too (controls over people entering, masks, etc.), but this article focuses on how geographic distinctions can help regions provide locally appropriate responses. Laws that prohibit targeted measures by local authorities violate common sense and the facts about public health.

In the United States, we have seen enormous disparities in the rate of infection and death due to COVID-19. Typical barriers to care arise from race and economic status. These injustices can be viewed geographically. For instance, a graph in an L.A. Times article shows consistently large gaps between neighborhoods in percentages of COVID-19 infection. Massachusetts researchers found that one city, “where two-thirds of residents identify as Latino, have a COVID-19 infection rate six times higher than the state average.”

Businesses can also “titrate” their responses to the pandemic based on local information. For instance, a restaurant can selectively open or close its indoor seating based on the severity of infections. Businesses can demand masks, just as they keep people from coming in barefoot or bringing in guns.

Naturally, geographic hotspotting is helpful for more than pandemics. For instance, a hospital or city might use information on the risk of disease to open a new clinic or send a visiting nurse to local community centers.

Experian Health provides a map of people at high risk for developing COVID-19, based on a wide range of data. I talked recently about this interactive heat map, called the COVID Outlook & Response Evaluator Model (CORE), with Karly Rowe, Vice President Patient Access, Identity, and Care Management Product at Experian Health

Some of the data is medical, measuring the incidence of pre-existing conditions–chronic lung disease, immunocompromised state, obesity, etc.–based on CDC guidelines. Experian also measured demographic indicators such as age, gender, and income, race, and ethnicity; and social indicators such as mobility, household density, and utilization of public transportation.

Data came from Experian’s own collections, other companies, and public data sources. A few examples of the input for the map are Experian’s , Census Area Projections and Estimates (CAPE) attributes [LINK?], USAFacts COVID case listings, the open source OpenStreetMap, and healthcare data for pre-existing conditions.

Experian also took in billions of location records from mobile devices, along with consumer attributes for 130 million households in more than 3,000 U.S. counties. For each county, Experian accumulated approximately 1 million records each month.

Experian derives attributes from mobile sighting data and healthcare data to capture the social behavior and health risks at a county level. They then use these attributes, along with COVID data and other public information, in a machine learning model to predict the COVID risk level in each county.

Because the map is updated constantly, governments and clinicians can apply its information to decisions and make course corrections. To prevent the re-identification of residents, Experian removed counties with low populations. Several healthcare organizations and government organizations have used CORE to plan reopenings, allocate personnel and equipment, and procure funds from the CARES Act.

Rowe says, “As a risk modeling tool, it can help government, nonprofits and healthcare organizations use data and insights to inform communication and outreach strategies aimed at high-risk populations. It can also help them align various social service programs to better serve these people and can even assist in the decision making around allocation of CARES stimulus funds.”

Regarding disparities based on social determinants of health (SDoH), Rowe says, “As the industry moves toward value-based care and risk-based delivery and payment models, socio-economic data beyond the four walls of a doctor’s office are increasingly important to create a 360-degree view of a patient. SDOH-like barriers to transportation, technology or financial means hinder a patient’s ability to follow treatment plans, take medication, or attend important follow-up visits.”

About the author

Andy Oram

Andy is a writer and editor in the computer field. His editorial projects have ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. A correspondent for Healthcare IT Today, Andy also writes often on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, named USTPC, and is on the editorial board of the Linux Professional Institute.

   

Categories