Blood-Red Tape: How Redundant Data Collection Leads to Scandal

Throughout the health care field, clinical and administrative staff complain about the burden of collecting data required by government regulations–often with no idea what purpose the data serves. A lot of regulatory requirements are desperate stabs at filling the gaps caused by a lack of data standards and of interoperability–yes, a decade into the U.S. government’s goal of making data exchange simple and universal in health care.

But now, ill-considered data collection requirements led to a lurid headline on the front page of the Sunday New York Times on March 14: “Maggots, “Rape and Yet Five Stars: How U.S. Ratings of Nursing Homes Mislead the Public.” This extensive examination of the five-star system offered by the Centers for Medicare & Medicaid Services (CMS) was a marathon exercise in big data, where reporters “combed through 373,000 reports by state inspectors and examined financial statements submitted to the government by more than 10,000 nursing homes.”

The results were predictable. When the self-reported data by nursing homes was checked against the facts–hospitalizations, inspection reports–it turned out that a huge number of nursing homes underreported incidents, overestimated staffing, and made other adjustments to reality. These administrators violated not only the law but the ancient Deuteronomic injunction: “Do not have two differing weights in your bag–one heavy, one light.” Like merchants who buy using one set of weights and sell using another, the nursing homes were gaming the system.

A follow-up article shows how the investigation is leading to legal action.

The one big question that the reporters failed to ask in this long article–and the question we all should ask–is: If the New York Times could get accurate data on nursing home safety, why can’t CMS?

Put another way, why did CMS ignore accurate sources of data and instead force the nursing homes to create another data set–in a process that was ripe for inaccuracy, whether intentional or accidental?

This scandal is just a particularly shocking outcome of a system that has resisted modern data-gathering and analysis for decades. Let’s take a look.

More Time on Forms Than on Patients

It’s well-known that screen time is contributing to doctor burnout. Computers speed up some tasks and prevent some kinds of errors–but spending more than half of their (very long) workday on the computer is clearly not the best use of doctors’ time.

Meaningful Use in the 2010 decade imposed more and more data collection requirements. It has been reported to me that much of the data was already in records, collected during clinical visits, but that the electronic record vendors didn’t bother investing the time to repurpose that data. Instead, dutifully responding to regulations, they added new fields that a clinician or administrator had to fill out.

Meanwhile, the rest of the computer field was striving to make data collection sleeker and more consistent. In finance, retail, airlines, and elsewhere, data was being curated, cleaned, and stored in data lakes. The watchword was “a single source of truth.” For fast retrieval, duplication was sometimes necessary, but it was done through rigorously defined flows–a completely different world from the haphazard data storage practices of health care.

The nursing home report in the New York Times should cause us all to re-examine not just the CMS rating system, but attitudes toward data collection and storage throughout the health care system. An example of forward-looking thinking is a recent recommendation for more centralized data collection on COVID-19 vaccinations.

Privacy concerns

Of course, when health care reformers call on institutions to do better sharing, the old guard puts up a privacy shield. I’m not referring to the devious “privacy shield” that the European Union recently discarded in its dealings with U.S. companies, but a shield to protect hospitals, insurers, and others who wanted to hoard data. Business secrets are another oft-used parry to prevent sharing data.

And it’s true that patient data is hard to protect. Professional, sophisticated anonymization/deidentification techniques work. The experts know how to create data sets in which the risk of reidentifying a patient is negligible (though never totally zero). But this expertise is in short supply.

HIPAA provided about 15 simple steps in a “safe harbor” for deidentifying patient data. In 1996 when the law was passed, compute power was less available and anonymization techniques were less sophisticated, so perhaps simple things like removing fields were the best recourse they had. The safe harbor was long understood to be too lax in some ways and too strict in others

Good data anonymization currently includes analyzing a population and determining which values are relatively rare in each field. You have to be more careful to fuzz the data for people with a rare disease than people with a common condition. The calculation is different for each data set, meaning it has to be calculated by each institution for its particular data. And it must be recalculated periodically, because data sets change and attacks on data change.

But calculations throughout the computer field are getting easier over time. Many businesses that used to assign analytics to programing staff now have tools that let “citizen data scientists” accomplish the tasks. The health care field could standardize tools to help their members create robust anonymized data sets.

Data sharing requires an investment. But how many billions of dollars will we save by freeing clinicians and administrators from redundant, error-prone data entry? How many billions of dollars would we save if patients in unsafe nursing homes had been removed before they got sick?

About the author

Andy Oram

Andy is a writer and editor in the computer field. His editorial projects have ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. A correspondent for Healthcare IT Today, Andy also writes often on policy issues related to the Internet and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, named USTPC, and is on the editorial board of the Linux Professional Institute.

   

Categories