AI Adoption in Healthcare Requires Better Approaches to Patient Data

The following is a guest article by Vanessa Braunstein, Healthcare Product Marketing Lead at NVIDIA.

Building great AI models in healthcare and life sciences requires lots of data that is diverse, well-labeled, and spans across different patient types.

However, as AI gains traction, there are still a number of bottlenecks that slow down the process of developing robust AI models such as patient privacy, access to data, and lack of clinical expertise to annotate data for training.

In order to overcome these barriers, data scientists and developers have developed new solutions such as federated learning paradigms, AI models that require less labeled data to reach state-of-the-art performance, and AI models that generate synthetic clinical data which can be used to understand disease progression across age, gender, and ethnicity.

Considerations in Ethics and Governance

Europe’s landmark GDPR regulations are templates for healthcare AI, but governments will need to go much further.  Things to consider include educating patients on how their data is deidentified, stored, and used in building AI models. Patients would feel more at ease understanding the security of their data, plus understanding the value that their data plays in better patient care and treatment.

An example of AI’s benefit is in colonoscopy. A recent study led by the Mayo Clinic discovered that AI reduced the miss rate of precancerous polyps in colorectal cancer screening. The AI-based colonoscopy detected more polyps that were smaller, flatter and in the proximal and distal colon. Colonoscopy plays an important role in colorectal cancer prevention because polyps can be detected early with proper screening. It is almost entirely preventable with proper screening, which starts at 50 for both men and women.

How AI will be used and ensuring that the training data reflects the diversity of the population will need to be factored in for each algorithm use case. Healthcare institutions must make ethics a key initiative because this work takes some time. This work should be done in parallel with the development of algorithms to ensure neither gets lost.

The Tribulations of Labeling Clinical Data

In healthcare, data needs to be annotated by experts in the area, such as radiologists, pathologists, and other clinical domain experts. The cost of hiring experts in addition to the time it takes to annotate a large volume of data is astounding and a hurdle to developing models quickly. As the amount of hospital data continues to skyrocket, there needs to be a better way to make use of this data to build valuable AI models.

Two advances in AI are helping to build AI models. The first is the development of tools that help clinicians label data sight in their environment as they are working. The label tool is integrated into their PACS and also automatically labels other slices in a study, such as in a CT study in radiology. The clinical annotator would simply need to click on the area of interest in one image in the study and the rest of the slices would be labeled automatically.

The second advance is in the different types of AI models being created. Many typical AI model architectures require vast amounts of labeled data for pre-training and then training.  These are common in convolutional neural network (CNN) AI model architectures. In contrast, the transformer-based AI model architecture approach ingests vast amounts of unlabeled data for pretraining and only needs a small amount of labeled data subsequently. Thus, the time and money needed to label 5,000 CT studies is reduced, without compromising AI model performance. This type of AI model architecture has been shown to work well in natural language processing AI models that are reading pathology reports, patient medical records, adverse event reports, medical imaging, and drug discovery to identify therapeutics for diseases.

Federated Learning to Build More Robust AI Models

Federated Learning is a privacy-preserving technique that shares model weights versus the data itself. It is a way to keep patient data inside an institution, but still use the model weights to train a model on diverse data across institutions, geographies, patient types, and instrument types. The net result is an AI model that is much more generalizable and works at various institutions, even when it was not trained on that institution’s data.

An example of a federated learning approach is the EXAM model, which was a global collaboration across 20 hospitals, to build an AI model that predicts the oxygen needs of a COVID patient in an emergency room. Researchers at individual hospitals around the world used data from chest x-rays, patient vitals and lab reports to train the AI model in a privacy-preserving federated paradigm. In just 2 weeks, the AI model achieved .94 area under the curve. Federated learning is important not only to build AI models that can be used across institutions, but also help with data scarcity issues that happen when training AI models for rare diseases or pediatrics. Often, a single hospital does not have enough data to build their own model, so working across institutions provides a great solution.

Institutions like the American College of Radiology are utilizing federated learning to bring AI to radiology images for breast cancer and COVID-19 applications.

Synthetic Data to Examine Disease Trends

Synthetic data has become practically indistinguishable from real data and offers a great opportunity to examine disease trends across various patient types without worrying about patient privacy issues. The initial AI model is trained on actual medical images or patient clinical notes and then synthetic data is the output that has no patient data associated with it. Clinical teams and researchers then have a corpus of synthetic data that can be shared to study diseases.

For example, King’s College London and the London AI Center are making 100,000 synthetic MRI brain images available free to healthcare researchers to better understand dementia, aging or other brain diseases. This way, lack of data is not a problem since the synthetic data is representative, freely available, and automatically not associated with any patients. AI researchers can now build models of their own as well.

Another example of synthetic data is at the University of Florida’s academic health center, UF Health, where they have developed SynGatorTron. This AI model generates synthetic patient data and profiles that can be used to train AI models in many healthcare applications, does not have any identifying patient information, and is freely available for researchers.

Delivering the Promise of AI

In order to drive a wide adoption of AI in healthcare, collaboration and coordination across governments, industry, academics, and technology companies is needed to address the challenges with data – how to get data, how to label data, how to educate patients, and how to preserve patient confidentiality.

The smart hospitals of the future, the robots performing surgery from a remote location, the virtual assistants monitoring patient safety, and more, all require a medical ecosystem that supports the development of AI.

These tools will start to accelerate AI past the challenges, democratizing AI to deliver the future of healthcare that patients deserve.

About Vanessa Braunstein

Vanessa Braunstein leads healthcare and life science product marketing at NVIDIA for Clara products in drug discovery, genomics, medical imaging, medical devices, NLP, and smart hospitals. Previously, she was in product development, business development and marketing for radiology, genomics, pharmaceutical, chemistry, and bioinformatics companies using AI. She studied molecular and cell biology, public health, and business at UC Berkeley and UCLA.

   

Categories