How Augmented Intelligence and NLP Can Help Researchers Identify Rare Diseases

By Elizabeth Marshall, MD

“When you hear hoofbeats behind you, think of horses, not zebras.” —Theodore Woodward, University of Maryland School of Medicine professor

Medical students are often taught that when making a diagnosis, they should first consider a more commonplace explanation (the horse) versus the rare and more exotic disease (the zebra). While sage advice, on occasion the hooves we hear do belong to a zebra—and a patient does, in fact, have a rare disease.

As a physician, I have experienced the challenges of diagnosing a rare disease. Often relevant clues about a patient’s condition are hidden in the free text section of a chart note, in an old lab report, or as an unstructured comment within a message in the patient portal. To find the vital clues, a clinician must manually search through a patient’s chart—a time-consuming process that is prone to error. Sometimes critical information is overlooked, and a rare disease is not diagnosed until all other more common diagnoses have been ruled out. Unfortunately, a late diagnosis can make a substantial—even life-or-death—difference in a patient’s outcome.

Defining rare disease

According to the advocacy group Rare Action Network, a rare disease is considered to be any disease, disorder, illness, or condition that affects fewer than 200,000 people in the U.S. An estimated 25 to 30 million Americans—nearly one in 10—have at least one of approximately 7,000 identified rare diseases.

Because of the difficulties diagnosing rare diseases, many healthcare organizations now use artificial intelligence (AI) technologies to aid in the process. AI tools such as natural language processing (NLP) create an augmented workflow that helps users search through unstructured data quickly, even if the information is stored in multiple data sources, such as EHRs, patient portals, or other clinical systems. Using NLP tools, clinicians can rapidly get a more accurate, 360-degree view of an individual patient, or even a population of patients.

Faster diagnosis, better treatment
To treat patients with rare diseases as soon as possible, rapid diagnosis is critical. Important patient information related to signs and symptoms of rare disease can be found in both structured (discrete fields) and unstructured (free text) EHR data—each of which are important for diagnosis. NLP plays a key role with unstructured data, enabling clinicians to discover and analyze hidden information that may be essential to diagnosing rare diseases.

Leveraging the right data, NLP-driven analytics can deliver not only a clearer diagnosis picture for individual patients but also for whole-disease populations, allowing for the creation or enhancement of rare disease registries.

Precision medicine is advanced, but the manual processes are not
Clinical genomic testing involves the analysis of an individual’s DNA to diagnose disease. At the University of Iowa Stead Family Children’s Hospital, physicians and genetic researchers are advancing their precision medicine efforts and disease diagnosis by utilizing chromosomal microarray (CMA) testing to identify “copy number variants,” including extra (duplicated) or missing (deleted) chromosomal segments. However, research efforts have traditionally been time-consuming and inefficient, relying heavily on manual processes.

With CMA testing, researchers would manually review medical records to determine the clinical relevancy of each copy number variant to a patient’s phenotype. Researchers looked in the EHR for specific terms (e.g., abnormally large head, seizures, structural heart defects) to identify any observations that might correlate with a specific copy variant.

CMA test outcomes are classified into three categories: normal, abnormal, or VUS (variant of unclear clinical significance). These outcomes—VUS in particular—are not always informative, so an additional review of the medical record is often required to determine the relevancy of each copy number variant to the clinical phenotype. The CMA test identifies, on average, 20–30 copy number variants per patient. Even though about half of the variants are benign, a lab technician must review each identified copy number variant. In the past, the technician would manually search through the EHR to determine if a variant was supported by the observed phenotype.

NLP boosts efficiency and output
For researchers at Stead Family Children’s Hospital, the analysis of variants was a very time-consuming and inefficient process, primarily because of the vast amounts of EHR data that had to be searched manually. To reduce this burden, the hospital deployed an NLP text mining solution, using AI to reduce search times and improve search results. After implementing NLP, researchers documented the following:

  • The manual extraction of phenotypes for 100 patients took over 34 hours, compared to just 10 minutes using NLP
  • Stead Family Children’s Hospital ran 700 CMAs, which would have taken nearly 240 hours to do manually vs. just 1.2 hours with NLP

The bottom line: Phenotype curation with NLP was 200 times faster than manual curation.

At Stead Family Children’s Hospital, as with many other organizations, researchers and clinicians have successfully deployed NLP to deliver significantly faster and more accurate results when analyzing clinical records, as compared to manual chart reviews. For patients with rare diseases, time is often of the essence. AI tools such as NLP can facilitate more rapid and precise research, resulting in a quicker diagnosis and hopefully a better quality of life.

Elizabeth Marshall, MD, is associate director of clinical analytics at Linguamatics, an IQVIA company.