Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing

Dig Dis Sci. 2017 Oct;62(10):2713-2718. doi: 10.1007/s10620-017-4721-9. Epub 2017 Aug 31.

Abstract

Introduction: Natural language processing is a powerful technique of machine learning capable of maximizing data extraction from complex electronic medical records.

Methods: We utilized this technique to develop algorithms capable of "reading" full-text radiology reports to accurately identify the presence of fatty liver disease. Abdominal ultrasound, computerized tomography, and magnetic resonance imaging reports were retrieved from the Veterans Affairs Corporate Data Warehouse from a random national sample of 652 patients. Radiographic fatty liver disease was determined by manual review by two physicians and verified with an expert radiologist. A split validation method was utilized for algorithm development.

Results: For all three imaging modalities, the algorithms could identify fatty liver disease with >90% recall and precision, with F-measures >90%.

Discussion: These algorithms could be used to rapidly screen patient records to establish a large cohort to facilitate epidemiological and clinical studies and examine the clinic course and outcomes of patients with radiographic hepatic steatosis.

Keywords: Electronic health records; Epidemiology; Fatty liver; Natural language processing; Nonalcoholic fatty liver disease; Triglycerides.

MeSH terms

  • Algorithms
  • Data Mining / methods*
  • Databases, Factual*
  • Electronic Health Records*
  • Fatty Liver / diagnostic imaging*
  • Fatty Liver / epidemiology
  • Fatty Liver / therapy
  • Humans
  • Magnetic Resonance Imaging*
  • Natural Language Processing*
  • Predictive Value of Tests
  • Prognosis
  • Tomography, X-Ray Computed*
  • Ultrasonography*
  • United States / epidemiology
  • United States Department of Veterans Affairs
  • Veterans Health