Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning

Mazen Noureddin; Fady Ntanios; Deepa Malhotra; Katherine Hoover; Birol Emir; Euan McLeod; Naim Alkhouri

doi:10.1002/hep4.1935

Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning

Mazen Noureddin, Fady Ntanios, Deepa Malhotra, Katherine Hoover, Birol Emir, Euan McLeod, Naim Alkhouri

Research output: Contribution to journal › Article › peer-review

32 Scopus citations

Abstract

This cohort analysis investigated the prevalence of nonalcoholic fatty liver disease (NAFLD) and NAFLD with fibrosis at different stages, associated clinical characteristics, and comorbidities in the general United States population and a subpopulation with type 2 diabetes mellitus (T2DM), using the National Health and Nutrition Examination Survey (NHANES) database (2017–2018). Machine learning was explored to predict NAFLD identified by transient elastography (FibroScan^®). Adults ≥20 years of age with valid transient elastography measurements were included; those with high alcohol consumption, viral hepatitis, or human immunodeficiency virus were excluded. Controlled attenuation parameter ≥302 dB/m using Youden’s index defined NAFLD; vibration-controlled transient elastography liver stiffness cutoffs were ≤8.2, ≤9.7, ≤13.6, and >13.6 kPa for F0–F1, F2, F3, and F4, respectively. Predictive modeling, using six different machine-learning approaches with demographic and clinical data from NHANES, was applied. Age-adjusted prevalence of NAFLD and of NAFLD with F0–F1 and F2–F4 fibrosis was 25.3%, 18.9%, and 4.4%, respectively, in the overall population and 54.6%, 32.6%, and 18.3% in those with T2DM. The highest prevalence was among Mexican American participants. Test performance for all six machine-learning models was similar (area under the receiver operating characteristic curve, 0.79–0.84). Machine learning using logistic regression identified male sex, hemoglobin A1c, age, and body mass index among significant predictors of NAFLD (P ≤ 0.01). Conclusion: Data show a high prevalence of NAFLD with significant fibrosis (≥F2) in the general United States population, with greater prevalence in participants with T2DM. Using readily available, standard demographic and clinical data, machine-learning models could identify subjects with NAFLD across large data sets.

Original language	English (US)
Pages (from-to)	1537-1548
Number of pages	12
Journal	Hepatology Communications
Volume	6
Issue number	7
DOIs	https://doi.org/10.1002/hep4.1935
State	Published - Jul 2022

ASJC Scopus subject areas

Hepatology

Access to Document

10.1002/hep4.1935

Cite this

Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning. / Noureddin, Mazen; Ntanios, Fady; Malhotra, Deepa et al.
In: Hepatology Communications, Vol. 6, No. 7, 07.2022, p. 1537-1548.

Research output: Contribution to journal › Article › peer-review

@article{c289d672b3bf481c811978c2bdc71ad9,

title = "Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning",

abstract = "This cohort analysis investigated the prevalence of nonalcoholic fatty liver disease (NAFLD) and NAFLD with fibrosis at different stages, associated clinical characteristics, and comorbidities in the general United States population and a subpopulation with type 2 diabetes mellitus (T2DM), using the National Health and Nutrition Examination Survey (NHANES) database (2017–2018). Machine learning was explored to predict NAFLD identified by transient elastography (FibroScan{\textregistered}). Adults ≥20 years of age with valid transient elastography measurements were included; those with high alcohol consumption, viral hepatitis, or human immunodeficiency virus were excluded. Controlled attenuation parameter ≥302 dB/m using Youden{\textquoteright}s index defined NAFLD; vibration-controlled transient elastography liver stiffness cutoffs were ≤8.2, ≤9.7, ≤13.6, and >13.6 kPa for F0–F1, F2, F3, and F4, respectively. Predictive modeling, using six different machine-learning approaches with demographic and clinical data from NHANES, was applied. Age-adjusted prevalence of NAFLD and of NAFLD with F0–F1 and F2–F4 fibrosis was 25.3%, 18.9%, and 4.4%, respectively, in the overall population and 54.6%, 32.6%, and 18.3% in those with T2DM. The highest prevalence was among Mexican American participants. Test performance for all six machine-learning models was similar (area under the receiver operating characteristic curve, 0.79–0.84). Machine learning using logistic regression identified male sex, hemoglobin A1c, age, and body mass index among significant predictors of NAFLD (P ≤ 0.01). Conclusion: Data show a high prevalence of NAFLD with significant fibrosis (≥F2) in the general United States population, with greater prevalence in participants with T2DM. Using readily available, standard demographic and clinical data, machine-learning models could identify subjects with NAFLD across large data sets.",

author = "Mazen Noureddin and Fady Ntanios and Deepa Malhotra and Katherine Hoover and Birol Emir and Euan McLeod and Naim Alkhouri",

note = "Funding Information: Medical writing support, under the guidance of the authors, was provided by Claire Cairney PhD, Neil Cockburn BSc, and Eric Comeau PhD, CMC Connect, McCann Health Medical Communications, and was funded by Pfizer Inc, New York, NY, USA, in accordance with Good Publication Practice (GPP3) guidelines (Ann Intern Med. 2015;163:461–4). Funding Information: Dr. Noureddin has advised 89BIO, Abbott, Allergan, Blade, EchoSens, Fractyl, Gilead, Intercept, Novartis, Novo Nordisk, OWL, Roche Diagnostics, Siemens, and Terns; he received research support from Allergan, Bristol‐Myers Squibb, Conatus, Enanta, Galectin, Galmed, Genfit, Gilead, Madrigal, Novartis, Shire, Viking, and Zydus; he is a shareholder of or has stock in Anaetos and Viking. Dr. Ntanios, Ms. Malhotra, Dr. Hoover, Dr. Emir, and Mr. McLeod are stockholders and employees of Pfizer Inc. Dr. Alkhouri participated in a speakers{\textquoteright} bureau for and received grants/research funding from Gilead and Intercept; he received grants/research funding from Akero, Allergan, Bristol‐Myers Squibb, Corcept, Galectin, Genfit, Madrigal, NGM, Pfizer Inc, Poxel, and Zydus. Publisher Copyright: {\textcopyright} 2022 The Authors. Hepatology Communications published by Wiley Periodicals LLc on behalf of the American Association for the Study of Liver Diseases.",

year = "2022",

month = jul,

doi = "10.1002/hep4.1935",

language = "English (US)",

volume = "6",

pages = "1537--1548",

journal = "Hepatology Communications",

issn = "2471-254X",

publisher = "Wiley-Blackwell Publishing Ltd",

number = "7",

}

TY - JOUR

T1 - Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning

AU - Noureddin, Mazen

AU - Ntanios, Fady

AU - Malhotra, Deepa

AU - Hoover, Katherine

AU - Emir, Birol

AU - McLeod, Euan

AU - Alkhouri, Naim

N1 - Funding Information: Medical writing support, under the guidance of the authors, was provided by Claire Cairney PhD, Neil Cockburn BSc, and Eric Comeau PhD, CMC Connect, McCann Health Medical Communications, and was funded by Pfizer Inc, New York, NY, USA, in accordance with Good Publication Practice (GPP3) guidelines (Ann Intern Med. 2015;163:461–4). Funding Information: Dr. Noureddin has advised 89BIO, Abbott, Allergan, Blade, EchoSens, Fractyl, Gilead, Intercept, Novartis, Novo Nordisk, OWL, Roche Diagnostics, Siemens, and Terns; he received research support from Allergan, Bristol‐Myers Squibb, Conatus, Enanta, Galectin, Galmed, Genfit, Gilead, Madrigal, Novartis, Shire, Viking, and Zydus; he is a shareholder of or has stock in Anaetos and Viking. Dr. Ntanios, Ms. Malhotra, Dr. Hoover, Dr. Emir, and Mr. McLeod are stockholders and employees of Pfizer Inc. Dr. Alkhouri participated in a speakers’ bureau for and received grants/research funding from Gilead and Intercept; he received grants/research funding from Akero, Allergan, Bristol‐Myers Squibb, Corcept, Galectin, Genfit, Madrigal, NGM, Pfizer Inc, Poxel, and Zydus. Publisher Copyright: © 2022 The Authors. Hepatology Communications published by Wiley Periodicals LLc on behalf of the American Association for the Study of Liver Diseases.

PY - 2022/7

Y1 - 2022/7

N2 - This cohort analysis investigated the prevalence of nonalcoholic fatty liver disease (NAFLD) and NAFLD with fibrosis at different stages, associated clinical characteristics, and comorbidities in the general United States population and a subpopulation with type 2 diabetes mellitus (T2DM), using the National Health and Nutrition Examination Survey (NHANES) database (2017–2018). Machine learning was explored to predict NAFLD identified by transient elastography (FibroScan®). Adults ≥20 years of age with valid transient elastography measurements were included; those with high alcohol consumption, viral hepatitis, or human immunodeficiency virus were excluded. Controlled attenuation parameter ≥302 dB/m using Youden’s index defined NAFLD; vibration-controlled transient elastography liver stiffness cutoffs were ≤8.2, ≤9.7, ≤13.6, and >13.6 kPa for F0–F1, F2, F3, and F4, respectively. Predictive modeling, using six different machine-learning approaches with demographic and clinical data from NHANES, was applied. Age-adjusted prevalence of NAFLD and of NAFLD with F0–F1 and F2–F4 fibrosis was 25.3%, 18.9%, and 4.4%, respectively, in the overall population and 54.6%, 32.6%, and 18.3% in those with T2DM. The highest prevalence was among Mexican American participants. Test performance for all six machine-learning models was similar (area under the receiver operating characteristic curve, 0.79–0.84). Machine learning using logistic regression identified male sex, hemoglobin A1c, age, and body mass index among significant predictors of NAFLD (P ≤ 0.01). Conclusion: Data show a high prevalence of NAFLD with significant fibrosis (≥F2) in the general United States population, with greater prevalence in participants with T2DM. Using readily available, standard demographic and clinical data, machine-learning models could identify subjects with NAFLD across large data sets.

AB - This cohort analysis investigated the prevalence of nonalcoholic fatty liver disease (NAFLD) and NAFLD with fibrosis at different stages, associated clinical characteristics, and comorbidities in the general United States population and a subpopulation with type 2 diabetes mellitus (T2DM), using the National Health and Nutrition Examination Survey (NHANES) database (2017–2018). Machine learning was explored to predict NAFLD identified by transient elastography (FibroScan®). Adults ≥20 years of age with valid transient elastography measurements were included; those with high alcohol consumption, viral hepatitis, or human immunodeficiency virus were excluded. Controlled attenuation parameter ≥302 dB/m using Youden’s index defined NAFLD; vibration-controlled transient elastography liver stiffness cutoffs were ≤8.2, ≤9.7, ≤13.6, and >13.6 kPa for F0–F1, F2, F3, and F4, respectively. Predictive modeling, using six different machine-learning approaches with demographic and clinical data from NHANES, was applied. Age-adjusted prevalence of NAFLD and of NAFLD with F0–F1 and F2–F4 fibrosis was 25.3%, 18.9%, and 4.4%, respectively, in the overall population and 54.6%, 32.6%, and 18.3% in those with T2DM. The highest prevalence was among Mexican American participants. Test performance for all six machine-learning models was similar (area under the receiver operating characteristic curve, 0.79–0.84). Machine learning using logistic regression identified male sex, hemoglobin A1c, age, and body mass index among significant predictors of NAFLD (P ≤ 0.01). Conclusion: Data show a high prevalence of NAFLD with significant fibrosis (≥F2) in the general United States population, with greater prevalence in participants with T2DM. Using readily available, standard demographic and clinical data, machine-learning models could identify subjects with NAFLD across large data sets.

UR - http://www.scopus.com/inward/record.url?scp=85127440262&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85127440262&partnerID=8YFLogxK

U2 - 10.1002/hep4.1935

DO - 10.1002/hep4.1935

M3 - Article

C2 - 35365931

AN - SCOPUS:85127440262

SN - 2471-254X

VL - 6

SP - 1537

EP - 1548

JO - Hepatology Communications

JF - Hepatology Communications

IS - 7

ER -

Predicting NAFLD prevalence in the United States using National Health and Nutrition Examination Survey 2017–2018 transient elastography data and application of machine learning

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this