TY - JOUR
T1 - Machine Learning Estimation of Low-Density Lipoprotein Cholesterol in Women with and Without HIV
AU - Dong, Tony
AU - Rana, Mariam N.
AU - Longenecker, Chris T.
AU - Rajagopalan, Sanjay
AU - Kim, Chang H.
AU - Al-Kindi, Sadeer G.
N1 - Publisher Copyright:
© 2022 Lippincott Williams and Wilkins. All rights reserved.
PY - 2022/3/1
Y1 - 2022/3/1
N2 - Introduction:Low-density lipoprotein cholesterol (LDL-C) is typically estimated from total cholesterol, high-density lipoprotein cholesterol, and triglycerides. The Friedewald, Martin-Hopkins, and National Institutes of Health equations are widely used but may estimate LDL-C inaccurately in certain patient populations, such as those with HIV. We sought to investigate the utility of machine learning for LDL-C estimation in a large cohort of women with and without HIV.Methods:We identified 7397 direct LDL-C measurements (5219 from HIV-infected individuals, 2127 from uninfected controls, and 51 from seroconvertors) from 2414 participants (age 39.4 ± 9.3 years) in the Women's Interagency HIV Study and estimated LDL-C using the Friedewald, Martin-Hopkins, and National Institutes of Health equations. We also optimized 5 machine learning methods [linear regression, random forest, gradient boosting, support vector machine (SVM), and neural network] using 80% of the data (training set). We compared the performance of each method using root mean square error, mean absolute error, and coefficient of determination (R2) in the holdout (20%) set.Results:SVM outperformed all 3 existing equations and other machine learning methods, achieving the lowest root mean square error and mean absolute error, and the highest R2(11.79 and 7.98 mg/dL, 0.87, respectively, compared with those obtained using the Friedewald equation: 12.45 and 9.14 mg/dL, 0.87). SVM performance remained superior in subgroups with and without HIV, with nonfasting measurements, in LDL <70 mg/dL and triglycerides > 400 mg/dL.Conclusions:In this proof-of-concept study, SVM is a robust method that predicts directly measured LDL-C more accurately than clinically used methods in women with and without HIV. Further studies should explore the utility in broader populations.
AB - Introduction:Low-density lipoprotein cholesterol (LDL-C) is typically estimated from total cholesterol, high-density lipoprotein cholesterol, and triglycerides. The Friedewald, Martin-Hopkins, and National Institutes of Health equations are widely used but may estimate LDL-C inaccurately in certain patient populations, such as those with HIV. We sought to investigate the utility of machine learning for LDL-C estimation in a large cohort of women with and without HIV.Methods:We identified 7397 direct LDL-C measurements (5219 from HIV-infected individuals, 2127 from uninfected controls, and 51 from seroconvertors) from 2414 participants (age 39.4 ± 9.3 years) in the Women's Interagency HIV Study and estimated LDL-C using the Friedewald, Martin-Hopkins, and National Institutes of Health equations. We also optimized 5 machine learning methods [linear regression, random forest, gradient boosting, support vector machine (SVM), and neural network] using 80% of the data (training set). We compared the performance of each method using root mean square error, mean absolute error, and coefficient of determination (R2) in the holdout (20%) set.Results:SVM outperformed all 3 existing equations and other machine learning methods, achieving the lowest root mean square error and mean absolute error, and the highest R2(11.79 and 7.98 mg/dL, 0.87, respectively, compared with those obtained using the Friedewald equation: 12.45 and 9.14 mg/dL, 0.87). SVM performance remained superior in subgroups with and without HIV, with nonfasting measurements, in LDL <70 mg/dL and triglycerides > 400 mg/dL.Conclusions:In this proof-of-concept study, SVM is a robust method that predicts directly measured LDL-C more accurately than clinically used methods in women with and without HIV. Further studies should explore the utility in broader populations.
KW - human immunodeficiency virus
KW - low-density lipoprotein
KW - machine learning
KW - measurement/estimation
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85124620926&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124620926&partnerID=8YFLogxK
U2 - 10.1097/QAI.0000000000002869
DO - 10.1097/QAI.0000000000002869
M3 - Article
C2 - 34813572
AN - SCOPUS:85124620926
SN - 1525-4135
VL - 89
SP - 318
EP - 323
JO - Journal of Acquired Immune Deficiency Syndromes
JF - Journal of Acquired Immune Deficiency Syndromes
IS - 3
ER -