TY - JOUR
T1 - Predicting Hospital Readmission for Campylobacteriosis from Electronic Health Records
T2 - A Machine Learning and Text Mining Perspective
AU - Zhou, Shang Ming
AU - Lyons, Ronan A.
AU - Rahman, Muhammad A.
AU - Holborow, Alexander
AU - Brophy, Sinead
N1 - Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/1/10
Y1 - 2022/1/10
N2 - (1) Background: This study investigates influential risk factors for predicting 30-day readmission to hospital for Campylobacter infections (CI). (2) Methods: We linked general practitioner and hospital admission records of 13,006 patients with CI in Wales (1990–2015). An approach called TF-zR (term frequency-zRelevance) technique was presented to evaluates how relevant a clinical term is to a patient in a cohort characterized by coded health records. The zR is a supervised term-weighting metric to assign weight to a term based on relative frequencies of the term across different classes. Cost-sensitive classifier with swarm optimization and weighted subset learning was integrated to identify influential clinical signals as predictors and optimal model for readmission prediction. (3) Results: From a pool of up to 17,506 variables, 33 most predictive factors were identified, including age, gender, Townsend deprivation quintiles, comorbidities, medications, and procedures. The predictive model predicted readmission with 73% sensitivity and 54% specificity. Variables associated with readmission included male gender, recurrent tonsillitis, non-healing open wounds, operation for in-gown toenails. Cystitis, paracetamol/codeine use, age (21–25), and heliclear triple pack use, were associated with a lower risk of readmission. (4) Conclusions: This study gives a profile of clustered variables that are predictive of readmission associated with campylobacteriosis.
AB - (1) Background: This study investigates influential risk factors for predicting 30-day readmission to hospital for Campylobacter infections (CI). (2) Methods: We linked general practitioner and hospital admission records of 13,006 patients with CI in Wales (1990–2015). An approach called TF-zR (term frequency-zRelevance) technique was presented to evaluates how relevant a clinical term is to a patient in a cohort characterized by coded health records. The zR is a supervised term-weighting metric to assign weight to a term based on relative frequencies of the term across different classes. Cost-sensitive classifier with swarm optimization and weighted subset learning was integrated to identify influential clinical signals as predictors and optimal model for readmission prediction. (3) Results: From a pool of up to 17,506 variables, 33 most predictive factors were identified, including age, gender, Townsend deprivation quintiles, comorbidities, medications, and procedures. The predictive model predicted readmission with 73% sensitivity and 54% specificity. Variables associated with readmission included male gender, recurrent tonsillitis, non-healing open wounds, operation for in-gown toenails. Cystitis, paracetamol/codeine use, age (21–25), and heliclear triple pack use, were associated with a lower risk of readmission. (4) Conclusions: This study gives a profile of clustered variables that are predictive of readmission associated with campylobacteriosis.
KW - Campylobacter infections
KW - Electronic health records
KW - Feature selection
KW - Hospitalisation
KW - Machine learning
KW - Readmission
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85122754923&partnerID=8YFLogxK
U2 - 10.3390/jpm12010086
DO - 10.3390/jpm12010086
M3 - Article
AN - SCOPUS:85122754923
SN - 2075-4426
VL - 12
JO - Journal of Personalized Medicine
JF - Journal of Personalized Medicine
IS - 1
M1 - 86
ER -