使用监督学习预测心脏疾病住院

需积分: 13 165 浏览量更新于2024-07-09 收藏 827KB PDF 举报

"这篇论文主要探讨了利用监督学习方法预测因心脏疾病导致的住院情况。作者包括来自波士顿大学电气与计算机工程系、系统工程分部的专家，以及波士顿大学医学院儿科、马萨诸塞州总医院电生理实验室/心律失常服务部门的学者。文章指出，美国在2008年的医疗保健支出占其GDP的15.5%，其中31%用于医院护理，而近308亿美元的医院费用是可预防的，其中心脏疾病占31%。因此，准确且高效地预测心脏相关住院事件对于减少医疗开支具有重要意义。方法—研究的目标是基于患者的个人医疗历史数据，精确预测心脏疾病相关的住院情况。作者可能采用了监督学习算法，如决策树、随机森林、支持向量机或神经网络等，这些方法能从大量历史病例中学习模式，并用这些模式来预测新患者是否会因心脏疾病住院。监督学习通常包括数据预处理（例如清洗、标准化和特征选择）、模型训练和验证，以及性能评估（如精度、召回率和F1分数）。在数据集构建阶段，研究人员可能收集了包括患者年龄、性别、血压、胆固醇水平、糖尿病史、吸烟状况、家族病史等多种因素。这些变量可能对预测结果有显著影响。通过训练模型，算法会学习到这些特征如何与住院风险相关联，并在新的数据上进行预测。此外，论文可能还讨论了模型的优化，比如通过调整超参数、集成学习策略（如bagging和boosting）或使用正则化技术来防止过拟合，提高泛化能力。为了验证模型的性能，研究者可能使用交叉验证技术，将数据集分为训练集和测试集，确保模型在未见过的数据上的表现良好。结论—该研究的结果对于公共卫生政策制定者和医疗保健提供者来说至关重要，他们可以利用这些预测工具来识别高风险患者，提前干预，降低心脏疾病的住院率和相关成本。通过预防性措施和早期治疗，可以显著减轻医疗系统的负担，同时改善患者的生活质量。" 这篇论文深入探讨了如何利用机器学习技术预测心脏疾病导致的住院情况，旨在通过早期识别高风险患者来减少不必要的医疗开支，并为医疗决策提供数据支持。

Details will be discussed in the next subsection. We will refer to the summarized

information of the medical factors over a specific time interval as features.

Each feature related to Diagnoses, Procedures CPT, Procedures ICD9 and Visits to the

Emergency Room is an integer count of such records for a specific patient during the

specific time interval. Zero indicates absence of any record. Blood pressure and lab tests

features are continuous-valued. Missing values are replaced by the average of values of

patients with a record at the same time interval. Features related to tobacco use are

indicators of current- or past-smoker in the specific time interval. Admission features

contain the total number of days of hospitalization over the specific time interval the feature

corresponds to. Admission records are used both to form the Admission features (past

admission records) and in order to calculate the prediction variable (existence of admission

records in the target year). We treat our problem as a classification problem and each patient

is assigned a label: 1 if there is a heart-related hospitalization in the target year and 0

otherwise.

2.2 Data Preprocessing

In this subsection we discuss several data organization and preprocessing choices we make.

For each patient, a target year is fixed (the year in which a hospitalization prediction is

sought) and all past patient records are organized as follows.

•

Summarization of the medical factors in the history of a patient: Based on

experimentation, an effective way to summarize each patient's medical history is to

form four time blocks for each medical factor with all corresponding records

summarized over one, two, and three years before the target year and all earlier

records being summarized in a fourth block. For blood pressure and tobacco use,

only the year before the target year is kept. This process results to a vector of 212

features for each patient.

•

Selection of the target year: As a result of the nature of the data, the two classes are

highly imbalanced. When we fix the target year for all patients to be 2010, the

number of hospitalized patients is about 2% of the total number of patients, which

makes the classification problem much more challenging. Thus, and to increase the

number of hospitalized patient examples, if a patient had only one hospitalization

throughout 2007-2010, the year of hospitalization is set as the target year for that

patient. If a patient had multiple hospitalizations, a target year between the first and

the last hospitalization is randomly selected.

•

Setting the target time interval to be a year: A year has been proven to be an

appropriate time interval for prediction for our data set. We conducted trials setting

the time interval for prediction to be 1, 3, 6 and 12 months and used a Support

Vector Machine classifier — a method described later in more detail. Setting the

target time interval to one year yielded the best results. Moreover, given that

hospitalization occurs roughly uniformly within a year, we take the prediction time

interval to be a calendar year.

Dai et al.

Page 4

Int J Med Inform. Author manuscript; available in PMC 2016 March 01.

NIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript

剩余19页未读，继续阅读

Data+Science+Insight

粉丝: 1w+
资源: 54

使用监督学习预测心脏疾病住院

Application of improved Grey prediction model for settlement prediction of roadbed

Contribution of teacher ratings of behavioral characteristics to the prediction of divergent thinking and problem solving

Heart_Diseases_Prediction_App_Creation_Using_MLOps_Tools:中等链接

Prediction of kindergarten reading success from preschool report of parents

Prediction of protein structure.rar

Correlation of solubility and Prediction of the Mixing Properties of Ginsenoside Compound K in various solvents

Photothermal damage prediction of laser interstitial thermotherapy

Application of support vector machine in the prediction of mechanical property of steel materials (2006年)

Prediction of achievement with measures of learning, social behavior, sex and intelligence

ctt.zip_predict java_prediction_prediction of_神经网络 java_销售

最新资源