|
Journal of Advanced Artificial Intelligence
Foundation of Computer Science (FCS), NY, USA
|
| Volume 2 - Issue 4 |
| Published: January 2026 |
| Authors: Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri |
10.5120/jaai202658
|
Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri . Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam. Journal of Advanced Artificial Intelligence. 2, 4 (January 2026), 20-26. DOI=10.5120/jaai202658
@article{ 10.5120/jaai202658,
author = { Phuong Luong-Thi-Bich,Quan Nguyen-Minh,Hung Vo-Tri },
title = { Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam },
journal = { Journal of Advanced Artificial Intelligence },
year = { 2026 },
volume = { 2 },
number = { 4 },
pages = { 20-26 },
doi = { 10.5120/jaai202658 },
publisher = { Foundation of Computer Science (FCS), NY, USA }
}
%0 Journal Article
%D 2026
%A Phuong Luong-Thi-Bich
%A Quan Nguyen-Minh
%A Hung Vo-Tri
%T Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam%T
%J Journal of Advanced Artificial Intelligence
%V 2
%N 4
%P 20-26
%R 10.5120/jaai202658
%I Foundation of Computer Science (FCS), NY, USA
Occupational lung disease is one of the most serious health problems affecting the global workforce. Early prediction of disease risk is important in medical prevention and intervention. In this study, the proposed approach conducted a comparison of four classical machine learning models—Random Forest (RF), XGBoost, Logistic Regression (LR), and Support Vector Machine (SVM)—on the same set of occupational lung disease data that had been manually processed and encoded. The experimental results show that XGBoost achieves the best performance with an accuracy of 98.34% and a Macro F1-score of 0.7996, followed by LR, RF and SVM. In addition, the characteristic analysis shows that each model focuses on different factors, suggesting the potential to combine multiple models to improve prediction efficiency.