Document classification (DC) is one of the broadly investigated natural language processing tasks. Medical document classification can support doctors in making decision and improve medical services. Since the data in document classification often appear in raw form such as medical discharge notes, extracting meaningful information to use as features is a challenging task. There are many specialized words and expressions in medical documents which make them more challenging to analyze. The classification accuracy of available methods in medical field is not good enough. This work aims to improve the quality of the input feature sets to increase the accuracy. A new three-stage approach is proposed. In the first stage, the Unified Medical Language System (UMLS) which is a medical-specific dictionary is used to extract the meaningful phrases by considering disease or symptom concepts. In the second stage, all the possible pairs of the extracted concepts are created as new features. In the third stage, Particle Swarm Optimisation (PSO) is employed to select features from the extracted and constructed features in the previous stages. The experimental results show that the proposed three-stage method achieved substantial improvement over the existing medical DC approaches.
History
Preferred citation
Abdollahi, M., Gao, X., Mei, Y., Ghosh, S. & Li, J. (2019, January). Stratifying Risk of Coronary Artery Disease Using Discriminative Knowledge-Guided Medical Concept Pairings from Clinical Notes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (11672 LNAI pp. 457-473). Springer International Publishing. https://doi.org/10.1007/978-3-030-29894-4_37