IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features
journal contribution
posted on 2021-03-26, 10:12 authored by TH Nguyen-Vo, QH Nguyen, TTT Do, TN Nguyen, S Rahardja, Binh NguyenBinh Nguyen© 2019 Nguyen-Vo et al. Background: Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP - an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. Results: Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew's correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. Conclusions: iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.
History
Preferred citation
Nguyen-Vo, T. H., Nguyen, Q. H., Do, T. T. T., Nguyen, T. N., Rahardja, S. & Nguyen, B. P. (2019). IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features. BMC Genomics, 20(S10), 971-. https://doi.org/10.1186/s12864-019-6357-yPublisher DOI
Journal title
BMC GenomicsVolume
20Issue
S10Publication date
2019-12-30Pagination
971Publisher
Springer Science and Business Media LLCPublication status
PublishedOnline publication date
2019-12-30ISSN
1471-2164eISSN
1471-2164Article number
971Language
enUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC