Open Access Te Herenga Waka-Victoria University of Wellington
Browse

IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

journal contribution
posted on 2021-03-26, 10:12 authored by TH Nguyen-Vo, QH Nguyen, TTT Do, TN Nguyen, S Rahardja, Binh NguyenBinh Nguyen
© 2019 Nguyen-Vo et al. Background: Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP - an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. Results: Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew's correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. Conclusions: iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.

History

Preferred citation

Nguyen-Vo, T. H., Nguyen, Q. H., Do, T. T. T., Nguyen, T. N., Rahardja, S. & Nguyen, B. P. (2019). IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features. BMC Genomics, 20(S10), 971-. https://doi.org/10.1186/s12864-019-6357-y

Journal title

BMC Genomics

Volume

20

Issue

S10

Publication date

2019-12-30

Pagination

971

Publisher

Springer Science and Business Media LLC

Publication status

Published

Online publication date

2019-12-30

ISSN

1471-2164

eISSN

1471-2164

Article number

971

Language

en

Usage metrics

    Journal articles

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC