File(s) stored somewhere else

Please note: Linked content is NOT stored on Open Access Victoria University of Wellington | Te Herenga Waka and we can't guarantee its availability, quality, security or accept any liability.

IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features

journal contribution
posted on 26.03.2021, 10:12 by TH Nguyen-Vo, QH Nguyen, TTT Do, TN Nguyen, S Rahardja, Binh NguyenBinh Nguyen
© 2019 Nguyen-Vo et al. Background: Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP - an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. Results: Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew's correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. Conclusions: iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.


Preferred citation

Nguyen-Vo, T. H., Nguyen, Q. H., Do, T. T. T., Nguyen, T. N., Rahardja, S. & Nguyen, B. P. (2019). IPseU-NCP: Identifying RNA pseudouridine sites using random forest and NCP-encoded features. BMC Genomics, 20(S10), 971-.

Journal title

BMC Genomics





Publication date





Springer Science and Business Media LLC

Publication status


Online publication date






Article number