Title:
Effect of varying degree of resampling on prediction accuracy for observed peptide count in protein mass spectrometry data

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE Computer Society

Abstract

Class imbalance affects the learning of classifiers and it is almost ubiquitous in biological data sets. Resampling methods are one of the common methods for balancing imbalanced data sets. SMOTE (Synthetic Minority Oversampling Techniques) is one of the intelligent methods of oversampling. This study examines the performance of learning of machine learning algorithms at different balancing ratios of positive and negative samples in the training set, consisting of the observed peptides and absent peptides in MS experiment. Using SMOTE at different rates we achieved the best result with optimal balancing on boosted random forest that resulted in sensitivity of 92.1%, specificity value of 94.7%, and overall accuracy of 93.4%, MCC of 0.869 and AUC of 0.982 that are better than previously reported results. From the results of current experiments, it can be inferred that suitably modifying the class distribution, the performance of machine learning algorithms on the classification tasks can be enhanced. © 2015 IEEE.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By