Browsing by Author "Karthikeyan Subbiah"

Now showing 1 - 20 of 21

A Deep Learning Approach for Classification of Medicare Beneficiaries Based on Gender and being Affected with Cancer
(Elsevier B.V., 2022) Manish K. Pandey; Karthikeyan Subbiah
With the advent of the Third computing platform of Social Mobility Analytics and Cloud (SMAC), data is getting generated in huge amounts. This huge amount of data is collected for domain-specific information to process them to get required domain-specific information as in real-time health analytics, financial frauds, real-time automated car driving, vital information of patients undergoing robotic surgery, handling cyber threats etc. This huge data, also known as Big Data, is highly unstructured and imbalanced that is not possible for traditional techniques to handle and process. Advancements in computing power, speedy data storage and convergence of SMAC technologies have also contributed to the swift acceptance of the technology. This led to innovative analytical techniques that are data as well as computation intensive. One such technique is Deep Learning which originated from the artificial neural network and found its use in handling many real-life problems involving multidimensional features. The advantage of Feature Learning or Representational Learning makes Deep Learning a wonderful tool for big data analytics. The previous level of hierarchy transfers the feature learning to the next levels and thus complex features are learned through the learning of simpler features at different levels of abstraction. For efficient learning of these features, tuning of hyper-parameters is a mandatory step. The current work incorporates Grid Search for classification to find the best classifier for the classification of Medicare beneficiaries based on two scenarios. The first Scenario is beneficiaries who are affected by cancer and the Second Scenario is where Medicare beneficiaries are provided Gender wise (being a Female beneficiary). By experimenting using these algorithms at 10-fold cross-validation, the best results were achieved in the sensitivity of 99.17 %, Specificity of 97.68 % and accuracy of 98.8 % with Deep Learning Neural Network with Dropout for First Scenario and achieved the best results in the sensitivity of 82.97 %, Specificity of 68.71 % and accuracy of 75.05 % with Random Forest for Second Scenario. © 2023 The Authors. Published by Elsevier B.V.
A novel storage architecture for facilitating efficient analytics of health informatics big data in cloud
(Institute of Electrical and Electronics Engineers Inc., 2017) Manish Kumar Pandey; Karthikeyan Subbiah
Analytics of health big data are very crucial for providing cost effective quality health care. Over recent years, the analytics on healthcare big data has evolved into a challenging task for getting insights into a very large data set for improving the health services. This enormous amount of data, which is being generated incessantly over a long period of time, has put a great deal of stress on the write performance as well as on scalability. Moreover, there is a requirement of efficient storage and meaningful processing of these data which is an another challenging issue. The traditional relational databases, which were used in the storage of health data, are now unable to handle due to its massive and varied nature. Besides, these databases have some inherent weakness in terms of scalability, storing varied data format, etc. So there is a necessity for a new kind of data storage management system. This paper proposes a new big data storage architecture consisting of application cluster and a storage cluster to facilitate read/write/update speedup as well as data optimization. The application cluster is used to provide efficient storage and retrieval functions from the users. The storage services will be provided through the storage cluster. © 2016 IEEE.
An insight into the molecular basis for convergent evolution in fish antifreeze Proteins
(2013) Abhigyan Nath; Radha Chaube; Karthikeyan Subbiah
Antifreeze proteins (AFPs) prevent the growth of ice-crystals in order to enable certain organisms to survive under sub-zero temperature surroundings. These AFPs have evolved from different types of proteins without having any significant structural and sequence similarities among them. However, all the AFPs perform the same function of anti-freeze activity and are a classical example of convergent evolution. We have analyzed fish AFPs at the sequence level, the residue level and the physicochemical property group composition to discover molecular basis for this convergent evolution. Our study on amino acid distribution does not reveal any distinctive feature among AFPs, but comparative study of the AFPs with their close non-AFP homologs based on the physicochemical property group residues revealed some useful information. In particular (a) there is a similar pattern of avoidance and preference of amino acids in Fish AFP subtypes II, III and IV-Aromatic residues are avoided whereas small residues are preferred, (b) like other psychrophilic proteins, AFPs have a similar pattern of preference/avoidance for most of the residues except for Ile, Leu and Arg, and (c) most of the computed amino acids in preferred list are the key functional residues as obtained in previous predicted model of Doxey et al. For the first time this study revealed common patterns of avoidance/preference in fish AFP subtypes II, III and IV. These avoidance/preference lists can further facilitate the identification of key functional residues and can shed more light into the mechanism of antifreeze function. © 2013 Elsevier Ltd.
An intuitionistic fuzzy-rough set model and its application to feature selection
(IOS Press, 2019) Anoop Kumar Tiwari; Shivam Shreevastava; Karthikeyan Subbiah; T. Som
Due to the development of modern internet-based technology, the electronically stored information is growing exponentially with time. It is highly challenging to select relevant and non-redundant features of the real-valued high dimensional datasets. Feature selection, a preprocessing technique, refers to the process of reducing the dimension of the input data in order to extract the most meaningful features for processing and analysis. One of the numerous useful applications of rough set theory is the attribute or feature selection, but it has certain limitations as it cannot be applied on real-valued data sets directly because rough set based feature selection can handle discrete data only. In order to deal with real-valued data sets, discretization method is applied to convert dataset from real-valued to discrete, which usually leads to information loss. Fuzzy rough set theory is profitably applied to address this problem and retain the semantics of real-valued datasets. However, intuitionistic fuzzy set can deal with uncertainty in a much better way when compared to fuzzy set theory as it considers membership, non-membership and hesitancy degree of an object simultaneously. In this paper, an intuitionistic fuzzy rough set model is established by combining intuitionistic fuzzy set and rough set. Furthermore, we propose a novel approach of feature selection derived from this model. Moreover, we develop an algorithm based on our proposed concept. Finally, our approach is applied to some benchmark data sets and compared with the existing fuzzy rough set based technique. The performed experiments show the superiority of our approach. © 2019 - IOS Press and the authors.
Comparative study on machine learning techniques in predicting the QoS-values for web-services recommendations
(Institute of Electrical and Electronics Engineers Inc., 2015) Sunil Kumar; Manish Kumar Pandey; Abhigyan Nath; Karthikeyan Subbiah; Manoj Kumar Singh
This is an era of Internet computing and computing as a service on the internet is called cloud computing. Mainly three services like SaaS (applications), PaaS, and IaaS are being accessed through internet on demand, pay as per usage basis. Quality of Service (QoS) is the main issue in internet based computing for service providers and user-dependent as well as user-independent QoS parameters. In the current work we compared different machine learning algorithms for predicting the response time and throughput QoS values using past usage data. Bagging and support vector machines are found to be better performing prediction methods in comparison with other learning algorithms. © 2015 IEEE.
Effect of varying degree of resampling on prediction accuracy for observed peptide count in protein mass spectrometry data
(IEEE Computer Society, 2016) Anoop Kumar Tiwari; Abhigyan Nath; Karthikeyan Subbiah; Kaushal Kumar Shukla
Class imbalance affects the learning of classifiers and it is almost ubiquitous in biological data sets. Resampling methods are one of the common methods for balancing imbalanced data sets. SMOTE (Synthetic Minority Oversampling Techniques) is one of the intelligent methods of oversampling. This study examines the performance of learning of machine learning algorithms at different balancing ratios of positive and negative samples in the training set, consisting of the observed peptides and absent peptides in MS experiment. Using SMOTE at different rates we achieved the best result with optimal balancing on boosted random forest that resulted in sensitivity of 92.1%, specificity value of 94.7%, and overall accuracy of 93.4%, MCC of 0.869 and AUC of 0.982 that are better than previously reported results. From the results of current experiments, it can be inferred that suitably modifying the class distribution, the performance of machine learning algorithms on the classification tasks can be enhanced. © 2015 IEEE.
Enhanced Prediction for Observed Peptide Count in Protein Mass Spectrometry Data by Optimally Balancing the Training Dataset
(World Scientific Publishing Co. Pte Ltd, 2017) Anoop Kumar Tiwari; Abhigyan Nath; Karthikeyan Subbiah; Kaushal Kumar Shukla
Imbalanced dataset affects the learning of classifiers. This imbalance problem is almost ubiquitous in biological datasets. Resampling is one of the common methods to deal with the imbalanced dataset problem. In this study, we explore the learning performance by varying the balancing ratios of training datasets, consisting of the observed peptides and absent peptides in the Mass Spectrometry experiment on the different machine learning algorithms. It has been observed that the ideal balancing ratio has yielded better performance than the imbalanced dataset, but it was not the best as compared to some intermediate ratio. By experimenting using Synthetic Minority Oversampling Technique (SMOTE) at different balancing ratios, we obtained the best results by achieving sensitivity of 92.1%, specificity value of 94.7%, overall accuracy of 93.4%, MCC of 0.869, and AUC of 0.982 with boosted random forest algorithm. This study also identifies the most discriminating features by applying the feature ranking algorithm. From the results of current experiments, it can be inferred that the performance of machine learning algorithms for the classification tasks can be enhanced by selecting optimally balanced training dataset, which can be obtained by suitably modifying the class distribution. © 2017 World Scientific Publishing Company.
Enhanced prediction for piezophilic protein by incorporating reduced set of amino acids using fuzzy-rough feature selection technique followed by SMOTE
(Springer New York LLC, 2018) Anoop Kumar Tiwari; Shivam Shreevastava; Karthikeyan Subbiah; Tanmoy Som
In this paper, the learning performance of different machine learning algorithms is investigated by applying fuzzy-rough feature selection (FRFS) technique on optimally balanced training and testing sets, consisting of the piezophilic and nonpiezophilic proteins. By experimenting using FRFS technique followed by Synthetic Minority Over-sampling Technique (SMOTE) at optimal balancing ratios, we obtain the best results by achieving sensitivity of 79.60%, specificity of 74.50%, average accuracy of 77.10%, AUC of 0.841, and MCC of 0.542 with random forest algorithm. The ranking of input features according to their differentiating ability of piezophilic and nonpiezophilic proteins is presented by using fuzzy-rough attribute evaluator. From the results, it is observed that the performance of classification algorithms can be improved by selecting the reduced optimally balanced training and testing sets. This can be obtained by selecting the relevant and non-redundant features from training sets using FRFS approach followed by suitably modifying the class distribution. © Springer Nature Singapore Pte Ltd. 2018.
Improved Carpooling Experience through Improved GPS Trajectory Classification Using Machine Learning Algorithms
(MDPI, 2022) Manish Kumar Pandey; Anu Saini; Karthikeyan Subbiah; Nalini Chintalapudi; Gopi Battineni
Globally, smart cities, infrastructure, and transportation have led to a rise in vehicle numbers, resulting in an increasing number of problems. This includes problems such as air pollution, noise pollution, high energy consumption, and people’s health. A viable solution to these problems is carpooling, which involves sharing vehicles between people going to the same location. As carpooling solutions become more popular, they need to be implemented efficiently. Data analytics can help people make informed decisions when selecting a ride (Car or Bus). We applied machine learning algorithms to select the desired ride (Car or Bus) and used feature ranking algorithms to identify the foremost traits for selecting the desired ride. Based on the performance evaluation metric, 11 classifiers were used for the experiment. In terms of selecting the desired ride, Random Forest performs best. Using ten-fold cross-validation, we obtained a sensitivity of 87.4%, a specificity of 73.7%, an accuracy of 81.0%, a sensitivity of 90.8%, a specificity of 77.6%, and an accuracy of 84.7% using leave-one-out cross-validation. To identify the most favorable characteristics of the Ride (Car or Bus), the recursive elimination of features algorithm was applied. By identifying the factors contributing to users’ experience, the service providers will be able to rectify those factors to increase business. It has been determined that the weather can make or break the user experience. This model will be used to quantify and map intrinsic and extrinsic sentiments of the people and their interactions with locality, socio-economic conditions, climate, and environment. © 2022 by the authors.
Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier
(Elsevier Ltd, 2014) Abhigyan Nath; Karthikeyan Subbiah
Organisms thriving at extreme cold surroundings are called as psychrophiles and they present a wealth of knowledge about sequence adjustments in proteins that had occurred during the adaptation to low temperatures. In this paper, we propose a new cascading model to investigate the basis for psychrophilicity. In this model, a superior classifier was used to discriminate psychrophilic from mesophilic protein sequences, and then the PART rule generating algorithm was applied on the input instances that are correctly classified by the classifier, to generate human interpretable rules. These derived rules were further validated on a structural dataset and finally analyzed to discover the underlying biological basis about the psychrophilicity. In this study, we have used one of the key features of psychrophilic proteins accountable for remaining functional in extreme cold temperature surroundings i.e., global patterns of amino acid composition as the input features. The rotation forest classifier outperformed all the other classifiers with maximum accuracy of 70.5% and maximum AUC of 0.78. The effect of sequence length on the classification accuracy was also investigated. The analysis of the derived rules and interpretation of the analyzed results had revealed some interesting phenomena such as the amino acids A, D, G, F, and S are over-represented, and T is under-represented in psychrophilic proteins. These findings augment the existing domain knowledge for psychrophilic sequence features. © 2014 Elsevier Ltd.
Insights into the molecular basis of piezophilic adaptation: Extraction of piezophilic signatures
(Academic Press, 2016) Abhigyan Nath; Karthikeyan Subbiah
Piezophiles are the organisms which can successfully survive at extreme pressure conditions. However, the molecular basis of piezophilic adaptation is still poorly understood. Analysis of the protein sequence adjustments that had taken place during evolution can help to reveal the sequence adaptation parameters responsible for protein functional and structural adaptation at such high pressure conditions. In this current work we have used SVM classifier for filtering strong instances and generated human interpretable rules from these strong instances by using the PART algorithm. These generated rules were analyzed for getting insights into the molecular signature patterns present in the piezophilic proteins. The experiments were performed on three different temperature ranges piezophilic groups, namely psychrophilic-piezophilic, mesophilic-piezophilic, and thermophilic-piezophilic for the detailed comparative study. The best classification results were obtained as we move up the temperature range from psychrophilic-piezophilic to thermophilic-piezophilic. Based on the physicochemical classification of amino acids and using feature ranking algorithms, hydrophilic and polar amino acid groups have higher discriminative ability for psychrophilic-piezophilic and mesophilic-piezophilic groups along with hydrophobic and nonpolar amino acids for the thermophilic-piezophilic groups. We also observed an overrepresentation of polar, hydrophilic and small amino acid groups in the discriminatory rules of all the three temperature range piezophiles along with aliphatic, nonpolar and hydrophobic groups in the mesophilic-piezophilic and thermophilic-piezophilic groups. © 2015 Elsevier Ltd.
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion
(Elsevier Ltd, 2015) Abhigyan Nath; Karthikeyan Subbiah
Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. © 2015 Elsevier Ltd. All rights reserved.
Missing QoS-values predictions using neural networks for cloud computing environments
(Institute of Electrical and Electronics Engineers Inc., 2016) Sunil Kumar; Manish Kumar Pandey; Abhigyan Nath; Karthikeyan Subbiah
Cloud computing environment is influenced by user-dependent quality of service (QoS) parameters in evaluating the performance of Web services apart from others factors. Among the performance QoS parameters, mainly response-time and throughput could be modulated to provide very efficient services for cloud users. As per user's requirement, the service provider's recommendation of appropriate Web services to the end-users with proper QoS satisfaction is one of the critical issues. This can be recommended to end-users in the Service Level Agreement (SLA) under Web Service Modeling Ontology (WSMO) of WS-Policy. Generally, the matrix of collected QoS parameter values is sparse and the accurate prediction of the missing QoS values is important for the recommendation of appropriate web services to the end users. To address this issue, we worked out an artificial neural network model for the prediction of missing QoS-values using past QoS performance parameter data. In this current work, the performances of different learning algorithms of ANN are analyzed for enhanced prediction of QoS performance values. The ANN model with Bayesian-Regularization is found to be better performing when compared to other learning algorithms. © 2015 IEEE.
New approaches to intuitionistic fuzzy-rough attribute reduction
(IOS Press, 2018) Anoop Kumar Tiwari; Shivam Shreevastava; K.K. Shukla; Karthikeyan Subbiah
Technological advancement in the area of computing has led to production of huge amount of structured as well as unstructured data. This high dimensional data is very complex to process. Feature selection is one of the widely used techniques for preprocessing of this huge data in predictive analytics. Rough set based feature selection is an approach for handling the vagueness in data and works fine on discrete data but struggles in the continuous case as it requires discretization. This process of discretization leads to information loss. Solution for this problem was given by various authors in form of fuzzy rough set as well as intuitionistic fuzzy rough set based approaches for feature selection. Intuitionistic fuzzy set has certain benefits over the theory of traditional fuzzy sets such as its ability in a better expression of underlying information as well as its aptness to recite fragile ambiguities of the uncertainty of the objective world. The benefits offered by Intuitionistic fuzzy sets is due to the concurrent contemplation of positive, negative and hesitancy degrees for an object to belong to a set. In this paper, three novel approaches of feature reduction based on intuitionistic fuzzy rough set are presented. For this, a new intuitionistic fuzzy rough set model is established by defining a pair of lower and upper approximations. Furthermore, three new approaches of feature selection based on the degree of dependency by using score function, membership grade and cardinality of intuitionistic fuzzy numbers are introduced. Moreover, the basic results on lower and upper approximations based on rough sets are extended for intuitionistic fuzzy rough sets and analogous results are established. Moreover, a suitable algorithm is given based on our proposed approaches. Finally, the proposed algorithm is applied to an arbitrary example data set and comparison has been made with the previous fuzzy rough set based technique. The proposed algorithm is found to be better performing in terms of selected features. © 2018-IOS Press and the authors. All rights reserved.
Optimal balancing & efficient feature ranking approach to minimize credit risk
(Elsevier Ltd, 2021) Manish Kumar Pandey; Mamta Mittal; Karthikeyan Subbiah
The banking industries are struggling with massive growth in the Non-Performing Assets (NPAs) that is raising the concerns of the financial institutions across the world. For gaining sustainable competitive advantages: detection, prediction, and prevention of credit Risks are becoming the foremost priorities for the banks. This data is vast, highly unstructured and imbalanced; thus, optimal balancing and efficient feature ranking are required, to predict the Credit Risk customers using Machine Learning techniques. Further, feature ranking algorithms are applied to identify the most vital characteristics of triggering the Credit Risk. The experiments have been conducted on credit Risk data set from a German bank, downloaded from the standard data repository of the UCI. Random Forest at optimal balancing ratio of 1:1.1335 has been found to be the best performing with a sensitivity of 81.6%, specificity value of 85.3%, the accuracy of 83.4%, MCC of 0.669 and AUC of 0.914. © 2021
Performance Analysis of Ensemble Supervised Machine Learning Algorithms for Missing Value Imputation
(Institute of Electrical and Electronics Engineers Inc., 2016) Sunil Kumar; Manish Kumar Pandey; Abhigyan Nath; Karthikeyan Subbiah
In this era of cloud computing, web services based solutions are gaining popularity. The applications running on distributed environment seek new parameters for them to perform efficiently to satisfy end user's requirements. Finding these parameters for increasing efficiency has become a talk of researchers now days. Non functional performance of a web service is described through User dependent QoS properties. These QoS parameters are generally described in WS-Policy in Service Level Agreement (SLA). Usually in web service QoS datasets, web service QoS values are missing, which makes missing value imputations an important job while working with cloud web services. In the current work we compared the prediction accuracy of two groups of supervised machine learning ensembles based Meta learners: bagging and additive regression (boosting) with a fusion of the seven base learners in both. Random forest is found to be better performing in both Meta learners: bagging and boosting than other learning algorithms. © 2016 IEEE.
Performance analysis of time series forecasting using machine learning algorithms for prediction of ebola casualties
(Springer Verlag, 2019) Manish Kumar Pandey; Karthikeyan Subbiah
There is an immense concern on our vigilance for controlling the spread of pandemics such as Ebola, Zika, and H1N1 etc. through state of art technology. The dynamics become very complex of epidemics in sweeping population. Efficient descriptive, predictive, preventive and prescriptive analyses on the huge data generated by SMAC are very crucial for valuable arrangement and associated responsive tactics. In this paper, we have proposed the use of machine learning techniques for performance evaluation of time series forecasting of Ebola casualties. By experimenting without lag creation, we achieved the best results in the MAE of 7.85%, RMSE value of 61.14%, and Direction Accuracy of 85.99% with Random Tree Classifier. Thus we can conclude that by using these models for forecasting epidemic spread and developing public health policies leads the health authorities to ensure the appropriate actions for the control of the outbreak. © Springer Nature Singapore Pte Ltd. 2018.
Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors
(Springer Verlag, 2016) Abhigyan Nath; Karthikeyan Subbiah
To counter the host RNA silencing defense mechanism, many plant viruses encode RNA silencing suppressor proteins. These groups of proteins share very low sequence and structural similarities among them, which consequently hamper their annotation using sequence similarity-based search methods. Alternatively the machine learning-based methods can become a suitable choice, but the optimal performance through machine learning-based methods is being affected by various factors such as class imbalance, incomplete learning, selection of inappropriate features, etc. In this paper, we have proposed a novel approach to deal with the class imbalance problem by finding the optimal class distribution for enhancing the prediction accuracy for the RNA silencing suppressors. The optimal class distribution was obtained using different resampling techniques with varying degrees of class distribution starting from natural distribution to ideal distribution, i.e., equal distribution. The experimental results support the fact that optimal class distribution plays an important role to achieve near perfect learning. The best prediction results are obtained with Sequential Minimal Optimization (SMO) learning algorithm. We could achieve a sensitivity of 98.5 %, specificity of 92.6 % with an overall accuracy of 95.3 % on a tenfold cross validation and is further validated using leave one out cross validation test. It was also observed that the machine learning models trained on oversampled training sets using synthetic minority oversampling technique (SMOTE) have relatively performed better than on both randomly undersampled and imbalanced training data sets. Further, we have characterized the important discriminatory sequence features of RNA-silencing suppressors which distinguish these groups of proteins from other protein families. © 2016, The Author(s).
Social networking and big data analytics assisted reliable recommendation system model for internet of vehicles
(Springer Verlag, 2016) Manish Kumar Pandey; Karthikeyan Subbiah
The devices are becoming ubiquitous and interconnected due to rapid advancements in computing and communication technology. The Internet of Vehicles (IoV) is one such example which consists of vehicles that converse with each other as well as with the public networks through V2V (vehicle-to-vehicle), V2P (vehicle-to-pedestrian) and V2I (vehicle-to-infrastructure) communications. The social relationships amongst vehicles create a social network where the participants are intelligent objects rather than the human beings and this leads to emergence of Social Internet of Vehicles (SIoV). The big data generated from these networks of devices are needed to be processed intelligently for making these systems smart. The security and privacy issues such as authentication and recognition attacks, accessibility attacks, privacy attacks, routing attacks, data genuineness attacks etc. are to be addressed to make these cyber physical network systems very reliable. This paper presents a comprehensive survey on SIoV and proposes a novel social recommendation model that could establish links between social networking and SIoV for reliable exchange of information and intelligently analyze the information to draw authentic conclusions for making right assessment. The future Intelligent IoV system which should be capable to learn and explore the cyber physical system could be designed. © Springer International Publishing AG 2016.
The role of pertinently diversified and balanced training as well as testing data sets in achieving the true performance of classifiers in predicting the antifreeze proteins
(Elsevier B.V., 2018) Abhigyan Nath; Karthikeyan Subbiah
Antifreeze proteins (AFPs) are those proteins, which inhibit the ice nucleation process and thereby enabling certain organisms to survive under sub-zero temperature habitats. AFPs are supposed to be evolved from different types of protein families to perform the unique function of antifreeze activity and turn out to be the classical example of convergent evolution. The common sequence similarity search methods have failed to predict putative AFPs due to poor sequence and structural similarity that exists among the different sub-types of AFP. The machine learning techniques are the viable alternative approaches to predict putative AFPs. In this paper, we have discussed about the criteria (like apposite feature selection, balanced data sets and complete learning) that are needed to be taken into account for successful application of machine learning methods and implemented these criteria by using a clustering procedure in order to achieve the true performance of the learning algorithms. Diversified and representative training and testing data sets are very crucial for perfect learning as well as true testing of machine learning based prediction methods for two reasons: first is that a training dataset that lacks definable subset of input patterns makes prediction of patterns belonging to this subset either difficult or unfeasible (thus resulting in incomplete learning) and secondly a testing data set that lacks definable subset of input patterns does not tell about whether this subset of patterns can be correctly predicted by the classifier or not (thus resulting in incomplete testing). Moreover, balanced training and testing data sets are equally important for achieving the true (robust) performance of classifiers because a well-balanced training set eliminates bias of the classifier toward particular class/sub-class due to over-representation or under-representation of input patterns belonging to those classes/sub-classes. We have used K-means clustering algorithm for creating the diversified and balanced training as well as testing data sets, to overcome the shortcoming of random splitting, which cannot guarantee representative training and testing sets. The current clustering based optimal splitting criteria proved to be better than random splitting for creating training and testing set in terms of superior generalization and robust evaluation. © 2017 Elsevier Ltd