Browsing by Author "Vedika Gupta"

Now showing 1 - 14 of 14

A fuzzy rule-based system with decision tree for breast cancer detection
(John Wiley and Sons Inc, 2023) Vedika Gupta; Harshit Gaur; Srishti Vashishtha; Uttirna Das; Vivek Kumar Singh; D. Jude Hemanth
Breast cancer is possibly the deadliest illness in the world and the risks are gradually increasing. One out of eight women has the chance to be detected with breast cancer in their lifetime. The utmost cause for the higher fatality rates is the prolonged prognosis for the detection of breast cancer. The focus of this study is therefore to develop a better fuzzy expert system for the detection of breast cancer using decision tree analysis for deriving the rule base. For this classification problem, the input features of the dataset are converted into human-understandable terms-linguistic variables. The Mamdani Fuzzy Rule-Based system is deployed as the main inference engine and the centroid method for the defuzzification process to convert the final fuzzy score into class labels- benign (not cancerous) or malignant (cancerous). A decision tree algorithm is applied the creating a novel set of 27 fuzzy rules which are fed into FRBS. The investigation is performed on the publicly available Wisconsin Breast Cancer Dataset. The accuracy obtained by the proposed system is about 97%, recall is 99.58% and precision is about 93%. The experiments on this dataset yield higher performance as compared to the state-of-the-art dataset. © 2023 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
A linguistic rule-based approach for aspect-level sentiment analysis of movie reviews
(Springer Verlag, 2017) Rajesh Piryani; Vedika Gupta; Vivek Kumar Singh; Udayan Ghose
Aspect-level sentiment analysis refers to sentiment polarity detection from unstructured text at a fine-grained feature or aspect level. This paper presents our experimental work on aspect-level sentiment analysis of movie reviews. Movie reviews generally contain user opinion about different aspects such as acting, direction, choreography, cinematography, etc. We have devised a linguistic rule-based approach which identifies the aspects from movie reviews, locates opinion about that aspect and computes the sentiment polarity of that opinion using linguistic approaches. The system generates an aspect-level opinion summary. The experimental design is evaluated on datasets of two movies. The results achieved good accuracy and shows promise for deployment in an integrated opinion profiling system. © Springer Nature Singapore Pte Ltd. 2017.
A quantitative and text-based characterization of big data research
(IOS Press, 2019) Vedika Gupta; Vivek Kumar Singh; Udayan Ghose; Pankaj Mukhija
This paper tries to map the research work carried out in the field of Big Data through a detailed analysis of scholarly articles published on the theme during 2010-16, as indexed in Scopus.We have collected and analyzed all relevant publications on Big Data, as indexed in Scopus, through a quantitative as well as textual characterization. The analysis attempts to dwell into parameters like research productivity, growth of research and citations, thematic trends, top publication sources and emerging topics in this field. The analytical study also investigates country-wise publications output and impact in terms of average citations per paper, country-level collaboration patterns, authorship and leading contributors (countries, institutions) etc. The scholarly publication data is also subjected to a detailed textual analysis method to identify key themes in Big Data research, disciplinary variations and thematic trends and patterns. The results produce interesting inferences. Quantitative measures show that there has been a tremendous increase in number of publications related to Big Data during last few years. Research work in Big Data, though primarily considered a sub-discipline of Computer Science, is now carried out by researchers in many disciplines. Thematic analysis of publications in Big Data show that it's a discipline involving research interest from fields as diverse as Medicine to Social Sciences. The paper also identifies major keywords now associated with Big Data research such as Cloud Computing, Deep Learning, Social Media and Data Analytics. This helps in a thorough understanding and visualization of the Big Data research area. © 2019 - IOS Press and the authors. All rights reserved.
Abusive comment detection in Tamil using deep learning
(Elsevier, 2024) Deepawali Sharma; Vedika Gupta; Vivek Kumar Singh
During the recent years, online social media have expanded in volume and coverage and have become a significant source of information for different groups of people. The comments posted on social media can be emotion-laden and hence can create an impact on mental health of an individual or a group of individuals. One such category of posts includes comments that are abusive or hateful in nature. The comments that spread hate and are abusive in nature usually target certain individuals or some specific communities. It is, therefore, very important to know about them and perhaps be able to detect such content in time. While there exist methods for automated detection of hate speech from posts in English language, there is relatively less research done on other low-resource languages, such as Tamil. This chapter presents an overview of research on detecting hate speech in low-resource languages and explores application of various deep learning models for the task. The abusive comments are classified in different categories: Homophobia, Xenophobia, Transphobic, Misandry, Misogyny, Counter-speech, and Hope speech, from Tamil and Tamil–English code-mixed language. Those comments that are not in the Tamil language are categorized as “Not-Tamil.” The following deep learning models: recurrent neural network, long-short term memory (LSTM), and bidirectional LSTM, are applied to the task. Experimental results are presented along with an analysis of the quality of results. © 2024 Elsevier Inc. All rights reserved.
Aspect-based sentiment analysis of mobile reviews
(IOS Press, 2019) Vedika Gupta; Vivek Kumar Singh; Pankaj Mukhija; Udayan Ghose
E-commerce websites provide an easy platform for users to put forth their viewpoints on different topics- ranging from a news item to any product in the market. Such online content encourages authors to express opinions on various aspects of an entity. Aspect based sentiment analysis deals with analyzing this textual content to look for the aspect in question. After locating the aspects, corresponding sentiment bearing words are looked for. This paper describes an integrated system that generates the opinionated aspect based graphical and extractive summaries from a large set of mobile reviews. The system focuses on three tasks (a) identification of aspects in given field, (b) computation of sentiment polarity of each aspect, and (c) generates opinionated aspect based graphical and extractive summaries. The system has been evaluated on three mobile-reviews dataset and obtains better precision and recall than baseline approach. The system generates summaries from reviews without any training. © 2019 - IOS Press and the authors.
Bibliographic Coupling and Conceptual Similarity: Are the Bibliographically Coupled Papers also Conceptually Similar?
(Phcog.Net, 2024) Abhirup Nandy; Aakash Singh; Vedika Gupta; Vivek Kumar Singh
Bibliographic coupling, over the years, has been referred to and used in different contexts related to scientific and technical literature. It is often believed that research papers that have bibliographic coupling deal with similar concepts and hence there may be high conceptual similarity between them. This study attempts to empirically asses this notion. To conduct this research, the study utilizes the data obtained from the Dimensions database and employs advanced machine learning algorithms to extract weighted keywords that better capture the conceptual content of documents. The Jaccard similarity measure is used to compute bibliographic and conceptual coupling matrices for different sets of research papers. The results show that even though bibliographic coupling is widely used to assess relationships between research papers, it often falls short of identifying actual conceptual similarities within documents. This study's findings carry important implications for areas such as information retrieval, interdisciplinary research and evaluation metrics, calling for a more refined understanding of how research documents relate to one another beyond their shared references. © Author (s) 2024.
BongHope: An annotated corpus for Bengali hope speech detection
(Springer Science and Business Media B.V., 2025) Tanusree Nath; Vivek Kumar Singh; Vedika Gupta
The exponential growth of social media has fostered the spread of both negativity (hate speech) and supportive content, with the latter often categorized as hope speech—text promoting peace and a hopeful outlook. Recently, a few research works have been conducted on automatic detection of hope speech in different languages, including English, Tamil, Malayalam, Spanish and Kannada. However, to the best of our knowledge, there is no research on hope speech in Bengali language text. Despite Bengali’s significant presence on social media, hope speech detection in this language remains unexplored. Therefore, it is important to develop appropriate computational methods for the automatic detection of hope and non-hope speech in Bengali text. One possible reason for the lack of hope speech research in Bengali may be the unavailability of a suitable dataset or corpus for this purpose. This study presents the first curated and annotated dataset for hope speech in Bengali, enabling effective detection and analysis. Several state-of-the-art computational models are applied to the created dataset and the results obtained confirm the suitability of the dataset for hope speech research. Overall, this research provides a foundational resource for Bengali hope speech detection, contributing to multilingual social media analysis. © Bharati Vidyapeeth's Institute of Computer Applications and Management 2025.
Book impact assessment: A quantitative and text-based exploratory analysis
(IOS Press, 2018) Rajesh Piryani; Vedika Gupta; Vivek Kumar Singh; David Pinto
Books are an important source of knowledge to disseminate information. Researchers and academicians write books to propagate their innovative research or teachings amongst academic as well as non-academic audience. The number of books written every year is increasing rapidly. According to International Publisher Association (IPA) annual report 2015-2016, around 150 million different books were published worldwide in 2014-2015. Many e-commerce websites are also involved in selling books. A recent addition to book publishing world is e-books, which have really made it very simple to publish. While, availability of large number of books is good for readers, at the same time it is challenging to find a good book, particularly in scholarly settings. Researchers in the area of Scientometrics have attempted to view assessment of goodness of a scholarly book by measuring citations that a book receive. However, citations alone are not a true measure of a book's impact. Many a times people use the knowledge in a book without actually citing it. Also use of books in classroom settings or for general reading often is not reflected in terms of citations. Therefore, it is important to obtain users's opinion about a book from other forms of data. Fortunately, we have now some data of this sort available in form of reviews, downloads and social media mentions etc. Amazon and Goodreads, both of which provide the readers' views about a book, are two good examples. This paper presents an exploratory research work on using these non-traditional data about books to assess impact of a book. A set of Scopus-indexed computer science books with good citations as well as some other popular books in computer science domain are used for analysis. The reviews of books have been crawled in an automated fashion from Amazon and Goodreads. Thereafter sentiment analysis is carried out the text of reviews. Results of sentiment analysis are compared and correlated with traditional impact assessment metrics. The experimental analysis does not show a coherent relationship between citation and online reviews. Also, majority of the online reviews are found to be positive for large number of books in the dataset. As a related exercise, the Scopus citation data and Google scholar citation data for books are also compared. A high value of correlation is observed in these two. Overall the exploratory analysis provides a useful insight into the problem of book impact assessment. © 2018-IOS Press and the authors. All rights reserved.
Detection of Homophobia & Transphobia in Malayalam and Tamil: Exploring Deep Learning Methods
(Springer Science and Business Media Deutschland GmbH, 2023) Deepawali Sharma; Vedika Gupta; Vivek Kumar Singh
The increase in abusive content on online social media platforms is impacting the social life of online users. Use of offensive and hate speech has been making social media toxic. Homophobia and transphobia constitute offensive comments against LGBT + community. It becomes imperative to detect and handle these comments, to timely flag or issue a warning to users indulging in such behaviour. However, automated detection of such content is a challenging task, more so in Dravidian languages which are identified as low resource languages. Motivated by this, the paper attempts to explore applicability of different deep learning models for classification of the social media comments in Malayalam and Tamil languages as homophobic, transphobic and non-anti-LGBT + content. The popularly used deep learning models-Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) using GloVe embedding and transformer-based learning models (Multilingual BERT and IndicBERT) are applied to the classification problem. Results obtained show that IndicBERT outperforms the other implemented models, with obtained weighted average F1-score of 0.86 and 0.77 for Malayalam and Tamil, respectively. Therefore, the present work confirms higher performance of IndicBERT on the given task on selected Dravidian languages. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Generating aspect-based extractive opinion summary: Drawing inferences from social media texts
(Instituto Politecnico Nacional, 2018) Rajesh Piryani; Vedika Gupta; Vivek Kumar Singh
This paper presents an integrated framework to generate extractive aspect-based opinion summary from a large volume of free-form text reviews. The framework has three major components: (a) aspect identifier to determine the aspects in a given domain; (b) sentiment polarity detector for computing the sentiment polarity of opinion about an aspect; and (c) summary generator to generate opinion summary. The framework is evaluated on SemEval-2014 dataset and obtains better results than several other approaches. © 2018 Instituto Politecnico Nacional. All rights reserved.
Hate Speech Detection Research in South Asian Languages: A Survey of Tasks, Datasets and Methods
(Association for Computing Machinery, 2025) Deepawali Sharma; Tanusree Nath; Vedika Gupta; Vivek Kumar Singh
Social media has over the years emerged as a powerful platform for communicating and sharing views, thoughts, and opinions. However, at the same time it is being abused by certain individuals to spread hate against individuals, communities, religions, and so on. Such content can lead to serious issues of mental health, online well-being, and social order. Therefore, it is very important to have automated methods and approaches for detecting such content from the large volume of posts in social media. Recently there has been several efforts to develop computational approaches toward this end, however, most of these efforts are directed toward content in English language. Only recently studies have started focusing on low resource languages, including those from South Asia. This article attempts to present a detailed and comprehensive survey of hate speech related research in South Asian languages. The various definitions and terms related to Hate speech in different social media platforms are discussed first. The different tasks in the hate speech research, available datasets, and the popular computational approaches used in the South-Asian languages are surveyed in detail. Major patterns identified and the practical implications are presented and discussed, along with a discussion of challenges and opportunities of further research in the area. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Movie Prism: A novel system for aspect level sentiment profiling of movies
(IOS Press, 2017) Rajesh Piryani; Vedika Gupta; Vivek Kumar Singh
This paper describes an integrated aspect level opinion summary generation system for movie reviews. The system, named as Movie Prism, analyses each movie review, locates aspect term in it, identifies opinion about those aspects and then generates a visual aspect based opinion summary of the movie in question. At present, the movie reviews and other related information is being automatically fetched from IMDb for all the movies released during the years 2010 to 2014. The system has an integrated crawler for this purpose. Further, ontology for the movie domain is created for better aspect identification. We have evaluated the system on three annotated movie review datasets. The system obtains good accuracy. Overall the designed system is capable of producing visual aspect level opinion summaries from unstructured textual reviews, without any need of training and results have a reasonable degree of accuracy. © 2017 - IOS Press and the authors. All rights reserved.
Should we stay silent on violence? An ensemble approach to detect violent incidents in Spanish social media texts
(Cambridge University Press, 2025) Deepawali Sharma; Vedika Gupta; Vivek Kumar Singh; David E. Pinto
There has been a steep rise in user-generated content on the Web and social media platforms during the last few years. While the ease of content creation allows anyone to create content, at the same time it is difficult to monitor and control the spread of detrimental content. Recent research in natural language processing and machine learning has shown some hope for the purpose. Approaches and methods are now being developed for the automatic flagging of problematic textual content, namely hate speech, cyberbullying, or fake news, though mostly for English language texts. This paper presents an algorithmic approach based on deep learning models for the detection of violent incidents from tweets in the Spanish language (binary classification) and categorizes them further into five classes - accident, homicide, theft, kidnapping, and none (multi-label classification). The performance is evaluated on the recently shared benchmark dataset, and it is found that the proposed approach outperforms the various deep learning models, with a weighted average precision, recall, and F1-score of 0.82, 0.81, and 0.80, respectively, for the binary classification. Similarly, for the multi-label classification, the proposed model reports weighted average precision, recall, and F1-score of 0.54, 0.79, and 0.64, respectively, which is also superior to the existing results reported in the literature. The study, thus, presents meaningful contribution to detection of violent incidents in Spanish language social media posts. © The Author(s), 2024.
TABHATE: A Target-based hate speech detection dataset in Hindi
(Springer, 2024) Deepawali Sharma; Vivek Kumar Singh; Vedika Gupta
Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts. © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2024.