Browsing by Author "Deepawali Sharma"

Now showing 1 - 6 of 6

Abusive comment detection in Tamil using deep learning
(Elsevier, 2024) Deepawali Sharma; Vedika Gupta; Vivek Kumar Singh
During the recent years, online social media have expanded in volume and coverage and have become a significant source of information for different groups of people. The comments posted on social media can be emotion-laden and hence can create an impact on mental health of an individual or a group of individuals. One such category of posts includes comments that are abusive or hateful in nature. The comments that spread hate and are abusive in nature usually target certain individuals or some specific communities. It is, therefore, very important to know about them and perhaps be able to detect such content in time. While there exist methods for automated detection of hate speech from posts in English language, there is relatively less research done on other low-resource languages, such as Tamil. This chapter presents an overview of research on detecting hate speech in low-resource languages and explores application of various deep learning models for the task. The abusive comments are classified in different categories: Homophobia, Xenophobia, Transphobic, Misandry, Misogyny, Counter-speech, and Hope speech, from Tamil and Tamil–English code-mixed language. Those comments that are not in the Tamil language are categorized as “Not-Tamil.” The following deep learning models: recurrent neural network, long-short term memory (LSTM), and bidirectional LSTM, are applied to the task. Experimental results are presented along with an analysis of the quality of results. © 2024 Elsevier Inc. All rights reserved.
Detection of Homophobia & Transphobia in Malayalam and Tamil: Exploring Deep Learning Methods
(Springer Science and Business Media Deutschland GmbH, 2023) Deepawali Sharma; Vedika Gupta; Vivek Kumar Singh
The increase in abusive content on online social media platforms is impacting the social life of online users. Use of offensive and hate speech has been making social media toxic. Homophobia and transphobia constitute offensive comments against LGBT + community. It becomes imperative to detect and handle these comments, to timely flag or issue a warning to users indulging in such behaviour. However, automated detection of such content is a challenging task, more so in Dravidian languages which are identified as low resource languages. Motivated by this, the paper attempts to explore applicability of different deep learning models for classification of the social media comments in Malayalam and Tamil languages as homophobic, transphobic and non-anti-LGBT + content. The popularly used deep learning models-Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) using GloVe embedding and transformer-based learning models (Multilingual BERT and IndicBERT) are applied to the classification problem. Results obtained show that IndicBERT outperforms the other implemented models, with obtained weighted average F1-score of 0.86 and 0.77 for Malayalam and Tamil, respectively. Therefore, the present work confirms higher performance of IndicBERT on the given task on selected Dravidian languages. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Hate Speech Detection Research in South Asian Languages: A Survey of Tasks, Datasets and Methods
(Association for Computing Machinery, 2025) Deepawali Sharma; Tanusree Nath; Vedika Gupta; Vivek Kumar Singh
Social media has over the years emerged as a powerful platform for communicating and sharing views, thoughts, and opinions. However, at the same time it is being abused by certain individuals to spread hate against individuals, communities, religions, and so on. Such content can lead to serious issues of mental health, online well-being, and social order. Therefore, it is very important to have automated methods and approaches for detecting such content from the large volume of posts in social media. Recently there has been several efforts to develop computational approaches toward this end, however, most of these efforts are directed toward content in English language. Only recently studies have started focusing on low resource languages, including those from South Asia. This article attempts to present a detailed and comprehensive survey of hate speech related research in South Asian languages. The various definitions and terms related to Hate speech in different social media platforms are discussed first. The different tasks in the hate speech research, available datasets, and the popular computational approaches used in the South-Asian languages are surveyed in detail. Major patterns identified and the practical implications are presented and discussed, along with a discussion of challenges and opportunities of further research in the area. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Misogynistic attitude detection in YouTube comments and replies: A high-quality dataset and algorithmic models
(Academic Press, 2025) Aakash Singh; Deepawali Sharma; Vivek Kumar Singh
Social media platforms are now not only a medium for expressing users views, feelings, emotions and sentiments but are also being abused by people to propagate unpleasant and hateful content. Consequently, research efforts have been made to develop techniques and models for automatically detecting and identifying hateful, abusive, vulgar, and offensive content on different platforms. Although significant progress has been made on the task, the research on design of methods to detect misogynistic attitude of people in non-English and code-mixed languages is not very well-developed. Non-availability of suitable datasets and resources is one main reason for this. Therefore, this paper attempts to bridge this research gap by presenting a high-quality curated dataset in the Hindi-English code-mixed language. The dataset includes 12,698 YouTube comments and replies, with each comment annotated under two-level categories, first as optimistic and pessimistic, and then into different types at second level based on the content. The inter-annotator agreement in the dataset is found to be 0.84 for the first subtask, and 0.79 for the second subtask, indicating the reasonably high quality of annotations. Different algorithmic models are explored for the task of automatic detection of the misogynistic attitude expressed in the comments, with the mBERT model giving best performance on both subtasks (reported macro average F1 scores of 0.59 and 0.52, and weighted average F1 scores of 0.66 and 0.65, respectively). The analysis and results suggest that the dataset can be used for further research on the topic and that the developed algorithmic models can be applied for automatic detection of misogynistic attitude in social media conversations and posts. © 2024 Elsevier Ltd
Should we stay silent on violence? An ensemble approach to detect violent incidents in Spanish social media texts
(Cambridge University Press, 2025) Deepawali Sharma; Vedika Gupta; Vivek Kumar Singh; David E. Pinto
There has been a steep rise in user-generated content on the Web and social media platforms during the last few years. While the ease of content creation allows anyone to create content, at the same time it is difficult to monitor and control the spread of detrimental content. Recent research in natural language processing and machine learning has shown some hope for the purpose. Approaches and methods are now being developed for the automatic flagging of problematic textual content, namely hate speech, cyberbullying, or fake news, though mostly for English language texts. This paper presents an algorithmic approach based on deep learning models for the detection of violent incidents from tweets in the Spanish language (binary classification) and categorizes them further into five classes - accident, homicide, theft, kidnapping, and none (multi-label classification). The performance is evaluated on the recently shared benchmark dataset, and it is found that the proposed approach outperforms the various deep learning models, with a weighted average precision, recall, and F1-score of 0.82, 0.81, and 0.80, respectively, for the binary classification. Similarly, for the multi-label classification, the proposed model reports weighted average precision, recall, and F1-score of 0.54, 0.79, and 0.64, respectively, which is also superior to the existing results reported in the literature. The study, thus, presents meaningful contribution to detection of violent incidents in Spanish language social media posts. © The Author(s), 2024.
TABHATE: A Target-based hate speech detection dataset in Hindi
(Springer, 2024) Deepawali Sharma; Vivek Kumar Singh; Vedika Gupta
Social media has become a platform for expressing opinions and emotions, but some people also use it to spread hate, targeting individuals, groups, communities, or countries. Therefore, there is a need to identify such content and take corrective action. During the last few years, several techniques have been developed to automatically detect and identify hate speech, offensive and abusive content from social media platforms. However, majority of the studies focused on hate speech detection in English language texts only. The non-availability of suitable datasets is a major reason for lack of research work in other languages. Hindi is one such widely spoken language where such datasets are not available. This work attempts to bridge this research gap by presenting a curated and annotated dataset for target-based hate speech (TABHATE) in the Hindi language. The suitability of the dataset is explored by applying some standard deep learning and transformer-based models for the task of hate speech detection. The experimental results obtained show that the dataset can be used for experimental work on hate speech detection of Hindi language texts. © The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2024.