A Machine Learning Approach for Moderating Toxic Hinglish Comments of YouTube Videos

Akash Singh, Kumar Vaibhav, Mamta Arora

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the increasing digitization of our world comes a new challenge: toxic comments on online platforms. While these comment sections were initially intended for meaningful discussion, they are now plagued by spam, trolls, and offensive messages. Researchers have explored various automatic detection methods using deep learning models and feature extraction techniques on various datasets, ranging from English Wikipedia to Roman Urdu social media comments, to address this problem. While these approaches have achieved impressive results, they still face limitations, such as misspelled offensive words and obfuscation. These systems often struggle in regions with multilingual societies. The objective of this study is to develop a moderation system to filter comments on YouTube in Hinglish, a hybrid language that combines Hindi and English. The proposed system employs the Text Vectorization technique to screen out toxic comments written in Hinglish, utilizing a self-curated dataset specifically tailored for this language. The developed system is capable of effectively classifying and automatically deleting toxic comments from a YouTube video. This study outlines several challenges and open problems in this area, providing insights and a useful roadmap for future work. Although the developed system may misclassify a few comments due to the limited size of the dataset, it has the potential to enhance the user experience for Hinglish-speaking users on YouTube.

Original languageEnglish (US)
Title of host publicationData Science and Applications - Proceedings of ICDSA 2023
EditorsSatyasai Jagannath Nanda, Rajendra Prasad Yadav, Amir H. Gandomi, Mukesh Saraswat
PublisherSpringer Science and Business Media Deutschland GmbH
Pages173-187
Number of pages15
ISBN (Print)9789819978168
DOIs
StatePublished - 2024
Externally publishedYes
Event4th International Conference on Data Science and Applications, ICDSA 2023 - Jaipur, India
Duration: Jul 14 2023Jul 15 2023

Publication series

NameLecture Notes in Networks and Systems
Volume820
ISSN (Print)2367-3370
ISSN (Electronic)2367-3389

Conference

Conference4th International Conference on Data Science and Applications, ICDSA 2023
Country/TerritoryIndia
CityJaipur
Period7/14/237/15/23

Keywords

  • Hinglish language
  • Natural language processing
  • Sequential model
  • YouTube

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A Machine Learning Approach for Moderating Toxic Hinglish Comments of YouTube Videos'. Together they form a unique fingerprint.

Cite this