Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine

  • Lionel Reinhart Halim Universitas Multimedia Nusantara
  • Alethea Suryadibrata Universitas Multimedia Nusantara

Abstract

Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively.

 

Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling

Downloads

Download data is not yet available.
Published
2021-06-27
How to Cite
Halim, L., & Suryadibrata, A. (2021). Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine. IJNMT (International Journal of New Media Technology), 8(1), 57-64. https://doi.org/https://doi.org/10.31937/ijnmt.v8i1.2047