Deteksi Komentar Spam Bahasa Indonesia Pada Instagram Menggunakan Naive Bayes
Abstract
Instagram is the most famous pictures and videos media sharing based on the web & mobile application. Instagram users can have picture posts that can be commented by their followers. Indonesian public figures such as actors, actresses, musicians use Instagram to promote their activities to their followers. Unfortunately, there are a lot of spam comments in Instagram that need special attention and have to be removed. This research grabs Instagram comments and builds the dataset from Indonesian public figures who have more than one million followers. By using preprocessing (tokenization, stop words removal, and stemming), TF-IDF weighting, and supervised learning, Naive Bayes method is used to detect spam comments in Indonesian. Naive Bayes produces 74,31% accuracy rate on unbalanced datasets and 77,25% accuracy rate on balanced datasets. This result shows that Naïve Bayes can be used to build an automatic Indonesian spam comments detector on Instagram with high accuracy rate. The novelty of this research is that Naive Bayes can be used to detect spam comment on our Indonesian Instagram comments dataset.
Index Terms—Instagram, Naive Bayes, Indonesian spam comments, spam comments detection.
Downloads
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike International License (CC-BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Copyright without Restrictions
The journal allows the author(s) to hold the copyright without restrictions and will retain publishing rights without restrictions.
The submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the ULTIMATICS or its Editorial Staff. The main (first/corresponding) author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission.