K-Nearest Neighbors Algorithm to Student Opinion of the Online Learning Method at Wira Wacana Sumba Christian University

The education sector is one of the areas that has felt the major impact of the Covid-19 pandemic. The impact that arises is the teaching and learning process must be carried out from home using the online learning method. This teaching and learning method raises a variety of responses from students. This is what makes researchers analyze these views, both in the form of positive opinions or negative opinions. The analysis process is carried out by applying sentiment analysis or opinion mining from the comment on Facebook, text mining is processed using the preprocessing method, labeled it to positive and negative. Based on the available data, a classification process is carried out using the KNearest Neighbors algorithm. Rapid Miner is used to experiment text data with the KNN algorithm in order to find the value of accuracy, precision and recall. From the results of research, it was obtained a value of 87.00% for accuracy and 0.916 for the AUC value. The values are high enough for the classification of student opinion against this pandemic so that this research is classified as Excellent Classification.


I. PRELIMINARY
The outbreak of the Corona Disease Virus 2019 or also known as Covid19 has had a huge impact on the teaching and learning process at all levels of education in all parts of the world. Specifically in Indonesia, the teaching and learning system is carried out from home, this is based on government regulations and recommendationsthrough the Indonesian Minister of Education and Culture (Mendikbud) for all students to study from home (BDR) and even the 2020 national exam was cancelled. The study from home policy is implemented with the aim of limiting and reducing physical contact as an effort to prevent virus transmission [1].
Wira Wacana Christian University Sumba is one of the universities affected by the Covid 19 outbreak so that all student activities are "homed", learning to teach from home using online or online learning models. As of March 2020 where the government's recommendation regarding Social Distancing during the Covid 19 pandemic was issued by Unkriswina through the Chancellor's Circular Number 019/EDR-R/2020 explaining that it is necessary to pay attention to measures to prevent the spread of Covid 19 in the campus environment so that lectures are carried out online. or online (Unkriswina, 2020)..The education system is carried out using an online method where every learning activity is carried out virtually through the media presented on the internet. Of course there is an impact, whether it supports or even harms the students. The positive impacts obtained are being safe from the Covid 19 outbreak, learning is more practical and flexible, saving time and energy, as well as a more personal learning approach(Pakpahan, R., & Fitriani, 2020). However, there are several problems that arise as a result of the application of the learning method from home using online media as an example of an unstable and adequate network to be accessed properly considering that Indonesia is an archipelagic country whose internet infrastructure development is uneven. There are obstacles for students and students with low economics considering that it takes money to access the internet. The effectiveness of online learning also depends on the maturity and readiness of the school, in this case the teaching teacher. There are many teachers who are not able to convey material effectively through online learning systems [2].
Based on the existing impacts, the focus to be studied is students using the Sentiment Analysis method which is a process to find out a person's views or opinions on events that occur, whether it is a positive or negative view. Opinions and views of students can be in written or oral form. K-Nearesr Neighbors is an algorithm used to perform the opinion classification process resulting from sentiment analysis.The advantage of the K-Nearest Neighbor algorithm is the high accuracy value in calculations that have been ISSN 2085-4579 proven and have been applied in applications [3].The aim is to classify student opinions in the form of positive and negative opinions and the accuracy value will be calculated using the K-NN algorithm approach. The results can be used as evaluation material regarding online learning models during the pandemic.
Previous research related to the classification of sentiment analysis is classification of wikipedia articles by Hardiyanto and Rahutomo (2016). The Indonesian Wikipedia article classification is intended for the classification of articles on the Indonesian Wikipedia website in text form by using a text pre-processing model and then forwarded by TF-IDF weighting. Based on this weighting, the articles in the Indonesian Wikipedia are classified using the K-Nearest Neighbor Algorithm. The results of the manual test show the accuracy of the truth with a value of 60% [4]. Comparison with this research is the classification process which is carried out manually without the help of classification tools or tools.
The research of Siti Ernawati and Risa Wati in 2018, The application of the K-Nearest Neighbors algorithm in the sentiment analysis of travel agent reviews shows the following: processing 100 positive and negative review data with the K-Nearest Neighbor (K-NN) algorithm related to sentiment. Experiments and results show that by using the K-Nearest Neighbor (K-NN) algorithm, it achieves high accuracy results and is classified as the best accuracy value of 87.00% and the AUC point of 0.916 [5]. There are several things that are lacking from this research, namely the KNN algorithm is applied theoretically while the formulation or algorithm equation in finding distance or proximity data as a classification process is not applied.
Research related to the use of online learning models conducted by Toni Limbong (2020) shows the following: as an effort to support the Government's decision regarding the spread of the Corona Virus, the Catholic University of Santo Thomas Medan applies an online or online-based learning model. The researcher applied the Multi Attribute Utility Theory Method to a case study of the effectiveness of online learning using the Zoom and Edmodo applications at the Faculty of Computer Science, Santo Thomas Catholic University, Medan and obtained objective results with the assessment of Theory (0.88) as the highest assessment, followed by Theory and Practicum courses (0.70) , Practicum courses (0.42) and Field courses (0.20). The conclusion that the researcher obtained and became a reference for the decision if the university leadership would make an online or online exam policy, then the form of questions and the nature of the exam were in the form of theory, such as: multiple choice, essay and also analysis [6]. This study uses pseudocode and a combination of opinions so that the conclusions generated are solely the views of the researchers. Some of the basic things that will be used in this research are the application of sentiment analysis and the K-Nearest Neighbors algorithm as a method for classifying Unkriswina students' views on the use of online learning during the Lock Down period due to the Corona Virus outbreak. The application of pseudocode or distance and proximity calculation algorithms as part of the classification process will be applied. Of course, the main target is how to find out the views of students while participating in online learning, which of course their views vary so that they need to be classified and separated positive opinions and negative opinions.

A. Sentiment Analysis
Sentiment analysis or also called opinion mining is a computational or computational study in finding and identifying opinions, attitudes, emotions, sentiments, evaluations, subjectivity, judgments contained in a text. Sentiment analysis is intended to find the percentage value of positive labeled sentiment and negative labeled sentiment towards a person, object or in a certain condition. Sentiment analysis has 3 values that are generally used, namely: positive sentiment, negative sentiment and neutral sentiment [7].
The steps of sentiment classification analysis of text-mining data or text data are as follows: 1). Initial stage: Collecting datasets such as public opinion, ratings of restaurants or products and others. 2). Preprocessing: stages in text-mining to convert raw data into important information where the stages include: Tokenization, Stopwords Removal, and Stemming. 3). Transformation: Weighted text data. 4). Feature Selection: The stage of limiting and reducing data that is not needed. 5). Classification: Classification stages such as: Naive Bayes, K-Nearest Neighbor, Support Vector Machine and others. 6). Interpretation/Evaluation: The evaluation stage is to calculate the accuracy value and the Area Under the Curve value [8].

B. K-Nearest Neighbors
In the book Data Mining Algorithm, Kusrini explains that the K-Nearest Neighbors Algorithm is an approach to finding cases by calculating the proximity between a new case and an old case through a weight matching process from a number of available features [9]. In another view, it is stated that K-Nearest Neighbors is an algorithm for classifying objects based on the data that is closest to the object. Data is illustrated in many dimensional spaces, where each dimension reflects a feature of the object. Accurate k values for this algorithm depend on data with high k values [10].
The main purpose of this algorithm is to classify an object based on the attributes and training sample. The K-NN model applies a classification that refers to the ISSN 2085-4579 proximity of the points of existing objects as the approximate value of the new sample [11].
The method used is by observing the discussion and comment forms in the Facebook group which is then used as a dataset to be processed as research subjects. The stages of the research are as follows:

C. Collection of Data Sets
The first stage is collecting data by creating a Facebook group account consisting of active Unkriswina students and providing space to answer questions related to online learning in the midst of the Covid 19 pandemic. The data in question are opinions or views from Wira Wacana Sumba Christian University students as targets. study. The data obtained is still in the form of a collection of opinions so that it must be processed first into a dataset.

D. Initial Data Processing
The data sample used for training is as much as 200 data. At the initial processing of data through preprocessing as follows:

i. Case Folding
Case Folding is the process of converting all letters of text data to lowercase.
ii. Tokenization Tokenization is the stage of separating words, phrases, punctuation marks or symbols.
iii. Stopword Removal Then next is the process of eliminating the stopword list which is a list of connecting words between sentences.
iv. Stemming Stemming aims to change or replace tokens that have affixes into basic words. For example, the word remove is changed to replace [12].

E. Modeling
Experiments in processing text data in this study using RapidMiner 8.2. The training data used are opinions Unkriswina Sumba students obtained from Facebook Social Media and grouped into 2 parts, namely: positive opinions and negative opinions. Generally, Similarity is at a value between 0 to 1, a value of 0 is that the two objects are absolutely not similar, while a value of 1 indicates that the object is similar to absolute [13].

G. Validation and Evaluation
The validation stage is carried out by applying 10fold-cross validation. The validation process has two sub-processes, namely: training set and testing set. The training data sub-process is intended to be used in RapidMiner modeling which will then be tested. The evaluation or testing of the results of the K-NN classification uses the Confusion Matrix.

III. RESULT AND DISCUSSION
The data is obtained from the comments in the Facebook group discussion form: https://www.Facebook.com/groups/465615134612746 with more than 200 respondents and 291 comment data. The data were then given positive and negative labels so as to produce 200 comment data, these labeled data were used as research datasets. Rapidminer version 8.2 is used with the aim of obtaining a model that suits research needs.

A. Opinion Document Collection
The  The Opinion Document in table 1 will be processed with the Prepoccessing approach and the application of opinion classification by prioritizing the data normalization stage. Online learning is not at all good, I mostly don't understand 3 BASICALLY if this online learning model is implemented seriously by the lecturer and we are STUDENTS… I think everything will be fine and we also get SCIENCE with EFFECTIVE !!! 4 Online lectures are very inconvenient, costs a lot to buy packages, the network is so unstable here 5 I don't have a cellphone let alone a laptop, online lectures for me have to find more money to buy a cellphone 6 More of us are required to learn on our own actually 7 The lectures are ok but I don't know the lecturers and friends in class, I can only see they have photos 8 Online lectures but if there is no internet is the same as lying, it is difficult. 9 Most of the lecturers teach not clear, suddenly give assignments. Few materials are taught, a myriad of tasks are given 10 I don't concentrate when studying online, not to mention if the network has been disrupted, it's already bad Table 3 is the initial opinion documents on training data that have not been preprocessed. The following is a preprocessing stage with a case folding, tokenizing, stopword removal approach in opinion documents in table 3.

B. Pre-processing Comment Data
Before the dataset is classified using the K-Nearest Neighbors method, as an initial stage, pre-processing will be carried out as follows: the basic online learning model is applied seriously by lecturers and students, all will be good and get effective knowledge 4 online college is a hassle to buy an unstable network package 5 I don't have a laptop, I have to find money to buy a cellphone 6 we need to learn on our own 7 college is ok, don't know the lecturers, friends, see their photos foto 8 online college no internet is the same as a lie 9 The teaching lecturer is not clear, suddenly gives material assignments, a little teaching warehouse assignments 10 lack of concentration online lectures, network, severe interference Next is the determination of the frequency term in the training data resulting from the preprocessing approach, as shown in table 5 below: The result of the term frequency of training data is in the form of word tokens which are then carried out in the classification stage, but first class labeling is carried ISSN 2085-4579 out on each opinion with the aim that the tools used can identify class documents from student opinions.  Table 6 is the stage of labeling student opinion training data which will then be tested with the K-Nearest Neighbors algorithm.

C. Classification Using the K-Nearest Neighbors Method
Before the classification is carried out, first the calculation of the proximity of the distance is carried out using the existing equations, namely: The comparison is on sample data and test data. Sample data used for example id_document = 1 with test data id_document = x. Then the application of the formula is as follows: 1= √∑(1) 2 + (−0,07175) 2 + (0,5) 2 + (1) 2 + (0,2222222) 2 + (1) 2 =1 1= 1.83494708 (3) The distances in the test data as shown in table 7 can be sorted into the closest as follows: The following are the stages of data processing using RapidMiner tools from the results of preprocessing in the early stages.     Table 9 is the result of the K-Nearest Neighbors sentiment training process with an accuracy value of 13% for positive sentiment and very large accuracy on negative sentiment reaching 87% of the total 100% accuracy. While the precision results in the positive sentiment class are worth 1 and the negative sentiment class is worth 1, thus the precision results are very accurate. Meanwhile, the recall results are the same as the precision results, with the results of positive and negative sentiment classes being worth 1. It means that the results of the training data sentiment are correct for all sentiment classifications.

D. Change in k Value
The following is an experimental process by changing the k value to determine the accuracy, precision, recall, and AUC values:  Based on the accuracy value, an AUC graph can be made as shown above. Figure 5 shows the AUC graph with the application of the K-Nearest Neighbors method, resulting in the Area Under Curve (AUC) = 0.916. So from the values that have been obtained, it is concluded that the accuracy classification in this study is included in the Excellent Classification which can be seen in the guide to the accuracy of the AUC value as below: i. 0.90 -1.00: Excellent Classification, Research has been carried out by applying the classification of comment data from Facebook social media by taking into account the views of Wira Wacana Sumba Christian University students on the use of online or online learning models during the Covid-19 pandemic by utilizing the K-Nearest Neighbors method. The data of 26 positive reviews and 174 negative reviews were used as a dataset and then classified with the results of an accuracy value of ISSN 2085-4579 87.00% and an AUC value of 0.916 so that these results were used as a reference for classifying the classification group, namely Excellent Classification. Based on the results of the study obtained a very good accuracy value and obtained a fairly large AUC value. In relation to the online learning process, the value generated is in the form of a negative opinion presentation of 87. 00% indicates that most of the student population considers online learning that has been implemented so far to be ineffective. The presentation of the negative opinion is supported by an accuracy value based on the K-Nearest Neighbors algorithm of 87%, which means that the presentation has a very high validity value. So it can be concluded that there needs to be an evaluation of the online learning model at Unkriswina Sumba, both the competence of lecturers in teaching online, the handling of uneven internet networks for students, solutions for students who are economically limited and unable to study online. The presentation of the negative opinion is supported by an accuracy value based on the K-Nearest Neighbors algorithm of 87%, which means that the presentation has a very high validity value. So that it can be concluded that there needs to be an evaluation of the online learning model at Unkriswina Sumba, both the competence of lecturers in teaching online, the handling of the uneven internet network for students, solutions for students who are economically limited and unable to study online. The presentation of the negative opinion is supported by an accuracy value based on the K-Nearest Neighbors algorithm of 87%, which means that the presentation has a very high validity value. So it can be concluded that there needs to be an evaluation of the online learning model at Unkriswina Sumba, both the competence of lecturers in teaching online, the handling of uneven internet networks for students, solutions for students who are economically limited and unable to study online.