Aspect-Based Sentiment Analysis on Application Review using CNN

—As an obligatory application during the COVID-19 pandemic by Indonesians, PeduliLindungi must have provided outstanding quality services to its users. However, as of December 2021, users’ sentiment toward the quality and service of the PeduliLindungi application was still low, with an application rating of 3.6 out of 5 on the Google Play Store. This study uses text mining techniques for the Aspect-Based Sentiment Analysis (ABSA) task in the PeduliLindungi application review, a sentiment analysis task based on the aspect category of the application. This study aims to classify the users’ sentiment on aspects of the application and provide insight and knowledge to improve the quality of the PeduliLindungi application. The ABSA method used in this study is the classification of aspects and sentiments using the Convolutional Neural Network (CNN) algorithm. The results showed that the CNN model could produce such good performance with an f1 score of 92.23% in the aspect classification and 95.13% in the sentiment classification. The results of user sentiment modelling showed the dominance of negative sentiment in the eight aspects of the application, namely Visual Experience, Scan – Check-in/out, Vaccine Certificate, eHac, COVID Test, Register/Login, Performance and Stability, and Privacy, Data, and Security.


I. INTRODUCTION
COVID-19 (Coronavirus Disease) first identified in Wuhan, China, in December 2019, has spread throughout the world until now [1].In identifying cases and preventing the spread of the virus, many types of mobile applications have been developed.The first COVID-19 mobile application to be developed and widely published was an application for contact tracing created to notify users if they met another person infected with COVID-19 [2].In Indonesia, the application developed to assist government in tracking to prevent the spread of COVID-19 is the PeduliLindungi application [3].
First released on March 28, 2020, the PeduliLindungi application has a tracking function by relying on community involvement to share location data to trace contact history with COVID-19 patients, patients under supervision, and people under supervision can be carried out.
Not only as a contact tracing application, PeduliLindungi also continues to grow and has many additional features.In September 2021, responding to the policy for the Implementation of Restrictions on Community Activities, commonly known as PPKM, the PeduliLindungi application became a mandatory application for public access, according to the rules in the Instruction of the Ministry of Home Affairs Number 42 of 2021 [4].This rule has led to an increase in the use of the PeduliLindungi application.
As of December 2021, PeduliLindungi is the number 1 application in Indonesia in the medical category on the Google Play Store.It has been downloaded by more than 50,000,000 people and has a 3.6 out of 5 rates on the Google Play Store [5].The rating is still relatively low, considering that PeduliLindungi, as an obligatory application, should provide excellent quality and service to its users.
Various reviews, as well as good and bad, are inevitable.However, this can be used to improve the quality of the application based on the analysis results from user reviews.By knowing the sentiments of aspect reviewed by users, developers can improve the quality of the relevant aspects of the application.
Sentiment analysis extracts sentiments, opinions, or judgments on products or services [6].Most sentiment analysis is carried out at the sentence level, so it does not provide sufficiently important information for decision-making.However, this information can be obtained by conducting sentiment analysis at the subsentence level or aspect level [6].
If a reviewer reviews a product, the thing being reviewed relates to the aspects that exist in the product.It does not mean the reviewers like or dislike the ISSN 2085-4579 product as a whole, but in certain aspects.This concept sparked the Aspect-Based Sentiment Analysis (ABSA), which aims to discover people's sentiment about aspects of an entity [6].The ABSA process is mainly done by classifying aspects and sentiments.The algorithm model used will classify the text into category aspects and then determine the sentiment [7].
Research conducted by [8] compared several deep learning algorithms in performing ABSA on hotel reviews with target classification aspects: price, hotel, room, location, service, restaurant, and sentiment classification: positive and negative.The study showed the CNN model algorithm has an accuracy of 90.4% for sentiment classification and 87.2% in aspect classification.
Researchers [9] compared the CNN algorithm model with Naïve Bayes in conducting ABSA on online marketplace reviews with target classification aspects: accuracy, quality, service, price, packaging, delivery, and sentiment classification: positive and negative.The CNN algorithm has a higher average accuracy of 91.98% for aspect classification and 93.07%for sentiment classification.No other journal sources are identical to the topic of ABSA for the Pedulilindungi application.
This research uses the CNN algorithm for the text classification task and aims to build the CNN model to classify aspects and sentiments on the PeduliLindungi application review, discover the model's performance, and compare the sentiment per an aspect of the application in versions 4.0.2 and 4.0.5.
In comparison to previous research, this research uses the Pedulilindungi application review as the research object.It will also classify unlabeled data using the CNN model built to compare the sentiments of each aspect sentiment on reviews of different application version.Both research conducted by [8] and [9] also used more general target classification aspects, such as quality, service, and price.Meanwhile, this research followed a series of aspect categories standards by Android and curated some aspects directly related to the function of the application content.

A. Text Mining
Text mining is a process of mining text data from an unstructured format to a structured format to identify existing patterns [10].The main goal is to obtain and extract useful information from the text for use in further tasks.Text mining requires structuring the text used as input because it has an unstructured format.Therefore, text pre-processing must be carried out to clean and convert text into a structured format.
The pre-processing stages are divided as follows: 1) Case Folding A common approach to deal with inconsistent capitalization in text is to generalize all characters by using the same letter, which is lowercase [11].In addition, removing punctuation, numbers, extra spaces, and single characters is required to reduce noise.

2) Tokenization
Tokenization is breaking long sentence text into words, called tokens [10].This process investigates each sentence and creates a list of tokens that can be used as input for the following algorithm [12].The main objective is to investigate the words in a sentence [11].

3) Normalization
This process aims to normalize non-standard languages to the appropriate word in the KBBI.

4) Filtering
This process includes steps such as removing words with no information or are unnecessary (stop words).With this, the dimensionality of the text can be reduced without reducing the text content [12].

5) Stemming
This process aims to search for stem words by transforming words that have affixes or suffixes to the root words [12].

B. Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis, or ABSA, is a type of sentiment analysis that aims to determine sentiment in each specified aspect [8].ABSA processes information at the sub-sentence level or aspect level.
In several studies, the process in ABSA is divided into two tasks, namely the task of aspect extraction and estimating the polarity/rating [6].Aspect extraction aims to extract words/aspects from product reviews and group synonyms for each aspect because each person can use different phrases that refer to the same aspect [6].The second task is polarity estimation which aims to determine the sentiment on an aspect, whether positive, negative, or neutral [6].In this method, aspects are extracted first and then classified as positive or negative [13].
ABSA is also carried out in other method because the aspect extraction process requires a lot of resources [8].The ABSA process can be done by classifying aspects and sentiments.The model used will classify text documents into category aspects and sentiment tendencies [7].For example, in the review sentence "The food price is quite high", the ABSA model will classify the sentence into price aspects and negative sentiment classes [7].This method requires labeled text data to train the model used in ABSA.

C. Convolutional Neural Network
Commonly used in computer vision and image processing, such as image classification and object detection, Convolutional Neural Network or CNN has been proven effective in Natural Language Processing (NLP) and has achieved good results in semantic text classification task [14].

ISSN 2085-4579
The following is an explanation of each layer used on CNN for text classification task: 1) Embedding layers This layer functions to map input in vocabulary indices into low-dimensional vectors [16].The maximum sentence length determines the vocabulary size.After the words are transformed into vectors, it will be fed to the convolutional layer [15].

2) Convolutional layers
This layer is the main processing layer of the model, which carries out the convolution process for inputs and filters [9].When the input enters this layer, a convolution operation involving a filter is applied to the word window to generate a new feature.The filter is applied repeatedly to each word window in the sentence to produce a feature map [14].

3) Pooling Layer
This layer gradually reduces the number of parameters, the computational complexity of the model, and control overfitting [16].Max-overtime pooling is often applied to feature maps to retrieve the most important feature (feature with highest value) for each map [14].

4) Fully-connected Layer or Dense Layer
This layer forms one-dimensional neurons and consists of neurons interconnected with neurons in the previous and subsequent layers [9].In this layer, regularization can be done with a dropout function to keeps the neurons in a probability value between 0 and 1, making it easier to classify output classes [17].This layer will also output the specified number of classes using SoftMax activation.

A. Overview of Research Object
The object of this research is user reviews on the PeduliLindungi application.PeduliLindungi is an application developed to help government agencies in tracking to prevent the spread of COVID-19 [3].Since September 2021, PeduliLindungi has been an obligatory application for several activities and public access.
This study uses two application version reviews, namely versions 4.0.2 and 4.0.5.The selection of the application version is based on the amount of review data that is adequate for this study and adjusts the data collection period.

B. Research Flow
The research flow used in this study is an adaptation of a research journal [7], [8] with several adjustments.The following is the steps that describes the flow of this research:     Functionalitye-Hac The functions in the e-Hac menu: to create, and view travel documents.The reviews contain kind words, positive emotions, and support both implicitly and explicitly.

Negative
The reviews contain bad words, negative emotions, and do not support either implicitly or explicitly.
c) Data Pre-processing At this stage, we clean the application review data in versions 4.0.2 and 4.0.5 and convert it from unstructured text data into structured ones.

e) Classification Modelling
Modeling is divided into two: modeling for aspect classification and sentiment classification.A separate CNN model will be created using parameters that show the best accuracy results in the hyperparameter tuning process.After initiating the model, training will be carried out using the data train.

f) Model Evaluation
We evaluate the model for its accuracy and loss using validation data at this stage.After that, we test the model using test data and evaluate the classification performance using metrics, namely accuracy, precision, recall, and F1 score.

A. Data Collection and Selection
We collect the user reviews from Google Play Store using google_play_scraper.There are 2,320 reviews of version 4.0.2 and 1,031 reviews of version 4.0.5.

B. Data Labelling
Table III shows the example of review data of version 4.0.2 with its labeled aspect and sentiment.

C. Data Pre-processing
This stage covers cleaning and converting unstructured text data into a structured format by filtering the terms of unnecessary things and normalizing them to a more uniform sequence.The result of data pre-processing is shown in Figure 4. b.Vectorization This stage aims to convert the text into a unique integer list form where each integer represents a unique word in the dictionary.Figure 5 shows the results of vectorization.The vocabulary size is 1.911, which indicates there are 1.911 unique words in the data.

d. Encode Label and Array Convert
We perform the label encoding at this stage to convert the string type of aspect and sentiment label to a unique integer.The aspect label is then converted into a binary matrix.

E. Classification Modelling 1) Aspect Classification Modelling
The CNN model in this study uses CNN sequential, with each layer stacked linearly from end to end.Each layer and parameter used can be seen in Table V.We compile the aspect model using an Adam optimizer with a learning rate of 0.0001 and categorical cross-entropy for loss type.We train the initiated aspect model using data train, while the validation process uses the validation data.

2) Sentiment Classification Modelling
The sentiment classification model also uses CNN sequential.Each layer and parameter used can be seen in Table VI.We also compile the sentiment model using an Adam optimizer with a learning rate of 0.001 and binary cross-entropy for loss type.
ISSN 2085-4579 Then, we train the initiated sentiment model using data train, and the validation process uses the validation data.

F. Model Evaluation 1) Evaluate and Test Aspect Classification Model
Figure 9 shows the accuracy and loss graph during the training and validation using the CNN model for aspect classification.We can see the CNN model for aspect classification is well trained with training accuracy of 98.5% and validation accuracy of 90.2%.There is no overfitting indicated by the validation loss that continues to decrease in each epoch.
Then we apply the model into the test data to classify the aspect of the review.Figure 10 shows the confusion matrix of the aspect classification using test data.

Fig. 10 Aspect Classification Confusion Matrix
The CNN aspect model got an overall accuracy of 0.9224, precision of 0.9234, recall of 0.9224, and f1 score of 0.9223.The performance of each label can be seen in Table VII.The Scan Checkin or check-out aspect has the highest F1 score of 0.9489, while the aspect with the lowest F1 score is the Privacy, Data, and Security aspect with 0.8710.The CNN model for sentiment classification is also well trained with a training accuracy of 97.4% and validation accuracy of 93.7%.Also, there is no overfitting indicated.
We apply the sentiment model into test data to classify the sentiment of the user review.Figure 12 shows the confusion matrix of the sentiment classification using test data.

G. Implementation
From the model evaluation, we concluded that the CNN model for both classification of aspects and sentiments has a good performance.After that, we perform the classification predictions to the unlabeled data using both CNN aspect and sentiment model.User sentiment on both version of PeduliLindungi application is dominated by negative sentiment.Figure 15 shows that if we explored it per aspect, almost all the aspects experienced an increase in negative sentiment in version 4.0.5, except for the Visual Experience aspect.Vaccine Certificate aspect experienced the highest increase in negative sentiment in version 4.0.5 by 17.21%.It was caused by an additional CAPTCHA that did not function well in claiming vaccine certificate; as in the review, "Setiap mau klaim sertifikat vaksin gak bisa, captcha ngulang terus" (every time (I) want to claim a vaccine certificate, it fails, the captcha keeps repeating).

H. Interpretation
The COVID-19 Test aspect also experienced an increase of 13.70% due to the number of test results that did not available in the application, such as in the review "Udah test PCR di RS AK Gani Palembang, tapi hasil di PeduliLindungi belum keluar padahal besok terbang" ((I) had a PCR test at the AK Palembang hospital, but the results PeduliLindungi have not been released, even though tomorrow (is the schedule to) fly).
Furthermore, the third-highest aspect is the eHac aspect, increasing 10.05% due to the eHac creation flow becoming more complex, as in the review "Pengisian eHac yang terbaru tidak praktis."(The latest (version of) eHac filling is impractical).
From the comparison between sentiment per aspect on version 4.0.2 and 4.0.5, it can be concluded that the application updates made an increase of negative sentiment in 7 aspects of the application.It is necessary to improve the PeduliLindungi application based on the cause.This analysis can also be used to prioritize aspects for corrective action in the following application update.
Research conducted by [8] and [9] has proven that CNN performs well for the ABSA task in classifying aspects and sentiments of user reviews.As for this research, we also performed the classification task to unlabeled data and compared each sentiment per aspect on different review per application version to expose whether an application update significantly changed the user's sentiment.Furthermore, this research's application of the CNN model gave better results with no overfitting indicated.The targeted aspects in this research will also give a better insight into the application usage since we use more detailed aspects than in the previous research.

ISSN 2085-4579
V. CONCLUSION This study performed good results of Aspect-Based Sentiment Analysis (ABSA) using CNN model on aspect classification and sentiment of review data.The results showed that the CNN model could produce such good performance with an f1 score of 92.23% in the aspect classification and 95.13% in the sentiment classification.
User sentiment on the eight aspects of the application: Visual Experience, Scan -Checkin/Out, Vaccine Certificate, eHac, COVID-19 Test, Register/Login, Performance and Stability, and Privacy, Data, and Security is dominated by negative sentiment.As for the application version 4.0.5, the sentiment given to each aspect increased in negative sentiment, except for the Visual Experience aspect.
In version 4.0.5, the Vaccine Certificate aspect increased 17.21% due to the CAPTCHA feature that did not function properly.It was then followed by COVID Test aspect by 13.70% due to the large number of test results not released in the application, and the eHac aspect of 10.05% due to the impractical flow of the eHac filling.
Since the performance of the CNN model in this study has proven to be good, it can be continued with the development of applications that will facilitate the monitoring of user sentiment on every aspect of the application review.For further research, exploration of word embedding options also can be carried out using pre-trained word embedding such as Word2Vec or Glove to improve the word representation with semantic meaning.

Fig. 2 .
Fig. 2. Data Pre-processing Flow d) Data PreparationThis stage is intended to prepare text data that is acceptable as the input of neural network.

Fig. 3 .
Fig. 3. Data Preparation Flow g) ImplementationThis stage aims to classify the unlabeled data (review version 4.0.5),using the CNN model for aspect classification and sentiment classification.The output is the classified aspects and sentiment of 4.0.5 review.h)InterpretationThis stage aims to explain the results of the aspect and sentiment classification and compare the sentiments in each aspect on version 4.0.2 with version 4.0.5.The comparison is intended to determine whether the sentiment in each aspect significantly changed accordingly to the version update.

Fig. 4 .
Fig. 4. Data Pre-processing ResultD.Data PreparationWe perform 4 steps in this stage, including: a. Splitting Data We divide the review of version 4.0.2into 70% train data, 15% validation data, and 15% test data with random_state of 42.Thus, 1.624 data trains, 348 validation data, and 348 test data.

Fig. 5 .
Fig. 5. Vectorization Result of Review Data c.Pad SequencesWe transform each review to the same length so it can enter the neural network.The max sentence length parameter will determine how long each sequence is based on the longest sentence in the review data.To have the inputs with the same length, we fill the empty slot in the sequence with 0.

Fig. 9 .
Fig. 9. Accuracy and Loss of CNN Model of Aspect

Fig. 11 .
Fig. 11.Accuracy and Loss of CNN Model of Sentiment

Fig. 12 .
Fig. 12. Confusion Matrix of Sentiment ClassificationThe CNN sentiment model got an overall accuracy of 0.9510, precision of 0.9514, recall of 0.9510, and f1 score of 0.9513.The performance of each label can be seen in Table

Fig. 13 .
Fig. 13.Classified Aspect and Sentiment of 4.0.5 ReviewTableIXshows the distribution of the classified aspects and sentiments on 4.0.5 application review.

Figure 14
compares sentiment in each aspect in versions 4.0.2 and 4.0.5,where red represents negative sentiment and blue represents positive sentiment.

Fig. 14 .
Fig. 14.Comparison of Sentiment in Each Aspects

Fig. 15 .
Fig. 15.Sentiment Percentage Difference in Version 4.0.2 to 4.0.5 PeduliLindungi version 4.0.5 was updated on November 19, 2021.Several things are updated in this version, which listed on the PeduliLindungi page on Google Play, including: -Changes in the UI/UX.-Added Chinese, Japanese, Russian, Korean, and Spanish language options.-Improved flow of the E-Hac menu.-Added a CAPTCHA in certificate claim.-Added FAQ regarding zoning color status.-Eradication of bugs (errors).

Table I .
As for the sentiment, there are two

TABLE II .
SENTIMENT CATEGORIES

TABLE III .
EXAMPLE OF REVIEW DATA PositiveThe distribution of data labels, as seen in TableIV, shows the data is label imbalance.The aspect with the most reviews is Performance and Stability with 408 reviews, while lowest is the COVID-19 Test with only 73 reviews.Most reviews are negative sentiments with 1840 reviews and positive sentiment with 480 reviews.

TABLE IV .
ASPECT AND SENTIMENT LABEL DISTRIBUTION

TABLE V .
LAYER STRUCTURES OF ASPECT CNN MODEL

TABLE VI .
LAYER STRUCTURES OF SENTIMENT CNN MODEL

TABLE VII .
ASPECT CLASSIFICATION REPORT

Table ISSN 2085-4579 VIII
. Negative sentiment has the highest F1 score of 0.97, while positive sentiment has an F1 score of 0.8682.

TABLE VIII .
SENTIMENT CLASSIFICATION REPORT

TABLE IX .
LABEL DISTRIBUTON OF CLASSIFIED 4.0.5 REVIEWS