Analysis Of UMN Student Graduation Timeliness Using Supervised Learning Method

—— Education is one of the most important things in human life, and in the world of education. However, there are still many students who graduate not on time. The purpose of this study is to find out an overview of what factors influence, then data analysis, and visualization so that students can graduate on time or not on time for UMN student graduates in 2018-2020. The method or approach used to solve the problem is data collection, independent variable, dependent variable, CRISP-DM, with SQLYog tools, to store data, rapid miner for data cleaning, then calculate prediction accuracy with rapid miner using nave Bayes algorithm, and regression logistics, using the included 10-fold validation method, and visualizing the data with Tableau. The conclusion of the final result that is done from this research is for the project to be able to process simple mysql pentaho storage, For data mining, suggesting using the model with the greatest accuracy in each semester in the Information Systems study program is for Semester 1 to use the IPS model - Cross Validation Logistic Regression, then Semester 2 to Semester 7 using the GPA-NaiveBayes (Normal) or GPA-NaiveBayes (Traning With CrossValidation) model). For Data Visualization, there are also insights that will be discussed further in this thesis.


I. INTRODUCTION
Education is one of the most important things for human life, education itself has the meaning of knowledge, skills, to a group of people that are passed on from generation to generation through teaching, or research, education is important for generational transfer, because the future is determined by a new generation, current work many are replaced by the new generation [1]. Universities in Indonesia, can take the form of institute, polytechnic, Academy,University and high School. Universities can organizeeducation, vocation, profession, academic, with educational programs Diploma, or Bachelor, or Master, or Doctoral, and Specialist [2].
Multimedia Nusantara University is a university that was established in 2005 with 4 faculties with 12 study programs at the undergraduate level (S1) in 2020, and 1 study program, namely hospitality at the Diploma level (D3) UMN is located in Kelapa Dua Summarecon Serpong, Tangerang Regency. Students graduate in the Bachelor (S1) program with a minimum credit that must be completed is 144 semester credit units (credits), and the maximum length of learning is 7 years, the length of study for undergraduate students (S1) normally according to the curriculum is 8 semesters or for 4 years. However, many students who complete their studies pass the general standard of graduation or can be said to be in the category of not graduating on time. In education, especially Indonesia, the quality of education must be improved, so that it can be useful in the world of work, especially for service to the country [3]. The number and percentage of graduate study programs that are not punctual in 2018 to 2020, where in 3 years the percentage of study programs that increase in punctuality is Accounting, Film, and Television, Communication Studies, and finally Information Systems, data obtained from the Academic Information Bureau (BIA) of Multimedia Nusantara University. [4]. However, there is lack of research that analyzes the category of UMN graduates that has been traced on the UMN knowledge center website [4]. From the existing background, the formulation of the problem emerges, where the questions that will discuss what are the factors that affect the punctuality of graduation for students at Multimedia Nusantara University in the 2018-2020 graduation year.

A. Object of research
The object of research in this thesis proposal is UMN students, many UMN students want to graduate quickly or want to improve grades in the intermediate semester, for the Bachelor program is an academic education level that has a study load of between 144 semester credit units (credits) to 160 credits, with a curriculum of 8 semesters. and the length of the program is between 7 to 14 semesters [5], Diploma Three Program (D-3) is an academic education level that has a study load between 108 to 120 semester credit ISSN 2354-0082 units (credits), with a curriculum of 6 semesters and program duration between 6 to 10 semesters [6].
The research method is a quantitative research, because it measures a data problem through numbers, and also measures as descriptive words pass on time, or not. Data can be converted in statistical form and taken into account in making a solution and the method used is not from questionnaires, surveys, polls, or interviews that are questions, the number of participants in quantitative methods tends to be more than qualitative [7].

1) Business Understanding
Business Understanding in this thesis is an analysis of UMN student graduation per study program, and per semester, the rate of non-graduation on time is still large and increasing in the 2018-2020 period at UMN, chapter 4 will discuss graduation in the UMN academic guide. Pass is when you have completed all the credits of lessons in each study program including internship and thesis sessions, and have taken the IELTS English exam, pass on time for Bachelor (S1) if you pass 3.5 years or 4 years, for not being on time pass above this number, including students who drop out of lectures [5].

2) Data Understanding
The data used is graduate student data (S1) who have graduated from 2018 to 2020 at Multimedia Nusantara University. data was obtained from the Academic Information Bureau (BIA) of Multimedia Nusantara University. The data rows were 3625 rows of data. The dependent variable is a variable that is influenced by the independent variable, the data obtained by students who have graduated from 2018 to 2020 only, in the dataset, the dependent variable is as follows: Category Of Graduates. Whilst independent variables are variables that can affect other variables. In the dataset, the independent variables are as follows:

3) Data Preparation
The data must be clean and free from missing values, therefore by eliminating the missing row labels and missing row attributes, and removing duplicates so that the resulting data is more valid, connect the data to Pentaho MySQL. This project saves MySQL pentaho data using SQLYog software to input data into a table, or create a table, which has a data type, and the length of each column. making data tables getting from csv format a total of 3625 data, inputting MySQL pentaho data from each study program so that it can store data optimally, the name of the thesis database, with the data_graduate_2018 to 2020 table, for a small part of the data, divided into 13 study programs namely the account table, architecture, DKV, FTV, Hospitality, Communication Studies, Journalism, Management, SI, SK, TE, TF, IT.   Fig. 4 explain about data cleaning on all tables, by connecting with mysql, then select attributes, and set labels, namely the graduate category, with filters for no missing labels, and no missing attributes so that the data is clean, as well as remove duplicates.

4) Modeling
There are two pupular algorithms that have been applied for this topic. The first algorithm is Naïve Bayes with 96.67% accuracy [9] and and 80% accuracy [10]. Secondly, reseachers uses a logistic regression algorithm with 90.2 % accuracy [11]. The reasons for choosing Naïve Bayes because, it does not have to be numerical for all predicted variables such as neural networks, can be used for quantitative and qualitative data, does not require a large amount of data, if used in programming languages, the code is simple, can be used for problem classification. binary or multiclass, compared to logistic regression where the dependent variable must be binary yes or no, the more variables the more precise. The reason for choosing Logistics Regression is because the independent variables or attributes in logistic regression do not have to be all numeric to predict the dependent variable, using logarithmic or logarithmic logistics, suitable for 2-choice logistic regression or true, false.
The modeling used is classification, using Naïve Bayes data mining algorithm, and logistic regression, for data mining using rapidminer tools, in this study by making a combination of naive bayes and logistic regression with the normal model, Cross Validation, and reducing the IPS attribute per semester with 70% training data, and 30% testing data, after getting a conclusion, data visualization is formed using the Tableau tools.  Table I. explains about the selection of data mining software for research from the advantages and disadvantages that exist, then the rapid miner is used because Open-source uses 10,000 data rows, usage is simpler because it uses the drag and drop method for data mining and data cleansing processes.  Table II. explained about the selection of data visualization software for research from the advantages and disadvantages that exist, by choosing the Tableau software because of the interactive visual options, there are moving graphics, User friendly, does not need a lot of hard coding, Mobile friendly dashboard, can process data on mobile, can connected to a database of many types, there is a story feature, and a dashboard.

5) Evaluation
The evaluation in this study is to compare the accuracy of the Naive Bayes model and logistic regression with the normal model, cross validation, and reduce the Social Studies attribute per semester with 70% training data and 30% testing data.

6) Deployment
In this study, it was used only for learning, the deployment stage was not used because it did not use a system model to UMN students III. RESULT AND DISCUSSIONS      Fig. 9 Explaining the dashboard of graduates of UMN S1 students over a 3 year period with the percentage, year, number, and category of graduates, from this dashboard it can be said that the percentage of inaccuracy of UMN student graduates increased from a period of 3 years, the blue color shows the number of students on time, and the orange color indicates the number of students is not on time, the data is obtained from the Academic Information Bureau (BIA) Universitas Multimedia Nusantara which was approved by the head of the Information Systems study program, Mrs. Ririn Ikana Desanti. Seen from Fig. 10 that there is also a problem in this research, namely the Average GPA of Graduation Annually at UMN for the 2018-2020 period decreases every year in a special 3-year period for S1, data obtained from the Academic Information Bureau (BIA) Universitas Multimedia Nusantara which was approved by the chairman Information Systems study program, namely Mrs. Ririn Ikana Desanti.  Table III. describes the comparison of the output accuracy of the winning model in each semester in each study program which for 3 years from 2018 to 2020 experienced an increase in untimely graduation, in this thesis data mining all using the most influencing variable with the greatest accuracy is class, school origin, IPS Semester 1, IPS Semester 2, GPA Semester 2, IPS Semester 1, IPS Semester 3, IPS Semester 4, IPS Semester 2, IPS Semester 5, IPS Semester 6, IPS Semester 3, IPS Semester 7.

A. Evaluation
In conclusion, suggesting using the model with the greatest accuracy in each semester in the Information Systems study program is for Semester 1 to use the IPS model -Cross Validation Logistic Regression, then Semester 2 to Semester 7 using the GPA-NaiveBayes (Normal) or GPA-NaiveBayes (Traning With CrossValidation) model).  Table IV. describes the results of the comparison of data analysis from 3 categories, namely data storage in Pentaho MySQL, data mining, and data visualization.

A. Conclusions
From the results of the practicum in this thesis report, the goal was achieved. By knowing the factors that affect the timeliness of graduation for Bachelor (S1) students at Multimedia Nusantara University in the 2018-2020 graduation year with the variable of the algorithm winner from each semester in the Information Systems study program, namely the generation, origin school, IPS Semester 1, IPS Semester 2, GPA Semester 2, IPS Semester 1, IPS Semester 3, IPS Semester 4, IPS Semester 2, IPS Semester 5, IPS Semester 6, IPS Semester 3, IPS Semester 7, with processing where MySQL data input was successful using Pentaho software, and SQLYog, for visualization data, the percentage of inaccuracy level of UMN student graduates increased from a 3-year period, Average Graduation GPA Annually at UMN 2018-2020 period decreased every year in a 3-year periodit can be seen that there are more percentages of women graduating on time than men, and in 2020 graduates, architecture, hospitality, journalism, electrical engineering, physics engineering obtained graduates 100% on time, then the average GPA graduated from UMN students in 2018 to 2020 based on study programs, the winner is engineering physics with an average GPA of 3.61, the number, and percentage of study programs at UMN graduates in 2018-2020 based on study programs, with the largest percentage of ILKOM being 1025 students with a percentage of 28.28%, and for diplomas only 17 people, with a percentage of 0.47%.then the average GPA graduated from UMN students in 2018 to 2020 based on study programs, the winner was physics engineering with an average GPA of 3.61, the number, and percentage of study programs for UMN graduates in 2018-2020 based on study programs, with the largest percentage of ILKOM being 1025 students with a percentage of 28.28%, and for diploma only 17 people, with a percentage of 0.47%.then the average GPA graduated from UMN students in 2018 to 2020 based on study programs, the winner was physics engineering with an average GPA of 3.61, the number, and percentage of study programs for UMN graduates in 2018-2020 based on study programs, with the largest percentage of ILKOM being 1025 students with a percentage of 28.28%, and for diploma only 17 people, with a percentage of 0.47%.

B. Suggestions
The findings of this model can be used as input to create a database, or create a timely graduation rate analysis system for UMN students, so that with a directly connected database they can issue reports and complete the analysis.