Teakwood Grade Identification with GLCM and K-NN with Adaboost Optimization (Case Study at KPH Cepu)

— Teak is one type of tree that has many functions and uses. Teak wood have a very high quality to choose as resource for the manufacture of home furniture such as tables, chairs, cabinets, and others. But middle testers (Perhutani staff) who test the quality of wood grade have limitations if the classification uses the five senses of sight and there are still many furniture entrepreneurs who are often mistaken about teak wood quality assessment. This resulted in a lack of quality grade teak wood used as raw material for making home appliances or for furniture and commerce in Perhutani Corporation, especially KPH Cepu. The teak wood image data is then acquired through preprocessing data ready to be processed. By using GLCM as an image feature extraction both training data and testing data. After the image characteristics are obtained, the image is classified by the K-Nearest Neighbor method with adaboost optimization. The final result is obtained in the form of wood grade quality classification namely grade A, B, C and D according to the class.


INTRODUCTION
Teak is a type of woody tree that has high quality. A large tree, straight trunk and can grow to a height of 30-40 m straight towering. Teak has the biological name Tectona grandis L.f. In today's industrial bussines, teak wood is mostly made into veneers to coat the outer skin of expensive plywood, besides that, parquet is also made for floor coverings, even twigs that cannot be used as raw material for processed furniture can still be used as fuel because teak can produce high degrees of heat high enough so that in the past it could be used as fuel in steam locomotives. Not only domestic commodities, Indonesia also exports processed teak wood in the form of outdoor furniture [1]. The largest teak production in Central Java is in Perum Perhutani Kph Cepu which has an area of 33,017.29 Ha. The teak forest area is located in Blora Regency and Tuban Regency and Bojonegoro Regency. Production management in the forest area in the Cepu KPH area is divided into two Forest Management Sub Units (SKPH), namely SKPH north Cepu Utara and SKPH South Cepu. The demand for teakwood production at Kph Cepu is the highest to date. Kph Cepu carries out production activities in accordance with the legal basis of central regulations. The production function is more focused on logging and testing the quality of teak wood. Teak wood grouping serves to determine with grades A, B, C, and D. This test was carried out by the staff of Perum Perhutani Kph Cepu Middle examiner. So far, the intermediate examiners have been doing it manually based on the five senses of sight with teak logs that have been classified by the length and diameter by the loggers. The grouping or classification of the quality of teak wood types aims to facilitate the sale of teak on the Perum Perhutani trading portal at Tokoperhutani.com and facilitate the fulfillment of buyer needs.
Identification of teak wood quality based on the image of the vertical cross-section is still done manually and takes a long time so that the production of teak wood for sale is not optimal. Many studies have been carried out by completing the classification of data record numbers taken through measurement actions during felling [2]. In the processed image data, texture feature extraction is the most recommended considering the homogeneity of the data features that are still in image form. The steps in feature extraction can use Gray level co-occurrence matrix which is a statistical method used as texture feature extraction. After the feature extraction results are complete, the classification is not only processed. The classification process is carried out to group images that have been identified using GLCM into classes that have been determined by processing data information. One of the specific problems with texture images is that the image edges are considered as textures in the search for image features.
R. Qayyum, K. Kamal ,T. Zafar and S. Mathava uses the Particle Swarm Optimization algorithm to train a feed forward neural network used to classify defects using GLCM-based features. The proposed technique shows promising results for the classification of wood defects. Means found that the Square Error of the network for the training dataset was 0.3483, whereas, for the test dataset, the accuracy was found to be 78.26% [3].

ISSN 2085-4552
While another study by Stefanus Santosa, R. A. Pramunendar, D. P. Prabowo, and Yonathan P. Santosa proved that GLCM displays the adjacency relationship between pixels in the image. The angles used are 0°, 45°, 90° and 135°. In the experiments carried out, it was produced 99.14% with the combination of BPNN GA [4].
Previous research can be concluded if the use of Gray Level Co-occurrence Matrix (GLCM) and Euclidean Distance as a calculation of the distance between images that can produce accuracy accurately. In this study, texture feature extraction for teak wood that has the possibility to be used is Gray Level Cooccurrence Matrix (GLCM) with 5 features, namely Contrast, Energy, Correlation, Homogeneity, and Entropy. As for the classification used is the K-NN classifier because the use of training data to be tested is quite a lot so it is expected to have a sufficient and good level of accuracy. To cover the weakness of the k-nn algorithm, adaboost optimization is used to reduce the presence of weak data or data outliers so that all data can be processed optimally. The data classified is of the type of teak wood texture image data which does not have information that can be processed for classification and ith the K-nn algorithm method, there are still some missing values in the k-nn classification model so that the classification algorithm's performance is not optimal. Processing data classified as teak texture image data in order to have information that can be processed for classification as data information using Gray level co-occurance matrix for feature extraction on teak board texture images. The creation of a grade classification model for teak wood boards using k-nn with adaboost optimization for classification and improving the performance of the classification algorithm.

A. Extraction Texture Feature
Sentiment analysis is one of the methods used to Rizky Andhika Surya, Abdul Fadlil, Anton Yudhana explained that the research he did using the image extraction method which was converted through the gray level coocurance matrix method could be used as raw material for batik image classification. This is done because it is felt that the results of GLCM can show differences between images with different texture [5].

B. Gray Level Co-occurrence Matrix
Gray Level Co-occurrence Matrix (GLCM) is a matrix whose elements are the number of pairs of pixels that have a certain brightness level, where the pixel pairs are separated by distance and angle. Angle orientation is formed based on four corner directions, namely, 0°, 45°, 90° and 135°, and the distance between pixels is 1 pixel [5]. The input value from GLCM is a matrix which is a representation of the grayscale image, the output from GLCM is a co-occurrence matrix which we can then extract characteristics based on secondorder statistical feature parameters such as contrast, correlation, homogeneity and energy.

C. Data Mining
Data mining has an important function in this research. data mining processes teak wood chip data to be able to make the information needed and increase knowledge by users of this research. Basically, data mining has four main functions in processing the teak wood image that has been converted through GLCM. The main thing is to predict, including where the testing data has been taken, then to describe what information is in the processed data, then to classify the data based on predetermined groups and finally to see if the data is associated with data that is close to its characteristics.
Classification is the process of finding a model or function that describes and differentiates data into classes. Classification involves the process of examining the characteristics of an object and assigning the object to one of the predefined classes.

D. Adaboost in KNN
The K-Nearst Neighbor classification works based on an analogy, where test and training data are compared and conclusions are drawn based on the similarity of the data generated by the comparison [6]. The calculation is carried out based on the distance (closeness of the data) which is then known as Euclidean Distance Where : Cij = training data Ckij = testing data one of the supervised algorithms for data mining with a classification model function is the AdaBoost Method. Initially this algorithm was applied to the regression model, along with the rapid development of computer technology, this method can also be applied to other statistical models. The adaBoost method is an ensemble technique using the loss function of the exponential function to improve the accuracy of the predictions made. Basically the boosting method (AdaBoost) can increase the accuracy in the classification and prediction process by generating a combination of a model, but the results of the classification or prediction chosen are the model that has the greatest weight value. The adaptive boosting method has been reported as a meta-technique to overcome the class imbalance problem.
Adaboost and its variants are very successful in application to domains because of their strong ISSN 2085-4552 theoretical basis, accurate predictions and great simplicity. In theory adaboost functions as a performance optimizer for the k-nn classification algorithm so that its performance can be maximized. In the case of an unbalanced data set, the application of boosting will not change the structure of the dataset, which means that the condition of the dataset remains in an unbalanced form.

A. Data Collection
The main data in this study is the image data of teak wood boards acquired by intermediate examiners (experts) in the amount of 20 wooden boards for Grade A, 20 wooden boards for grade B, 20 wooden boards for Grade C and 20 wooden boards for grade D, respectively. each taken from 4 angles, namely 0°, 45°,90°,135°. In total there are 316 data for training data and 20 wooden boards as testing data with 4 corners also producing 80 test data. There are 4 classes in the research that will be tested, namely grade A which indicates the grade of wood that has the best durability, density, and humidity which must be valued the most expensively by the seller, in this case Perum Perhutani Kph Cepu, then Grade B shows that durability, density, and humidity good enough with no wood defects or cambium sources. Grade C is a wood plank grade that has sufficient durability, density, and moisture and is in the category of being able to be used as raw material for furniture or equipment. While the last grade is grade D which has durability, density, and moisture. It can still be used for raw materials but with the condition of the wood. which must be adjusted to the needs (the load to be borne by the strength of the wood). Data collection techniques are taken directly or commonly referred to as Image Acquisition. This process captures or scans an analog image using a tool to obtain a digital image that has been determined by PUSBANGHUT. When taking photos, the teak wood is placed parallel and 15 cm from the recording device (camera). After that the data is collected in the form of images [7].

B. Preprocessing
Eliminating the Background, the original image data that has been obtained with different backgrounds and less same angles will be cut off the background with the aim of simplifying calculations in the feature extraction process. Then equalize pixel size piece per piece. The image data resulting from the uniformity of the background will be equalized with the pixel size with the aim of simplifying calculations during the feature extraction process. The pixel size of the image data will be standardized to 400 x 406 pixels. After second step is converting image to grayscale. Converting an image to grayscale can be done by taking all the pixels in the image then the color of each pixel will be retrieved information about the 3 basic colors, namely red, blue and green (via the colortoRGB function), these three basic colors will be added up and then divided by three to get the value average. This average value will be used to color the image pixels so that the color becomes grayscale.

C. Classify with Knn
K-nn calculation is an algorithm that functions to classify some learning data or training data. The training data in this study is in the form of images that have been extracted to produce image characteristics in the form of numbers. The numerical data from the feature extraction will be projected with some test data. The training data sample used in this manual calculation contains 19 training data and 1 data whose class/label is unknown and is called testing data. The training data used are 4 class A data, 5 class B data, 6 class C data and 4. class D data. Because the training data is even, it is recommended to use odd k. The criteria record is c90 means correlation on 90°angle etc.

Finding the Euclidean Distance
Calculating the Euclidean distance is calculating the root in the test data minus the training data and then squared on each training data. After calculating it produces the table below.

Sorting data by 5 nearest neighbors
The determination of the value of K is determined as k = 5, meaning that 5 data records will be taken with the closest distance value.
3. After ranking the 5 smallest data and then checking the class on the nearest neighbor. From this data processing, the closest neighbor sequence is data with data id 166C which has a label C, in sequence 2 there is data with id 320D with label D. In the order of closest distances, sequences 3 and 4 with test data are data labeled A with data id 2A and 4A. In the last order, there is data 162C with label C. From the data processing, the test data has the majority of proximity to data labeled A and C.

D. Optimation with Adaboost
In the adaboost optimization process, the same sample data will be used with the data above in the knn process [8].
• The first stage is to determine the sample weight of the data. Sample weights are selected from 1 divided by the total number of datasets. There are 20 datasets used, so the weight on the sample records is 1/20. • The next stage is to determine the total number of possible errors that can occur. And errors that may occur 1/20. • The third step is to calculate the performance value. • The last stage is normalization of weight.

E. Modeling Process 1. Experiment without Adaboost
The next stage is to apply the KNN algorithm in the workspace. First click and put the training data into the blankpage, test data and look for the KNN algorithm and apply the model and performance for the confusion matrix. Search can be done by typing in the toolbar. After everything is there and complete, connect the training data to the KNN algorithm as shown below.

Experiment with Adaboost
In implementing this adaboost, special attention needs to be paid to the rapid miner step. The important thing is the implementation of adaboost for optimization of the training data rule attributes and the k-nn algorithm in the workspace.  To improve the accuracy of the Knn algorithm in the process of classifying the quality of wood board grades at Kph Perhutani Cepu which will be forwarded to the District Research and Development Center. Semarang, data processing is carried out with the knn process optimized with adaboost. This is because Adaboost can improve accuracy in data processing with the Knn algorithm. The k-nearest neighbor algorithm produces 80.31% accuracy results, while adaboost optimization can increase the accuracy to almost 100% with an increase of about 19.9%. From the data above, the boosting method, namely adaboost, has been proven to increase the accuracy of the KNN algorithm on wood sample data.
In the cross validation experiment, trials have been carried out with different k values, namely from numbers 10 -30 with a value of k at knn k = 5 in the experiment. So that the difference in accuracy obtained is conveyed in the following chart.
So we can conclude that the addition of adaboost optimization in k-NN processing can increase accuracy so that the testing data input process can provide more precise results than using only the knn classification algorithm.