Allergen Recognition in Food Ingredients with Computer Vision

The process of recognition and classification of food is very important. It can be useful for consumers who are sensitive in choosing foods that they want to consume. Considering that some food ingredients are allergens that can cause allergies for some people. This paper aims to design and build an Android-based system to detect food ingredients that can facilitate consumers in getting information about all allergens contained in the. The application is created by implementing Optical Character Recognition (OCR) algorithm and using Boyer Moore algorithm to do the word matching (string matching). The experiments were performed with trial of OCR, Boyer Moore, light sources, and technical words (uncommon words). Our experiment shows more than 90% accuracy obtained with different scenario applied.


I. INTRODUCTION
Food packaging has an important role in marketing a product. Product packaging becomes an advertising medium from producers to consumers. However, in reality the packaging that was made and designed to protect the product from contamination and damage now also serves to make it easier for customers to recognize the product [1].
Packaging is also a means to convey food information to the following consumers about the composition and nutritional value of food. This information becomes important for consumers who have rules in consuming food. The information provided is the result of calculations and research conducted by nutritionists.
The term food allergy is used to describe an adverse immune response to food. Allergy is a person's immune response to a substance or substance that is considered harmful to the body. Allergy is an important issue that must be considered because it is present at all levels of society. At the age of the first year of life, a child's immune system is relatively immature and very vulnerable. If he has atopic talent, he will be easily sensitized and develop into allergic diseases to certain allergens such as food and inhalants [2]. Table 1 shows the estimated level of food allergies of children and adults. Based on the table, it can be concluded that the food ingredients that are commonly found in packaged foods and which most often cause allergies are milk, eggs, nuts, wheat, and soy. [3].
Public awareness of allergic diseases is currently relatively low. Many people think that allergies are just an ordinary disease, even though allergies can cause greater costs and threats if left unchecked and not handled properly [4].
For allergy sufferers, the ingredients information listed on the food label is very important. However, these food ingredients cannot be easily found on food labels because they are generally included in the form of other nutrients such as casein as a protein in milk. Therefore we need to understand the labels on packaged foods that we will consume [5]. Therefore, computer vision-based applications means to be an ideal solution for allergen recognition since it is mimic how we as human to see the ingredients list. Computer vision will facilitate consumers in finding the possibility of containing food ingredients that can cause allergies, including foods that most often cause allergies such as milk, eggs, nuts, wheat, and soybeans and their processed [6].

ISSN 2355-3286
An computer vision-based solution had proposed by Ozlem et. al. [7] which conducted by using an application to scan barcodes to detect these food products. The mobile application created by this researcher has fairly complicated steps to get the final result. The results of this study stated that the application made makes it easy to monitor the contents of packaged food products. Different with previous work, in this research we use direct approach to scan the food labels and extract the information then comparing the extracted text with our allergen database. We do some scenario to test our system performance including testing on different type of the food package and illumination.
Allergen data that will be used for application is from the Asthma and Allergy Foundation of America (AAFA), an organization dedicated to finding drugs and controlling asthma, food allergies, nasal allergies, and other allergic diseases [8]. Data from AAFA will then be entered into our database.

II. IMPLEMENTATION
Our application was developed and run under Android OS. We use rear smartphone camera to capture the food package. Several stages need to be perform in order to get the classification result. Fig. 1. Shows our overall process we embed to the program.

A. Application Start
When the application is first opened, the user will be presented with the main page. On this page there is a button to take pictures to check the allergen content.

B. Select Image
When the button is selected, a dialog box will appear titled Select Image with 2 choices, namely the Camera and Gallery options. Display options can be seen in Fig. 3. On the main page also displays a display titled "Image Preview" which is a temporary display when there is no image selected by the user.

C. Crop Image
After selecting one of the Select Image options as in Fig. 4 then the selected image will enter the cropping or cropping process. On this page there are features such as image rotation or image flip. The cropping process is done manually by the user in the food composition section with the aim to facilitate the next process, namely the OCR process so that it does not detect parts that do not need to be detected. The appearance of the crop page can be seen in Fig. 4.
In this research, the image cropping feature that will be used is the zoom feature, namely image magnification, rotation / flip, and window aspect ratio to 1: 1, 4: 3, 16: 9 or custom. Cropped images will be set as bitmaps or Android URIs.

D. Text Recognition
OCR implementation is to do text recognition of images that are inputted by the user after cropping. Text recognition is performed on images that have been set by bitmaps. There is a repetition that OCR will detect text until it reaches the item size limit where the item is an object to detect text. The detected text will be entered into an array which will then enter the string matching process.
The library that we use for implementing this algorithm is Google Play Service. The author chose this library because it contains an interface for each of Google's services and also provides an API that allows us to resolve any issues at runtime, such as Google Play APK services that are missing, defective, or expired. In addition, if we want to access newly added features or products, we can upgrade the library to the latest version when the library is released. The library used is a release for computer vision namely play-servicesvision version 16.2.0 which is devoted to android-based development.

E. String Macthing
Boyer Moore's algorithm is used to do word matching or can be called string matching between text that has been recognized by OCR and allergen data in the database. First of all you have to check string validity. In Boyer Moore's algorithm there are two terms namely content which means the whole text, and pattern which is a word. If the content is null or empty, and if the length of the content is less than the length of the pattern, the application will return a false value, which means the application failed to do word matching.
Boyer Moore's algorithm is to check the matching (string matching) between text in this context is written ISSN 2355-3286 as content that has been recognized with allergen data from the database where the data or pattern to be found is allergen data from the database. This method checks from the rightmost pattern. If a pattern is found that matches the content, it will return the value of the pattern found.

F. Database
In developing applications, the authors use the Firebase platform. Firebase is a service from Google that is used to facilitate application developers in developing applications. One of the functionalities of Firebase is a realtime database service that we use to store allergen data. Realtime database has the ability to store and synchronize application data in milliseconds.
The database will be hosted in the cloud. Data will be saved as JSON and then synchronized in realtime to each client that has been connected so that all clients can receive the latest data updates automatically.
In this study, the database stores allergen data in the form of id, composition, and general name of the allergen. Fig. 5 is an example of a storage format in the database. "Komposisi" is the name of the allergen (processed / technical) which may appear on food packaging, "NamaGeneral" is the name of the main allergen of the ingredient, for example in Fig. 5 ingredients is "lactalbumin" which is processed from milk (susu) so that it has a generalized name "susu".

G. Allergen Prediction
If there are allergens in the image selected by the user, then the display will show a list of allergens as can be seen in Fig. 6. If no allergens are detected in the image, the results will state that there are no allergens on the packaging as can be seen in Fig. 7. If the packaging contains food ingredients that are processed from the main allergen ingredients (milk, beans, chocolate, wheat, and eggs), then what will be displayed on the application is the name of the main allergen only.

III. EXPERIMENT AND RESULT
A trial was conducted to test the performance and accuracy of the allergen recognition application that was created. Experiments carried out with several scenarios such as testing using the camera and using the gallery. The mobile application was developed under Android Studio with a Intel Core I7 CPU, while smartphone used in the experiment is Redmi Note-4.

A. Experiment Scenario
The study was conducted by conducting an experiment that is doing text recognition on 10 different packages which were carried out 10 times experiments for each package.
The trial was carried out by running the application with the input image captured by the mobile camera used for this study using lighting derived from the mobile camera flash.
The trial was also conducted with light coming from LED lamps with a large power of 5W, 220-240V, 50 / 60Hz, 38mA in a room measuring 3x3 meters.
The success of the test results is seen from the results of allergens that are issued. This experiment was carried out to find out what light was more effectively used to support the application to run and produce the most accurate results.
Another Trials are carried out by running applications with image input that include unfamiliar allergen words, names of some ingredients, especially incomprehensible additives, or different names used for certain types of food unknown to consumers, for example emulsifiers namely food additives with the code "E". The expected result is the accuracy of the application in detecting the allergen. This test is done by taking 10 words of processed / chemical allergens as samples to be used for the experiment. The experiment will be carried out 10 times.

B. Evaluation Metrics
In the trials conducted for this study have the test metrics to be obtained from these trials. The test metric sought is the level of application accuracy that is expected to be of high value to find out how big is the suitability of the implementation of the algorithms used for the allergen recognition application made. Following are the formulas used to get the test results. Total Error is a way to get the error value generated from 10x experiments. Total Error is obtained by dividing the Total False by the number of words or allergens that should be multiplied by n where n is the number of experiments carried out in this study as many as 10 times the experiment, and Total False is the total error that occurs from 10 times the experiment. Accuracy (%) is a way to get the accuracy value of each experiment on one package. This value is obtained from the difference in the value of success that is 100 (%) with the value of Total Error (%) where Total Error (%) is the value of Total Error in percent.
Average Accuracy is obtained by finding the average value of Total Accuracy (%) of 10x experiments that have been conducted. The value obtained is what will be the final value to find out the accuracy of a wanted to find.

C. Result
From the results of experiments on detecting and logging the technical allergen words (hidden) that have been carried out, the calculations are then performed to get the accuracy level. Table III shows a sample of test data for allergen difficult words.
In Table III, an experiment of 10 times the word technical allergen was conducted. The results listed in the column "Alergen Kata Sulit Terdeteksi" have 2 output values. The TRUE value states that the application successfully detected the allergen, while the FALSE value states the application failed to detect the allergen.

IV. DISCUSSION
From the results of word detection and logging experiments that have been carried out with a mobile camera light and a 5W LED light, the next step is to calculate the accuracy level. Table II shows trial data using mobile camera light .
The calculation of illumination variation is done using the formulas (1) - (4). From the calculation results, the average accuracy for trials using mobile camera light is 95.7% while the average accuracy for trials using 5W LED light is 97%. This states that the use of applications supported with 5W LED light is better at providing accurate data to the user when compared to using a mobile camera light.
The calculation is done using the formulas (1) -(4). From the calculation results, the average accuracy obtained for the test of difficult allergen words is 100%.
When the packaging is composed of a technical word or processed allergen material, the application will automatically translate the technical word or processed into the name of the main allergen it contains. The main allergen names are milk, eggs, nuts, wheat, and soy. Fig. 10 is an example of packaging with the technical word "E412" and Fig. 11 shows the results of an application that provides output or information in the form of translation "E412" which contains the main allergen ingredient "kacang".

A. Conclusion
Based on the results of testing and research that has been done, it can be concluded the results of the study are as follows.

1). Allergen recognition application called Allergen
Recognition has been successfully designed and built using OCR (Optical Character Recognition) algorithm and Boyer Moore algorithm.
Applications that are built based on mobile can be used on devices that have an Android operating system. The programming language used to build applications is the Java language using the Android Studio Integrated Development Environment (IDE). The implementation of text recognition is done using the OCR algorithm while the string matching process is implemented using the Boyer Moore method.
2). This application has been evaluated by conducting several experiments to get the level of accuracy such as the OCR algorithm experiment, Boyer Moore algorithm, different light sources, and difficult word testing. When compared with applications made by Ozlem Durmaz Incel and Mustafa Incel, the advantage of the author's application is the use (flow) of a simpler application with an average overall application accuracy of 97.9%.

B. Suggestion
Based on research that has been done, suggestions that can be given for the development of allergen recognition applications in the future are as follows.
2). Adding text to speech feature so that the application can also be used for people who do not have the ability to read writing. Real-time inference will be possible with evidence [10]. And text to speech algorithm will be added based on [11].