Classification of Metagenome Fragments With Agglomerative Hierarchical Clustering

  • Alex Kurniadi Universitas Multimedia Nusantara
  • Marlinda Vasty Overbeek Universitas Multimedia Nusantara

Abstract

Unlike genomics which study specifically culturable microorganisms, metagenomics is a field that studies microorganic samples retrieved directly from the environment. Such samples produce widely varying fragments when sequenced, many of which are still unidentified or unknown. Assembly of these fragments in the goals of identifying the species contained among them are thus prone to make said goals more difficult, so it becomes necessary for binning techniques to come in handy while trying to classify these mixed fragments onto certain levels in the phylogenetic tree. This research attempts to implement algorithms and methods such as k-mers to use for feature extraction, linear discriminant analysis (LDA) for dimensionality reduction, and agglomerative hierarchical clustering (AGNES) for taxonomic classification to the genus level. Experimentation is done across different objective measurements, including the length of the observed metagenome fragment that spans from 0,5 Kbp up to 10 Kbp for both the 3-mer and 4-mer contexts (k = 3 and k = 4). The averaged validity scores of the resulting data clusters generated from both the training and test sets, computed with the silhouette index metric, are 0.6945 and 0.0879 for the 3-mer context, along with 0.5219 and 0.1884 for the 4-mer context.

Downloads

Download data is not yet available.
Published
2022-01-23
How to Cite
Kurniadi, A., & Overbeek, M. (2022). Classification of Metagenome Fragments With Agglomerative Hierarchical Clustering. Ultimatics : Jurnal Teknik Informatika, 13(2), 114-119. https://doi.org/https://doi.org/10.31937/ti.v13i2.2180