Performance Analysis of X3D Architecture for Cross-Domain Real-World Violence Detection

Authors

  • Beril Berekhya Mutia Hasibuan Institut Teknologi Nasional Bandung
  • Irma Amelia Dewi Institut Teknologi Nasional Bandung
  • Muhammad Ichwan

DOI:

https://doi.org/10.31937/ijnmt.v13i1.4505

Abstract

Real-time violence detection systems need models that are efficient and can adapt to different environments. This study looks at the performance of the X3D-XS architecture and focuses on the issue of generalizing across various domains. The model was trained on three controlled source domains: AVD, HockeyFight, and MovieFight. The model's performance was then tested on the diverse Real-Life Violence Situations (RLVS) dataset. The experimental results show that X3D-XS is highly efficient, achieving inference speeds of up to 191 FPS, which makes it suitable for edge deployment. However, the model faces significant challenges due to domain shift; training on a single domain resulted in varying accuracy between 49.5% and 61.1% on real-world data. This indicates that staged and cinematic violence differ quite a bit from real-life situations. Importantly, combining different source domains improved the model's sensitivity, leading to a Recall of 94.28%. These findings demonstrate that while X3D offers the speed needed for real-time monitoring, relying solely on staged training data is not enough for real-world effectiveness, highlighting the essential need for data diversity in surveillance applications.

Downloads

Download data is not yet available.

Downloads

Published

2026-06-30

How to Cite

Hasibuan, B. B. M., Irma Amelia Dewi, & Ichwan, M. (2026). Performance Analysis of X3D Architecture for Cross-Domain Real-World Violence Detection. IJNMT (International Journal of New Media Technology), 13(1), 17–23. https://doi.org/10.31937/ijnmt.v13i1.4505