Real-Time Detection and Multi-Class Classification of DGAs With HybridBERT

Aimen Mahmood, Haider Abbas, Muhammad Faisal Amjad, Waleed Shahid, Syed Qaisar Ali Shah

Research output: Contribution to journalArticlepeer-review

Abstract

Domain fluxing, a technique employed by attackers to evade conventional Command and Control detection, presents a significant challenge for cybersecurity. This technique leverages Domain Generation Algorithms (DGAs) to dynamically generate domain names, often producing non-sensical sequences. The proposed framework presents a real-time DGA detection framework that analyzes Non-existent (NX) domain responses and applies a statistical anomaly detection approach to identify malicious activity. The detected DGAs are further classified into 56 families using a HybridBERT framework, integrating Bidirectional Encoder Representations from Transformer (BERT) with an attention mechanism and statistical characteristics. The dataset, comprising approximately 0.3 million samples from various online sources, was pre-processed to remove redundant data, approximately 25% of the total, and then divided into training, validation, and testing sets in a 60:20:20 ratio. The BERT model was fine-tuned by freezing the first five layers and trained over 20 epochs with early stopping, achieving an overall precision of 96%. Despite significant class imbalance, the framework demonstrated robust performance in both word-based and pseudorandom DGAs, with detailed metrics such as precision, recall, and F1-score providing a comprehensive evaluation. The proposed framework improves the ability of cybersecurity systems to detect zero-day DGAs and offers a scalable solution for real-time DGA classification.
Original languageEnglish
Pages (from-to)160393-160410
Number of pages18
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 10 Sept 2025

Keywords

  • malware
  • feature extraction
  • servers
  • Hidden Markov models
  • Classification algorithms
  • Vectors
  • Real-time systems
  • Domain Name System
  • Semantics
  • convolutional neural networks

Cite this