Deep Learning-Based Models for Malicious File Segregation
in Naval Networks

Gosain, A; Gosain, R K

doi:10.24868/11235

Deep Learning-Based Models for Malicious File Segregation in Naval Networks

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Background: The naval industry faces escalating cyber threats from sophisticated malware attacks such as denial-of-service, spyware, and ransomware, which jeopardise confidential information and operational security. Traditional security measures often struggle to accurately detect encrypted or obfuscated malicious files within network traffic, especially in real-time scenarios. The legacy entropy‑based file segregation model (Gosain & Gosain, 2022) deployed aboard combatants detects encrypted malware but suffers from high false positive rates and manual throughput limits. A fully automated Malware Arresting and Recommendation System is proposed to address these challenges. This model leverages deep learning-based models, such as CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory) Network, BERT (Bidirectional Encoder Representations from Transformers) and its variants, fine-tuned to analyse network file traffic. By treating files as sequences akin to natural language, the model employs advanced natural language processing techniques to extract semantic embeddings, enabling the effective segregation of suspected malicious files from benign ones. The proposed model uses deep neural networks to learn byte‑level and sequential patterns of benign and malicious Portable‑Executable (PE) files. Methodology: The proposed model utilises deep learning-based models to generate high-dimensional embeddings that capture the intricate patterns and dependencies within file sequences. Training is conducted on comprehensive datasets comprising malicious and benign files, including encrypted and polymorphic malware samples. Four candidate models, namely, CNN, three‑layer LSTM, BERT‑Large and CodeBERT, are trained and evaluated on the 11,000-file corpus with a ratio of 1000 benign: 10,000 malware files (DikeDataset 2022). Class imbalance is mitigated through random subsampling of the majority class. Results and Observations: The proposed model, built upon deep networks, achieves a much higher detection accuracy than the traditional entropy-based segregation model, all the while reducing the false positive rates. The model built upon the CodeBERT framework attained the highest balanced accuracy of 95.4 %. The least inference time of 0.1 milliseconds per sample was observed in the CNN-based model, outperforming the earlier entropy-based model and delivering 20 times lower compute load than the transformer baselines. Conclusion and Applications: In conclusion, integrating a hybrid CNN-CodeBERT Recommendation model into network traffic analysis represents a significant advancement in naval cybersecurity defence mechanisms. With a fast preliminary screening by the CNN head and a deep inspection of the flagged files by the CodeBERT body, this hybrid model will be able to accurately and efficiently detect the malicious files within the encrypted and regular network traffic, enhancing the security of command and control communications. The model's real-time capabilities and adaptability make it suitable for various applications, including secure communications in weapon systems, unmanned vehicle coordination, hypersonic glide vehicles, and missile guidance systems. Operation in denied, degraded or disrupted SATCOM scenarios is preserved because all inference executes locally without cloud offloading. By leveraging advanced natural language processing and deep learning techniques, this research contributes to strengthening cyber defence measures within naval operations and can be extended to other critical infrastructure sectors requiring robust security solutions.