Computer Network Traffic Analysis Framework for Anomaly Detection using Unsupervised Machine Learning
Authors
-
Ge, T.
.Department of Mathematics and Computer Science, Benue State University, Makurdi
-
Agaji, I.
.Department of Computer Science, Joseph Sarwuan Tarka University, Makurdi
-
Blamah, N.V.
Department of Computer Science, University of Jos, Jos
-
Ogala, E.
.Department of Computer Science, Joseph Sarwuan Tarka University, Makurdi
Abstract
Numerous novel traffic classification methods have been conceivedrn and put into practice. The ability to recognize and categorize networkrn traffic from diverse domains holds substantial importance in networkrn management but there is no existing framework that addresses thern demand using unsupervised learning techniques. This study developedrn a K-means based framework that possessed the capacity to bothrn identify and classify network traffic data originating from variousrn domains using unsupervised machine learning. A mathematical modelrn for the k-means-based network analysis framework was designed andrn implemented in Python to evaluate key performance metrics of Vrnmeasure, Mutual information, Normalized mutual information, Randrn index and Adjusted rand index using diverse datasets representativern of different domains. In order to detect patterns suggestive of typicalrn network behavior and deviations indicating possible security breaches,rn the framework was trained using UNWSW –NB15 dataset and testedrn using Diverse Network Traffic Dataset. Additionally, the frameworkrn adopted dynamic thresholding and window sliding techniques thatrn allowed it to adjust to changing threat environments and maintainedrn anomaly detection efficacy over time. For each window, the sum ofrn malicious flows was calculated, and if this sum exceeded the definedrn threshold, the window was classified as malicious, otherwise it wasrn classified as normal. The threshold and window size were the keyrn factors that impact the accuracy of this classification (66.22%rn anomalous, 33.78 benign for UNWSW-NB15 dataset and 54.44rn anomalous, 45.56 benign for Diverse Data set). It also places a strongrn emphasis on scalability and interpretability, allowing for easy integrationrn into current network infrastructures in a variety of organizationalrn contexts. The results from this study indicated that (for both the trainingrn and test datasets) V-measure equaled Normalized Mutual Informationrn (0.7356 training, 0.065 test datasets respectively) and correctly alignedrn each other as they both reached their lowest and highest values at thern same number of clusters. The rand index, adjusted rand index andrn mutual information reached their maximum values at different number of clusters but decreased as the clusters increasedrn beyond the optimal point.
Keywords: Cybersecurity, detection, machine learning, network anomaly, network security and threat.