Arguments Against Using the 1998 DARPA Dataset for Cloud IDS Design and Evaluation and Some Alternative - Machine Learning for Networking
Conference Papers Year : 2020

Arguments Against Using the 1998 DARPA Dataset for Cloud IDS Design and Evaluation and Some Alternative

Abstract

Due to the lack of adequate public datasets, the proponents of many existing cloud intrusion detection systems (IDS) have relied on the DARPA dataset to design and evaluate their models. In the current paper, we show empirically that the DARPA dataset by failing to meet important statistical characteristics of real world cloud traffic data center is inadequate for evaluating cloud IDS. We present, as alternative, a new public dataset collected through a cooperation between our lab and a non-profit cloud service provider, which contains benign data and a wide variety of attack data. We present a new hypervisor-based cloud IDS using instance-oriented feature model and supervised machine learning techniques. We investigate 3 different classifiers: Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) algorithms. Experimental evaluation on a diversified dataset yields a detection rate of 92.08% and a false positive rate of 1.49% for random forest, the best performing of the three classifiers.
Fichier principal
Vignette du fichier
487577_1_En_21_Chapter.pdf (1.02 Mo) Télécharger le fichier
Origin Files produced by the author(s)

Dates and versions

hal-03266464 , version 1 (21-06-2021)

Licence

Identifiers

Cite

Paulo Faria Quinan, Issa Traore, Isaac Woungang, Abdulaziz Aldribi, Onyekachi Nwamuo. Arguments Against Using the 1998 DARPA Dataset for Cloud IDS Design and Evaluation and Some Alternative. 2nd International Conference on Machine Learning for Networking (MLN), Dec 2019, Paris, France. pp.315-332, ⟨10.1007/978-3-030-45778-5_21⟩. ⟨hal-03266464⟩
114 View
155 Download

Altmetric

Share

More