net-int-det

Dataset short description: Total number of labeled connections : 1885519 Normal connections : 1816609 Attack connections : 68910 (~3.65%) Attack types with number of instances: Bruteforce (12 june: 2086 + 16 june: 14), Infilterating network from inside (13 june : 20358), HTTP DoS (14 june : 3777), DDoS using IRC Botnet (15 june : 37461), Bruteforce SSH (17 june : 5261).

1 -- xmlparser.sh: This script extract the connection-wise attributes from the network labeled flows of ISCX datasets. Run the following in terminal. bash xmlparser.sh .xml

The ouput is saved in a csv file with the name "input-file-name".csv.

xmlstarlet is used to extract the attribute values /parse xml files. For instance, to extract the labels, we may use;

xmlstarlet sel -t -v "//Tag"

2-- seq_features.py: This python script takes the payloads (source and destination) of each connection and the respective labels as file1 and file2. Then generate n-grams of selected size, dictionarize them and finally maps the "text" payloads to vector space.

How to run: python seq_features.py file1 file2 k

Datasets description:

The train-test-with-seeds.tar.gz contains training set (90%), test set (10 %) and the seed (binary array) to generate them. All the experiments needs to be done on the training set and it needs to be considered the ultimate and only available data for time being.
The training set from above needs to be further split into development and test set with 80-20 ratio. Since each experiment is to be repeated five times and scores to be averaged, for the split use the seeds provided in the "developement-sets-seeds.gz". Use one column for each iteration.
For each train-test split in the development set, the training is to be done using 5-fold cross-validation. The "cross-val-seeds.tar.gz" contains five binary arrays each one for each development set in (2). Each binary array contains five columns where each column corresponds to a different 80-20 split for cross-validation.
Overall, there would be five experiments with training and testing, and each training is done using 5-fold cross-validation.
The labels are of six types and for binary classification labels other than "Normal" can be merged together.
For split in each scenario, all the samples where the corresponding index in seeds is zero would go into training and the ones with seed 1 would be used for testing. For instance X_train=data[seed==0,:-1], X_test=data[seed==1,:-1], Y_train=data[seed==0,-1], Y_test=data[seed==1,-1].

Experiments with binary and multi-class classification: gbc_multi.py, rfc_binary.py, rfc_multi.py, svm_binary.py , svm_multi.py : These python scripts are used to train and classifiy the network payloads. The deatiled results are given in Results -- raw text file. In all the experiments the model fit almost perfectly.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README.md		README.md
Results -- raw		Results -- raw
all_nopayloads_preprocess.py		all_nopayloads_preprocess.py
all_xml_flows_labeled_features_nopayloads.py		all_xml_flows_labeled_features_nopayloads.py
balanced-data.tar.gz		balanced-data.tar.gz
class_test1.py		class_test1.py
class_test2.py		class_test2.py
class_test3.py		class_test3.py
class_test4.py		class_test4.py
class_test5.py		class_test5.py
clean_balanced_data.csv.tar.gz		clean_balanced_data.csv.tar.gz
data_description.txt		data_description.txt
dict_vect.pickle		dict_vect.pickle
gbc_multi.py		gbc_multi.py
keras_preprocess_onehot.py		keras_preprocess_onehot.py
no_pay_class3.py		no_pay_class3.py
no_pay_class_2.py		no_pay_class_2.py
no_pay_onevsrest_multi.py		no_pay_onevsrest_multi.py
no_pay_onevsrest_multi_1.png		no_pay_onevsrest_multi_1.png
no_pay_pca_osvm.py		no_pay_pca_osvm.py
no_payload_pca_osvm.py		no_payload_pca_osvm.py
payloads_labels_classes.py		payloads_labels_classes.py
payloads_onehot.py		payloads_onehot.py
payloads_processing.py		payloads_processing.py
preprocess_for_nonempty_balancing.py		preprocess_for_nonempty_balancing.py
rfc_binary.py		rfc_binary.py
rfc_binary_test.py		rfc_binary_test.py
rfc_multi.py		rfc_multi.py
rfc_multi_test1.py		rfc_multi_test1.py
roc_rfc_binary_test2.png		roc_rfc_binary_test2.png
seq_features.py		seq_features.py
svm_binary.py		svm_binary.py
svm_binary_test.py		svm_binary_test.py
svm_multi.py		svm_multi.py
svm_multi_1.py		svm_multi_1.py
xmlparser.sh		xmlparser.sh
xmlparser_2.sh		xmlparser_2.sh
xmlparsertest.sh		xmlparsertest.sh
xmltocsv.py		xmltocsv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

net-int-det

About

Uh oh!

Releases

Packages

Languages

data-boss/net-int-det

Folders and files

Latest commit

History

Repository files navigation

net-int-det

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages