Skip to content

msgeden/familyclassifier

Repository files navigation

This README file is for Malware Family Classifier as a part of Research Project at University of Oxford
Submission date: 15/06/2018
Created by: Munir Geden

*************************************
Path definitions
*************************************
1. a “training data path” folder needs to be specified in config.properties where the Cuckoo report files should be placed
2. a “test data path” folder needs to be specified in config.properties where the Cuckoo report files should be placed
3. a "reports" folder needs to be specified in config.properties where the distinctive features and other output results of the classifications should be placed.

*****************************************************************************
How to configure the application?
*****************************************************************************
1. specify the “training” data path consist of report files by editing config.properties file or from the command line write: 
	-tp TRAINING_DATA_FOLDER_PATH
2. specify the “validation” data path consist of report files by editing config.properties file or from the command line write: 
	-vp TEST_DATA_FOLDER_PATH
3. specify the "reports" directory path for the application by editing config.properties file

You can play with the parameters from the configuration file(config.properties) based on your preferences. 

******************************************************************************
ATTENTION FOR JVM MEMORY SIZE!!!
******************************************************************************
Due to high memory consumption during analysis for some feature models do not forget to increase maximum memory size for JVM
	(ex:$java -Xms1G -Xmx6G -jar familyclassifier.jar ….)


******************************************************************************
How to extract distinctive features from training data?
******************************************************************************
1. to construct distinctive features: 
	-xf
	(ex:$java -Xms1G -Xmx6G -jar familyclassifier.jar -xf )


******************************************************************************
How to write feature matrices to weka files for training and test samples by using distinctive feature sets?
******************************************************************************
1. to generate weka arff files for training and test samples: 
	-gwf -df DISTINCTIVE_FEATURES_FILE_PATTERN
	(ex:$java -Xms1G -Xmx6G -jar familyclassifier.jar -gwf -df reports/json_distinctive_apicallwithargs_by_ )


******************************************************************************
How to classify wih Weka classifiers?
******************************************************************************
1. to classify Weka for the given algorithm with the given weka training and test *.arff files
   (knn:k-nearest neighbours(k value configurable from configuration.properties), 
    svm:support vector machines
    rf:random forest
    nn: neural networks) from the command line enter: 
	-cwa apicallwithargs
	(ex:$java -Xms1G -Xmx6G -jar familyclassifier.jar -cwa all -wt reports/train_class_ngram_4_1000.arff.arff -wv reports/test_class_ngram_4_1000.arff)
	this will generate output results in the reports folder

***************************************
Access to source code
***************************************
Link to project's repository:
https://bitbucket.org/msgeden/familyclassifier

For clonning the repository:
git clone https://bitbucket.org/msgeden/familyclassifier.git

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages