Malware Detection By Machine Learning Techniques

Research Aim, Objectives and Questions

Malware refers to a software-designed program that can infiltrate and damage a computer system in order to breach data from it. The owner of the system remain unaware of the malware attack on their computer. However, a simple classification of malware can be file infectors and standalone malware (Varma, Raj & Raju, 2017). However, another way for classifying these malware include worms, backdoors, Trojans, spyware and adware. Therefore, malware detection methods and techniques need to be advanced level for detecting different types of malware for the system. The use of various techniques for the detecting several malware will be discussed in this research. Malware detection by using standard methods are difficult for detecting the malwares in the computer system (Kolosnjaji et al., 2016). The use of malware detection applications uses the polymorphic layers for avoiding detection and use mechanism for automatically update into newer version in a short period. However, a few machine-learning methods have been discussed in the research that might help in detecting malware from the system.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

As there is a diversity of malware has been increasing all over the world, anti-virus software are not able to provide full protection to the computer user in the companies and individual basis. As per the Kaspersky labs, 6563145 different systems have been attacked and 400000 unique malware software have been detected in the market (Anderson & McGrew, 2017). Therefore, there is an extreme need of a technique to minimize these ratios. However, there is also a decrease in the expertise for maintaining the attacks in the company and individual basis. Therefore, the use o machine-learning concept has been important in the daily life. The attacking tools have been uncrossing on a daily basis. High-availability of anti-malware techniques have been maintained in the market for the detecting these malwares over the internet, Therefore there has been an opportunity in the market for the antimalware in the market (Narudin et al., 2016).

This research will focus on identifying new techniques of detecting these malwares in the system. The importance of the machine learning in order to detect these malwares in the internet will be analyzed. The primary goal of the research will be based on detection of the malwares and issues to implement this technique in the internet.

The accuracy level of the antimalware software have been decreasing on a daily basis. There have been increase in the number if the malwares as discussed earlier. As commented by Avasarala, Day & Steiner (2016), there has been 8.7% increase in the cyber-attacks over the internet. Therefore, this have created a high-level risk in the companies and other usage of computer system at home.

Literature Review

In recent days, Kaspersky labs have reported that different companies are suffering with data loss problem in the market due to the cyber-attacks. According to Gandotra, Bansal & Sofat (2014), there has been increase in the cyber-attack over the internet Therefore the use if the antimalware software have been increased in the market. However, the availability of the anti-malware software have been less in the market. They are also not able to detect these new malware in the market. Therefore, this has been creating major problem in solving this problem (Kumar, Gao, Welch & Mansoori, 2016).

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

This research reflects the use of the machine learning for classifying these malware and provide protection from these malwares. The use of the machine learning techniques will be discussed in the study. The research will helps in maintaining the security of data and information in the computer system. The use if the various machine learning technique will be discussed in the paper.

The aim of the research is to classify malware with the help of machine learning method.

The research objectives ate as follows:

  • To identify different types of malware over the internet
  • To understand the need to machine learning in detecting malware
  • To implement proper strategies in order to mitigate issues related to malware

The research questions are as follows:

  • What are the different types of malware over the internet?
  • What is the need of machine learning in detecting malware?
  • How these issues can be mitigated using different strategies?

There are various classification of the malware over the internet.

Virus: Virus is the simplest form of malware. However, it can be the most dangerous malware over the internet. This malware can enter into the system of user without permission and damages the system (Allix et al., 2016).

Worm: This malware is similar to virus. However, it can spread over the network and can damage all the systems connected in a single network (Kolosnjaji et al., 2016).

Trojan: This malware aims at different legitimate network and software. It acts as general spreading vector using the social engineering technology. Therefore, user get confused with installing legitimate software.

Adware: This malware only display fake advertisements on the screen of the computer. Therefore, there are various adware present over different websites on the internet (Friedrichs, Huger & O’donnell, 2015).

Spyware: This malware acts as an agent to convey the details of one computer to another user. Therefore, this breaches into the computer for stealing data and information. Therefore, spyware checks out the search history and send personal details over the internet. The use of different algorithms might help in minimizing the threats and risk of malware attack in the computer system (Milosevic, Dehghantanha & Choo, 2017).

Backdoor: The backdoor malware helps in providing a secret entrance to either malware into the computer. Other malware get into the system with the help in backdoor malware. Therefore, it never attack independently but with a lot of other malware to enter into the computer.

Detection Methods

Ransom ware: This type of malware encrypts all the data and information stored in the computer send to the internet. Therefore, it locks all the data and information in the computer system. After that, a ransom money is asked for decrypting these data and information in the computer (Dash et al., 2016).

Remote Administration Tools (RAT): This malware helps in allowing attacker to receive access to the computer and make changes in the settings of the computer. It can even change the password of different accounts stored in the computer (Chen, Ye & Bourlai, 2017).

As commented by Sethi et al., (2018), all the malware detection method are based in the signature-based and behavior-based methods. Various detection methods has been discussed below:

File Format Inspection: File metadata helps in providing information about the whole data set stored in the computer. For example, Windows portable executable files helps in providing information during compile time and exported functions (Yerima, Sezer & Muttik, 2015).

String Extraction: This method refers to the examination of software output. For example, status and error messages during interference of data and information related to malware operation.

Fingerprinting: This cryptographic hash computation helps in maintaining the security of the biometric systems. Therefore, different artifacts can be detected through this method (Meidan et al., 2017).

K-nearest neighbors (KNN) is the simplest algorithms of the machine learning method. A non-parametric algorithm does not make any assumption during detection of malware. This algorithm can be used for both regression and classification problems. The prediction is based on K training instances. In the case of the KNN classification problems, majority of the output class can be predicted by the majority of sites to the K nearest neighbors (Bekerman et al., 2015).

As commented by Chumachenko, (2017), Euclidian distance works for the problems that are of same type. However, the value of k plays an important role in predicting accuracy of the algorithm. The small value of k denotes lower accuracy level of the algorithm. However, larger value of k lower the performance of the algorithm. 

This research will help in detecting malware by the use of the machine learning algorithms. Therefore, different algorithms will be used in order to detect these malwares. The perception algorithm will be used in this study for correctly detecting these malwares (Chumachenko, 2017). 

− F = (fa1, fa2, . . . , fan) is an array representing the feature values associated to a file, where fai are file features. − Ri = (Fi , labeli) is a record, where Fi is an array of file feature as above, and labeli is a boolean tag. The value of labeli identifies the file characterised by the array of feature values Fi as being either a malware file or a clean file. − R = (R1, R2, . . . Rm) is the set of records associated to the training files.

K-nearest algorithms

NumberOf Iterations ← 0

M axIterations ← 100

repeat

Train (R, 1, -1)

while FP(R) > 0 do

Train (R, 0, -1)

end while

NumberOf Iterations ← NumberOf Iterations + 1

until (TP(R) = NumberOfM alwareF iles) or

(NumberOf Iterations = M axIterations)

The algorithm 1 and 2 will be used in the sequel as bricks in cascade classification stages.

NumberOf Iterations ← 0

M axIterations ← 100

repeat

Train (R, 1, -1)

R′ = R−{all malware samples}

while FP(R′

) > 0 do

Train (R′

, 0, -1)

R′ = R′−{all samples correctly classified}

end while

NumberOf Iterations ← NumberOf Iterations + 1

until (TP(R) = NumberOfM alwareF iles) or

(NumberOf Iterations = M axIterations)

Algorithm 3 refers to the first main optimization into OSP algorithm. Therefore, it reduces the size of the training and increases the speed of training. Therefore, utilizing this optimized version of the OSP version, the speed of detecting malwares will be increased.

The data analysis has been done using the Random forest analysis method.  The random forest is a bootstrapping algorithm that is deepened on the Decision tree (CART) model.  This method has been initiating different variables and samples during the experimental analysis.  The real process has been done for the many times that have helped in collecting samples as data for the data analysis. The fine tuning method of the data analysis has been maintaining data collection sample.  The 5-fold cross validation has been done by initiating 3 experiments and applying machine learning to the cyber security.

After the completion of the research, it can be presumed that the detection of the malwares in the internet might be possible. The machine learning techniques discussed in the study might help in detecting and removing the malwares in the computer system. The use of malware detection applications uses the polymorphic layers for avoiding detection and use mechanism for automatically update into newer version in a short period. However, a few machine-learning methods have been discussed in the research that might help in detecting malware from the system. The role of the antivirus software can be implemented by the machine learning techniques in order to detect malwares in the computer system.

References

Allix, K., Bissyandé, T. F., Jérome, Q., Klein, J., & Le Traon, Y. (2016). Empirical assessment of machine learning-based malware detectors for Android. Empirical Software Engineering, 21(1), 183-211.

Anderson, B., & McGrew, D. (2017, August). Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1723-1732). ACM.

Avasarala, B. R., Day, J. C., & Steiner, D. (2016). U.S. Patent No. 9,292,688. Washington, DC: U.S. Patent and Trademark Office.

Bekerman, D., Shapira, B., Rokach, L., & Bar, A. (2015, September). Unknown malware detection using network traffic classification. In Communications and Network Security (CNS), 2015 IEEE Conference on (pp. 134-142). IEEE.

Chen, L., Ye, Y., & Bourlai, T. (2017, September). Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense. In Intelligence and Security Informatics Conference (EISIC), 2017 European (pp. 99-106). IEEE.

Chumachenko, K. (2017). Machine Learning Methods for Malware Detection and Classification.

Dash, S. K., Suarez-Tangil, G., Khan, S., Tam, K., Ahmadi, M., Kinder, J., & Cavallaro, L. (2016, May). Droidscribe: Classifying android malware based on runtime behavior. In Security and Privacy Workshops (SPW), 2016 IEEE (pp. 252-261). IEEE.

Friedrichs, O., Huger, A., & O’donnell, A. J. (2015). U.S. Patent No. 9,088,601. Washington, DC: U.S. Patent and Trademark Office.

Gandotra, E., Bansal, D., & Sofat, S. (2014). Malware analysis and classification: A survey. Journal of Information Security, 5(02), 56.

Kolosnjaji, B., Zarras, A., Lengyel, T., Webster, G., & Eckert, C. (2016, July). Adaptive semantics-aware malware classification. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (pp. 419-439). Springer, Cham.

Kolosnjaji, B., Zarras, A., Webster, G., & Eckert, C. (2016, December). Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence (pp. 137-149). Springer, Cham.

Kumar, S., Gao, X., Welch, I., & Mansoori, M. (2016, March). A machine learning based web spam filtering approach. In Advanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on (pp. 973-980). IEEE.

Meidan, Y., Bohadana, M., Shabtai, A., Guarnizo, J. D., Ochoa, M., Tippenhauer, N. O., & Elovici, Y. (2017, April). ProfilIoT: a machine learning approach for IoT device identification based on network traffic analysis. In Proceedings of the Symposium on Applied Computing (pp. 506-509). ACM.

Milosevic, N., Dehghantanha, A., & Choo, K. K. R. (2017). Machine learning aided android malware classification. Computers & Electrical Engineering, 61, 266-274.

Narudin, F. A., Feizollah, A., Anuar, N. B., & Gani, A. (2016). Evaluation of machine learning classifiers for mobile malware detection. Soft Computing, 20(1), 343-357.

Sethi, K., Chaudhary, S. K., Tripathy, B. K., & Bera, P. (2018, January). A Novel Malware Analysis Framework for Malware Detection and Classification using Machine Learning Approach. In Proceedings of the 19th International Conference on Distributed Computing and Networking (p. 49). ACM.

Varma, P. R. K., Raj, K. P., & Raju, K. S. (2017, February). Android mobile security by detecting and classification of malware based on permissions using machine learning algorithms. In I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), 2017 International Conference on (pp. 294-299). IEEE.

Vatamanu, C., Cosovan, D., Gavrilut, D., & Luchian, H. (2015). A comparative study of malware detection techniques using machine learning methods. World Academy of Science, Engineering and Technology, International Journal of Computer and Information Engineering, 2(5).

Yerima, S. Y., Sezer, S., & Muttik, I. (2015). High accuracy android malware detection using ensemble learning. IET Information Security, 9(6), 313-320.