AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
14 11 2022
14 11 2022
Historique:
received:
15
06
2022
accepted:
04
11
2022
entrez:
14
11
2022
pubmed:
15
11
2022
medline:
18
11
2022
Statut:
epublish
Résumé
Due to the widespread usage of Android smartphones in the present era, Android malware has become a grave security concern. The research community relies on publicly available datasets to keep pace with evolving malware. However, a plethora of apps in those datasets are mere clones of previously identified malware. The reason is that instead of creating novel versions, malware authors generally repack existing malicious applications to create malware clones with minimal effort and expense. This paper investigates three benchmark Android malware datasets to quantify repacked malware using package names-based similarity. We consider 5560 apps from the Drebin dataset, 24,533 apps from the AMD and 695,470 apps from the AndroZoo dataset for analysis. Our analysis reveals that 52.3% apps in Drebin, 29.8% apps in the AMD and 42.3% apps in the AndroZoo dataset are repacked malware. Furthermore, we present AndroMalPack, an Android malware detector trained on clones-free datasets and optimized using Nature-inspired algorithms. Although trained on a reduced version of datasets, AndroMalPack classifies novel and repacked malware with a remarkable detection accuracy of up to 98.2% and meagre false-positive rates. Finally, we publish a dataset of cloned apps in Drebin, AMD, and AndrooZoo to foster research in the repacked malware analysis domain.
Identifiants
pubmed: 36376412
doi: 10.1038/s41598-022-23766-w
pii: 10.1038/s41598-022-23766-w
pmc: PMC9663591
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
19534Informations de copyright
© 2022. The Author(s).
Références
Chau, M. & Reith, R. Smartphone market share (2020). Accessed from 12 Oct 2020.
Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A. & Awajan, A. Intelligent mobile malware detection using permission requests and api calls. Future Gener. Comput. Syst. 107, 509–521 (2020).
doi: 10.1016/j.future.2020.02.002
Gibert, D., Mateu, C. & Planes, J. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. J. Netw. Comput. Appl. 153, 102526 (2020).
doi: 10.1016/j.jnca.2019.102526
Samani, R. Mcafee mobile threat report (2020). Accessed from 12 Jun 2020
Merlo, A., Ruggia, A., Sciolla, L. & Verderame, L. You shall not repackage! demystifying anti-repackaging on android. Comput. Secur. 103, 102181 (2021).
doi: 10.1016/j.cose.2021.102181
Zhou, Y. & Jiang, X. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy 95–109 (IEEE, 2012).
Gibler, C. et al. Adrob: Examining the landscape and impact of android application plagiarism. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services 431–444 (2013).
Li, L. & Bissyandé, T. F. & Klein, J (Literature review and benchmark. IEEE Transactions on Software Engineering, Rebooting research on detecting repackaged android apps, 2019).
Arnatovich, Y. L., Wang, L., Ngo, N. M. & Soh, C. A comparison of android reverse engineering tools via program behaviors validation based on intermediate languages transformation. IEEE Access 6, 12382–12394 (2018).
doi: 10.1109/ACCESS.2018.2808340
Zhang, J., Tian, C. & Duan, Z. An efficient approach for taint analysis of android applications. Comput. Secur. 104, 102161 (2021).
doi: 10.1016/j.cose.2020.102161
Rathore, H., Sahay, S. K., Nikam, P. & Sewak, M. Robust android malware detection system against adversarial attacks using q-learning. Inf. Syst. Front. 23(4), 867–882 (2021).
doi: 10.1007/s10796-020-10083-8
Rastogi, V., Chen, Y. & Jiang, X. Catch me if you can: Evaluating android anti-malware against transformation attacks. IEEE Trans. Inf. Forensics Secur. 9, 99–108 (2013).
doi: 10.1109/TIFS.2013.2290431
Lindorfer, M. et al. Andrubis–1,000,000 apps later: A view on current android malware behaviors. In 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS) 3–17 (IEEE, 2014).
Arp, D. et al. Drebin: Effective and explainable detection of android malware in your pocket. Ndss 14, 23–26 (2014).
Eastlake, D. & Jones, P. Us secure hash algorithm 1 (SHA1) (No. rfc3174). (2001).
Rivest, R. The MD5 message-digest algorithm (No. rfc1321). (1992).
Kornblum, J. Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3, 91–97 (2006).
doi: 10.1016/j.diin.2006.06.015
Wei, F., Li, Y., Roy, S., Ou, X. & Zhou, W. Deep ground truth analysis of current android malware. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment 252–276 (Springer, 2017).
Allix, K., Bissyandé, T. F., Klein, J. & Le Traon, Y. Androzoo: Collecting millions of android apps for the research community. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) 468–471 (IEEE, 2016).
Syrris, V. & Geneiatakis, D. On machine learning effectiveness for malware detection in android os using static analysis data. J. Inf. Secur. Appl. 59, 102794 (2021).
Cai, H., Meng, N., Ryder, B. & Yao, D. Droidcat: Effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur. 14, 1455–1470 (2018).
doi: 10.1109/TIFS.2018.2879302
Hamidreza, A. & Mohammed, N. Permission-based analysis of android applications using categorization and deep learning scheme. In MATEC Web of Conferences vol. 255, 05005 (EDP Sciences, 2019).
Desnos, A. Androguard: A tool to reverse engineer Android apps. https://github.com/androguard/androguard . Accessed 11 November 2022.
Navarro, L. C., Navarro, A. K., Grégio, A., Rocha, A. & Dahab, R. Leveraging ontologies and machine-learning techniques for malware analysis into android permissions ecosystems. Comput. Secur. 78, 429–453 (2018).
doi: 10.1016/j.cose.2018.07.013
Mathur, A., Podila, L. M., Kulkarni, K., Niyaz, Q. & Javaid, A. Y. Naticusdroid: A malware detection framework for android using native and custom permissions. J. Inf. Secur. Appl. 58, 102696 (2021).
Idrees, F. & Rajarajan, M. Investigating the android intents and permissions for malware detection. In 2014 IEEE 10th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob) 354–358 (IEEE, 2014).
Feizollah, A., Anuar, N. B., Salleh, R., Suarez-Tangil, G. & Furnell, S. Androdialysis: Analysis of android intent effectiveness in malware detection. Comput. secur. 65, 121–134 (2017).
doi: 10.1016/j.cose.2016.11.007
Khariwal, K., Singh, J. & Arora, A. Ipdroid: Android malware detection using intents and permissions. In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) 197–202 (IEEE, 2020).
Zou, D. et al. Intdroid: Android malware detection based on api intimacy analysis. ACM Trans. Softw. Eng. Methodol. (TOSEM) 30, 1–32 (2021).
Alam, S., Alharbi, S. A. & Yildirim, S. Mining nested flow of dominant apis for detecting android malware. Comput. Netw. 167, 107026 (2020).
doi: 10.1016/j.comnet.2019.107026
Yang, X.-S. A new metaheuristic bat-inspired algorithm. In Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) 65–74 (Springer, Cham, 2010).
doi: 10.1007/978-3-642-12538-6_6
Mirjalili, S., Mirjalili, S. M. & Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014).
doi: 10.1016/j.advengsoft.2013.12.007
Yang, X.-S. Firefly algorithms for multimodal optimization. In International symposium on stochastic algorithms 169–178 (Springer, 2009).
De Lorenzo, A., Martinelli, F., Medvet, E., Mercaldo, F. & Santone, A. Visualizing the outcome of dynamic analysis of android malware with vizmal. J. Inf. Secur. Appl. 50, 102423 (2020).
Sugunan, K., Kumar, T. G. & Dhanya, K. Static and dynamic analysis for android malware detection. In Advances in Big Data and Cloud Computing 147–155 (Springer, Cham, 2018).
doi: 10.1007/978-981-10-7200-0_13
Yang, Y., Wei, Z., Xu, Y., He, H. & Wang, W. Droidward: An effective dynamic analysis method for vetting android applications. Clust. Comput. 21, 265–275 (2018).
doi: 10.1007/s10586-016-0703-5
Onwuzurike, L. et al. A family of droids-android malware detection via behavioral modeling: Static vs dynamic analysis. In 2018 16th Annual Conference on Privacy, Security and Trust (PST) 1–10 (IEEE, 2018).
Allamanis, M. The adverse effects of code duplication in machine learning models of code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software 143–153 (2019).
Zhao, Y. et al. On the impact of sample duplication in machine-learning-based android malware detection. ACM Trans. Softw. Eng. Methodol. (TOSEM) 30, 1–38 (2021).
Li, S. et al. Malicious mining code detection based on ensemble learning in cloud computing environment. Simul. Model. Pract. Theory 113, 102391 (2021).
doi: 10.1016/j.simpat.2021.102391
Akhtar, T., Gupta, B. B. & Yamaguchi, S. Malware propagation effects on scada system and smart power grid. In 2018 IEEE International Conference on Consumer Electronics (ICCE) 1–6 (IEEE, 2018).
Li, S. et al. A malicious mining code detection method based on multi-features fusion. IEEE Trans. Netw. Sci. Eng. (2022).
Razgallah, A., Khoury, R., Hallé, S. & Khanmohammadi, K. A survey of malware detection in android apps: Recommendations and perspectives for future research. Comput. Sci. Rev. 39, 100358 (2021).
doi: 10.1016/j.cosrev.2020.100358
Gaurav, A., Gupta, B. B. & Panigrahi, P. K. A comprehensive survey on machine learning approaches for malware detection in iot-based enterprise information system. Enterprise Inf. Syst. https://doi.org/10.1080/17517575.2021.2023764 (2022).
doi: 10.1080/17517575.2021.2023764
Feizollah, A., Anuar, N. B., Salleh, R., Suarez-Tangil, G. & Furnell, S. Androdialysis: Analysis of android intent effectiveness in malware detection. Comput. Secur. 65, 121–134 (2017).
doi: 10.1016/j.cose.2016.11.007
Garcia, J., Hammad, M. & Malek, S. Lightweight, obfuscation-resilient detection and family identification of android malware. ACM Trans. Softw. Eng. Methodol. (TOSEM) 26, 1–29 (2018).
doi: 10.1145/3162625
Surendran, R., Thomas, T. & Emmanuel, S. Gsdroid: Graph signal based compact feature representation for android malware detection. Expert Syst. Appl. 159, 113581 (2020).
doi: 10.1016/j.eswa.2020.113581
Maryam, A. et al. chybridroid: A machine learning-based hybrid technique for securing the edge computing. Secur. Commun. Netw. 2020, 8861639. https://doi.org/10.1155/2020/8861639 (2020).
doi: 10.1155/2020/8861639
Olson, R. S., Bartley, N., Urbanowicz, R. J. & Moore, J. H. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the genetic and evolutionary computation conference vol. 2016, 485–492 (2016).
Pye, J., Issac, B., Aslam, N. & Rafiq, H. Android malware classification using machine learning and bio-inspired optimisation algorithms. In IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) 1777–1782 (IEEE, 2020).
Bai, Y., Xing, Z., Li, X., Feng, Z. & Ma, D. Unsuccessful story about few shot malware family classification and siamese network to the rescue. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) 1560–1571 (IEEE, 2020).
Fan, M. et al. Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Trans. Inf. Forensics Secur. 13, 1890–1905 (2018).
doi: 10.1109/TIFS.2018.2806891
Frenklach, T., Cohen, D., Shabtai, A. & Puzis, R. Android malware detection via an app similarity graph. Comput. Secur. 109, 102386 (2021).
doi: 10.1016/j.cose.2021.102386
Yang, H. & Tang, R. Power consumption based android malware detection. J. Electr. Comput. Eng. 2016, 6860217. https://doi.org/10.1155/2016/6860217 (2016).
doi: 10.1155/2016/6860217
Sharma, A., Gupta, B. B., Singh, A. K. & Saraswat, V. Orchestration of apt malware evasive manoeuvers employed for eluding anti-virus and sandbox defense. Comput. Secur. 115, 102627 (2022).
doi: 10.1016/j.cose.2022.102627
Rafiq, H., Aslam, N., Issac, B. & Randhawa, R. H. An investigation on fragility of machine learning classifiers in android malware detection. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) 1–6 (IEEE, 2022).
Jan, S., Ali, T., Alzahrani, A. & Musa, S. Deep convolutional generative adversarial networks for intent-based dynamic behavior capture. Int. J. Eng. Technol. 7, 101–103 (2018).
Taheri, R., Javidan, R., Shojafar, M., Vinod, P. & Conti, M. Can machine learning model with static features be fooled: An adversarial machine learning approach. Cluster Comput. 23(4), 3233–3253 (2020).
doi: 10.1007/s10586-020-03083-5
Ye, Y., Li, T., Adjeroh, D. & Iyengar, S. S. A survey on malware detection using data mining techniques. ACM Comput. Surv. (CSUR) 50, 1–40 (2017).
doi: 10.1145/3073559
Tam, K., Feizollah, A., Anuar, N. B., Salleh, R. & Cavallaro, L. The evolution of android malware and android analysis techniques. ACM Comput. Surv. (CSUR) 49, 1–41 (2017).
doi: 10.1145/3017427
Crussell, J., Gibler, C. & Chen, H. Attack of the clones: Detecting cloned applications on android markets. In European Symposium on Research in Computer Security 37–54 (Springer, 2012).
Zheng, M., Sun, M. & Lui, J. C. Droid analytics: a signature based analytic system to collect, extract, analyze and associate android malware. In 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications 163–171 (IEEE, 2013).
Jiao, S., Cheng, Y., Ying, L., Su, P. & Feng, D. A rapid and scalable method for android application repackaging detection. In International Conference on Information Security Practice and Experience 349–364 (Springer, 2015).
Sun, M., Li, M. & Lui, J. C. Droideagle: Seamless detection of visually similar android apps. In Proceedings of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks 1–12 (2015).
Alam, S. & Sogukpinar, I. Droidclone: Attack of the android malware clones-a step towards stopping them. Comput. Sci. Inf. Syst. 18, 35–35 (2020).
Singh, S., Chaturvedy, K. & Mishra, B. Multi-view learning for repackaged malware detection. In The 16th International Conference on Availability, Reliability and Security 1–9 (2021).
Glanz, L. et al. Codematch: Obfuscation won’t conceal your repackaged app. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering 638–648 (2017).
Ishii, Y., Watanabe, T., Akiyama, M. & Mori, T. Appraiser: A large scale analysis of android clone apps. IEICE Trans. Inf. Syst. 100, 1703–1713 (2017).
doi: 10.1587/transinf.2016ICP0012
He, G., Zhang, L., Xu, B. & Zhu, H. Detecting repackaged android malware based on mobile edge computing. In 2018 Sixth International Conference on Advanced Cloud and Big Data (CBD) 360–365 (IEEE, 2018).
Alam, S. & Sogukpinar, I. Droidclone: Attack of the android malware clones-a step towards stopping them. Comput. Sci. Inf. Syst. 18, 67–91 (2021).
doi: 10.2298/CSIS200330035A
Surendran, R. On impact of semantically similar apps in android malware datasets. arXiv preprint arXiv:2112.02606 (2021).
Wang, H., Si, J., Li, H. & Guo, Y. Rmvdroid: towards a reliable android malware dataset with app metadata. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) 404–408 (IEEE, 2019).
Irolla, P. & Dey, A. The duplication issue within the drebin dataset. J. Comput. Virol. Hacking Techn. 14, 245–249 (2018).
doi: 10.1007/s11416-018-0316-z
Milosevic, N., Dehghantanha, A. & Choo, K.-K.R. Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017).
doi: 10.1016/j.compeleceng.2017.02.013
Zhu, H.-J. et al. Droiddet: Effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018).
doi: 10.1016/j.neucom.2017.07.030
Rana, M. S., Rahman, S. S. M. M. & Sung, A. H. Evaluation of tree based machine learning classifiers for android malware detection. In International Conference on Computational Collective Intelligence 377–385 (Springer, 2018).
Zhang, H., Luo, S., Zhang, Y. & Pan, L. An efficient android malware detection system based on method-level behavioral semantic analysis. IEEE Access 7, 69246–69256 (2019).
doi: 10.1109/ACCESS.2019.2919796
Bai, H., Xie, N., Di, X. & Ye, Q. Famd: A fast multifeature android malware detection framework, design, and implementation. IEEE Access 8, 194729–194740 (2020).
doi: 10.1109/ACCESS.2020.3033026
Mcdonald, J., Herron, N., Glisson, W. & Benton, R. Machine learning-based android malware detection using manifest permissions. In Proceedings of the 54th Hawaii International Conference on System Sciences 6976 (2021).
Sasidharan, S. K. & Thomas, C. Prodroid–an android malware detection framework based on profile hidden markov model. Pervasive Mob. Comput. 72, 101336 (2021).
doi: 10.1016/j.pmcj.2021.101336
Amira, A., Derhab, A., Karbab, E. B., Nouali, O. & Khan, F. A. Tridroid: A triage and classification framework for fast detection of mobile threats in android markets. J. Ambient Intell. Humaniz. Comput. 12, 1731–1755 (2021).
doi: 10.1007/s12652-020-02243-0