Submitted:
14 September 2023
Posted:
15 September 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related research
2.1. Static analysis of website security risks
2.2. Dynamic Analysis of Website Security Risk
2.3. Analysis of website security risk based on similarity hash
2.4. Machine Learning (ML)-based website security threat analysis
3. Suggested method of website risk information collection
3.1. Technique of web Site risk information collection
3.2. Technique Analyze of website risk information collection
3.3. How to collect website risk disclosure information
3.3.1. Web Site DNS Information Collection
3.3.2. Collecting website IP information
3.3.3. Collection of website history information
4. Experiment
4.1. Results of Website Domain Security Risk Measurement
4.2. The security risk measurement results of the website IP
4.3. The security risk measurement results of Website reputation inquiry
4.4. Expected effect
5. Conclusion
References
- Kuyoung Shin, Jinchel Yoo, Changhee Han, et al., "A study on building a cyber attack database using Open Source Intelligence(OSINT)", Convergence Security Journal 19(2), pp. 113-133, 2019.
- Kim, KH. , Lee, DI., Shin, YT. (2018). Research on Cloud-Based on Web Application Malware Detection Methods. In: Park, J., Loia, V., Yi, G., Sung, Y. (eds) Advances in Computer Science and Ubiquitous Computing. CUTE CSA 2017 2017. Lecture Notes in Electrical Engineering, vol 474. Springer, Singapore. [CrossRef]
- Yong-Joon Lee, Se-Joon Park, and Won-Hyung Park,“Military Information Leak Response Technology through OSINT Information Analysis Using SNSes”, Security and Communication Networks, 2022. [CrossRef]
- G. Tan, P. Zhang, Q. Liu, X. Liu, C. Zhu, and F. Dou, "Adaptive Malicious URL Detection: Learning in the Presence of Concept Drifts," 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/ 12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), New York, NY, USA, 2018, pp. 737-743. [CrossRef]
- K. Nandhini, and R. Balasubramaniam, "Malicious Website Detection Using Probabilistic Data Structure Bloom Filter," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2019, pp. 311-316. [CrossRef]
- T. Shibahara, Y. Takata, M. Akiyama, T. Yagi, and T. Yada, "Detecting Malicious Websites by Integrating Malicious, Benign, and Compromised Redirection Subgraph Similarities," 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), Turin, Italy, 2017, pp. 655-664. [CrossRef]
- Himanshu Mishra, Ram Kumar Karsh, and K. Pavani, “Anomaly-Based Detection of System-Level Threats and Statistical Analysis,” Smart Computing Paradigms: New Progresses and Challenges, 2019, pp 271–279. [CrossRef]
- Nayeem Khan, Johari Abdullah, Adnan Shahid Khan, "Defending Malicious Script Attacks Using Machine Learning Classifiers", Wireless Communications and Mobile Computing, vol. 2017, Article ID 5360472, 9 pages, 2017. [CrossRef]
- M. Husak and J. Kaspar, “towards Predicting Cyber Attacks Using Information Exchange and Data Mining,” in 2018 14th International Wireless Communications Mobile Computing Conference(IWCMC), 2018.
- Singhal, U. Chawla, and R. Shorey, "Machine Learning & Concept Drift based Approach for Malicious Website Detection," 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bengaluru, India, 2020, pp. 582-585. [CrossRef]
- Torres, J.M. Comesaña, C.I. García-Nieto, P.J. "Machine learning techniques applied to cyber security",Int.J.Mach.Learn.Cybern.2019.
- D. Liu, and J. Lee, "CNN Based Malicious Website Detection by Invalidating Multiple Web Spams," in IEEE Access, vol. 8, pp. 97258-9 7266, 2020. [CrossRef]











| Comparison item | Static analysis | Dynamic Analysis | Similarity Hash Analysis | ML Learning Analysis |
|---|---|---|---|---|
| Inspection method | Check website sources by registering malicious code patterns | Access websites from virtualized PCs to analyze risk behavior | Check the similarity of malicious code data sets for hash values and files collected on the website | ML learning of malicious code and healthy files to predict risk to files collected on the website |
| Inspection technology | Web Crawling | Virtualized PC | Malicious Code Dataset similarity hash |
Malicious code, regular code ML learning |
| Inspection speed | high speed | low speed | high speed | high speed |
| Zero Day Detection | Unable to check | Inspection available | Inspection available | Inspection available |
| Accuracy | Good | Good | Middle | Good |
| No. | Contents of Collection | Informations of Collection | Methods of Collection |
|---|---|---|---|
| 1 | Domain Country | -Registrant Country | |
| 2 | Domain Registrar | -Registrant Name -Registrant Organization -Registrant |
Whois Search |
| 3 | Domain registration date | -Creation Date -Registered Date -Registered on |
|
| 4 | Domain Expiry Date | -Expiry Date -Expiration Date -Expiration Date |
|
| 5 | Domain Registration Agent | -Authorized Agency -Registrar -Organisation |
|
| 6 | Domain Name Server | -Name Server -Host Name |
|
| 7 | IP Country | -country | Whois Search |
| 8 | IP allocation agency | -organization -netname |
|
| 9 | Country Comparison | - Country Comparison of Domain and IP | |
| 10 | Global Traffic Rank | -Global Traffic Rank Search | Amazon Alexa Search |
| 11 | Number of occurrences | -Malicious code occurrence count | Google Virustotal Search |
| 12 | Time of occurrence | -Malicious code occurrence date | |
| 13 | HTTP Response code | -Checking website access status | HTTP response code Search |
| No. | Contents of Collection | Extraction Keyword | |
| regular website | malicious website | ||
| 7 | IP Country | - unknown 12% - TOP 5∙ KR(387) ∙ US(341) ∙ AU(42) ∙ CN(31) ∙ DE(9) |
- unknown 48% - TOP 5∙ US(230) ∙ AU(53) ∙ RU(25) ∙ NL(21) ∙ GB(21) |
| 8 | IP allocation agency | - unknown 12% - TOP 5∙ IRT-KRNIC-KR(389) ∙ IRT-APNIC-AP(100) ∙ Google LLC(82) ∙ Cloudflare, Inc(62) ∙ RIPE Network Centre(48) |
- unknown 48% - TOP 5∙ IRT-APNIC-AP(153) ∙ RIPE Network Centre(123) ∙ Cloudflare, Inc(53) ∙ Internet Assigned Authority(13) ∙ Google LLC(12) |
| 9 | Country Comparison | - unknown 4% - FALSE 67% - TRUE 29% |
- unknown 45% - FALSE 53% - TRUE 2% |
| No. | Contents of Collection | Extraction Keyword | |
| regular website | malicious website | ||
| 7 | IP Country | - unknown 12% - TOP 5∙ KR(387) ∙ US(341) ∙ AU(42) ∙ CN(31) ∙ DE(9) |
- unknown 48% - TOP 5∙ US(230) ∙ AU(53) ∙ RU(25) ∙ NL(21) ∙ GB(21) |
| 8 | IP allocation agency | - unknown 12% - TOP 5∙ IRT-KRNIC-KR(389) ∙ IRT-APNIC-AP(100) ∙ Google LLC(82) ∙ Cloudflare, Inc(62) ∙ RIPE Network Centre(48) |
- unknown 48% - TOP 5∙ IRT-APNIC-AP(153) ∙ RIPE Network Centre(123) ∙ Cloudflare, Inc(53) ∙ Internet Assigned Authority(13) ∙ Google LLC(12) |
| 9 | Country Comparison | - unknown 4% - FALSE 67% - TRUE 29% |
- unknown 45% - FALSE 53% - TRUE 2% |
| No. | Contents of Collection | Extraction Keyword | |
| regular website | malicious website | ||
| 10 | Global Traffic Rank | - unknown 13% - TOP 5∙ 1(18) ∙ 13(8) ∙ 233(5) ∙ 11(4) ∙ 572(3) |
- unknown 89% - TOP 5∙ 369628(9) ∙ 2645(4) ∙ 6942(3) ∙ 2149797(3) ∙ 6722(3) |
| 11 | Malicious code occurrence count | - unknown 96% - TOP 5∙ 1(23) ∙ 2(4) ∙ 460(1) ∙ 29(1) ∙ 7(1) |
- unknown 1% - TOP 5∙ 1(825) ∙ 2(76) ∙ 3(38) ∙ 4(17) ∙ 5(9) |
| 12 | Malicious code occurrence date | - unknown 96% - TOP 5∙ 2021.03.28(1) ∙ 2021.03.27(1) ∙ 2020.01.13(1) ∙ 2018.09.20(1) ∙ 2018.07.27(1) |
- unknown 1% - TOP 5∙ 2021.01.13(6) ∙ 2021.01.11(6) ∙ 2021.01.18(5) ∙ 2021.01.11(5) ∙ 2021.01.14(5) |
| 13 | HTTP Response code |
- unknown 11% - TOP 5∙ 301(488) ∙ 302(198) ∙ 200(126) ∙ 400(19) ∙ 403(17) |
- unknown 59% - TOP 5∙ 200(165) ∙ 301(134) ∙ 302(48) ∙ 403(35) ∙ 404(12) |
| Search Contents | Website Security Risk Utilization of Public Information | |||||
| Contents of Collection | Unknown | significant | Unknown rate |
risk | ||
| regular | malicious | |||||
| Domain Information |
⚫ Domain Country | 64% | 75% | × | - | - |
| ⚫ Domain Registrar | 62% | 77% | × | - | - | |
| ⚫ Domain registration date | 12% | 66% | ○ | ↑ | ↑ | |
| ⚫ Domain Expiry Date | 11% | 66% | ○ | ↑ | ↑ | |
| ⚫ Domain Registration Agent | 10% | 66% | ○ | ↑ | ↑ | |
| ⚫ Domain Name Server | 10% | 66% | ○ | ↑ | ↑ | |
| IP Information | ⚫ IP Country | 12% | 48% | ○ | ↑ | ↑ |
| ⚫ IP allocation agency | 12% | 48% | ○ | ↑ | ↑ | |
| ⚫ Country Comparison of DNS and IP | 4% | 45% | ○ | ↑ | ↑ | |
| Reputation Information |
⚫ Global Traffic Rank | 13% | 89% | ○ | ↑ | ↑ |
| ⚫ Malicious code occurrence count | 96% | 1% | ○ | ↑ | ↓ | |
| ⚫ Malicious code occurrence date | 96% | 1% | ○ | ↑ | ↓ | |
| ⚫ HTTP Response code | 11% | 59% | ○ | ↑ | ↑ | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).