Submitted:
30 July 2024
Posted:
30 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Searching Appropriate Information
1.2. Knowledge Discovery from WWW
1.3. Data Personalization
1.4. Analysis of Individual User Preferences
2. Related Work
3. Approaches of Web Content Mining
3.1. Unstructured Data Mining
3.1.1. Information Extraction

3.1.2. Topic Tracking
2.1.3. Summarization
- ✓ The extractive methodological approach chooses a subcategory of phrases, sentences, and words to formulate summary from the actual text.
- ✓ The abstractive methodological approach develops an internal semantic illustration and also uses NLP-based techniques to create summarization techniques. This summary would consists of words which aren’t included in the actual textual document.

3.1.4. Categorization
3.1.5. Clustering
3.1.6. Information Visualization
3.2. Structured Data Mining
3.2.1. Web Crawler
3.2.2. Web Page Content Mining
3.2.3. Wrapper Generation
3.3. Semi Structured Data Mining
3.3.1. Object Exchange Model (OEM)
3.3.2. Web Data Extraction
3.3.3. Top Down Extraction
4. Comparision of Web Content Mining Tools
4.1. Web Info Extractor
4.2. Mozenda
4.3. Screen Scraper
5. Results

6. Conclusion
References
- E. T. John, B. Skaria, and P. X. Shajan, “An overview of web content mining tools,” Bonfring Int. J. Data Min., vol. 6, no. 1, pp. 1–3, 2016.
- N. R. Satish, “A study on applications, approaches and issues of web content mining,” Int. J. Trend Res. Dev., vol. 4, no. 6, pp. 41–43, 2017.
- P. Sharma, S. Jain, S. Gupta, and V. Chamola, “Role of machine learning and deep learning in securing 5G-driven industrial IoT applications,” Ad Hoc Networks, vol. 123, p. 102685, 2021. [CrossRef]
- M. A. Kamal, H. W. Raza, M. M. Alam, and M. Mohd, “Highlight the Features of AWS, GCP and Microsoft Azure that Have an Impact when Choosing a Cloud Service Provider,” Int. J. Recent Technol. Eng., vol. 8, no. 5, pp. 4124–4232, 2020. [CrossRef]
- J. H. Cox et al., “Advancing software-defined networks: A survey,” Ieee Access, vol. 5, pp. 25487–25526, 2017. [CrossRef]
- M. A. Kamal and A. Ali, “Role and Effectiveness of IOT in E-Learning: A Digital Approach for Higher Education,” Innov. Comput. Rev., vol. 3, no. 1, 2023.
- G. Karagiannis et al., “Vehicular networking: A survey and tutorial on requirements, architectures, challenges, standards and solutions,” IEEE Commun. Surv. Tutorials, vol. 13, no. 4, pp. 584–616, 2011. [CrossRef]
- M. A. Kamal et al., “Resource Allocation Schemes For 5G Network : A Systematic Fo r P ee r R ev iew Fo r P ee r R,” 2021.
- C. T. B. Garrocho, E. Klippel, A. V. Machado, C. M. S. Ferreira, C. F. M. da Cunha Cavalcanti, and R. A. R. Oliveira, “Blockchain-based machine-to-machine communication in the industry 4.0 applied at the industrial mining environment,” in 2020 X Brazilian Symposium on Computing Systems Engineering (SBESC), 2020, pp. 1–8.
- H. Liu, “Big data drives cloud adoption in enterprise,” IEEE Internet Comput., vol. 17, no. 4, pp. 68–71, 2013. [CrossRef]
- M. A. Kamal, M. Shahid, and H. Khawar, “The Mathematical Model for searching the Shortest Route for TB Patients with the help of Dijkstra’s Algorithm,” Sukkur IBA J. Comput. Math. Sci., vol. 5, no. 2, pp. 41–48, 2021. [CrossRef]
- M. Horowitz and E. Grumbling, “Quantum computing: progress and prospects,” 2019.
- H. W. Raza, M. A. Kamal, M. Alam, and M. S. M. Su’ud, “A Review Of Middleware Platforms In Internet Of Things: A Non – Functional Requirements Approach,” J. Indep. Stud. Res. Comput., 2020. [CrossRef]
- D. V Dimitrov, “Medical internet of things and big data in healthcare,” Healthc. Inform. Res., vol. 22, no. 3, pp. 156–163, 2016. [CrossRef]
- H. Khawar, T. R. Soomro, and M. A. Kamal, “Machine learning for internet of things-based smart transportation networks,” in Machine Learning for Societal Improvement, Modernization, and Progress, IGI Global, 2022, pp. 112–134. [CrossRef]
- Z. Yu, Optimization techniques in data mining with applications to biomedical and psychophysiological data sets. The University of Iowa, 2009.
- M. O. Samuel, A. I. Tolulope, and O. O. Oyejoke, “A systematic review of current trends in web content mining,” in Journal of Physics: Conference Series, 2019, vol. 1299, no. 1, p. 12040. [CrossRef]
- X. L. Mary, G. Silambarasan, and M. phil Scholar, “Web content mining: tool, technique & concepts,” Int. J. Eng. Sci, vol. 7, no. 5, p. 11656, 2017.
- M. J. H. Mughal, “Data mining: Web data mining techniques, tools and algorithms: An overview,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 6, 2018.
- M. Saravaiya Viralkumar, R. J. Patel, and N. K. Singh, “Web Mining: A Survey on Various Web Page Ranking Algorithms,” Int. Res. J. Eng. Technol. (IRJET), e-ISSN, pp. 56–2395, 2016.
- Mebrahtu and B. Srinivasulu, “Web content mining techniques and tools,” Int. J. Comput. Sci. Mob. Comput., vol. 6, no. 4, pp. 49–55, 2017.
- S. Vidya and K. Banumathy, “Web mining-concepts and application,” Int. J. Comput. Sci. Inf. Technol., vol. 6, no. 4, pp. 3266–3268, 2015.
- L. Sadath, “Data mining in E-commerce: a CRM platform,” Int. J. Comput. Appl., vol. 68, no. 24, 2013.
- Lang, S. Xia, and C. Liu, “Style and fit customization: a web content mining approach to evaluate online mass customization experiences,” J. Fash. Mark. Manag. An Int. J., vol. 25, no. 2, pp. 224–241, 2021. [CrossRef]
- P. Bhat, P. Malaganve, and P. Hegde, “A new framework for social media content mining and knowledge discovery,” Int. J. Comput. Appl., vol. 182, no. 36, pp. 17–20, 2019.
- S. Singh and M. S. Aswal, “Ontology learning procedures based on web mining techniques,” in International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India, 2019.
- Van Aartsen, O. F. El-Gayar, and C. Noteboom, “A systematic review of web usage mining techniques and future research options,” 2020.
- J. Jin and X. Lin, “Web Log Analysis and Security Assessment Method Based on Data Mining,” Comput. Intell. Neurosci., vol. 2022, no. 1, p. 8485014, 2022. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).