Submitted:
29 July 2024
Posted:
29 July 2024
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
1.1. Searching Appropriate Information
1.2. Knowledge Discovery from WWW
1.3. Data Personalization
1.4. Analysis of Individual User Preferences
2. Related Work
3. Approaches of Web Content Mining
3.1. Unstructured Data Mining
3.1.1. Information Extraction

3.1.2. Topic Tracking

3.1.3. Summarization
- ✓
- The extractive methodological approach chooses a subcategory of phrases, sentences, and words to formulate summary from the actual text.
- ✓
- The abstractive methodological approach develops an internal semantic illustration and also uses NLP-based techniques to create summarization techniques. This summary would consists of words which aren’t included in the actual textual document.

3.1.4. Categorization
3.1.5. Clustering
3.1.6. Information Visualization
3.2. Structured Data Mining
3.2.1. Web Crawler
3.2.2. Web Page Content Mining
3.2.3. Wrapper Generation
3.3. Semi Structured Data Mining
3.3.1. Object Exchange Model (OEM)
3.3.2. Web Data Extraction
3.3.3. Top Down Extraction
4. Comparision of Web Content Mining Tools
4.1. Web Info Extractor
4.2. Mozenda
4.3. Screen Scraper
5. Results

6. Conclusions
References
- John, D.E.T.; Skaria, B.; Shajan, P.X. , “An Overview of Web Content Mining Tools,” Bonfring Int. J. Data Min., vol. 6, no. 1, pp. 01–03, 2016. [CrossRef]
- Satish, N.R. , “A Study on Applications, Approaches and Issues of Web Content Mining,” Int. J. Trend Res. Dev., vol. 4, no. 6, pp. 41–43, 2017.
- Yu, Z. , “Optimization techniques in data mining with applications to biomedical and psychophysiological data sets,” ProQuest Diss. Theses, vol. 1464852, p. 91, 2009.
- Samuel, M.O.; Tolulope, A.I.; Oyejoke, O.O. , “A Systematic Review of Current Trends in Web Content Mining,” J. Phys. Conf. Ser., vol. 1299, no. 1, 2019. [CrossRef]
- Mary, X.L.; Silambarasan, G. , “Web Content Mining : Tool, Technique & Concepts,” vol. 7, no. 5, pp. 11656–11660, 2017.
- Mughal, M.J.H. , “Data mining: Web data mining techniques, tools and algorithms: An overview,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 6, pp. 208–215, 2018. [CrossRef]
- Thacker, P.; Prof, A.; Thacker, C. , “a Review Paper on Various Web Page Ranking Algorithms in Web Mining,” Int. J. Adv. Eng. Res. Dev., vol. 3, no. 02, pp. 192–197, 2016. [CrossRef]
- Mebrahtu, A.; Srinivasulu, B. , “Web Content Mining Techniques and Tools,” Int. J. Comput. Sci. Mob. Comput., vol. 6, no. 4, pp. 49–55, 2017.
- Vidya, S.; Banumathy, K. , “Web Mining- Concepts and Application,” vol. 6, no. 4, pp. 3266–3268, 2015.
- Gu, H. , “Data mining in the application of e-commerce website,” Adv. Intell. Syst. Comput., vol. 180 AISC, no. 8, pp. 493–497, 2013. [CrossRef]
- Lang, C.; Xia, S.; Liu, C. , “Style and fit customization: a web content mining approach to evaluate online mass customization experiences,” Journal of Fashion Marketing and Management: An International Journal, vol. ahead-of-print, no. ahead-of-print, Jul. 2020. [CrossRef]
- Bhat, P.; Malaganve, P.; Hegde, P. , “A New Framework for Social Media Content Mining and Knowledge Discovery,” International Journal of Computer Applications, vol. 186, no. 36, pp. 17–20, Jan. 2019. [CrossRef]
- Singh, S.; Aswal, M.S. , “Ontology Learning Procedures Based on Web Mining Techniques,” SSRN Electronic Journal, 2019. [CrossRef]
- Van Aartsen, B.; El-Gayar, O.; Noteboom, C. , “A Systematic Review of Web Usage Mining Techniques and Future Research Options,” Research & Publications, Jan. 2020, Accessed: Jul. 25, 2024. [Online]. Available: https://scholar.dsu.edu/bispapers/134/.
- Jin, J.; Lin, X. , “Web Log Analysis and Security Assessment Method Based on Data Mining,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–9, Aug. 2022. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).