Preprint Article Version 1 This version is not peer-reviewed

Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives

Version 1 : Received: 11 October 2017 / Approved: 12 October 2017 / Online: 12 October 2017 (04:55:57 CEST)
Version 2 : Received: 16 October 2017 / Approved: 17 October 2017 / Online: 17 October 2017 (03:47:41 CEST)

How to cite: Wang, Q.; Zhao, X..; Huang, J..; Feng, Y.; Liu, Z..; Su, J.; Luo, Z.; Cheng, G. Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives. Preprints 2017, 2017100076 (doi: 10.20944/preprints201710.0076.v1). Wang, Q.; Zhao, X..; Huang, J..; Feng, Y.; Liu, Z..; Su, J.; Luo, Z.; Cheng, G. Addressing Complexities of Machine Learning in Big Data: Principles, Trends and Challenges from Systematical Perspectives. Preprints 2017, 2017100076 (doi: 10.20944/preprints201710.0076.v1).

Abstract

The concept of ‘big data’ has been widely discussed, and its value has been illuminated throughout a variety of domains. To quickly mine potential values and alleviate the ever-increasing volume of information, machine learning is playing an increasingly important role and faces more challenges than ever. Because few studies exist regarding how to modify machine learning techniques to accommodate big data environments, we provide a comprehensive overview of the history of the evolution of big data, the foundations of machine learning, and the bottlenecks and trends of machine learning in the big data era. More specifically, based on learning principals, we discuss regularization to enhance generalization. The challenges of quality in big data are reduced to the curse of dimensionality, class imbalances, concept drift and label noise, and the underlying reasons and mainstream methodologies to address these challenges are introduced. Learning model development has been driven by domain specifics, dataset complexities, and the presence or absence of human involvement. In this paper, we propose a robust learning paradigm by aggregating the aforementioned factors. Over the next few decades, we believe that these perspectives will lead to novel ideas and encourage more studies aimed at incorporating knowledge and establishing data-driven learning systems that involve both data quality considerations and human interactions.

Subject Areas

big data; machine learning; regularization; data quality; robust learning framework

Readers' Comments and Ratings (0)

Leave a public comment
Send a private comment to the author(s)
Rate this article
Views 0
Downloads 0
Comments 0
Metrics 0
Leave a public comment

×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.