1. Summary
With the rapid advancement of satellite design, digital technology, and cloud computing, an increasing amount of data is generated through digital equipment and sensors, such as satellite constellations, smartphones, computers, and advanced measuring infrastructures [
1]. For example, the size of data on the internet is now measured in exabytes (10^18), zettabytes (10^21) [
2] and Yottabyte (10^24). However, the collected ground data and the estimated data are experiencing exponential growth, and their structure is also becoming much more complicated. The processing and analysis of these large-volume data pose both a new challenge and an opportunity with the concept of "big data" [
3,
4]. Indeed, in recent years, big data has gained increasing importance in tech services, data engineering, production systems, and environmental monitoring, leading to significant changes in data management in developing analysis techniques.
Database Management Systems (DBMS) have evolved to handle the growing demand for data storage, processing, and analysis. Traditional data is structured data managed by computing programming experts within organizations. In a traditional database system, a centralized architecture is used to store and maintain data in a fixed format or fields in a file. For managing and accessing the data, Structured Query Language (SQL) is utilized. Its high level of organization and structure makes it easier to store, manage, and analyze. Traditional data is crucial for decision-makers. Big data can be considered as an advanced version of traditional data, dealing with excessively large or complex datasets that are challenging to manage using traditional data-processing application software. It involves a large volume of structured, semi-structured, and unstructured data.
Big datasets are increasingly relevant in the decision-making process. Concepts like precision farming, digital agriculture, and machine learning drive innovation in production chains. Traditional production systems are grappling with issues such as water shortages, drought, and soil exhaustion. Smart grids offer effective solutions to accelerate sustainability in production systems, enhancing the preservation of natural resources. Despite the growing awareness of sustainable development to address climate change effects, it presents significant challenges to smart farming and the stable operation of food production chains. In this context, soil data are critical.
In a collaborative effort coordinated by the Food and Agriculture Organization of the United Nations (FAO), the best available (newer) soil information for central and southern Africa, China, Europe, northern Eurasia, and Latin America was combined into a new product known as the Harmonized World Soil Database (HWSD) [
5]. Until recently, the HWSD was the primary digital map annex database available for global analyses. However, it has several limitations [
6,
7,
8,
9]. Some of these limitations relate to partly outdated soil geographic data and the use of a two-layer model (0–30 and 30–100 cm) for deriving soil properties. Others concern the derived attribute data themselves, particularly their unquantified uncertainty, and the use of three different versions of the FAO legend (i.e., FAO74, FAO85, and FAO90). These issues have been addressed to varying degrees in various new global soil datasets [
10,
11,
12] that still largely draw on a traditional soil mapping approach [
13].
In the last decade, digital soil mapping (DSM) has become a widely used approach to obtain maps of soil information [
14]. DSM involves primarily building a quantitative numerical model between soil observations and environmental information acting as proxies for the soil-forming factors [
13,
14]. DSM can also integrate direct information as proxies for soil properties, such as proximal sensing measurements. The number of studies using DSM to produce maps of soil properties is continually growing. Numerous modeling approaches are considered, from linear models to geostatistics, machine learning, and artificial intelligence (e.g., deep learning). [
16] provide a recent review of methods and applications in the field of DSM.
The significant progress in information and communication technology (ICT) provides a new vision for farmers, academics, researchers, and policymakers to perceive and promote the transformation of traditional production systems into smart production systems. Embedded critical data associated with crops’ water demand and biotic and abiotic stress requires near real-time data, including field measurements and control instructions, transmission, storage, and analysis in a fast and comprehensive way. This paper aims to discuss the concepts of grid soil data framing and its applications. The intent of this paper is threefold. First, the gridded soil data are embedded in one specific data frame. Next, the paper briefly reviews the concepts of web services, friendly user navigation, and hierarchical query. Finally, the manuscript illustrates the detailed applications of soil data analytics in smart grids and point centroids in the Mexico domain. The objective of this study was to create a continuous dataset of soil properties specific to Mexico’s domain, referred to as MSMx (Mesh Soils Mexico). This database includes regional predictions of physical, chemical, and derived soil properties. These properties encompass soil organic carbon content, total nitrogen, coarse fragments, pH (water), cation exchange capacity, bulk density, and texture fractions at six standard depths (up to 200 cm; see
Table 1). Soil physical properties pertain to the structural organization of soil and are indicative of its architecture [
17]. Chemical properties depend on the soil’s chemical composition. Derived properties, such as porosity, air capacity, water capacity, compaction, effective root depth, carbon sequestration, and carbon sequestration potential, necessitate the application of response models to interrelate physical and chemical properties.
Table 1 provides a complete list of soil properties included in the MSMx geodatabase.