3.2. About R
Effective data analysis requires a basic understanding of statistics and access to appropriate computational tools. While Excel is widely used, it is not well-suited for statistical analysis. Fortunately, many statistical packages are available. I recommend
R, a free and open-source environment supported by a vibrant and helpful community. The latest version can always be downloaded from the CRAN website [
14].
Using R does require some programming, but I encourage learning it in parallel with statistics—this is how I teach my students. Statistical study often involves tedious calculations, which R can handle efficiently, allowing learners to focus on interpretation rather than arithmetic. And if you get stuck, just ask online—someone will almost certainly help. That’s the power of a supportive community.
That said, Excel remains convenient for data handling. You can prepare your dataset in Excel, then export it as a text-based file and import it into R using:
data <- read.table(file = “xxx.txt”, sep = “\t”, header = TRUE)
This command stores the file contents in an object called data. In R, although the equals sign (=) is also accepted, assignment is typically indicated with <- or ->. To treat the data as a matrix, simply apply:
data <- as.matrix(data)
All necessary R code is available at zendo [
15].
3.3. On Exploratory Data Analysis (EDA)
Here, I advocate for exploratory data analysis (EDA) as a statistical approach [
11,
12]. EDA seeks to understand the nature of data by examining its properties without preconceptions, making it a branch of statistics that aligns closely with the scientific method [
11,
12]. The process begins by acknowledging a lack of prior knowledge about the data’s structure. Analytical methods are selected based on the data itself, with a healthy scepticism toward any predefined mathematical model. As a result, the first focus becomes the data’s distribution. Since all distribution types are mathematically defined, one can generate idealised data accordingly. By directly comparing the quantiles of this ideal data with those of the observed data, it becomes possible to assess whether the empirical data conforms to the assumed distribution. This forms the basis of the quantile–quantile (Q–Q) plot technique. If the relationship is linear, then the data will be distributed according to that model.
There exists a mathematical theorem known as the Central Limit Theorem. This states that the sum of multiple random numbers follows a normal distribution. Such phenomena are extremely common. Therefore, those employing EDA first investigate whether the data is normally distributed. R provides a remarkably straightforward solution for this purpose.
qqnorm(data)
Just this one line. If you set this data to magnitude data, the result will surely be a straight line (
Figure 1A). This is the simplest way to realise a QQplot [
10,
11]. To do this a little more carefully,
ideal <- qnorm(ppoints(length(data)))
This means: prepare probability points (ppoints) equal to the length of the data, find the corresponding quantiles from the normal distribution (qnorm), and store them in an object called ideal. By comparing this with the sorted data and plotting it, we can compare the quantiles of the data with those of the normal distribution.
plot(ideal, sort(data))
Adding
z<- line(sort(data)~ideal)
abline(coef(z))
draws a line of best fit (abline). Using coef(z) to extract the coefficients gives the slope as an estimate of the data’s scale σ, and the intercept as an estimate of the location μ. If you forget R functions, execute `?line`. A tutorial will appear immediately.
By leveraging the normally distributed nature of the data, one can construct a grid with one-degree intervals in both latitude and longitude, and estimate the scale of magnitude within each cell to identify regions exhibiting anomalous behavior (
Figure 1B) [
9]. As expected, magnitudes tend to increase in larger scale or locations [
6]. This approach alone reveals how seismic magnitude is spatially distributed—try it yourself. Sometimes, magnitudes tend to increase in several regions. Such patterns serve as important indicators of potential seismic hazards [
9].
As a general principle, plotting a histogram of normally distributed random numbers produces the characteristic bell-shaped curve, rising and falling symmetrically (
Figure 1C). This familiar form is widely recognised. When the same data are displayed on a semi-logarithmic scale, the curve assumes a cannonball-like shape (
Figure 1D). In this representation, the right-hand tail appears linear; however, this is an illusion. In reality, the slope increases with larger values, and the apparent linearity lacks mathematical significance—it is merely an artefact of the scaling. This phenomenon reflects the true nature of the GR law. Notably, the GR law holds only by disregarding the smaller data points; it does not even encompass the median, which is likewise excluded.
Which do you believe: GR law or the normal distribution? My resolve to undertake research in this field crystallised precisely when I executed `qqnorm(data)`.
Magnitude is a logarithmic representation of earthquake energy. A normal distribution of magnitude therefore implies that energy follows a log-normal distribution—a form that naturally arises when multiple factors interact synergistically to determine an outcome. This, I propose, is the true essence of earthquakes.
3.5. Number of Aftershocks
The decay of aftershock frequency over time has long been described by Omori’s formula, later modified by Utsu [
19,
20]. These models propose an inverse proportionality to time. However, this assumption does not hold up under scrutiny. You can verify this yourself. Simply plot the number of aftershocks at each time point t using:
plot(t, number, log = “y”)
This procedure yields a semi-logarithmic plot. If the data follow a true linear relationship, the result should appear as a straight line. Given that earthquake frequency is often approximated by a log-normal distribution, I initially plotted the data on a semi-log scale and observed a clear linear decline (
Figure 2A). However, introducing an inverse proportionality term—as in the Omori–Utsu law—causes the plot to deviate from linearity (
Figure 2B) [
7]. Although the Omori–Utsu formula includes three high-degree-of-freedom parameters, adjusting any one of them fails to restore linearity. This is because the underlying assumption of inverse proportionality is itself fundamentally flawed.
A linear decrease on a semi-log plot suggests a first-order reaction with a characteristic half-life. This is typical of systems where the rate of change is proportional to the remaining quantity—such as radioactive decay or spring oscillation. Earthquakes, too, appear to settle in this manner. When statistical properties are clarified, the underlying mechanism begins to emerge. This, in turn, allows for the construction of a parsimonious model—the very goal of analysis in Exploratory Data Analysis (EDA) [
11].
Immediately after an earthquake, the location parameter of the Q–Q plot shown in
Figure 1A increases (
Figure 2B). This increase subsequently decays over time, following a half-life pattern. Notably, the duration of this half-life is considerably shorter than that observed for the decay in earthquake frequency. Since both the magnitude and frequency of earthquakes follow a log-normal distribution, it is likely that the timing and size of earthquakes are governed by the synergistic effects of multiple factors. However, the discrepancy in half-lives suggests that the final determining factor—the proverbial last card—differs between the two. If we could identify and quantify these last cards, we might be able to predict with considerable accuracy not only the scale of earthquakes, but also when and where they will occur.
Figure 2D displays the locations of several sites referenced in this study, along with their respective seismic hazard levels. The figure incorporates J-SHIS data provided by the National Research Institute for Earth Science and Disaster Resilience [
18].
3.6. Position of the Boundary
Although focal depths are routinely measured, the Japan Meteorological Agency (JMA) has continued to rely on outdated models of the tectonic plates surrounding Japan, based on submissions from years past [
21,
22]. This persistence highlights a lack of understanding of the current three-dimensional tectonic configuration.
Fortunately, this issue can be addressed with relative ease using R. While R’s core functionality supports many types of analysis, it also offers a rich ecosystem of specialized packages—known as libraries—for more advanced tasks. One particularly useful package for 3D visualization is rgl [
23]. To install it, simply run:
install.packages(“rgl”)
This downloads the package from a CRAN mirror and integrates it into your R environment—a testament to the community’s generosity, as maintaining such packages requires significant effort. Once installed, load the library with:
library(rgl)
To visualize three-dimensional data (e.g., a matrix with three column vectors representing x, y, and z coordinates), use:
plot3d(data)
This displays a 3D plot in the R console (
Figure 3A). To export it as an interactive HTML widget:
rglwidget()
This allows you to save and share the visualization with others [
7].
Using this approach, I was able to visualise the distribution of earthquake hyposentres, revealing the actual position of the plate boundary—a structure that had long been misrepresented [
7].
3.7. A Single Boundary as a Tilted Plane
The plate boundary can be represented as a single tilted plane in three-dimensional space—naturally described by the standard equation of a plane. Earthquake hyposentres located on this surface can also be visualized (
Figure 3B).
To achieve this, I employed Principal Component Analysis (PCA) [
24,
25]. PCA is a multivariate analysis technique commonly used for dimensionality reduction. Its strength lies in its objectivity—it yields consistent results regardless of the analyst—making it particularly well-suited for scientific applications. A detailed introduction to this method is provided in the appendix of [
7], which I encourage you to read. It may prove useful in many other contexts as well. The implementation in R is remarkably concise, requiring only a few lines of code [
15].
When applied to earthquake hyposentre data—typically described by latitude, longitude, and depth—PCA can reveal the dominant spatial patterns of seismicity. Each hyposentre occupies a unique position in this three-dimensional space. PCA identifies these directions of variation using orthogonal vectors. PC1 captures the greatest variance, passing through the centroid of the data. PC2 captures the second greatest variance, orthogonal to PC1. PC3, orthogonal to both PC1 and PC2, represents finer variations and the curvature of the data distribution. PC1 and PC2 function as independent, orthogonal axes, defining a single plane. The plane defined by PC1 and PC2—referred to here as the
boundary plane—captures the dominant spatial structure of the hyposentre distribution. PC3 is the normal vector to this plane, along which the finer variations in the data and the curvature of the plane are expressed. Furthermore, using this normal vector, an equation representing the boundary plane can be readily found. This allows the position of the boundary to be clearly and objectively mapped. Furthermore, by projecting the hyposentres onto the PC1–PC2 plane, their spatial clustering along the boundary can be visualised objectively (
Figure 3B, 3C). In this case, PC1 primarily represents depth, while PC2 represents direction on a map.
Thus, as the position of the boundary became clearer, it was found that parallel to the line where it meets the ground lies a shallow, belt-shaped region prone to frequent earthquakes, referred to as the shallow seismic zones (
Figure 3CD). This area is likely where crustal material from each plate is compressed, accumulating stress within the plates. For instance, the Seto boundary may represent such a zone; here too, there is no plate subduction, and earthquake hyposentres are concentrated at shallow depths.
From the precise positions of the boundaries obtained in this manner, the locations of the plates can be estimated (
Figure 3D).
Using mesh and 3D viewing provides two complementary perspectives for observing earthquakes, though some practice is needed to use these tools effectively. Regional variability is an important consideration: for example, the Noto Peninsula is a high-hazard area located near the convergence of the Eurasian, Pacific, and Philippine Sea plates, a setting that favors the development of shallow seismic zones [
26].
The M7.6 earthquake on the Noto Peninsula in 1st January 2024 [
27] was preceded by a precursor swarm at 5 May 2023 (M6.5), during which both magnitude parameters μ and σ increased. However, the observed increases largely reflect its influence. After this swarm, seismicity declined; from October to December 2023, when event counts had fallen sufficiently, a magnitude anomaly was detectable only in the locator (
Figure 4 AB). How should risk be assessed under these conditions?
First, proximity to a plate boundary makes the region inherently seismically active, so even an anomaly confined to the locator warrants concern. Second, the locator increase during the 2023 swarm was very small: although event counts decreased along two linear trends, magnitudes did not clearly follow that pattern (
Figure 4C). Typical mainshock locator rises span several days and amount to roughly 2–3 units; here the rise was under 1 unit and largely returned to baseline within a day [
7]. This implies that the conditions for accumulating the energy needed to trigger a large mainshock were not fully met, and the swarm may not have released energy as a conventional mainshock. By October, however, magnitudes began a gradual increase, culminating in values that exceeded those recorded during the 5 May event (
Figure 4C).
Volcanic activity likely affects the adjacent offshore region. This sector of the Sea of Japan is anomalous, with elevated locators [
9].. During volcanic episodes at Mount Aso and the Tokara Islands, σ has been observed to decrease [
8,
10], a similar σ reduction was detected west of the Noto Peninsula (
Figure 4D). Other regional signatures—such as a fine-silt bank and low Bouguer anomalies—are consistent with volcanic-type processes [
28]. Immediately after the M7.6 quake on 1 January 2024, seismicity initiated in this offshore area and, unlike the 2023 pattern (
Figure 4E), expanded broadly around the peninsula (
Figure 4F).
A plausible interpretation is that 2024 seismicity was modulated by residual energy accumulated during 2023, which interacted with volcanic-type processes offshore. This interaction may have contributed to the generation of the large, damaging January 2024 earthquake. Thus, the region functions both as a tectonic convergence zone and as an area with volcanic features, both of which are linked to plate dynamics.
Moreover, even when an abnormality is detected, its interpretation may remain unclear. For instance, recent data reveal clusters of large, lump-like hyposentres along the Seto boundary (
Figure 5A, green), scattered across Wakayama Prefecture at depths of approximately 10–15 km and around 30 km (
Figure 5B). This region has long exhibited high seismic activity, with consistently elevated earthquake counts recorded since the 1990s. Yet no anomalies are apparent in the magnitude parameters, as large earthquakes are rare. How, then, should this swarm be understood? Although the local meteorological observatory is aware of the phenomenon, no satisfactory explanation for its underlying cause has been provided to date [
29]. Clarifying how such patterns should be interpreted remains an open challenge for future research.
Furthermore, in December 2025, a noticeable increase in very deep earthquakes was observed along the boundaries in Ogasawara (
Figure 5CD) and Hokkaido (
Figure 5EF). For comparison, data from 2010 are presented; however, this phenomenon was particularly prominent in December 2025. The activity was dispersed across the entire region, with no clear concentration in any specific area. Due to the considerable depth of these events, no significant surface damage was reported. Could this signal, for instance, heightened activity within the Pacific Plate? Is this a phenomenon that merits closer attention? These questions remain open, underscoring the need for continued monitoring and investigation.
Incidentally, the concentration of seismic activity observed at shallow depths on the Hokkaido boundary (
Figure 5F, green) may represent precursors to a larger earthquake. This cluster is located at E147.7, N44.4, in the offshore waters near Iturup (Etorofu) Island—a shallow region situated some distance from the Kuril (Chishima) –Kamchatka Trench. This region is seismically active, with a history of frequent earthquakes, including a notable event in 2020 [
30]. Should a major earthquake occur in this area, its proximity to major cities such as Nemuro, Kushiro, and Abashiri means that any resulting tsunami could arrive rapidly. Moreover, the geographic features of the Nemuro and Shiretoko peninsulas, which form prominent capes, may enhance the risk of wave refraction, potentially amplifying tsunami impacts. At present, the magnitude anomaly indicates a notably elevated location parameter, although the scale remains unchanged. Even so, continued vigilance is essential.