Research data recycling through open sharing and reuse : A case study of sustainable digital good consumption in the sharing economy

-------------------------------------------------------------------------------------------------------------------Abstract: In order to meet the needs of an increasingly complex research landscape, researchers engage in “collaborative prosumption” through open data sharing and reuse. Although significant gains have been achieved in this regards because of growing requirements from funding agencies, governments and journals, the question of how reuse of openly available data for new research contribute to sustainability is yet to be appropriately addressed in the literature. Therefore, relying on a three stage stratified clustered random sampling of the Journal of Applied Econometrics data archive (JAEDA), the present research provides a case study of the value of research data recycling for sustainable research and economic development. More specifically our analysis show that reformatting from wide to long format, openly shared equity price index data on eleven European countries’ extracted from JAEDA, and augmented with country level geospatial Meta data, provides a new basis for interesting descriptive analytics and spatio-temporal econometric modeling and inference. Given the ever-increasing volume of openly available research data, our study provides a firsthand insight on open data reuse, which should benefit all stakeholders in the research community, as they seek sustainable solutions for scientific productivity and progress.


Introduction
Although there is a plethora of literature data on the value of recycling in general by scholars, there is a paradoxical paucity of literature on the value of data recycling in particular among scholars.
With the recent developments in computational simulation and modeling, automated data acquisition and communication technologies the concept of data has evolved to be more inclusive (Curty et al, 2017;Chawinga & Zinn, 2019). In addition to digital manifestations of literature, data refers as well to forms of databases generally requiring the assistance of computational machinery and software in order to be useful (Zhang et al, 2017). Consequently, the object of data has been rendered a key infrastructural component of science (Nielsen, 2020), at the root of a new era of research called "the fourth paradigm: data-intensive scientific discovery" (Tolle et al, 2011;Pasquetto et al, 2017;Wu et al, 2019). Cornerstone of the fourth industrial revolution, and a follow up to the previous research paradigms (including theoretical and experimental), the fourth paradigm seeks to have all of science literature, and all of science data openly inter-operating with each other online (Borst & Limani, 2020). Hence the emergence of "open research data" (Mancini et al, 2020), which has been defined as structured, machine-readable, data that is actively published on the internet for public re-use, and that is freely accessible, usable, modifiable, and sharable by academic researchers (Allen & Mehler, 2019;Mayernik, 2017;Powers & Hampton, 2019;Turki et al, 2019;Zuiderwijk & Hinnant, 2019).
In this context, "Data sharing" is better understood as a problem of scarce research resources optimization. Indeed, researchers typically produce large amount of data, part of which may be useful to others for the reproduction and verification of previous research results (Hardwicke et al, 2019;Powers & Hampton, 2019), the formulation of new research questions (Boté & Térmens, 2019), and the advancement of research and innovation . However, making potentially useful data available to others requires substantial investment, in documentation and data curation, beyond the actual conduct of the initial research for which the data was primarily collected (Wallis et al, 2013). Economically, this creates a misalignment in incitation, since the originating investigator bears the full cost (time and efforts) of making the data available openly, while unknown and often non-existent re-users reap the benefits. From the perspective of the researcher contemplating sharing, a valid question then is: "Which data are worth the investment effort, as they might get reused by others?" Given that specific reuse intentions by other researchers only arise when the data are discoverable, accessible and usable (Stall et al, 2019), one can ask an equally valid question: How can demand of data reuse be built if potential users of the data are unaware of its existence and utility?
Hence the need for studies (including data descriptors) that showcase the re-use potential of various data sources, so as to raise awareness of their utility for prospective investigations. Moreover, pointing out the potential collaborative effort needed to address data sharing and reuse for the well-being of all stakeholders in any economic system Hanson et al, (2011)

declared:
"We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support, of much-improved data curation"; while Borgman (2012) suggested: "if the rewards of the data deluge are to be reaped, then researchers who produce those data must share them and do so in such a way that the data are interpretable and reusable by others".
The present research therefore inscribes itself in this dynamic. Although previous authors have provided insights about the motivating factors of open data sharing and re-use in other fields including archaeobotany (Lodwick, 2019), astrophysics (Zuiderwijk & Spiers, 2019), biomedical (Park et al, 2018;Heller et al, 2019), ecology (Zimmerman, 2008), psychology (Houtkoop et al, 2019), and science in general (Chawinga & Zinn, 2019;Federer et al, 2018;Tenopir et al, 2015), the economics literature is yet to properly fill this void. Indeed, although Waugh (2010) discusses the value of digital assets for sustainable economic development, to date a limited literature covers the specific value of data sharing and re-use within the field of economics (Cavallo & Rigobon, 2016;Mullainathan & Spiess, 2017). An even fewer number addresses the idea of open research data sharing and re-use for sustainable economic development (Hilbert, 2016;Jetzek et al, 2019).
Therefore the present study aims at filling this gap in the scientific discourse, by investigating the potential of the Journal of Applied Econometrics data archive to contribute to sustainable scientific research and economic development.
As a case study of the sharing economy, within the specific context of digital goods/assets sharing and re-using, the present study is also in line with the United Nations (UN) global agenda for Sustainable Development. Indeed, goal 12 of the UN sustainable development goals (SDGs) is broadly concerned with ensuring sustainable consumption and production patterns by 2030; while its target 12.5 specifically addresses the substantial reduction of waste generation through prevention, reduction, recycling, and reuse. In addition, SGDs target 12.8 is specifically concerned with ensuring that people everywhere have the relevant information and awareness for sustainable development and lifestyles in harmony with nature; while SDGs target 12.a looks at supporting developing nations to strengthen their scientific and technological capacity to move towards more sustainable patterns of consumption and production. Given that open data sharing in general has the potential to: Facilitate the use of digital assets for scientific (academic) research (ii) Contribute to the expansion of the market for sharable "digital assets" (iii) Encourage multiple perspectives, discourage fraud, help identify errors (iv) Provide useful information for training new researchers, increase efficient use of funding and respondent population resources by avoiding duplicate data collection The key question that will be addressed in the present study in line with our above research objective is: How can data sharing and re-use through the Journal of Applied Econometrics data archive contribute to the promotion of sustainable scientific research and economic development?
To address this question, we organize the rest of the paper as follows: in section 2 we review the literature on open data sharing and re-use; in section 3 we present the adopted methodology to bring the research to fruition; in section 4 we present and discuss the results; and finally in section 5 we conclude the analysis with policy recommendations and future directions for research.

2) Literature review
The topic of open data sharing and reuse has been an important and growing part of contemporary scholarly debates (Chapman et al, 2020;Pasquetto et al, 2019;Slota et al, 2020). The prospect for widely available data for all has garnered the attention of scholars from various fields of specialization Chawinga & Zinn, 2019); Wiggins et al, (2018), and featured earlier publications in Nature (Campbell, 2008;2009), Science (Kum et al, 2011;Einav & Levin, 2014), as well as the Journal of the American society for information science & technology (Borgman, 2012). Testament to its ongoing significance among scholars of diverse background is the recently published comment in nature's journal Scientific Data (Lin et al, 2020).
Motivations for research data sharing and reusing are diverse and reflect the interests of many stakeholders including researchers, and funders (Stall et al, 2019;Wu et al, 2019;Chapman et al, 2020). Underlying the arguments for open data sharing include time efficiency gains (Pronk, 2019), providing others the ability to reproduce and verify past research (Allen & Mehler, 2019;Gray & Marwick, 2019;Hardwicke et al, 2019), allowing others to ask new questions using the data (Whitlock, 2011;Boté & Térmens, 2019;Pronk, 2019), advancing research and innovation Elsayed & Saleh, 2018), boosting faculty impact in their field of specialization through higher citations (Drachen et al, 2016;Colavizza et al, 2019;Park et al, 2018;Zeng et al, 2020), and finally making the outputs of government funded research available to the general public (Mayernik, 2017;Sholler et al, 2019;Zuiderwijk & Hinnant, 2019).
Despite these and other arguments of the value of scientific research data beyond their original purpose (Faniel & Zimmerman, 2011;Stall et al, 2019;Wu et al, 2019), researchers are often protective of their data and may be reluctant in sharing (Piwowar & Chapman, 2010;Tenopir et al, 2011;Fecher et al, 2015;Houtkoop et al, 2018). Indeed, looking at general trends in data sharing among 1329 scientists from North America, Europe and Asia, Tenopir et al, (2011) found that despite the overwhelming support for data sharing and reuse, only 46% of the respondents made their data available openly, 36% agreed that their own data are readily accessible when needed, and less than 6% made their data available to others. Similarly, focusing on 337 researchers from three Arab universities in Egypt, Jordan, and Saudi Arabia, Elsayed & Saleh (2018) reported that 64.4% of study participants shared their data, motivated by the potential of increased research visibility and citation, along with the drive to contribute to scientific progress.
In a sub-sequent investigation 4 years later, Tenopir et al, (2015) looking at the changes in data sharing and reuse practices and perceptions among scientists worldwide found increased acceptance of and willingness to engage in data sharing behaviors. Although specific barriers to data sharing still persisted. Among the many barriers to data sharing reported by researchers recently are increased perceived risk associated with data sharing (Bektaş & Tayauova, 2019;Zipper et al, 2019), publications rights concerns (Meyer, 2018;Federer et al, 2018;Ross et al, 2018), and the level of time and effort required to make data easily findable and reusable by others (Mayernik, 2017;Stall et al, 2019). Among the most important factors reported by researchers to hinder data re-use are trust issues in other's data content and quality (Boté & Térmens, 2019;Curty et al, 2017;Lin et al, 2020), the ability to interpret the data (Zimmerman, 2008;Boté & Térmens, 2019), and the applicability of data to the problem at hand (Hartswood et al 2012;Gregory et al, 2019a). In their follow study, Tenopir et al, (2015) also reported geographic differences in data sharing and reuse worldwide, which was attributed to collectivist vs individualist cultural differences. They further found significant cross-discipline heterogeneity in perceived constraints' and enablers of data sharing and reuse.
Despite the variety and richness of the literature on the topic, only a limited number of studies address how and when researchers reuse data they obtain from other researchers (Curty et al, 2017;Gregory et al, 2019b;Koesten et al, 2019;Borst & Limani, 2020), and even fewer studies address how such data recycling practices could potentially contribute to sustainability (Jetzek et al, 2019;Turki et al, 2019).
While the nature of the value generated through openly shared data may vary depending on the particular use case, value creation is sustainable only if the data are used and reused again and again to create long-lasting value that benefits society at large. That is, the generated value can simultaneously benefit (i) private enterprises through new funding or profits; (ii) citizens that derive utility from the provided information, products or services; and (iii) society through happier and healthier citizens, better living environments, and more efficient and sustainable economic markets. Therefore, building on Hart and Milstein (2003), definition of sustainable value as "a contribution that simultaneously delivers both short and long-term economic, social, and environmental benefits", our present analysis contributes to the literature "by looking at how data sharing and re-use through the Journal of Applied econometrics data archive could contribute to promoting sustainable scientific research and economic development"

3) Methodology 3.1. Data
The idea of digital data archives as knowledge infrastructures mediating data sharing and reuse among researchers has been established in the field of information science and technology Lin et al, 2020). One of the key digital data archives in the field of economics is the Journal of applied economics data archive (JAEDA), which is a secondary data source of peer reviewed publications. Its current version hosted online (see http://qed.econ.queensu.ca/jae/), contains data for all papers accepted for publication after January 1994. Sampled from this latter archive, the data used in the current analysis is now described in the subsections below.

Data source and sampling frame
As of March 2020, JAEDA had 35 published volumes, with an average of 7 issues per volume.
For the sake of our study, we define each publication volume as a strata (35 stratum total) of clustered (132 clusters total) publication issues. Within each clustered issue, research publications and thus research data are stratified in alphabetical order by authors' last name (about 10 publications in each cluster).

Secondary data sampling procedure
For the purpose, our study we relied on a three stage stratified clustered random sampling of JAEDA. In the first stage of the sampling, volume 34 was randomly selected between all 35 volumes (stratum), and then issue number 1 was subsequently selected at random between the 7 issues (cluster) in the volume. Within issue 1, the shared data corresponding to the second alphabetically ranked published data source "bernardi-catania" (Bernardi & Catania, 2019) was selected at random.

Brief Data content description
Our randomly sampled, openly shared data "bernardi-catania" contains a collection of time series, representing European market equity price index changes, along with those of the eleven biggest European economies. These daily fluctuations in stock market returns are specifically recorded between September 8 th 1999 and October 16 th 2015, across national equity markets in Austria, Belgium, Denmark, France, Germany, Hungary, Italy, the Netherlands, Spain, Sweden, and the United Kingdom respectively. In addition to the individual national market fluctuations, the data also contains daily fluctuations of the STOXX Europe 600 Index, which is used as a proxy for the overall European equity market.
In the original use of the data, Bernardi & Catania (2019) were interested in the first differences of the log returns of the various equity indexes. These were used to study the impact of the global financial crisis of 2007-2008, as well as the impact of the resulting European sovereign debt crisis that followed in 2010. Relying on the Conditional Value-at-risk (CoVaR) and the conditional Expected Shortfall (CoES) as systemic risk measures, the sharing authors analyzed the systemic spread of risks among the different European countries, through an examination of stock market co-movements overtime.

Data generating process
In the initial application, Bernardi & Catania (2019) looked at the data as a collection of time series, and therefore relied on time series methods of analysis to achieve their research objectives. In comparison, in our present analysis, we adopt a different perspective looking at the data as a collection of spatio-temporal processes, which we analyze using dynamic statistical modelling methods. In our inherently conditional view, integrated stock markets are assumed to produce spatially dependent returns that evolve overtime. As a spatio-temporal phenomenon the future evolution of stock market returns depends on their past and present values. This feature allows spatio-temporal dynamic models a better chance at establishing answers to the "why" questions (causality), which is the ultimate goal in any scientific inquiry.
Comparison between our adopted dynamic spatio-temporal modelling approach and the multivariate time series modelling approach in the previous investigation shows close features, yet two fundamental differences. The first difference is related to the fact that while not all relationships in a multivariate time-series model make economic sense, a dynamic spatio-temporal model has to represent realistically the kind of spatio-temporal interactions between the various stock market returns under investigation. The second distinguishing feature deals with dimensionality. Most often in spatio-temporal applications the dimensionality of the spatial component of the model prohibits standard inferential methods, as used in Multivariate time series representation of the same phenomenon. In such case, special care is needed for model parametrization in order to obtain realistic yet parsimonious dynamics. The development and introduction of basis functions within hierarchical statistical modeling, has allowed for the analytical leap-frogging of the computational bottleneck caused by inverting very large covariance matrices of spatially dependent data. Therefore, in our current view of stock market returns following spatio-temporal processes, we represent these processes as mixed (linear and non-linear) models with known covariates whose coefficients are unknown and non-random, together with known basis functions whose coefficients are unknown and random. The basis functions which are usually functions of space, have coefficients defined as multivariate time series random vectors, and provide computational advantages through reduced dimensionality of the covariance matrix.

Data preparation for re-use
Because of the open source nature of the R statistical software (R Core Team, 2015), we relied on it to perform all required reformatting and data preparation for reuse. The computer codes used for all data treatments and analyses in this paper are provided in the supplementary materials of this manuscript. As shown in the supplementary materials, the originally extracted data from the Journal of Applied Econometrics Data Archive, volume 34, issue 1 (Bernardi & Catania, 2019), was provided in excel format. As a wide format collection of time series, the data represented European market equity price index changes, along with that of 11 European countries' between July 8th 1999 and October 16th 2015.
Since we depart from the time series perspective by looking at the data as a collection of spatiotemporal processes, which we analyze using dynamic statistical modelling methods, we transform the wide format time series into a long format panel data. We achieve this initial transformation by using the "melt()" function from the R library "reshape2" (Hadley, 2007).
To facilitate graphical mappings in our spatio-temporal analysis, following the long transformation of the data, we downloaded from the open source GADM database of global administrative areas (Hijmans et al, 2018), the polynomial shape files for all 11 countries in our data sample. After appropriate treatments (see supplementary R codes) these shape files were used along with the long reformatted data for spatial analytics of country level aggregates, in addition to modelling the conditional mean and variance functions of the spatio-temporal process of equity price index fluctuations among the 11 countries in the European market, between 1999 and 2015.
Since the data are high frequency daily fluctuations of equity index values, we define three hierarchical temporal dimensions (daily, monthly, and yearly). To ease our analysis using conventional statistical methods, we proceed to various levels of data aggregations along these three hierarchical temporal dimensions. Our first data aggregation "CountryOutcDatm" averages over the days and computes the moments (mean, standard deviation, kurtosis and skewness) of monthly fluctuations in the index values for the 11 countries, over 1999 to 2015. The second level of aggregation "CountryOutcDaty" averages over the days and months and computes the moments (mean, standard deviation, kurtosis and skewness) of annual fluctuations in the index values for the 11 countries, observed between 1999 and 2015. We also provide a third level of aggregation "CountryOutcDat", which averages across days within months, within years, and compute the moments of the panel wide aggregated fluctuations in index values for all 11 European countries. All data aggregations and empirical moments' computations are done using functionalities from the R library "dplyr" (Hadley et al, 2019) as show in the attached supplementary R codes.

Descriptive Statistics of the reformatted Data
Table (1) summarizes the empirical distributional properties of the 12 series (daily fluctuations in equity index price), using their first four empirical moments, and the coefficient of variation. The mean results suggest that except for Italy that records an overall reduction in the average value of its national stock market index, all remaining 10 countries record an improvement between 1999 and 2015. Similarly on the variance and standard deviations of equity price changes, among all 11 national indices, that of Italy shows the greatest variations. The skewness and kurtosis results are in line with the characteristics of extreme value distributions, of which daily stock market fluctuations typically subscribe to. On the coefficient of variations, the daily fluctuations in index value appears to highlight the greatest variation above its mean value in France, while Italy records the greatest variation below its mean value.   (2019) Table (2) below shows a 12 X 12 correlation matrix, summarizing the results of the test of crosscountry correlation in daily equity price index fluctuations between the eleven European economies in the studied sample. The two tail test results at a critical correlation value of r = 1% suggest the statistical significance of all the correlation coefficients in the table, further suggesting a significant positive inter-dependence in equity price index fluctuations, between the 11 European equity markets under investigation. In other words, the results suggest that the 11 European equity markets are significantly integrated. Figure (1) below provides a full graphical characterization of the process of equity index value fluctuations across the eleven European countries in our studied sample. As the figure shows the European equity index market evolves heterogeneously across both time and space, and therefore could be reasonably considered a spatio-temporal process in line with Dette et al, (2020).

3.2.Econometric Model
Economically, to be sustainable any society needs to take care of the needs and wants of its contemporary members, without compromising the ability of its future members to do so. Financial resource needs in modern society are usually met through financial markets where savers (investors) and borrowers (entrepreneurs) meet to trade. Each financial market is characterized by an equilibrium trade price, which evolves overtime based on both exogenous and endogenous market influences. Because of the typical integration of financial markets at various spatial locations (including regionally and internationally), equilibrium trade prices within integrated financial markets could be reasonably assumed to follow inter-related random spatio-temporal processes (RSTPs) (Dette et al, 2020), as previously discussed in the data generating process above. The mean and covariance functions of each national RSTP is assumed to depend on the regional average, but also on temporal influences (daily, monthly, and annually) as well as spatial influences (fixed and random). The fixed spatial influences reflect national market fundamentals, while the random influences are spatial random walks, which we capture using Markov random field smoothing over the spatial random parameter space.

Model specifications
As previously discussed in the data generating process section, we represent the conditional mean and variance functions of the spatio-temporal processes of stock market returns as mixed (linear and non-linear) models with known covariates whose coefficients are unknown and non-random, together with known basis functions whose coefficients are unknown and random.
We adopt a generalized modeling perspective that accounts for both linear as well as nonlinear model representations with spatial random effects. Our adopted framework builds on generalized additive models for location, scale and shape (GAMLSS) introduced by Rigby & Stasinopoulos (2005), where model parameters can be specified as functions of additive predictors with several types of covariate effects (including linear, non-linear, random and spatial effects). Given the nature of our empirical application, GAMLSS provide indeed the ideal framework for modeling the mean and variance functions of the spatio-temporal processes of stock market returns. To this end, we define a generic predictor as: Or more explicitly: = 0 + 1 ( 1 ) + 2 ( 2 ) + 3 ( 3 ) + 4 ( 4 ) + 5 ( 5 ), ∀ = 1 … Where 0 is the overall model intercept, denotes the ℎ sub-vector of the complete covariate vector , which contains the three temporal predictors (day, month, year), one regional stock index (market) and one spatial predictor (country) as previously described. The five functions ( ) are generic smoothed effects chosen based on the type of each of the five covariates under consideration. Each ( ) is approximated as a linear combination of basis functions ( ) and regression coefficients ∈ ℝ, that is: In this form, Equation (3) Taking into account the actual temporal and spatial predictors, equation (4) can be specifically: Where is an -dimensional vector of ones. In a more compact notation, equation (4) and (5) can be rewritten as = , with = ( 0 , 1 , … , 5 ) and = ( , 1 , … , 5 ) = ( , , , In this representation, the smooth functions may represent linear, non-linear; random and spatial effects. Moreover, each has an associated quadratic penalty , which plays the role of enforcing specific properties on the ℎ function, such as smoothness. Smoothing parameter ∈ [0, ∞) controls the trade-off between fit and smoothness, and play the role of determining the shape of ̂( ). The overall penalty can be defined as , with = (0, 1 1 , … , 5 5 ). For identification purposes, the smooth functions are mean centered following the procedure in wood (2017).
For variables with fully linear parametric effects, there is no assigned penalty such that = 0, and equation (3) becomes , with the design matrix obtained by stacking all covariate vectors into . This is typically the case for dichotomous variables such as the temporal predictors (day, month, year), which are categorical random variables. To improve identifiability of the coefficients in those spatial predictors, a ridge penalty could be used (wood, 2017), such that = (the identity matrix). We achieve this latter specification in the present study by specifically entering the temporal effects as random effects. This is done in the R statistical software using the following representation for the three temporal predictors in the mean and variance functions: : eqreturn ~ (day, bs = "re") + (month, bs = "re") + (year, bs = "re") : ~ (day, bs = "re") + (month, bs = "re") + (year, bs = "re") Where "eqreturn" is the dependent variable, and more specifically the average index value or equity return; "day", "month" and "year" are the categorical temporal predictors; "bs" is an argument specifying the type of spline basis used, which in this case is "re" for random effect.
For continuous predictors of the mean and variance functions of the spatio-temporal processes of stock market returns, such as the proxied European stock index return (market), the smooth functions are represented using the regression spline approach (Eilers & Marx, 1996). In this case, where the ( ) are low rank thin plate regression spline basis function evaluations for each daily market return , and are numerically stable with convenient mathematical properties (wood, 2003). This specification allow us to avoid arbitrary modeling decisions, such as selecting the appropriate degree of a polynomial or specifying cutpoints, which could induce bias. Extending the mean and variance functions in equations (6) and (7), this specification is implemented in the R statistical software using the follow representations: : eqreturn ~ (day, bs = "re")+(month, bs = "re")+(year, bs = "re")+(market, bs = "tp", k=10) (8) : ~ (day, bs = "re")+(month, bs = "re")+(year, bs = "re")+(market, bs = "tp", k=10) (9) Where "market" is the continuous covariate capturing the European stock index; with the basis spline argument "bs" set to "tp" for penalized low rank thin plate spline) and 10 basis functions since "k"=10. Although not specifically used here, other available continuous basis spline functions include the cubic regression spline "cr", and the P-spline, "ps".
To incorporate the spatial effects into the regression model, the 11 European countries covered in the study are split into discrete contiguous geographic units, with spatial coordinates exploited through a Markov random field approach. This latter approach relies on the information contained in neighboring stock market returns from each country. In this case, equation (3) becomes , with = ( 1 , … , ) representing the vector of spatial effects. = 11 denotes the total number of countries and is made up of a set of country labels. The design matrix linking each daily stock return to the corresponding spatial effect is defined for = 1, … 11 by: The smoothing penalty is based on the neighborhood structure of the discrete contiguous geographic units, so that stock market returns from spatially adjacent countries share similar effects. Specifically the diagonal matrix associated with the quadratic penalty is given by: Where ≈ indicates whether any two countries and are adjacent neighbors, and is the total number of neighbors for country . The resulting quadratic penalty is equivalent to Rue & Held (2005) stochastic interpretation that follows a Gaussian Markov random field. Finally, extending the mean and variance functions in equations (8) and (9), this specification is implemented in the R statistical software using the follow representations: : eqreturn ~ (day, bs = "re")+(month, bs = "re")+(year, bs = "re")+(market, bs = "tp", k=10) + s(country, bs = "mrf", xt = xt) (12) : ~ (day, bs = "re")+(month, bs = "re")+(year, bs = "re")+(market, bs = "tp", k=10) + s(country, bs = "mrf", xt = xt) Where "country" is a factor variable with 11 levels, indexing each country in the data. "mrf" is the Markov Random Field smoother. All described smoothers are available from the R package "GJRM", whose documentation can be consulted for more details (see Wojtys & Marra, 2018). We estimate the parameters of the above described model specification within the R statistical software using the "gamlss()" function from the R library "SemiParBIVProbit" (see Marra & Radice, 2017).

Results
Relying on the underlying philosophy of empirical process theory (Dehling & Philipp, 2002), our empirical analysis using the reformatted data, models jointly the mean function, and the variance function of the column vector of concatenated time series. In this model representation, the full vector is seen as random structure (Merlevède et al, 2019), describing the process of daily changes in equity price index across the eleven integrated European stock markets.
Making distributional assumptions about the empirical moments generating function (Cabaña & Quiroz, 2005;Collender & Chalfant, 1986) for this random vector, we are able to study the properties of its first and second order moments. These are represented by the mean and variance equations capturing respectively the conditional expected fluctuations in equity price index, and the conditional variance in equity price index fluctuations over the period of September 8 th 1999 and October 16 th 2015, across the 11 national equity markets in Austria, Belgium, Denmark, France, Germany, Hungary, Italy, the Netherlands, Spain, Sweden, and the United Kingdom. Edf is the estimated empirical density function; C.I. the confidence interval on the estimated coefficient. Significance code: *** for 0.1% level, ** for 1% level, and * for 5% significance level.
For sensitivity analysis, we estimate a linear process (parametric) model, and a non-linear process (semi-parametric) model, both with random country effects. The performances of these two specifications are subsequently contrasted based on the Bayesian and the Akaike information criteria. Table (2) above summarizes the results of these estimations. Based on the model performance measures the non-linear process model is adopted as the preferred specification, with its results now discussed in the sub-sections below.

Conditional Mean equation results for equity price index fluctuations
Recall that in semi-parametric models with smooth function estimates, when the empirical density function (edf) is close to 1, the respective estimated effect is linear, and hence the covariate can enter the model parametrically. However, the higher the edf value the more complex is the estimated curve, and thus the corresponding covariate cannot be assumed to have a linear relationship with the outcome variable.
As shown in the lower portion of table (2) under "Semi-parametric model", the edfs for the smooth components of day, month, and year, are all equal to 1, with statistically non-significant p-values. Conversely, the p-value of the overall European regional market effect appears significant at the 5% significance level. Although the edf value of the random country effects is 2.832, thus above 1 and suggesting its non-linear influence on the individual markets fluctuations. Since its corresponding p-value is 0.0626, this effect only reaches statistical significance at the 10% level. Together, the above suggest that expectations about future changes in equity price index, by economic actors within the individual national markets, are not significantly influenced by daily, monthly nor annual perturbations in each national economy. Instead, the mean function of the daily fluctuations in equity price index value appears to depend solely but weakly, on the fluctuations in the overall European average index price. These results are further confirmed by the smooth function plots in figure (2) below, which are estimated after fitting the spatio-temporal model.

Figure (2):
Smooth functions plots for the effects of daily, monthly, yearly, and market level perturbations on expected equity index price changes across the eleven financial markets.

Conditional Variance equation results for equity price index fluctuations
Also shown in the lower portion of table (2) under "Semi-parametric model", the edfs for the smooth components of daily (3.415), monthly (8.657), yearly (8.955), European market level (8.052), and individual country level (9.992) fluctuations are all well above 1, with statistically significant p-values at the 0.1% level. These indicate the non-linear and significant effects of all the above mentioned covariates, on the variances of the individual European equity price index fluctuations. Indeed, daily variations in price index fluctuations are seen to not only depend on overall European average index fluctuations, but also on daily, monthly and annual perturbations taking place in each individual market. These algebraic results are further confirmed by the smooth functions plots in figure (3)   On the monthly scale, variations in equity price index changes seem to exhibit a cascading, but overall convex shape over the course of the year. The smooth function plots on the top-right panel of figure (3) shows that the variance in daily price index fluctuations, starts the year by increasing initially between January and February. Then from February, it decreases reaching its minimum in mid-year (June); from which point it starts to increase again reaching its maximum in November. It then decreases as the year end in December, to spike back up at the beginning of the next year.
On the annual scale, variations in equity price index changes seem to exhibit a cyclical pattern over the 15 years covered by the data sample as seen in the lower-left panel of figure (3). It can be noted that variations in equity index price changes is concave between 1999 and 2005 reaching a maximum in the early 2000's. Although a local maxima appears to characterize the period of the 2008 financial crises, the level of variations in daily equity price fluctuation during this period remained less than that observed in the early 2000. A relatively steeper reduction in the variance of equity price index fluctuations is observed between 2008 and 2010. This reduction appears to have persisted between 2010 and 2015 although at a slower pace.
Finally, the lower-right panel of figure (3) suggests that changes in the overall European market average index value exert a convex effect on index price fluctuations in the individual national markets. Indeed, variations in the individual national stock market fluctuations are minimized at overall European market index price changes of zero; while strictly increasing for overall European stock market index price changes above or below zero. Further validating the above described algebraic properties in table (2)

3.5.Empirical results assessment
Assessing our results through the lenses of time series modelling, it can be noted that they corroborate the theoretical predictions of stochastic mixing processes, which Meir (2000;p.13) shows converge uniformly under some regularity condition. Indeed, starting with the work of Yu (1994) on dependent sequences, scholars have shown that among time series exists a special class of processes for which the "future" depends only weakly on the "past", these have been referred to as "mixing processes" (Dehling & Philipp, 2002;Rio, 2017). In our analysis, the fact that the conditional expectation of equity index price changes weakly on the overall changes in the European regional average index value, while being free from individual market's daily, monthly, and annual perturbations, suggest that European stock market index prices evolve stochastically as mixing processes. This is further reaffirmed by the results of the conditional variance equation, which shows the variations in daily equity index price in the European market to significantly depend on daily, monthly, and annual perturbations.
These results have significant implications for the stability of the European stock market and financial sector, and should provide financial regulators in the region with additional evidence for their policy efforts. In line with the vision of the European Institute of Innovation and technology (EIT) 1 , which was created in 2008 by the European Commission as an open innovation program in charge of translating the knowledge of open science into platforms, architectures and systems that bring significant value to society, our empirical investigation points to the potential value of creating a unique index, in the like of the existing "ethical stock market indices", which will track the equity values of European companies involved in open innovation. Doing so will allow open innovation investors to have key market updates for their portfolio investment decisions, and may contribute to driving further the needed funds to support European open innovation initiatives, such as ATTRACT 2 . This latter initiative by a group of big European research labs including CERN, the European molecular biology laboratory, and the European Synchrotron Radiation Facility aims to translate the outputs of open science into open innovations, in partnership with Universities, SMEs, multinationals and private investors. According to the quadruple helix framework (Miller et al, 2018;Yun, 2019), such open science based innovation practices are more sustainable than the previous closed innovation paradigm.

Discussion
The sharing economy as an umbrella concept that encompasses several ICT developments and technologies (Acquier et al, 2017), including collaborative consumption of goods and services through online platforms, has emerged as an economic-technological phenomenon (Laurenti et al, 2019). Its recent growth has been fueled by developments in ICT, growing consumer awareness, proliferation of collaborative web communities (Ranjbari, 2019), and increasing concerns over ecological, societal and developmental impact (Hamari et al, 2016). Though debates still persist as to whether it could be a successful pathway to sustainability in times of rapid technological developments, but constrained financial resources (Martin, 2016;Geissinger, 2019;Murillo, 2020). Although others have focused on how the data deluge along with artificial intelligence is reshaping the field of economics altogether (Athey, 2018), in this study we explored instead how sharing and reusing scientific research data, as digital goods, could contribute to sustainable research output production and economic growth. We achieved this by relying on a three stage stratified clustered random sample from the Journal of Applied Econometrics Data Archive, along with descriptive analytics and spatio-temporal econometric modelling and inference.
We find that despite the strong deductive reasoning requirement for the successful reuse of openly shared data 3 in subsequent analyses, such practice does provide a viable solution for the sustainability of research output production, innovation and economic development. Sustainable value creation is not only reflected in direct economic value, but also through better social and environmental outcomes (Geissinger, 2019). This is even more apparent if we adopt a "data valuechain" perspective to link openly shared data as raw material to the digital contents, goods and services produced (De Reuver et al, 2018). Such perspective is underpinned however by the interplay of two fundamental mechanisms (Jetzek et al, 2019), the first of which is the information sharing mechanism that relates to how openly shared data are used to create informational content that creates value for society through increased transparency, reduced information asymmetry, and improved decision-making. The second channel is the market mechanism which relates to how openly shared data help make processes more efficient, and often satisfy previously unmet needs by providing the raw material for the production of digital goods and services that are subsequently sold in markets.
Sustainable value generation through openly shared data is based on creating an opportunity for anyone to reuse data beyond the organizational boundaries of the data custodian and the technical boundaries of the originating system (Murillo, 2020). Making the value creation itself sustainable requires re-users to complement open data with proprietary data sources and use the enriched data in combination with specialized algorithms and technical infrastructures for the development of digital content, products, and services (Welle Donker & van Loenen, 2017). Although such idea of "data recycling" remains at present date under-researched, its connection with "collaborative consumption" through open data sharing and reuse is increasingly regarded as a practice that engages especially environmentally and ecologically conscious consumers (Jetzek et al, 2019).
Our analysis further support the idea that viewing research data recycling as a sustainable practice, like any other recycling activity can lead to an increase in data sharing and reuse, especially when adopting this view leads to increased positive attitudes towards participation (Turki et al, 2019). Because aspirations to sustainability do not always strongly translate into action, expectations as to a wider diffusion of data sharing and reuse within the research community might be deflated. It may be that opportunistically, people seeking economic benefits end up adopting data sharing and reuse as an alternative mode of scientific research production (Houtkoop et al, 2018). Or, in a worst case scenario, some researchers might be altruistic and share openly their research data, while other researchers benefit mostly from such sharing. This situation however would undermine the sustainability of data sharing and reuse if proper compensating mechanisms are not put in place. As initiatives such as the "Research Data Alliance 4 ", and new journals such as Elsevier's "data in brief", MDPI's "data", and Nature's "Scientific Data" are created with inter-disciplinary focus, and allowing researchers to openly publish and get full citable credit for their research data, the practice of data recycling through open data sharing and reuse should find its way to make scientific research production more sustainable. It is also our hope that initiatives such as "Open Science Grid 5 ", will successfully contribute to fostering the wider, faster, and cheaper access to new knowledge, promoting more rapid understanding and use of science.

Conclusion
The sharing economy has been emerging as one of the key paradigms in support of the Fourth Industrial Revolution (Kim & Lee, 2019). As a result in March 2017, the U.S. National Academies of Sciences, Engineering, and Medicine (NASEM) appointed an expert committee to evaluate more fully the benefits and challenges of broadening access to the results of scientific research, described as "open science". The committee was charged with focusing on how to move toward open science as the default for scientific research results, and to indicate both the benefits of open science and the barriers to doing so. The resulting consensus report published by the NASEM pointed out significant benefits of open science moving forward (National Academies of Sciences, Engineering, and Medicine, 2018a). Since then, there have been a growing consensus in the scientific community that the transition to data-driven open science is best achieved by establishing a globally interoperable research infrastructure (National Academies of Sciences, Engineering, and Medicine, 2018b). This has led the Board on Research Data and Information (BRDI) of the NASEM to organize a follow up symposium on November 1st, 2017 to explore the issues of making research data findable, accessible, interoperable, and reusable (abbreviated as FAIR). The current investigation was carried out within this general context, with the understanding that the value-generating mechanisms of openly shared data remains mostly unobservable, and that research on the underlying topic of data recycling for sustainability is still in a nascent state (Meijer & Grimmelikhuijsen, 2017). Relying therefore on a resourceful research approach that focused on showcasing the re-use potential of a key digital data archive, we hoped to provide potential reusers with a case study example that could assist in further prospective investigations using similar open data sources. This practice contributes to the literature on how and when researchers reuse data they obtain from openly shared sources, easing therefore the transition into the era of "citizen science" or "crowdscience" (Teo, 2020). As suggested in the recently published comment in nature's "Scientific Data" journal (Lin et al, 2020), the issue of open data sharing and reuse, and its "trust" requirement between the different stakeholders including data repositories, needs to be at the center of ongoing scientific discourses. We hope that our current treatment, as a step in that direction, would draw further prospective interests on the topic.