ARTICLE | doi:10.20944/preprints202305.0390.v1
Subject: Public Health And Healthcare, Other Keywords: exploratory data analysis; non-parametric statistics; skewed data; survival analysis; repeated measures.
Online: 6 May 2023 (08:32:28 CEST)
Outliers can influence regression model parameters and change the direction of the estimated effect, over-estimating or under-estimating the strength of the association between a response variable and an exposure of interest. Identifying visit-level outliers from longitudinal data with continuous time-dependent covariates is important especially when the distribution of such variable is highly skewed at follow-up visits. The primary objective was to identify potential outliers at follow-up visits using interquartile range (IQR) statistic, motivated by a large TEDDY dietary longitudinal and time-to-event data with a continuous time varying vitamin B12 intake as the exposure of interest and time to developing Islet Autoimmunity (IA) as the response variable. The IQR method was also applied to simulated data. To assess the impact of IQR-method detected outliers, data was analyzed using Cox-proportional hazard model with robust sandwich estimator. Partial residual diagnostic plots were used to detect highly influential outliers. Results showed how some of the detected outliers had large influence on the Cox regression model and changed both the direction of hazard ratios and the strength of association with the risk of developing IA. In conclusion, the IQR method is useful in identifying potential outliers at visit-level which can be further investigated.
BRIEF REPORT | doi:10.20944/preprints202010.0082.v1
Subject: Engineering, Automotive Engineering Keywords: exploratory analysis; model selection; MLR; K fold cross validation
Online: 5 October 2020 (12:16:38 CEST)
In this project, we use a statistical multiple regression to study the impact of eight various predictors (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution) to estimate the cooling load energy efficiency of residential buildings. We try to analyze and visualize the effect of each predictor with each of the response variable using different classical statistical analysis tools used in describing linear models, in such a way so that we can find out the most strongly related predictor variables. Before starting all of this, we use the idea of model selection by stepwise regression technique and compare the AIC of these models and identified a better model between all of them. Then, we compare a classical linear regression approach by simulations on 768 diverse residential buildings show that we can predict CL with low mean absolute error. By using ANOVA we determine variation in the different residuals. Also, we use non constant variance test to verify it. Furthermore, we check leverage and influence points as well as outliers as well as determined cook distance for influential points. By taking box cox transformation and weights, we also introduce WLS technique to fit the model for better results and did all type of important analysis to understand the energy efficiency. Finally, we show 5-fold cross validation to verify our model.
ARTICLE | doi:10.20944/preprints201905.0073.v1
Subject: Social Sciences, Education Keywords: VET, Government Plan, effectiveness, exploratory survey, Malaysia
Online: 7 May 2019 (10:32:49 CEST)
In Malaysia, as in many parts of the world, vocational education and training (VET) is frequently perceived as the solution to improving the opportunities of youths who lack the resources, skills or motivation to continue with higher education. The focus of the study falls on the effectiveness of the apprenticeship scheme during the 10th Malaysia Plan. This study may provide an opportunity to find out how the related parties react towards the apprenticeship scheme. This is important, as feedbacks are central to the success or failure of any scheme. It is expected that the relevant government bodies, private sectors, trainers and trainees will have a valuable insight on the progress this far and what need to be done in the future based on the outcome of the research. This study follows snowballing sampling method and gathers the information from apprentices from variety of industrial sectors. The findings indicate effectiveness at some types of training but lacks comprehensiveness and efficient use of resources and future direction, especially during the 10th Malaysia plan period from 2011 to 2016. This exploratory research is a first chapter of the deeper study in this niche.
REVIEW | doi:10.20944/preprints202205.0004.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: COVID-19; Exploratory Search; Machine Learning; Document Retrieval
Online: 4 May 2022 (12:20:15 CEST)
The urgency of the COVID19 pandemic caused a surge in related scientific literature. This surge made the manual exploration of scientific articles time-consuming and inefficient. Therefore, a range of exploratory search applications have been created to facilitate access to the available literature. In this survey, we give a short description of certain efforts in this direction and explore the different approaches that they used.
ARTICLE | doi:10.20944/preprints202301.0546.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: motor activity; orienting exploratory activity; epigenetic inheritance; conception; rats
Online: 30 January 2023 (07:53:43 CET)
The aim of this work was to study the, so far, unexplored possibility that non-genetic inheritance of animal behavioral characteristics could depend on the state of the parents at the time of conception, by studying inheritance of high and low motor and exploratory activity in the first generation of rats. In this study, we measured the levels of motor and exploratory activity in rats at the ages of 2 and 5 months. Male and female rats were mated at the age of 5 months. The following groups were used: male and female rats with high motor activity at ages of 2 and 5 months (ACT+); male and female rats with high activity at the age of 2 months, but low activity at the age of 5 months (ACT–); male and female rats with low activity at the ages of 2 and 5 months (PAS–); male and female rats with low activity at the age of 2 months, but high activity at the age of 5 months (PAS+). Significant differences in the severity of exploratory activity were found between the offspring of ACT+ and ACT– rats. Moreover, these differences were observed only in males, and not in females. Differences between the offspring of PAS+ and PAS– rats were observed in both the male and female rats. The motor activity of animals in the period from 20 minutes after the start of registration did not differ between groups. Thus, it can be considered that individual characteristics of general motor activity are due to genetically inherited factors, while differences in the level of exploratory activity, apparently, are formed due to non-genetic influences from parents during mating.
ARTICLE | doi:10.20944/preprints202311.1796.v1
Subject: Business, Economics And Management, Finance Keywords: Dividend optimization; entropy regularization; distributional control; exploratory HJB
Online: 28 November 2023 (10:23:46 CET)
This paper studies the dividend optimization problem in the entropy regularization framework by following the same continuous-time reinforcement learning setting as in Wang et al. (2020). The exploratory HJB is established and the optimal exploratory dividend policy is a truncated exponential distribution. We show that, for suitable choices of the maximal dividend paying rate and the temperature parameter, the value function of the exploratory dividend optimization problem could be significantly different from the value function in the classical dividend optimization problem. In particular, the value function of the exploratory dividend optimization problem could be classified into three cases based on its monotonicity. Numerical examples are also presented to show the impact of temperature parameter on the solution.
ARTICLE | doi:10.20944/preprints202203.0189.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Smart device; Users behavior; human computer interaction; exploratory analysis; statistical methods
Online: 14 March 2022 (12:28:44 CET)
Purpose: The use of smart devices has increased greatly in the last ten years with users reaching out to the possibility to do more with them especially in the networking front. In this context there is a need to understand the connection between users’ social demographic factors and their way to related to their smart devices. Objective: This study was designed to evaluate the senso of belonging of a community in order to evaluate intangible benefits that employees may gain from a more immerse relationship with their devices. Method: We used a dataset of 586 anonymous respondent of an existing survey designed for capturing the relationships that humans develop with their smart devices. In particular, we investigate the relationships with smart device and particular background variables of the respondents using a chi-square test. Results: The study showed that there is a significant relationship between users’ sex and smart device type and their dependency on smart device. Male tends to think that smart device (in general) enables them to connect with a larger community. At the same time, female using smart phones feels more connected more to large community than when using other smart devices. Conclusion: This study provided several significant findings that confirm and strength previous literature works on the subject. In addition, socio demographics variables (like gender) as well as the type of smart device present a correlation between the smart device users and their tendency to stay in touch with a larger community.
ARTICLE | doi:10.20944/preprints202307.1577.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: deep learning, transformer, drug use detection, exploratory data analysis, natural language processing.
Online: 24 July 2023 (10:57:47 CEST)
Social media platforms are increasingly enabling the propagation of content from groups related to drug use, thus posing risks for the wider population and, in particular, individuals who are amenable to drug use and drug addiction. The detection of drug use content on social media platforms is a priority for governments, technology companies, and drug law enforcement organizations. To counter this issue, various techniques have been developed to identify and promptly remove drug use content, while also blocking its creators from network access. In this paper, we introduce a manually annotated Twitter dataset, comprising 156,521 tweets published between 2008 and 2022, specifically compiled for the purpose of drug use detection. The dataset underwent annotation by several group of expert annotators who classified the tweets as either drug use or non-drug use. Exploratory data analysis was conducted to comprehend the dataset's characteristics. Various classification algorithms, including SVM, XGBoost, RF, NB, LSTM, and BERT were employed using the dataset. Among the traditional machine learning models, SVM utilizing term frequency-inverse document frequency features achieved the highest F1-Score (0.9017). However, BERT with textual features concatenated with numerical and categorical features in ensemble method surpassed the performance of traditional models, attaining F1-Score of 0.9112. To facilitate future research and enhance English online drug use classification accuracy, the dataset will be made publicly available.
ARTICLE | doi:10.20944/preprints202104.0636.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: collaborative economy; sharing economy; switchover; obtainer; provider; values; learning; mutuality; exploratory research
Online: 23 April 2021 (12:05:17 CEST)
The collaborative economy comprises resource circulation systems where consumers can act as both obtainers and providers of products and services. Despite considerable research on collaborative economies, there is a dearth of understanding of how individuals switch from being an obtainer to a provider. We address this void by drawing on 31 in-depth semi-structured interviews with collaborative economy obtainers. The findings suggest that personal values, learning experience, social benefits, mutuality, and peer influence drive obtainers to become providers. In contrast, distrusting strangers, a sense of intimacy, a lack of resources to share, and a lack of skills inhibit the switchover process. Our findings contextualize the drivers and inhibitors idiosyncratically to convert obtainers into providers, offer important implications for managers, contribute to the collaborative economy and sharing economy literature, and suggest compelling avenues for future research.
ARTICLE | doi:10.20944/preprints201810.0662.v1
Subject: Engineering, Energy And Fuel Technology Keywords: renewable energy; future perspectives; renewable energy sources; Romania energy structure; exploratory study
Online: 29 October 2018 (07:22:02 CET)
In 2015, Romania was the first country in Europe that achieved EU targets regarding the share of renewables in the generation mix, far ahead of the 2020 deadline. Starting with the energy structure in Romania, the paper: (1) analyses the evolution of the main indicators in the renewable energy sector, (2) discloses the perspectives of renewable energy in Romania synthesizing the main trends of development in the field and (3) analyses the challenges facing with the development of renewable energy in Romania. Based on analyzing the exploratory data, the paper makes a preliminary prediction of the development of the sector for the future decades and proposes targeted countermeasures and suggestions. Romania still has unexploited potential concerning renewable energy sources. Because Romania registered a continuous economic growth, the demand for electricity is steadily growing, and this trend is expected to continue. Also, Romania could introduce a support mechanism for developing the potential of unexploited potential. The results of the present study may be useful for further research regarding public policies for the development of renewable energy. Also, it can represent a useful analysis in order to identify the future trends of renewable energy in Romania.
REVIEW | doi:10.20944/preprints202308.1679.v1
Subject: Medicine And Pharmacology, Medicine And Pharmacology Keywords: in vivo; exploratory; confirmatory; preclinical research; preprocedural planning; animal model; study endpoints; pitfalls
Online: 24 August 2023 (08:11:01 CEST)
During the preclinical research process, multiple factors can be difficult to implement without the careful consideration and planning of each step. As research has become more advanced with the use of increasingly complex technology, animal models have also become essential for understanding the potential impact of devices, drug therapies, and surgical techniques on humans before clinical trials are conducted. The use of an in vivo animal model is a key and necessary step in the progression of preclinical research studies that will lead to future medical inventions and innovation. Here, we describe the three phases of effectively designing a preclinical research protocol: the research, preprocedural planning, and experimental phases. Furthermore, we provide researchers with guidance through these phases and discuss important considerations.
ARTICLE | doi:10.20944/preprints202005.0323.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: exploratory regression analysis; built environment; influencing factors; incidence rate for female lung cancer
Online: 20 May 2020 (09:06:39 CEST)
Objective: Application of ERA methods to investigate the atmospheric pollution and built environment factors influencing lung cancer incidence rate in Chinese women. Methods: Lung cancer incidence rate among Chinese women at 339 cancer registries were obtained from the China Cancer Registry Annual Report 2017, air quality and built environment data were obtained from the Greenpeace and China Construction Yearbook. After multiple covariates variables were eliminated, an exploratory regression analysis was performed using the world standardized population incidence rate as the dependent variable. Air quality and built environment factors as the independent variable. Results: Shandong Peninsula, Hebei and Liaoning are high incidence rate areas of female lung cancer in China, with significant regional aggregation. In addition to air quality factors such as industrial smoke emission data, the association between built environmental factors such as urbanization rate, development LUI, population density and greening coverage of built-up areas and female lung cancer incidence rate is statistically significant. Conclusion: In addition to air quality factors, urban spatial factors can also significantly affect respiratory health. The LUI is positively while urbanization rates and population density are negatively correlated with the incidence rate of lung cancer. The role of green space for respiratory health has not been proven. In addition, there is little relationship between income and health, and similar findings are found for indicators such as the public transportation and roads network.
ARTICLE | doi:10.20944/preprints202311.0703.v1
Subject: Business, Economics And Management, Marketing Keywords: influencing factors on purchase; purchase of dairy products; exploratory factors; mixed research; structural validity
Online: 10 November 2023 (12:06:13 CET)
This study aimed to identify and categorize the influencing factors on the purchase of dairy products among customers of dairy companies in Mashhad city. This study is a mixed research (qualitative and quantitative), conducted with the aim of identifying and categorizing the influencing factors on consumers’ purchase. Therefore, the theoretical foundations were examined in the initial step, after which the customers and their needs and demands were addressed in terms of the company and its products. In the qualitative part, interviews were conducted in the focus group (23 people in 5 groups) to identify and extract these factors using the theme analysis technique. The reliability of the interviews was confirmed by retesting and the reliability between two encoders. Sixty themes (factors) that affected the purchase of dairy products were collected. In quantitative part, 517 questionnaires were collected from 13 regions of Mashhad city to perform statistic tests. A questionnaire was designed using the extracted themes in the previous (qualitative) stage to assess customers’ feedback. The validity of the questionnaire was confirmed by content and structure validity (exploratory and confirmatory factor analysis), and its reliability was checked by Cronbach’s alpha. Fourteen components were obtained by exploratory and confirmatory factor analyses, after which the variables were labeled as intrinsic, psychological, and personality factors, the identity of the company, production power of the company, competitive power of the company, competitive prices, social awareness, and store capabilities. The construct validity was confirmed using confirmatory factors.
TECHNICAL NOTE | doi:10.20944/preprints202209.0404.v1
Subject: Engineering, Energy And Fuel Technology Keywords: Recurrent Neural Network; Renewable Energy; Power consumption; Open Power System Data; Multivariate Exploratory; Time series forecasting
Online: 27 September 2022 (02:44:29 CEST)
The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation's pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a Recurrent Neural Network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production are investigated through extensive Exploratory Data Analysis (EDA) and a feature engineering framework. The performance of the model is found satisfactory through the comparison of the predicted data with the observed data, visualization of the distribution of the errors and Root Mean Squared Error (RMSE) value of 0.084. Higher performance is achieved through the increase in the number of epochs and hyperparameter tuning. The proposed framework can be used and transferred to investigate the trend of renewable energy production and power consumption and predict the future scenarios for different communities. Incorporation of the cloud-based platform into the proposed pipeline may lead to real-time forecasting.
ARTICLE | doi:10.20944/preprints202009.0495.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: agency; smart devices; IoT; device agency; user agency; human computer interaction; HCI; questionnaire; exploratory analysis; anova
Online: 22 September 2020 (03:45:51 CEST)
In this paper, we investigate the relationship people have with their smart devices. We use the concept of agency to capture aspects of users’ sense of mastery as they relate to their device. This study gives preliminary evidence of the existence of two independent dimensions of agency for modeling the interaction between humans and smart devices: (i) user agency and (ii) device agency. These constructs emerged from an exploratory factorial analysis conducted on a survey data collected from 587 participants. In addition, we investigate the correlation between user agency and device agency with background variables of the respondents. Finally, we argue that mapping the users’ dynamics with their device into user agency and device agency fosters a better understanding of the needs of the users and helps in designing interfaces tailored for the specific capabilities and expectations of the users.
ARTICLE | doi:10.20944/preprints202203.0093.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: 6-hydroxydopamine; rotenone; in vitro neurotoxicity; mitochondrial dysfunction; exploratory data analysis; applied computational statistics; unsupervised and supervised machine learning
Online: 7 March 2022 (09:16:28 CET)
With the increase in life expectancy and consequent aging of the world’s population, the prevalence of many neurodegenerative diseases is increasing, without concomitant improvement in diagnostics and therapeutics. These diseases share neuropathological hallmarks, including mitochondrial dysfunction. In fact, as mitochondrial alterations appear prior to neuronal cell death at an early phase of the disease onset, the study and modulation of mitochondrial alterations rise as promising strategies to predict and prevent neurotoxicity and neuronal cell death before the onset of cell viability alterations. In this work, differentiated SH-SY5Y cells were treated with the mitochondrial-targeted neurotoxicants 6-hydroxydopamine and rotenone. These compounds were used at different concentrations and for different time points to understand the similarities and differences in their mechanisms of action. To accomplishing this, data on mitochondrial parameters was acquired and analyzed using unsupervised (hierarchical clustering) and supervised (decision tree) machine learning methods. Both biochemical and computational analyses resulted in an evident distinction between the neurotoxic effects of 6-hydroxydopamine and rotenone, specifically for the highest concentrations of both compounds.
ARTICLE | doi:10.20944/preprints202103.0059.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Metallothionein (MT); Scientific discovery; Scientific pursuit; Research strategies; upward looking research; Exploratory research; Protein function; Compensation; Moonlighting; multifunctional proteins; Vestiges
Online: 2 March 2021 (09:39:39 CET)
In the mid-1950s, Bert L. Vallee and his colleague Marvin Margoshes discovered a molecule known today as metallothionein (MT). Meanwhile MTs have been shown to be common in many biological organisms. Despite their prevalence, however, it remains unclear to date what exactly MTs do and how they contribute to the biological function of an organism or organ. Honoring Dr. Vallee’s sometimes innovative approach to research, this contribution sets out to show how philosophy of science can help us gain a clearer picture of biochemical research. We shall look into both the discovery of as well as recent research on Dr. Vallee’s beloved family of MT proteins to illustrate (i) how exploratory and upward-looking research play important roles in biochemical discoveries although they do not fit the paradigmatic approach of decomposition and struc-ture-function mapping. Besides, we shall suggest (ii) that while other biochemical molecules ex-hibit a clearly identifiable function, other research hypotheses might be worthy of pursuit in the case of MTs.
ARTICLE | doi:10.20944/preprints202311.0302.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: intrusion detection system; machine learning techniques; Exploratory Data Analysis; Performance Evaluation; feature selection; CSE-CIC-IDS-2018 dataset; Three phase models
Online: 6 November 2023 (07:46:31 CET)
In this paper, intrusion detection systems are thoroughly investigated utilizing the CSE-CIC-IDS-2018 dataset. The research is divided into three key phases: first, applying Data Cleaning, Exploratory Data Analysis, and Data Normalization techniques (min-max and z-score) for preparing data across distinct classifiers. Second, feature importance is reduced using a combination of Principal Component Analysis (PCA) and Random Forest (RF), with the goal of improving processing speed and decreasing model complexity. This stage comprises a comparison with the entire dataset. Finally, machine learning algorithms (XGBoost, CART, DT, KNN, MLP, RF, LR, and Bayes) are applied to specific features and preprocessing approaches. Surprisingly, the XGBoost, DT, and RF models outperform in both ROC values and CPU runtime. Following evaluation, which includes PCA and RF feature selection, an optimal set is produced.
ARTICLE | doi:10.20944/preprints201910.0151.v1
Subject: Social Sciences, Safety Research Keywords: exploratory spatial data analysis; LISA; temporary assistance for needy families (TANF); tanf responsiveness to great recession; spatial clusters; TANF policy choices; TANF maximum aid
Online: 13 October 2019 (16:35:09 CEST)
During the 2008 Great Recession, many families with children relied on cash assistance from Temporary Assistance for Needy Families (TANF) program. The present study applied Exploratory Spatial Data Analysis (ESDA) tools to analyze geographically varying spatial clusters of states’ unemployment rates, TANF caseload growth rates, TANF policy choices such as benefit levels and TANF responsiveness rates to the recession. We analyzed 45 contiguous states and Washington D.C. A standardized TANF responsiveness index was developed to compare states’ TANF growth rates relative to their labor market conditions. The western states were found to be very responsive to the recession with ratios greater than one. In contrast, Texas and Arizona, with ratios below 1, were unresponsive to the recession. The presence of strong spatial clusters in unemployment rate and TANF maximum aid were found. In the case of maximum aid, there was a strong presence of Low-Low spatial clusters in Southern States and High-High clusters in Northeastern States. The findings suggest that several neighboring states in the northeast and some in the south had similar levels of financial commitment during the 2008 recessionary as the ones found by earlier research conducted during non-recessionary periods. The findings have implications for future federal actions and for state level collaboration.