Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

# Testing RNN-LSTM Forecasting with Simulated Astronomical Lightcurves

Version 1 : Received: 19 July 2019 / Approved: 22 July 2019 / Online: 22 July 2019 (10:41:25 CEST)

How to cite: Chakraborty, N. Testing RNN-LSTM Forecasting with Simulated Astronomical Lightcurves. Preprints 2019, 2019070241 (doi: 10.20944/preprints201907.0241.v1). Chakraborty, N. Testing RNN-LSTM Forecasting with Simulated Astronomical Lightcurves. Preprints 2019, 2019070241 (doi: 10.20944/preprints201907.0241.v1).

## Abstract

With an explosion of data in the near future, from observatories spanning from radio to gamma-rays, we have entered the era of time domain astronomy. Historically, this field has been limited to modeling the temporal structure with time-series simulations limited to energy ranges blessed with excellent statistics as in X-rays. In addition to ever increasing volumes and variety of astronomical lightcurves, there's a plethora of different types of transients detected not only across the electromagnetic spectrum, but indeed across multiple messengers like counterparts for neutrino and gravitational wave sources. As a result, precise, fast forecasting and modeling the lightcurves or time-series will play a crucial role in both understanding the physical processes as well as coordinating multiwavelength and multimessenger campaigns. In this regard, deep learning algorithms such as recurrent neural networks (RNNs) should prove extremely powerful for forecasting as it has in several other domains. Here we test the performance of a very successful class of RNNs, the Long Short Term Memory (LSTM) algorithms with simulated lightcurves. We focus on univariate forecasting of types of lightcurves typically found in active galactic nuclei (AGN) observations. Specifically, we explore the sensitivity of training and test losses to key parameters of the LSTM network and data characteristics namely gaps and complexity measured in terms of number of Fourier components. We find that typically, the performances of LSTMs are better for pink or flicker noise type sources. The key parameters on which performance is dependent are batch size for LSTM and the gap percentage of the lightcurves. While a batch size of $10-30$ seems optimal, the most optimal test and train losses are under $10 \%$ of missing data for both periodic and random gaps in pink noise. The performance is far worse for red noise. This compromises detectability of transients. The performance gets monotonically worse for data complexity measured in terms of number of Fourier components which is especially relevant in the context of complicated quasi-periodic signals buried under noise. Thus, we show that time-series simulations are excellent guides for use of RNN-LSTMs in forecasting.

## Subject Areas

recurrent neural networks, LSTM, lightcurves, simulation, variability

Views 0