VIRAC Maser Data Processing Suite

Modern scientific research particularly radio astronomical spectroscopic observations cannot be imagined without appropriate software suite. That is necessary because of two reasons 1) data are in digital form and 2) data processing is a complex multi-step process. In this paper created in VIRAC data processing suite for a single radio telescope, space maser observations are presented. Software Defined Radio (SDR) backend data processing is described. Implementation of the frequency switching algorithm, acquisition of observation data and their format for storing are discussed. Multiple ways to display the observation results are highlighted.


Introduction
In the year 2017 Ventspils International Radio Astronomy Centre (VIRAC) team of astronomers and engineers started to work on the maser monitoring programme (Aberfelds et al., 2017). That included technical and scientific development. The technical development was creating the radio telescope backend and software package for data processing. The scientific development was the creation of source catalogue, developing mathematical background for data processing, creating observation planning method and studying the observed radio sources variability. Observations have been done with VIRAC 16 m and 32 m radio telescopes RT16 and RT32. Initially, both telescopes had Digital Base Band Converter (DBBC) (Tuccari, 2003) backend, recently SDR (Bleiders et al., 2020) based backend has been developed. Afterwards tool complex for single-dish radio telescope space maser observation data processing named Maser Data Processing Suite (MDPS). MDPS discussed in this paper was developed in VIRAC. It is open-source software, with user-friendly Graphical User Interface GUI distributed via GitHub repository https://github.com/sklandrausis/Maser-Data-Processing-Suite.
There exist many other software packages for single dish data processing like Common Astronomy Software Applications (CASA), GILDAS (it is used to reduce all data acquired with the IRAM 30M telescope and the NOrthern Extended Millimetre Array NOEMA), GBTIDL (is an interactive package for reduction and analysis of spectral line data taken with the Robert C. Byrd), but non of existing software packages are compatible with VIRAC SDR based backend setup for example maser data reducing software used for Effelsberg telescope is not publicly available and pipeline code is very tailored to Effelsberg, so it would be a major effort to adapt it to VIRAC needs. It is also tied to the Multi-Beam FITS Raw Data Format (MBFITS) format and if MBFITs format is not used, it would be better off with a complete re-write anyway (Information obtained in personal communication with Benjamin Winkel). VIRAC setup produces the specific format of data that these packages do not support. And these packages do not have monitoring module. Additionally frequency shifting algorithm is not wide spread maser calibration method, because of that MDPS is one of that first this kind of software.

Overview of MDPS
MDPS currently allows 1) process SDR output, 2) display maser sources spectral components variability over time monitoring and 3) Visualise data for publications. MDPS also uses several Python libraries. The most important of them are PyQt5 (riverbankcomputing.com, 2020), Astropy (Robitaille et al., 2013) (Günther et al., 2018), Matplotlib (Hunter, 2007), Numpy (Oliphant, 2006), H5py (Collette, 2013), SciPy (Virtanen, 2020), jplephem (Rhodes, 2011) and experimentsLogReader (Šteinbergs, 2020). It also needs planetary ephemeris parameter file de435.bsp, that can be required by https://naif.jpl .nasa.gov/pub/naif/generic_kernels/spk/planets/de435.bsp. Processing suite also uses two configuration files -Python library Matplotlib configuration file to modify plotting view, custom made configuration file, that specifying used directories and hard-coded variables. Most important of them are 1) velocity line parameters, 2) signal cut values, 3) observed source parameters (Right ascension, Declination, epoch), 4) base frequencies of observed maser species and 5) station coordination's. Velocity line parameter is used to compute the local maximum of the spectrum and signal cut values are used to compute signal to noise ratio. Base frequencies of species are laboratory determined maser frequencies for specific chemical specie and are used to compute Doppler effect. Station coordination's are three geocentric values in metres (x,y,z) used to compute local standard of rest.

SDR backend description
As backed on-the-shelf SDR Ettus Research USRP X300 equipped with TwinRX daughterboard is used. (Bleiders et al., 2020) It is a multi-channel SDR transceiver, that allows obtaining complex sample rates and since USRP X300 is used together with TwinRX front-end daughterboard, it allows dualpolarization spectroscopy. (Bleiders et al., 2020) Optimal linearity receiver is find using calibration noise injection function. (Bleiders et al., 2020) Since power values is measured with noise source on and off, it should not depend on input signal level, that allows to find optimum attenuation values for USRP unit, telescope receiver and IF signal chain itself. (Bleiders et al., 2020) 3.2 Implementation of frequency switching algorithm 3.2.1 Mathematical implementation of frequency switching algorithm Observations are done in a frequency-switching mode described by (Winkel et al., 2012) with four typically 15 seconds long integration stages. First two stages are with noise diode turned off, the last two stages with noise diode turned on.
These stages are marked as ref of f , sig of f , ref on and sig on . Definitions of ref, sig correspond to the shift of the local oscillator frequency to up and down from the calculated central frequency. Shifting local oscillator frequencies allow to remove the bandpass dependence. Indexes on and off correspond to noise diode turned on and off respectively. Sequential SDR backend creates a file for each of these stages. Each of these files contains a matrix with three columns -the first is frequency, the second and third are amplitudes of left and the right polarisation. The system also generates a log file. It is parsed and necessary information for calibration is extracted by using Python library experimentsLogReader. The first step of telescope data processing is to rearrange data by shifting the centre-frequency component to the centre of the array after that calibration process can start. The calibration process is a multi-step process beginning with a conversion from telescope backend relative amplitude units to Kelvins. Thereafter the system temperature for noise diode turned off must be computed. This is done for both sig and ref stages. For the first, it is done by the help of formula 1. For the next one, the formula is similar only there is a P ref instead of P sig.
Here T cal is the temperature of noise diode obtained from the observation log file. When observation log file is generated T cal value is written 3.820791 K for both telescope, that is temperature of noise diode coresponding the value of its generated signal. P sig on and P sig of f is sig stage on and off amplitude. (P sig on − P sig of f ) is an average value from the subtraction of P sig on and P sig of f . Next calibrated amplitude for all four stages must be computed. Since computation for sig and ref stage is similar only for sig stage it will be shown. First T cal sig of f for off -stage is computed using formula 2, thereafter T cal sig on for on stage using formula 3.
where P sig of f is sig stage off amplitude.
Next step is to compute average amplitude from their on and off stage values, that can be done by formula 4.
After that data are shifted by shifting factor, computed by formula 5. In sig stage positive shifting factor is used in ref stage negative shifting factor is used. That is done by using the Python library Numpy function roll.
where f step is the distance between frequencies points and f shif t are computed by the formula 6.
where bw is bandwidth and df div is division factor. Both values are obtained from the observation log. After that average of sig and ref stages are computed by using formula 7.
where Tref is calibrated ref stage amplitude obtained similar to Tsig. Finally, conversion from amplitude in Kelvins to amplitude in Janskys is done by formula 8.
where S is amplitude in Janskys, T is calibrated amplitude, DPFU is degreesper-flux-unit and P oly val (g el , elevation) is evaluated ordinary polynomial, where the g el is polynomial coefficients. Elevation, g el and DPFU are obtained from separately performed calibration sessions, targeting well known continuum sources (Perley and Butler, 2013) and also are stored in log files.
Polynomial is evaluated using Python library Numpy function polyval. The final part is to extract valuable part of data, that is done by computing two indexes. These indexes i and j are 1/4 and 3/4 of sampling points. Example of a frequency switching algorithm result screenshot is shown in figure 1. contains a matrix with three columns -the first is frequency, the second and third are amplitudes of left and the right polarisation, no header information. These files are read using Python library Numpy function loadtxt. Before reading the data files they are sorted for each scan and is checked if every scan have all four files, if not scan are deleted and user are warned about that. SDR backend file naming scheme is this source _f f requency _ station _ iteration _no nOandstage .dat. Where nO and stage is scan number and calibration stage index, example 001r0. All scan number and calibrator stage index contains two parts scan number in example 001 and calibration stage index r0. Calibration stage index is string that contains two parts identification for sig and ref stages -letter r and s and identification for on and off stages numbers 1 and 0. Theses files are read as Numpy arrays and all mathematical computations mention in 3.2.1 are done using matrix operation.

Computation of Doppler effect
After amplitudes are calibrated next step of data processing is to compute the Doppler effect and transform data from the frequency domain to the velocity domain, that is done using formula 9.
where f o is observed frequency, f b is laboratory determined maser frequency, c is the speed of light and v r is the velocity of the receiver.
Observed frequency is frequency from data file plus local oscillator frequency. The velocity of the receiver in Local Standard of Rest frame is the sum of Sun apex velocity, Earth orbital velocity and Earth rotation.

Observation data quality estimation
Data quality is determined by signal to noise ratio. This ratio is computed in the velocity domain. To extract a signal from data MDPS configuration file has section cuts, that contain velocities ranges where signal are located. This information is used to compute indexes of ranges for a signal. Knowing where a signal is in data, it is possible to know where noise is in data. Then the signal to noise ratio is computed by the maximum value of signal divided by 3 multiply by Standard deviation of the noise.
Additional option to estimate the quality of observation is to see the system temperature over time. In a normal situation, it will be stable. The result screenshot example is shown in figure 2

Data normalisation and monitoring values extraction
After transformation to velocity domain is done scans of observation are aligned so that the maximum value index is the same for all scans, then all scans are averaged to avoid velocity drift during the observation time. When a single spectrum is created from all scans in one observation, data must be normalised. It is necessary because during observations the effect from system instabilities will correspond to no flat noise floor in the spectrum. It is done by computing ordinary polynomial from the noise part of the spectrum and this polynomial is subtracted from the spectrum. By default, MDPS use polynomial order 3. An example screenshot is shown in 3.

Output data products of SDR backend data processing and their formats
After processing SDR backend data MDPS creates two outputs 1) calibrated spectrum data and 2) data for spectra variability with time monitoring. For calibrated spectrum data file format following specifications were selected 1) It must have read and written a library in most popular programming languages (output files can be distributed to other researchers and they can use any tools they like to process these files) 2) It should be possible to store data for each data processing stage in one file (it is necessary to avoid complicated file naming schemes and that allow reprocessing data from any data processing stage). To store and process astronomical data many data formats are used. In this paper authors only will discuss these data file formats (CSV, FITS (Wells et al., 1979), HDF5 (Folk et al., 2011), NPY (community, 2020), MS (Kemball and Wieringa, 2000)). They are given in We see that both criteria are satisfying only by two data formats FITS and HDF5. Although FITS file format has the support of more programming languages, we favour HDF5, because of advantages that have HDF5. These advantages are as improved I/O speed, compression (Price et al., 2015) and HDF5 data format is used in new modern telescopes like LOFAR, CCAT and MeerKat (Price et al., 2015) and will be used in the future telescope SKA (Comrie et al., 2020) software. The Output file contains tables amplitude, amplitude_corrected, amplitude_corrected_not_smooht, specie and system_temperature. For variability monitoring data following specifications were selected 1) It must allow monitoring meta-data to change over time 2) It must have reading and writing library in most popular programming languages. Therefore JavaScript Object Notation (JSON) was chosen as data format. as seen in figure 6. Power as a function of periods is obtained by computing Lomb-Scargle periodogram as described in (VanderPlas and Ivezic, 2015). To compute periodogram Python library Astropy, modules time series function LombScargle was used. Monitoring module also can compute amplitude changes not only for specific maser velocity component, but also can compute amplitude change as a function in time for all spectrum. That is done by computing triangulation between the velocity of maser and observed time. After that contour of the created triangular grid is drawn, contour colours correspond to observed fluxes. The result can be seen in figure 7.

Performance testing
Running Python scripts with option -m cProfile print in terminal profile info (function calls, primitive calls, execution time in seconds) or importing Python library cProfile to script it will allow to get profile info from any part of script. In figure 9 show SDR backend output processing execution with different number of scans, but all observations had 4096 numbers of points.  figure 9 we can see that execution time is between functions O(logn) and O(nlogn) that means that complexity of SDR backend data processing algorithms is O(n).
Similar analyses are shown spectrogramm computation, results of that can be seen in figure 10, but execution speed of spectrogramm computation is determined by numbers of observation in monitoring and numbers of spectral components.

Conclusions
This paper presents the developed Maser line monitoring processing suite. It enables process SDR output, display monitoring and visualises data for publications and presentation. The suite was entirely written in programming language Python. For data storing modern and efficient technologies were used, like HDF5. Implementation of the frequency switching algorithm was described in detail. Variability monitoring module was described and different ways to display monitoring results were shown. Also, data formats for storing the processed data were discussed.