Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Proximal Policy Optimization for Radiation Source Search

Version 1 : Received: 31 July 2021 / Approved: 2 August 2021 / Online: 2 August 2021 (11:14:24 CEST)

How to cite: Proctor, P.; Teuscher, C.; Hecht, A.; Osiński, M. Proximal Policy Optimization for Radiation Source Search. Preprints 2021, 2021080018 (doi: 10.20944/preprints202108.0018.v1). Proctor, P.; Teuscher, C.; Hecht, A.; Osiński, M. Proximal Policy Optimization for Radiation Source Search. Preprints 2021, 2021080018 (doi: 10.20944/preprints202108.0018.v1).

Abstract

Rapid search and localization for nuclear sources can be an important aspect in preventing human harm from illicit material in dirty bombs or from contamination. In the case of a single mobile radiation detector, there are numerous challenges to overcome such as weak source intensity, multiple sources, background radiation, and the presence of obstructions, i.e., a non-convex environment. In this work, we investigate the sequential decision making capability of deep reinforcement learning in the nuclear source search context. A novel neural network architecture (RAD-A2C) based on the actor critic (A2C) framework and a particle filter gated recurrent unit for localization is proposed. Performance is studied in a randomized 20 x 20 m convex and non-convex environment across a range of signal-to-noise ratio (SNR)s for a single detector and single source. RAD-A2C performance is compared to both an information-driven controller that uses a bootstrap particle filter and to a gradient search (GS) algorithm. We find that the RAD-A2C has comparable performance to the information-driven controller across SNR in a convex environment and at lower computational complexity per action. The RAD-A2C far outperforms the GS algorithm in the non-convex environment with greater than 95% median completion rate for up to seven obstructions.

Supplementary and Associated Material

Keywords

deep reinforcement learning; source search and localization; active search; gamma radiation; source parameter estimation; sequential decision making; non-convex environment}

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.