Remote Photoplethysmography (rPPG) has emerged as a non-intrusive and promising physiological sensing capability in HCI research, gradually extending its applications in health-monitoring and clinical care contexts. With advanced machine learning models, recent datasets collected in real-world conditions have gradually enhanced the performance of rPPG methods in recovering heart-rate and heart-rate variability metrics. However, the signal quality of reference ground-truth PPG data in existing datasets is by and large neglected, while poor quality references negatively influence models. Here, this work introduces a new imaging blood volume pulse (iBVP) dataset of synchronized RGB and thermal infrared videos with PPG ground-truth signals from the ear and its high resolution signal quality labels, for the first time. Participants perform rhythmic breathing, head-movement, and stress-inducing tasks, which help reflect real-world variations in psycho-physiological states. This work conducts dense (per sample) signal quality assessment to discard noisy segments of ground-truth and corresponding video frames. We further present a novel end-to-end machine learning framework, iBVPNet that features an efficient and effective spatio-temporal feature aggregation for reliable estimation of BVP signals. Finally, this work examines the feasibility of extracting BVP signals from thermal video frames, which is underexplored. The iBVP dataset and source codes are publicly available for research use.