1. Introduction
Nowadays, the importance of image and video compression is undeniable in medical and computer vision applications, where the compression of depth video data is crucial for optimizing bandwidth usage while preserving the integrity of the original scene. Effective video compression is vital in enhancing healthcare delivery, which calls for further research in order to improve the technological capabilities of subsequent tasks making use of compressed video data. Many standards have been developed during the past decades for image and video compression. The standards are designed to encode signals using the available pixel information while optimizing for maximum compression efficiency and performance.
The basic concept of image compression is to reduce the size of the original image by representing the input with the fewest possible bits. Image compression techniques can be broadly categorized into two types based on the presence of error: lossless and lossy (or near-lossless) compression [
1]. In lossless compression, the original image is perfectly reconstructed at the decoder, ensuring that there is no error between the original and reconstructed images. In lossy compression, some data is shrunk or discarded to achieve a significantly higher compression ratio, resulting in a potential loss of image quality.
Several studies provide theoretical reviews of image compression techniques with a particular focus on lossless compression in medical applications [
2,
3], while others have concentrated on the practical part [
4,
5,
6,
7]. In practical applications, these papers mainly utilize CT grayscale or MRI images. In [
8], a spatial domain image codec is introduced. This technique involves dividing the image into multiple blocks and calculating the difference between minimum and maximum pixel values within each block. Then, the minimum pixel value of the block is subtracted from each pixel, and the resulting values are stored and encoded for the new block. [
9] used the same idea of spatial domain image compression, but the Lempel-Ziv-Welch (LZW) algorithm is applied to encode the subtraction value. Both methods are designed only for lossless compression of images. Lastly, a comprehensive review of classical and novel image compression methods based on the region-of-interest (ROI) [
10] is presented in [
11] that emphasizes the growing importance of image compression in various aspects of our digital life.
Depth images are data sources that encode a depth value in each pixel, representing the distance of that pixel from the depth camera. The depth map is a representation of a scene in two dimensions grayscale. These types of data sources have already found applicability in numerous domains including the medical one, where major hospitals and medical centers world-wide commonly use cameras for patient monitoring. In this respect, various algorithms have been proposed in literature that aim at compressing the depth map images using advanced methods. For instance, the work of Tabus et al [
12] encodes both the contours and the depth value from the depth map. Their method, referred to as CERV (crack-edge-region-value), begins from an initial representation of the image, utilizing binary vertical and horizontal edges of the region contours, along with the depth values assigned to each region. This approach is then used in [
13] to introduce the Anchor Points Coding (APC) technique, where the anchor points of contours are encoded using a context tree algorithm. Schiopu and Tabus also developed their work of Greedy rate-distortion Slope Optimization (GSOm) [
14] by proposing a progressive coding technique for lossy-to-lossless compression methods [
15]. Furthermore, [
16] suggests the single depth intra mode to efficiently encode the smooth area within a depth image. The idea behind this is to simply reconstruct the current Coding Unit (CU) as a smooth area with a single depth sample value, thereby enhancing the encoding efficiency of that area within the depth map by incorporating a leaf merging of the pixel values. [
17] proposed a bit depth compression by estimating the packet size based on the data sensor pattern, the algorithm performing only in lossless mode.
For video coding, many standards and algorithms have been developed. The foundational standards of video compression have evolved over the decades, beginning with the introduction of the Advanced Video Coding (AVC/H.264) standard in 2003 [
18], followed by the release of the improved High Efficiency Video Coding (HEVC/H.265) standard in 2012 [
19] culminating with the completion of the last standard Versatile Video Coding(VVC/H.266) in 2021 [
20]. The standards employ both inter-coding and intra-coding. In intra-coding, each frame in the video is encoded independently without reference to other frames. In contrast, in inter-coding, the frames are encoded using prediction mechanisms based on the previously decoded reference frames. Inter-coding improves compression performance compared to intra-coding at the expense of increased computational complexity. Existing survey papers show that compression performance is steadily improved over the years by adopting increasingly complex prediction and compression mechanisms[
21,
22,
23].
Typical lossy compression systems as cited above, employ the L2 metric to drive the compression process. However, L2 is a global metric which enables minimizing the overall coding error for a given bit rate. Essentially, the coding system minimizes the L2 distortion, corresponding to minimizing the mean square error between the original and reconstructed images for any given bit budget. However, such a global metric may be globally optimal but it may also be subject to large local errors. In depth image and video data, the pixel values correspond to actual depth from the scene to the camera plane. Controlling the coding error at the level of every pixel is thus of critical importance in order to prevent locally large depth errors that may affect subsequent processing tasks and medical decisions. What would be of interest in medical applications is to minimize the coding error at the level of every pixel for a given bit budget. This corresponds to the use of L-infinite metric to drive the coding process.
This paper proposes a novel lossless and near-lossless L-infinite codec for video processing in medical applications, with a focus on elderly monitoring and fall detection. The compression module is or critical importance in reducing the size of streamed video, but in this respect, it is essential that the encoding process introduces either no errors (lossless) or minimal, controlled errors (near-lossless). The work described herein addresses L-infinite compression, thus guaranteeing a bounded distortion (error) in the reconstructed video. The proposed L-infinite codec is computationally lightweight, being implemented on an embedded platform (depth camera sensor), demonstrating the practical applicability of the proposed system on embedded devices.
The primary contributions of this work are as follows:
A novel L-infinite codec for depth video data compression that preserves the semantic interpretation of the scene.
An original quantizer that targets the L-infinite norm for sparse (discontinuous) residual distributions.
A lightweight encoder optimized for real-time deployment on embedded platforms used in medical applications.
The remainder of this paper is organized as follows.
Section 2 reviews the related works.
Section 3 details the principal methodology in which we describe the architecture of our codec.
Section 4 presents the experimental results following the deployment of the codec.
Section 5 discusses the results. Finally, the conclusion is presented in
Section 6.