Submitted:
22 March 2026
Posted:
24 March 2026
You are already at the latest version
Abstract

Keywords:
1. Introduction
- A complete 3D-DCT–based coding framework: A simple and fully functional end-to-end codec is developed by extending a JPEG-like 2D pipeline to three dimensions, including 3D transform computation, quantization, serialization, and entropy coding.
- Low-complexity video compression without motion compensation: The proposed approach exploits temporal redundancy via 3D-DCT, achieving compression performance comparable to MPEG-4 while avoiding motion estimation and compensation, significantly reducing computational complexity.
- Editing-friendly video coding with configurable GOP sizes: The use of fixed-size 3D blocks enables short and flexible GOPs, allowing direct frame access and making the method well suited for video editing workflows and camera firmware.
- Generalization to non-cubic 3D-DCT blocks: Non-cubic (parallelepipedic) 3D-DCT transforms are introduced and evaluated, demonstrating that increasing the third-dimension size improves compression efficiency across applications.
- Novel 3D quantization and serialization strategies: Two original methods for 3D quantization matrix construction and two alternative coefficient ordering schemes are proposed and analyzed, showing that optimal choices depend on the redundancy structure of the data.
- Application to volumetric medical image compression: The proposed coder is successfully applied to real CT datasets, achieving very high compression ratios while maintaining diagnostically acceptable image quality.
2. Materials and Methods
2.1. Starting Point
2.2. 3D-DCT Computation by Separability
- First, apply the DCT-2D to each 2D layer of x: x[m,n,p0 ] (p0 constant).
- Applying in the resulting matrix, the DCT-1D (dct function in MATLAB), to each vector T2d[m0 ,n0 ,p] (m0 and n0 constant).
2.3. From the Quantization Matrix to the Quantization Parallelepiped
-
Method 1: called the geometrical method. To easily calculate an operational matrix, the following method has been designed:
- Make the matrices corresponding to the Cartesian planes, that is Q3(i,j,1), Q3(i,1,k), and Q3(1,j,k) equal to the original JPEG matrix (equation 5). If Z dimension (index k) is greater than 8, remaining values are filled with the maximum value in the 2D matrix.
- Calculate the remaining values according to the plane i+j+k=c0 to which they belong.
- For a plane, i+j+k=c0, make its coefficients Q(i,j,k) not yet assigned; equal to the average of those already assigned (the values already assigned are those on the intersection lines of that plane with the Cartesian planes).
- If there is no intersection. For these high frequencies, make Q3(i,j,k) equal to the maximum coeffient of 2D matrix.
-
Method 2: called the polynomic method. This new method comes from the idea of constructing polynomial function that computes matrix coefficients. Method details are:
- Assume that there exists a function that can compute Q2(i,j) as:
- 2.
- Compute A, B, and C, using a overdetermined linear equation system solved using a matrix formulated minimum square error method [22].
- 3.
- Extend the polynomial for being able to create a 3D Q, Q3(i,j,k):
- 4.
- Coefficients must be corrected using the following empirical equations:
- 5.
- Running a simple loop allows to create a 3D Q matrix easily (computed coefficients are rounded to the nearest integer). Remember that any matrix is valid, its ability to get high PSNR and high compression must be tested.
2.4. Serialization of the Quantized Block
- Method 1: Based on the fact stated above, the serialization is performed by horizontal plane layers: T3d[i,j,k0], with constant k0. The word horizontal means that layers are parallel planes orthogonal to vector (0,0,1). Inside each layer, the zig-zag path inherited from the standard JPEG method is applied.
- Method 2: In theory, indexes i, j, and k are proportional to transform frequencies (horizontal, vertical and temporal), so the sum i+j+k can be used as a measurement of “how high is the frequency associated with a given coefficient”. So it can be reasonable to order coefficients using parallel planes orthogonal to vector (1,1,1), id EST: planes with equation the i+j+k=c0.
2.5. Variable Length Encoding
2.6. Video Encoding with 3D-DCT
2.7. Test Videos
3. Results and Discussion
3.1. Monochrome Video Encoding
3.2. Color Video Encoding
3.3. Medical Images Coding
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| JPEG | Joint Photographic Experts Group (still images coding standard) |
| MPEG | Motion Pictures Experts Group (video coding standard) |
| DCT | Discrete Cosine Transform |
| PSNR | Peak Signal to Noise Ratio |
| CT | Computer Tomography |
| DICOM | Digital Imaging and Communication in Medicine (medical image file format) |
Appendix A. Description of used databases
| Name |
Resolution |
FPS | Duration (s) | Color space | Scene Type |
|---|---|---|---|---|---|
| akiyo_cif | CIF (352×288) | 30 | 10 | YCbCr | HS |
| akiyo_qcif | QCIF (176×144) | 30 | 10 | YCbCr | HS |
| bowing_cif | CIF (352×288) | 30 | 10 | YCbCr | HS |
| bowing_qcif | QCIF (176x144) | 30 | 10 | YCbCr | HS |
| bridge_close_cif | CIF (352×288) | 30 | 67 | YCbCr | UL |
| bridge_close_qcif | QCIF (176×144) | 30 | 67 | YCbCr | UL |
| bridge_far_cif | CIF (352x288) | 30 | 70 | YCbCr | UL |
| bridge_far_qcif | QCIF (176×144) | 30 | 70 | YCbCr | UL |
| bus_cif | CIF (352×288) | 30 | 5 | YCbCr | UT |
| bus_qcif_15fps | QCIF (176×144) | 15 | 5 | YCbCr | UT |
| car_phone_qcif | QCIF (176×144) | 30 | 13 | YCbCr | HS |
| city_4cif | 4CIF (704×576) | 60 | 10 | YCbCr | UL (aerial) |
| city_cif | CIF (352×288) | 30 | 10 | YCbCr | UL (aerial) |
| city_qcif_15fps | QCIF (176×144) | 15 | 10 | YCbCr | UL (aerial) |
| claire_qcif | QCIF (176×144) | 30 | 16.50 | YCbCr | HS |
| foreman_cif | CIF (352×288) | 30 | 10 | YCbCr | HS+UL |
| foreman_qcif | QCIF (176x144) | 30 | 10 | YCbCr | HS+UL |
| hall_monitor_cif | CIF (352×288) | 30 | 10 | YCbCr | IO |
| hall_monitor_qcif | QCIF (176x144) | 30 | 10 | YCbCr | IO |
| miss_am_qcif | QCIF (176×144) | 30 | 5 | YCbCr | HS |
| news_cif | CIF (352×288) | 30 | 10 | YCbCr | TV (news) |
| news_qcif | QCIF (176×144) | 30 | 10 | YCbCr | TV (news) |
| tt_sif | SIF (352×240) | 30 | 4 | YCbCr | IS |
| Name |
Resolution |
Number of Slices |
|---|---|---|
| dental | 512x512 | 166 |
| head | 512x512 | 460 |
| spine01 | 512x512 | 436 |
| spine02 | 512x512 | 515 |
| spine03 | 512x512 | 226 |
| spine04 | 512x512 | 631 |
| spine05 | 512x512 | 322 |
| spine06 | 512x512 | 556 |
| spine07 | 512x512 | 258 |
| spine08 | 512x512 | 202 |
| spine09 | 512x512 | 299 |
| spine10 | 512x512 | 328 |
| Name |
Resolution |
Number of Slices |
|---|---|---|
| Lumbar01 | 640x640 | 25 |
| Lumbar02 | 256x256 | 47 |
| Lumbar03 | 768x768 | 25 |
Appendix B. Computational load study
References
- M.K.Ibraheem, A.V. Dvorkovich, I.M. Abdalameer Al-khafaji , A Comprehensive Literature Review on Image and Video Compression: Trends, Algorithms, and Techniques, Ingénierie des Systèmes d’Information, p863-876. [CrossRef]
- Technical report: https://www.sciencedirect.com/topics/computer-science/discrete-cosine-transform.
- A.K. Jain, Fundamentals of digital image processing, Englewood Cliffs, New Jersey: Prentice Hall, cop. 1989.
- S. Saponara, S. Real-time and low-power processing of 3D direct/inverse discrete cosine transform for low-complexity video codec. J Real-Time Image Proc, 7, 43–53 (2012). [CrossRef]
- Jeoong Sung Park, Tokunbo Ogunfunmi, A 3D-DCT video encoder using advanced coding techniques for low power mobile device, Journal of Visual Communication and Image Representation 2017, 48, 122-135. [CrossRef]
- Xue, J.; Yin, L.; Lan, Z.; Long, M.; Li, G.; Wang, Z.; Xie, X. 3D DCT Based Image Compression Method for the Medical Endoscopic Application. Sensors 2021, 21, 1817. [CrossRef]
- O. Alshibami and S. Boussakta, Fast algorithm for the 3D DCT, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 2001, pp. 1945-1948 vol.3. [CrossRef]
- A. Skodras, C. Christopoulos and T. Ebrahimi, “The JPEG 2000 still image compression standard,” in IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36-58, Sept. 2001. [CrossRef]
- M. J. Weinberger, G. Seroussi and G. Sapiro, “From LOGO-I to the JPEG-LS standard,” Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348), Kobe, Japan, 1999, pp. 68-72 vol.4. [CrossRef]
- D. A. Adjeroh and S. D. Sawant, Error-Resilient Transmission for 3D DCT Coded Video, in IEEE Transactions on Broadcasting, vol. 55(2), pp. 178-189 (2009). [CrossRef]
- T. Haiyan, S. Wenbang, G. Bingzhe and Z. Fengjing, Research on Quantization and Scanning Order for 3-D DCT Video Coding, 2012 International Conference on Computer Science and Electronics Engineering, Hangzhou, China, 2012, pp. 200-204. [CrossRef]
- J. Li, J. Takala, M. Gabbouj, H. Chen, Modeling of 3D-DCT coefficients for fast video encoding, 3rd International Symposium on Communications, Control and Signal Processing, 2008 (ISCCSP 2008), pp. 634-648. [CrossRef]
- Jin Li, Moncef Gabbouj, Jarmo Takala and Hexin Chen, Simplified video coding for digital mobile devices, 2008 9th International Conference on Signal Processing, Beijing, China, 2008, pp. 1247-1250. [CrossRef]
- Guofeng Tong, Sixuan Liu, Yang Lv, Hanyu Pei, Feng-Lei Fan, “A Survey on Medical Image Compression: From Traditional to Learning-Based”, preprint on arXiv. [CrossRef]
- H. M. Luu, et al., “Efficiently compressing 3d medical images for teleinterventions via cnns and anisotropic diffusion,” Medical Physics, vol. 48, no. 6, pp. 2877–2890, 202. [CrossRef]
- A.B. Watson, J.A. Solomon, A.J. Ahumada Jr., and A. Gale, Discrete cosine transform (DCT) basis function visibility: effects of viewing distance and contrast masking, Proc. SPIE 2179, Human Vision, Visual Processing, and Digital Display V, (1 May 1994). [CrossRef]
- H.A. Peterson, H. Peng, J.H. Morgan, and W.B. Pennebaker, Quantization of color image components in the DCT domain, Proc. SPIE 1453, Human Vision, Visual Processing, and Digital Display II, (1 June 1991). [CrossRef]
- A.J. Ahumada Jr. and H.A. Peterson, Luminance-model-based DCT quantization for color image compression, Proc. SPIE 1666, Human Vision, Visual Processing, and Digital Display III, (27 August 1992). [CrossRef]
- H.A. Peterson, A.J. Ahumada Jr., and A.B. Watson, Improved detection model for DCT coefficient quantization, Proc. SPIE 1913, Human Vision, Visual Processing, and Digital Display IV, (8 September 1993). [CrossRef]
- A.B. Watson, DCT quantization matrices visually optimized for individual images, Proceedings of SPIE 1993 (Human Vision, Visual Processing, and Digital Display IV). [CrossRef]
- G. K. Wallace, The JPEG still picture compression standard, in IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii-xxxiv, Feb. 1992. [CrossRef]
- E. W. Weisstein, Normal Equation, MathWorld (Wolfram). https://mathworld.wolfram.com/NormalEquation.html (accessed on 2nd of February, 2026).
- ITU-T Rec. T.81 (1993) PDF, legacy JPEG spec: https://www.w3.org/Graphics/JPEG/itu-t81.pdf, (accessed on 2nd of February, 2026).
- E. Oma, “YUV4MPEG reader”, matlab central, 2015. https://es.mathworks.com/matlabcentral/fileexchange/50690-yuv4mpeg-reader (accessed on 2nd of February, 2026).
- https://media.xiph.org/video/derf/ (accessed on 2nd of February, 2026).
- https://www.ffmpeg.org/ (accessed on 2nd of February, 2026).
- https://www.dicomlibrary.com/ (accessed on 2nd of February, 2026).
- www.kaggle.com (accessed on 2nd of February, 2026).
- https://www.cse.fau.edu/~borko/paper_procarch.pdf (accessed on 2nd of February, 2026).
- O. Lehtoranta and T. D. Hamalainen, “Complexity analysis of spatially scalable MPEG-4 encoder,” Proceedings. 2003 International Symposium on System-on-Chip (IEEE Cat. No.03EX748), Tampere, Finland, 2003, pp. 57-60. [CrossRef]






| Coder |
Z (GOP) |
Q coefficients Computation method |
Serialization Method |
PSNR mean/std |
Compression Ratio Mean/std |
Processing time per frame |
|---|---|---|---|---|---|---|
| 3D-DCT, v0 Baseline |
8 |
1 | 1 | 37.91/1.37 | 29.50/14.93 |
0.96 |
| 3D-DCT, v1 | 8 | 1 | 2 | 37.65/1.01 | 24.53/10.71 | 0.93 |
| 3D-DCT, v2 | 8 | 2 | 1 | 37.20/1.11 | 40.36/22.73 | 0.92 |
| 3D-DCT, v3 | 8 | 2 | 2 | 37.20/1.11 | 31.90/15.71 | 0.92 |
| 3D-DCT, v4 | 16 | 1 | 1 | 37.93/1.06 | 43.38/23.41 | 0.50 |
| 3D-DCT, v5 | 16 | 1 | 2 | 38.06/1.10 | 37.39/17.58 | 0.51 |
| 3D-DCT, v6 | 16 | 2 | 1 | 37.83/1.13 | 56.88/35.79 | 0.50 |
| 3D-DCT, v7 | 16 | 2 | 2 | 37.83/1.13 | 46.47/25.25 | 0.49 |
| 3D-DCT, v8 | 32 | 1 | 1 | 38.25/1.13 | 61.79/25.24 | 0.29 |
| 3D-DCT, v9 | 32 | 1 | 2 | 38.34/1.21 | 54.01/29.16 | 0.28 |
| 3D-DCT, v10 | 32 | 2 | 1 | 38.21/1.20 | 79.09/54.74 | 0.28 |
| 3D-DCT, v11 | 32 | 2 | 2 | 38.21/1.20 | 66.16/38.65 | 0.28 |
| MPEG-4, MATLAB baseline |
90 |
NA | NA | 42.32/0.85 | 27.35/13.77 |
NA |
| H.265 (ffmpeg) | 8 | NA | NA | 37.37/1.29 | 56.37/30.45 | NA |
| H.265 (ffmpeg) | 16 | NA | NA | 37.62/1.33 | 81.21/47.34 | NA |
| H.265 (ffmpeg) | 32 | NA | NA | 37.47/1.31 | 112.81/74.73 | NA |
| Coder |
Z (GOP) |
Q coefficients Computation method |
Serialization Method |
PSNR mean/std |
Compression Ratio mean/std |
Processing time per frame |
|---|---|---|---|---|---|---|
| 3D-DCT, v0 Baseline |
8 | 1 | 1 | 33.34/1.29 | 77.44/36.88 | 1.38 |
| 3D-DCT, v1 | 8 | 1 | 2 | 33.00/0.64 | 65.44/25.96 | 1.37 |
| 3D-DCT, v2 | 8 | 2 | 1 | 32.69/0.74 | 105.30/55.07 | 1.36 |
| 3D-DCT, v3 | 8 | 2 | 2 | 32.69/0.74 | 84.78/37.54 | 1.36 |
| 3D-DCT, v4 | 16 | 1 | 1 | 33.42/1.18 | 115.20/59.97 | 0.74 |
| 3D-DCT, v5 | 16 | 1 | 2 | 33.16/0.69 | 100.42/43.91 | 0.73 |
| 3D-DCT, v6 | 16 | 2 | 1 | 33.00/0.74 | 149.94/89.64 | 0.73 |
| 3D-DCT, v7 | 16 | 2 | 2 | 33.00/0.74 | 123.98/62.12 | 0.72 |
| 3D-DCT, v8 | 32 | 1 | 1 | 33.17/0.75 | 164.96/98.98 | 0.42 |
| 3D-DCT, v9 | 32 | 1 | 2 | 33.20/0.77 | 145.47/74.55 | 0.41 |
| 3D-DCT, v10 | 32 | 2 | 1 | 33.22/0.99 | 210.86/141.57 | 0.41 |
| 3D-DCT, v11 | 32 | 2 | 2 | 33.12/0.77 | 177.79/98.58 | 0.41 |
| MPEG-4, MATLAB | 90 | NA | NA | 35.65/0.74 | 66.59/22.62 | NA |
| H.265 (ffmpeg) | 8 | NA | NA | 70.33/2.40 | 47.35/22.66 | NA |
| H.265 (ffmpeg) | 16 | NA | NA | 70.35/33.25 | 66.30/33.25 | NA |
| H.265 (ffmpeg) | 32 | NA | NA | 70.28/2.40 | 87.60/47.59 | NA |
| Coder |
Z |
Q coefficients Computation method |
Serialization Method |
PSNR mean/std |
Absolute Compression mean/std |
Relative Compression mean/std |
Processing time per frame |
|---|---|---|---|---|---|---|---|
| 3D-DCT, v0 Baseline |
8 | 1 | 1 | 37.56/3.28 | 315.63/ 236.84 |
285.56/ 237.77 |
2.38 |
| 3D-DCT, v1 | 8 | 1 | 2 | 37.56/3.28 | 356.58/ 256.54 |
319.93/ 262.03 |
2.42 |
| 3D-DCT, v2 | 8 | 2 | 1 | 36.90/2.74 | 476.84/ 358.60 |
429.74/ 356.01 |
2.46 |
| 3D-DCT, v3 | 8 | 2 | 2 | 36.90/2.74 | 521.75/ 388.11 |
466.84/ 380.16 |
2.46 |
| 3D-DCT, v4 | 16 | 1 | 1 | 36.90/3.01 | 415.63/ 307.66 |
377.13/ 311.04 |
1.27 |
| 3D-DCT, v5 | 16 | 1 | 2 | 36.90/3.02 | 523.70/ 393.39 |
472.37/ 390.77 |
1.26 |
| 3D-DCT, v6 | 16 | 2 | 1 | 36.59/2.71 | 668.22/ 517.00 |
606.13/ 516.16 |
1.26 |
| 3D-DCT, v7 | 16 | 2 | 2 | 36.59/2.71 | 819.66/ 641.15 |
739.44/ 631.14 |
1.25 |
| 3D-DCT, v8 | 32 | 1 | 1 | 36.13/2.71 | 537.49/ 394.29 |
488.48/ 401.28 |
0.73 |
| 3D-DCT, v9 | 32 | 1 | 2 | 36.13/2.71 | 726.67/ 554.28 |
659.36/ 555.27 |
0.75 |
| 3D-DCT, v10 | 32 | 2 | 1 | 36.09/2.67 | 863.22/ 676.02 |
787.82/ 679.64 |
0.75 |
| 3D-DCT, v11 | 32 | 2 | 2 | 36.09/2.67 | 1166.40/ 946.93 |
1064.40/ 943.18 |
0.71 |
| Coder |
Z |
Q coefficients Computation method |
Serialization Method |
PSNR mean/std |
Absolute Compression mean/std |
Relative Compression mean/std |
Processing time per frame |
|---|---|---|---|---|---|---|---|
| 3D-DCT, v0 Baseline |
8 | 1 | 1 | 39.97/2.26 | 44.20/5.9 | 34.31/2.54 |
4 |
| 3D-DCT, v1 | 8 | 1 | 2 | 39.97/2.26 | 58.43/6.94 | 45.37/2.60 | 4 |
| 3D-DCT, v2 | 8 | 2 | 1 | 38.47/2.03 | 63.59/6.56 | 49.41/2.01 | 4 |
| 3D-DCT, v3 | 8 | 2 | 2 | 38.47/2.03 | 88.68/8.52 | 68.93/2.32 | 4 |
| 3D-DCT, v4 | 16 | 1 | 1 | 37.17/1.43 | 60.09/3.80 | 46.79/0.71 | 2.1 |
| 3D-DCT, v5 | 16 | 1 | 2 | 37.18/1.44 | 81.31/7.37 | 63.23/1.83 | 2.1 |
| 3D-DCT, v6 | 16 | 2 | 1 | 36.97/1.58 | 75.29/3.67 | 58.65/1.00 | 2 |
| 3D-DCT, v7 | 16 | 2 | 2 | 36.97/1.58 | 101.73/6.90 | 79.18/0.32 | 2.1 |
| 3D-DCT, v8 | 32 | 1 | 1 | 35.43/1.22 | 82.29/13.12 | 63.80/6.45 | 1.2 |
| 3D-DCT, v9 | 32 | 1 | 2 | 35.47/1.25 | 104.52/17.60 | 81.00/8.91 | 1.2 |
| 3D-DCT, v10 | 32 | 2 | 1 | 35.55/1.30 | 97.90/15.47 | 75.90/7.54 | 1.2 |
| 3D-DCT, v11 | 32 | 2 | 2 | 35.55/1.30 | 122.88/20.15 | 95.24/10.03 | 1.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).