Preprint
Article

Human Age Estimation from Face Images with Deep Convolutional Neural Networks Using Transfer Learning

Altmetrics

Downloads

1201

Views

137

Comments

0

Submitted:

03 January 2024

Posted:

04 January 2024

You are already at the latest version

Alerts
Abstract
In recent years, there has been a growing interest in the prediction of facial age due to its diverse applications in fields such as security, entertainment, and healthcare. The current scourge of underaged voting in Nigeria is a problem and this research delves into the realm of facial age prediction by employing four well-known convolutional neural network (CNN) architectures, namely VGG16, ResNet50, Mobile Net, and VGG19 to predict chronological ages from facial pictures of person. The objective is to achieve precise age estimation from facial images, utilizing two datasets: UTKFace and CASIA African Facial Datasets. The results of this investigation are noteworthy. Specifically, the VGG16 model which demonstrated remarkable performance, yielding a Mean Absolute Error (MAE) of 1.76 when applied to the UTK-Face dataset. Additionally, when utilizing Mobile-Net, an unprecedented MAE of 1.10 was achieved for the Casia Africa Face Dataset. Notably, this marks the first instance of employing the dataset for facial age detection with CNNs, and this approach outperformed previous works, yielding the lowest MAE among all the studies reviewed.
Keywords: 
Subject: Computer Science and Mathematics  -   Computer Vision and Graphics

1. Introduction

The task of automatically estimating age has gained popularity due to the many industries it may be applied to, particularly social media and e-commerce. Currently, various e-commerce websites may provide product recommendations to their users based on their historical preferences; with seeming age, more practical recommendations are likely. Facial age prediction is a challenging task due to the inherent complexity and variations in facial appearances caused by factors such as genetics, lifestyle, and environmental factors. Convolutional neural networks (CNNs) have shown remarkable success in various computer vision tasks, including facial age prediction. This paper focuses on utilizing transfer learning with VGG16[1], VGG19[2], ResNet50[3] and Mobile Net [5] which are deep CNN architectures widely adopted for image recognition tasks, to improve the accuracy of facial age prediction. Age can be expressed as an integer or a floating-point number, but it also has some coherence,[4] making it possible to compare a person's facial features across a few age groups (for instance, age 20-22). Hence, we can look at the age problem as either regression where a specific age is predicted or classification where different images are grouped into different age groups. We focus on age as a regression problem in this study. It is demonstrated that the suggested technique performs remarkably well on the UTK facial and IMDB-Wiki images benchmark datasets. In this article, we used transfer learning models to estimate facial photos on two facial datasets which are UTK face and Casia African facial dataset. In this approach, transfer learning is used to coordinate a variety of pre-trained deep CNNs VGG16, ResNet50, Mobile Net and VGG19. The only layers that are modified for age estimation are those that are added to the pre-trained model which are the fully connected layer s (FC). Underage voting can be said to be when a person less than the approved age for voting i.e. 18 years of age is engaged in voting exercises, this work aims to develop a facial age detector that can be used to prevent the problem of underaged voting and other age-related vices [18]. The following is the paper's outline: A quick summary of literature reviewed is found in the literature and related work is presented in Section 2. The proposed method for facial age classification is described in Section 3, Section 4 describes our result estimates, and Section 5 compares our findings to current state-of-the-art approaches.

2. Materials and Methods

2.1. Dataset

For facial age prediction, a dataset comprising facial images with corresponding age labels is required for that we selected the UTK and Casia African facial dataset which is openly available from the following links
The dataset can be preprocessed by applying standard techniques such as face detection, alignment, and normalization. Data augmentation techniques like rotation, scaling, and flipping was also employed to enhance the diversity of the training set.

2.1.1. UTK facial dataset

One of the most used datasets created for age determination from facial photos is the UTK image benchmark. The dataset is one of the largest-scale facial dataset with a wide age range (from 0 to 116 years old), it has over 20,000 facial photos with annotations for age, gender, and ethnicity. There is a wide range of poses, facial expressions, lighting, occlusion and resolution. Several tasks, including face detection, age estimation, age progression and regression, and landmark localization could be performed using this dataset. We used pictures from ages 5-30, each consisting of 450 images which made a total of 11400 images.
Figure 1. image showing example of facial images in the UTK dataset.
Figure 1. image showing example of facial images in the UTK dataset.
Preprints 95410 g001
Figure 2. image showing the distribution of from ages 5-30 from the UTK Facial dataset used for the experiments.
Figure 2. image showing the distribution of from ages 5-30 from the UTK Facial dataset used for the experiments.
Preprints 95410 g002

2.1.2. CASIA African facial dataset

The images in the database were taken in Kaduna [27], Nigeria, which is an African country. Approximately 1150 individuals took part in the practice of capture. The dataset has different ethnic Nigeria tribes, for this experiment, we made use of the Hausa ethnic group. The images in the database comprise a total of 38,546 images from 1,183 subjects. We made use of a total of 10921 facial images distributed to ages 10-30, table 2 shows the age distribution of the dataset.
Figure 3. images showing example of facial images in the CASIA African dataset.
Figure 3. images showing example of facial images in the CASIA African dataset.
Preprints 95410 g003
Figure 4. image showing the distribution of from ages 5-30 from the WIKI Facial dataset used for the experiments.
Figure 4. image showing the distribution of from ages 5-30 from the WIKI Facial dataset used for the experiments.
Preprints 95410 g004
  • Training
The input image was rescaled to 120 X 120 pixels and we used the optimization function Adam, and a dropout of the fully connected layer of 0.2 which was to regularize the network while training. We downloaded the various deep learning models from the TensorFlow keras library, and then we froze the convolutional layers and added our fully connected layers for the prediction to take place, we used three dense layers which had 50, 20 and 1 outputs hence for prediction. We then specified the activation function at the last dense layer which is linear for the experiments. We also used the early stopping function to stop training once there are no improvements in accuracy, the RGB image is passed to the convolution area for feature extraction (pre-trained) and then passed to the fully connected layer which is then passed to the output layer where the probability of the output layer is calculated. We used a total of 11700 images which had 450 images in each age label i.e., 5-30 for the UTK dataset, we used a total of 10921 images from the CASIA dataset [27]. we then used the Python function” train_test_split()” to split the X images and labels into training and test set, using the ration of 70.30 for facial datasets.
  • Transfer Learning Workflow
The transfer learning workflow involves the following steps.
a. Preparing the dataset. Splitting the dataset into training, validation, and testing sets.
b. Pre-training. Utilizing the pre-trained (i.e. VGG16) model, which is
trained on a large-scale image classification task such as ImageNet, to extract features from facial images.
c. Fine-tuning. Modifying the last few layers of the pre-trained model or adding new fully connected layers to adapt the model for age prediction. These layers are initialized. Randomly and trained on the specific age prediction task as seen in fig6.
d. Training and Evaluation. Training the modified model using the training set and evaluating its performance on the validation set. Iterative optimization techniques such as gradient descent and backpropagation are applied.
e. Testing. Assessing the final model's accuracy by evaluating it on the testing set.
Metrics such as mean absolute error (MAE) and mean squared error (MSE) can be used to quantify the prediction accuracy. In this work, we utilize the MAE as our evaluation metric.
Figure 5. shows the transfer learning concept.
Figure 5. shows the transfer learning concept.
Preprints 95410 g005
M A E = 0 N y y N
In a regression model, R-Squared (also known as R2 or the coefficient of determination) is a statistical metric that quantifies how much of the variance in the dependent variable can be accounted for by the independent variable. R-squared, or the "goodness of fit," measures how well the data match the regression model. A greater r-squared shows that the model can explain more variability. R-Squared ranges from 0-1
R 2 = S S r S S t
where SSt represents the total sum of squares and SSr represents the residual sum of squares. We would be using both MAE and R2 to determine how well the models performed. The suggested age prediction system using deep CNNs and transfer learning is shown in Fig 7. One of the deep learning models would be attached to the base model with fully connected layers and an output. in Fig.7, a deep CNN model is initially imported with its pre-trained weights. The architectures were developed for the ImageNet object classification problem, so each deep CNN has 1000 units in its last layer. Hence, we remove its fully connected layers and added our custom fully connected layer with one output for age prediction using an activation function of either ReLu or Linear.
  • Experimental setup
Using Jupiter Notebook, we put the suggested strategy into practice. Intel 12 gen I-core 7 with 10 cores has been used for training the networks. For regression, the learning rate was set to 0.001 and the momentum to 0.9. We used mean absolute error (MAE) for regression in the performance metric. Human age exhibits some coherence, and facial shape can vary slightly but not enough for the human eye to notice. For instance, the age patterns of people who are 21 and 23 years old may be similar, and in some instances, it may be difficult for a human to tell them apart. On the other hand, for regression, we minimized MAE.
  • Training
The input image was rescaled to 224 X 224 pixels and we used the optimization function Adam, we used a dropout rate of 0.2 for convolutional layer and a dropout of the fully connected layer of 0.2 which was to regularize the network while training. We also used the early stopping function to stop training once there is no improvements in accuracy, the ‘rgb’ image is passed to the convolution area for feature extraction and then passed to the fully connected layer which is then passed to output layer where the probability of the output layer is calculated. We used a total of 11700 images which had 450 images in each age label i.e. 5-30 for the UTK-face dataset and the Casia dataset we had an unbalanced dataset due to the fact that some of the ages were in small amount in the dataset , we then used the python function train_test_split() to split the X images and labels into training and test set, using the ration 70:30 and the last layer had 1 output for the age value using linear activation function.
Figure 6, a deep CNN model is initially imported with its pre-trained weights. Ever since the architectures were developed for the ImageNet object classification problem, Deep CNN has had 1000 units in its last layer.
The next step is to freeze it convolutional layers and three more fully connected layers added. The first FC layer has 1254450 units, and the second layer has 1024 units and the last fully connected layer has 1 unit for the number of classes, which in our case is 1 due to the regression problem we are dealing with. After training the final three layers in (b), the entire design is fine-tuned.
Figure 7. shows the transfer learning model with fine tuning.
Figure 7. shows the transfer learning model with fine tuning.
Preprints 95410 g007
  • Contribution to Knowledge
Our work Has examined the use of transfer learning approach on age prediction from facial images, this shows that with the aid of transfer learning we can build ready-to-go models which can be adopted in the real world specifically in Nigeria, for process such as under age detection during election and other access control functions.

3. Results

In Table 1 below we show the MAE and R2 of our experiments using the various transfer learning models .
Figure 8. the performance of the models on the UTK face dataset.
Figure 8. the performance of the models on the UTK face dataset.
Preprints 95410 g008
Figure 9. the performance of the models on the Casia Facial dataset.
Figure 9. the performance of the models on the Casia Facial dataset.
Preprints 95410 g009
Figure 10. shows a test image and the VGG 16 predicted result on the UTK image Dataset and Mobile-Net predicted result on the CASIA Facial dataset.
Figure 10. shows a test image and the VGG 16 predicted result on the UTK image Dataset and Mobile-Net predicted result on the CASIA Facial dataset.
Preprints 95410 g010

4. Discussion

From the results in our experiments, we obtained very impressive MAE and R2 , for the UTK Facial we obtained an MAE of 1.76 in years using VGG16 while for CASIA facial Dataset we obtained an MAE of 1.1 in years using Mobile-Net, which was the best in terms of performance. We also compare our work we some of the literatures using transfer learning models, in Table 2.
Authors Method Result Dataset(s)
Akhand et al (2020) Transfer-Learning CACD:5-year Grouping (85%). UTKFace,CACD and FGNet datasets
(ResNet 18,ResNet -34,ResNet-50,InceptionNet and Den seNet 10-Year Grouping (93%)
Regression
(MAE of5.17)
UTK Face:5-year Grouping (97%).
10-Year Grouping (99%)
Regression
(MAE of 9.19)
FG-Net:5-year Grouping (100%).
10-Year Grouping (100%)
Regression
(MAE of 2.64)
[19] fang et al,(2019) Trnasfer-Learning VGG 19 Adience:94% on classification. Regression 1.84 MAE Adience,CACD
CACD: 95% on classification. Regression 5.38 MAE
[20] Irhebhude et al,(2021) Principal Component Analysis and support Vector Machine 10-Year Grouping (95% and 96%)(4 classes) . Local Dataset and FG-Net
[21] Jiang et al (2018) Caffe DL framwork,CNN MAE:2.94 FG-NET and MORPH
[22]Ahmed & Viriri (2020) Transfer-Learning and Bayesian Optimization MAE) of 1.2 and 2.67 FERET and FG-NET
[23] Ahmed & Viriri (2020) CNN& Bayesian Optimization MAE of 2.88 and 1.3 and 3.01 MORPH, FG-NET and FERET
[24] Dagher & Barbara (2021) transfer learning (pre-trained CNNs, namely VGG, Res-Net, Google-Net, and Alex-Net) Googlenet (5-year Grouping 74%) FGNET and the MORPH
Googlenet (10-year Grouping 85%)
Googlenet (15-year Grouping 87%)
Googlenet (20-year Grouping 89%)
[25] Ito& Kawai(2018) transfer learning (AlexNet, VGG16, ResNet152, WideResNet-16-8) single task (STL) learning and multi task learning (MTL) WRN + STL MAE(7.3) IMDB
WRN+MTL MAE(7.2) Wide ResidualNe
[8] ResNet18 2.66 UTKFace
[8] ResNet34 2.64 FGNet
[8] Inceptionv3 5 Cross-Age-Celebrity-Dataset
[8] DenseNet 3.19 UTKFace
[8] ResNet50 3.94 FGNet
[9] GoogleNet 2.94 MORPH
[9] GoogleNet 2.97 FGNET
[10] CRCNN 3.74 MORPH
[10] CRCNN 4.13 FGNET
[11] RED+SVM 6.33
Our Work VGG19 2.22 CASIA African Facial Dataset
Our work ResNet 152 4.08 CASIA African Facial Dataset
Our work Mobile Net 1.10 CASIA African Facial Dataset
Our Work VGG16 1.76 CASIA African Facial Dataset
Our Work Resnet50 4.60 UTK Facial
Our Work Mobile net 2.01 UTK Facial
Our Work VGG16 1.75 UTK Facial
Our Work VGG19 2.22 UTK Facial

5. Conclusion

The study showcases the effectiveness of employing transfer learning with popular models such as VGG16, ResNet50, Mobile Net, and VGG19 for facial age prediction. Leveraging pre-trained models along with fine-tuning techniques enhances generalization and accuracy, even when dealing with limited data. The research paper concludes by highlighting potential applications and future avenues for further refining facial age prediction through transfer learning.
The obtained results demonstrate VGG16's regression Mean Absolute Error (MAE) of 1.76 on the UT Face dataset and a 1.1 MAE when utilizing Mobile Net on the CASIA facial dataset. Predicting age through deep learning represents a challenging yet essential task. This study leveraged transfer learning within deep Convolutional Neural Network (CNN) models to devise an efficient approach for age prediction based on facial photographs. Age can be represented either as an integer or grouped into age brackets; in this work, regression techniques using deep learning were employed.
On benchmark face image datasets such as UTK-Face and CASIA African Dataset, which had not been previously used for age prediction in the existing literature, our proposed method demonstrated noteworthy performance, surpassing other approaches. This method exhibits significant potential for extension to real-time age estimation by extracting facial images from live recordings, such as those from webcams or security cameras—a promising avenue for future research. It consistently performs well on currently available benchmark datasets.

Funding

This research received no external funding

Informed Consent Statement

informed consents were taken from all the human subjects in this work by the respective researchers who developed the dataset.

Data Availability Statement

data used in this work can be publicly accessed and downloaded from Kaggle using the link: https://www.kaggle.com/datasets/abhikjha/utk-face-cropped for UTK Face dataset and http://biometrics.idealtest.org/dbDetailForUser.do?id=6#/datasetDetail/24 for Africa CASIA Face.

Conflicts of Interest

the authors declare no conflict of interest.

References

  1. C. Alippi, S. Disabato, M. R.-2018 17th Acm/Ieee, And Undefined 2018, “Moving Convolutional Neural Networks To Embedded Systems: The Alexnet And Vgg-16 Case,” Ieeexplore.Ieee.Org, Accessed: May 07, 2023. [Online]. Available: Https://Ieeexplore.Ieee.Org/Abstract/Document/8480072/.
  2. S. Mascarenhas, M. A.-2021 I. C. On, And Undefined 2021, “A Comparison Between Vgg16, Vgg19 And Resnet50 Architecture Frameworks For Image Classification,” Ieeexplore.Ieee.Org, Accessed: May 07, 2023. [Online]. Available: Https://Ieeexplore.Ieee.Org/Abstract/Document/9687944/.
  3. D. Theckedath, R. S.-S. C. Science, And Undefined 2020, “Detecting Affect States Using Vgg16, Resnet50 And Se-Resnet50 Networks,” Springer, Accessed: May 07, 2023. [Online]. Available: Https://Link.Springer.Com/Article/10.1007/S42979-020-0114-9.
  4. A. Singh, N. Rai, P. Sharma, P. Nagrath, And R. Jain, “Age, Gender Prediction And Emotion Recognition Using Convolutional Neural Network.” [Online]. Available: Https://Ssrn.Com/Abstract=3833759.
  5. Z. A. S Karen, “Very Deep Convolutional Networks For Large-Scale Image Recognition,” Inf Softw Technol, Vol. 51, Pp. 769–784, 2015.
  6. D. Sinha And M. El-Sharkawy, “Thin Mobilenet: An Enhanced Mobilenet Architecture,” 2019 Ieee 10th Annual Ubiquitous Computing, Electronics And Mobile Communication Conference, Uemcon 2019, Pp. 0280–0285, Oct. 2019. [CrossRef]
  7. J. Mahadeokar And G. Pesavento, “Open Sourcing A Deep Learning Solution For Detecting Nsfw Images,” Yahoo Engineering, Vol. 24, 2016.
  8. M. A. H. Akhand, Md. Ijaj Sayim, S. Roy, And N. Siddique, “Human Age Prediction From Facial Image Using Transfer Learning In Deep Convolutional Neural Networks,” Pp. 217–229, 2020. [CrossRef]
  9. I. Dagher And D. Barbara, “Facial Age Estimation Using Pre-Trained Cnn And Transfer Learning,” Multimed Tools Appl, Vol. 80, No. 13, Pp. 20369–20380, May 2021. [CrossRef]
  10. F. S. Abousaleh, T. Lim, W. H. Cheng, N. H. Yu, M. A. Hossain, And M. F. Alhamid, “A Novel Comparative Deep Learning Framework For Facial Age Estimation,” Eurasip J Image Video Process, Vol. 2016, No. 1, 2016. [CrossRef]
  11. K. Y. Chang, C. S. Chen, And Y. P. Hung, “A Ranking Approach For Human Age Estimation Based On Face Images,” In Proceedings - International Conference On Pattern Recognition, 2010. [CrossRef]
  12. G. George, S. . Adeshina, And M. M. Boukar, “Development Of Android Application For Facial Age Group Classification Using Tensorflow Lite”, Int J Intell Syst Appl Eng, Vol. 11, No. 4, Pp. 11–17, Sep. 2023.
  13. G. George, S. Adeshina, And M. M. Boukar, “Age Estimation From Facial Images Using Custom Convolutional Neural Network (Cnn)”, Icfar, Vol. 1, Pp. 134–137, Feb. 2023.
  14. K. Mohammed And G. George, “Identification And Mitigation Of Bias Using Explainable Artificial Intelligence (Xai) For Brain Stroke Prediction”, Ojps, Vol. 4, No. 1, Pp. 19-33, Apr. 2023.
  15. G. George And C. Uppin, “A Proactive Approach To Network Forensics Intrusion (Denial Of Service Flood Attack) Using Dynamic Features, Selection And Convolution Neural Network”, Ojps, Vol. 2, No. 2, Pp. 01-09, Aug. 2021.
  16. F. Peter, G. George, K. Mohammed, And U. B. Abubakar, “Evaluation Of Classification Algorithms On Locky Ransomware Using Weka Tool”, Ojps, Vol. 3, No. 2, Pp. 23-34, Sep. 2022.
  17. Chandrashekhar Uppin, Gilbert George, "Analysis Of Android Malware Using Data Replication Features Extracted By Machine Learning Tools", International Journal Of Scientific Research In Computer Science, Engineering And Information Technology (Ijsrcseit), Issn : 2456-3307, Volume 5, Issue 5, Pp.193-201, September-October-2019. [CrossRef]
  18. Journal Url : Https://Ijsrcseit.Com/Cseit195532.
  19. Olufemiajasa, “Alleged Underage Voters: How S’west Lost Voting Strength To N’west,” Vanguard News, Https://Www.Vanguardngr.Com/2023/02/Alleged-Underage-Voters-How-Swest-Lost-Voting-Strength-To-Nwest/ (Accessed Nov. 22, 2023).
  20. J. Fang, Y. Yuan, X. Lu, and Y. Feng, “Muti-stage learning for gender and age prediction,” Neurocomputing, vol. 334, pp. 114–124, Mar. 2019. [CrossRef]
  21. M. Irhebhude, A. K.-… C. S. Journal, and undefined 2021, “Northern Nigeria Human Age Estimation From Facial Images Using Rotation Invariant Local Binary Pattern Features with Principal Component Analysis.,” ecsjournal.orgME Irhebhude, AO Kolawole, F AbdullahiEgyptian Computer Science Journal, 2021•ecsjournal.org, Accessed: Aug. 06, 2023. [Online]. Available: http://ecsjournal.org/Archive/Volume45/Issue1/2.pdf.
  22. F. Jiang, Y. Zhang, and G. Yang, “Facial Age Estimation Method Based on Fusion Classification and Regression Model,” in MATEC Web of Conferences, EDP Sciences, Nov. 2018. [CrossRef]
  23. M. Ahmed and S. Viriri, “Deep learning using bayesian optimization for facial age estimation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019. [CrossRef]
  24. M. Ahmed and S. Viriri, “Facial Age Estimation using Transfer Learning and Bayesian Optimization based on Gender Information,” Signal Image Process, vol. 11, no. 6, 2020. [CrossRef]
  25. I. Dagher and D. Barbara, “Facial age estimation using pre-trained CNN and transfer learning,” Multimed Tools Appl, vol. 80, no. 13, pp. 20369–20380, May 2021. [CrossRef]
  26. K. Ito, H. Kawai, T. Okano, T. A.-2018 A.-P. S. and, and undefined 2018, “Age and gender prediction from face images using convolutional neural network,” ieeexplore.ieee.orgK Ito, H Kawai, T Okano, T Aoki2018 Asia-Pacific Signal and Information Processing Association, 2018•ieeexplore.ieee.org, Accessed: Jul. 22, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8659655/.
  27. J. Muhammad, Y. Wang, C. Wang, K. Zhang and Z. Sun, "CASIA-Face-Africa: A Large-Scale African Face Image Database," in IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3634-3646, 2021. [CrossRef]
Figure 6. shows the initial transfer learning model without fine tuning.
Figure 6. shows the initial transfer learning model without fine tuning.
Preprints 95410 g006
Table 1. Regression results obtained from experiment using the UTK and CASIA facial dataset.
Table 1. Regression results obtained from experiment using the UTK and CASIA facial dataset.
Method MAE R2 Dataset
Resnet50 4.60 0.38 UTK
Mobile net 2.01 0.80 UTK
VGG16 1.75 0.85 UTK
VGG19 2.22 0.80 UTK
VGG19 1.34 0.19 CASIA
ResNet 50 3.19 0.28 CASIA
Mobile Net 1.10 0.88 CASIA
VGG16 1.21 0.85 CASIA
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated