For facial age prediction, a dataset comprising facial images with corresponding age labels is required for that we selected the UTK and Casia African facial dataset which is openly available from the following links
The dataset can be preprocessed by applying standard techniques such as face detection, alignment, and normalization. Data augmentation techniques like rotation, scaling, and flipping was also employed to enhance the diversity of the training set.
2.1.1. UTK facial dataset
One of the most used datasets created for age determination from facial photos is the UTK image benchmark. The dataset is one of the largest-scale facial dataset with a wide age range (from 0 to 116 years old), it has over 20,000 facial photos with annotations for age, gender, and ethnicity. There is a wide range of poses, facial expressions, lighting, occlusion and resolution. Several tasks, including face detection, age estimation, age progression and regression, and landmark localization could be performed using this dataset. We used pictures from ages 5-30, each consisting of 450 images which made a total of 11400 images.
Figure 1.
image showing example of facial images in the UTK dataset.
Figure 1.
image showing example of facial images in the UTK dataset.
Figure 2.
image showing the distribution of from ages 5-30 from the UTK Facial dataset used for the experiments.
Figure 2.
image showing the distribution of from ages 5-30 from the UTK Facial dataset used for the experiments.
2.1.2. CASIA African facial dataset
The images in the database were taken in Kaduna [
27], Nigeria, which is an African country. Approximately 1150 individuals took part in the practice of capture. The dataset has different ethnic Nigeria tribes, for this experiment, we made use of the Hausa ethnic group. The images in the database comprise a total of 38,546 images from 1,183 subjects. We made use of a total of 10921 facial images distributed to ages 10-30, table 2 shows the age distribution of the dataset.
Figure 3.
images showing example of facial images in the CASIA African dataset.
Figure 3.
images showing example of facial images in the CASIA African dataset.
Figure 4.
image showing the distribution of from ages 5-30 from the WIKI Facial dataset used for the experiments.
Figure 4.
image showing the distribution of from ages 5-30 from the WIKI Facial dataset used for the experiments.
The input image was rescaled to 120 X 120 pixels and we used the optimization function Adam, and a dropout of the fully connected layer of 0.2 which was to regularize the network while training. We downloaded the various deep learning models from the TensorFlow keras library, and then we froze the convolutional layers and added our fully connected layers for the prediction to take place, we used three dense layers which had 50, 20 and 1 outputs hence for prediction. We then specified the activation function at the last dense layer which is linear for the experiments. We also used the early stopping function to stop training once there are no improvements in accuracy, the RGB image is passed to the convolution area for feature extraction (pre-trained) and then passed to the fully connected layer which is then passed to the output layer where the probability of the output layer is calculated. We used a total of 11700 images which had 450 images in each age label i.e., 5-30 for the UTK dataset, we used a total of 10921 images from the CASIA dataset [
27]. we then used the Python function” train_test_split()” to split the X images and labels into training and test set, using the ration of 70.30 for facial datasets.
The transfer learning workflow involves the following steps.
a. Preparing the dataset. Splitting the dataset into training, validation, and testing sets.
b. Pre-training. Utilizing the pre-trained (i.e. VGG16) model, which is
trained on a large-scale image classification task such as ImageNet, to extract features from facial images.
c. Fine-tuning. Modifying the last few layers of the pre-trained model or adding new fully connected layers to adapt the model for age prediction. These layers are initialized. Randomly and trained on the specific age prediction task as seen in fig6.
d. Training and Evaluation. Training the modified model using the training set and evaluating its performance on the validation set. Iterative optimization techniques such as gradient descent and backpropagation are applied.
e. Testing. Assessing the final model's accuracy by evaluating it on the testing set.
Metrics such as mean absolute error (MAE) and mean squared error (MSE) can be used to quantify the prediction accuracy. In this work, we utilize the MAE as our evaluation metric.
Figure 5.
shows the transfer learning concept.
Figure 5.
shows the transfer learning concept.
In a regression model, R-Squared (also known as R
2 or the coefficient of determination) is a statistical metric that quantifies how much of the variance in the dependent variable can be accounted for by the independent variable. R-squared, or the "goodness of fit," measures how well the data match the regression model. A greater r-squared shows that the model can explain more variability. R-Squared ranges from 0-1
where SSt represents the total sum of squares and SSr represents the residual sum of squares. We would be using both MAE and R2 to determine how well the models performed. The suggested age prediction system using deep CNNs and transfer learning is shown in Fig 7. One of the deep learning models would be attached to the base model with fully connected layers and an output. in Fig.7, a deep CNN model is initially imported with its pre-trained weights. The architectures were developed for the ImageNet object classification problem, so each deep CNN has 1000 units in its last layer. Hence, we remove its fully connected layers and added our custom fully connected layer with one output for age prediction using an activation function of either ReLu or Linear.
Using Jupiter Notebook, we put the suggested strategy into practice. Intel 12 gen I-core 7 with 10 cores has been used for training the networks. For regression, the learning rate was set to 0.001 and the momentum to 0.9. We used mean absolute error (MAE) for regression in the performance metric. Human age exhibits some coherence, and facial shape can vary slightly but not enough for the human eye to notice. For instance, the age patterns of people who are 21 and 23 years old may be similar, and in some instances, it may be difficult for a human to tell them apart. On the other hand, for regression, we minimized MAE.
The input image was rescaled to 224 X 224 pixels and we used the optimization function Adam, we used a dropout rate of 0.2 for convolutional layer and a dropout of the fully connected layer of 0.2 which was to regularize the network while training. We also used the early stopping function to stop training once there is no improvements in accuracy, the ‘rgb’ image is passed to the convolution area for feature extraction and then passed to the fully connected layer which is then passed to output layer where the probability of the output layer is calculated. We used a total of 11700 images which had 450 images in each age label i.e. 5-30 for the UTK-face dataset and the Casia dataset we had an unbalanced dataset due to the fact that some of the ages were in small amount in the dataset , we then used the python function train_test_split() to split the X images and labels into training and test set, using the ration 70:30 and the last layer had 1 output for the age value using linear activation function.
Figure 6, a deep CNN model is initially imported with its pre-trained weights. Ever since the architectures were developed for the ImageNet object classification problem, Deep CNN has had 1000 units in its last layer.
The next step is to freeze it convolutional layers and three more fully connected layers added. The first FC layer has 1254450 units, and the second layer has 1024 units and the last fully connected layer has 1 unit for the number of classes, which in our case is 1 due to the regression problem we are dealing with. After training the final three layers in (b), the entire design is fine-tuned.
Figure 7.
shows the transfer learning model with fine tuning.
Figure 7.
shows the transfer learning model with fine tuning.
Our work Has examined the use of transfer learning approach on age prediction from facial images, this shows that with the aid of transfer learning we can build ready-to-go models which can be adopted in the real world specifically in Nigeria, for process such as under age detection during election and other access control functions.