With applications in psychology, security, and human-computer interaction, facial expression recognition (FER) has become an essential tool for non-verbal communication. Current research often categorizes expressions into micro and macro types, yet existing datasets suffer inconsistent labelling for classes, limited diversity of the databases, and insufficient scale for the currently available datasets. To address these gaps, this work proposes a novel framework combining the Diffusion model with pre-trained CNNs. Leveraging original images from established datasets, CASME Ⅱ, we generate synthetic facial expressions to augment training data, mitigating bias and inconsistency. The synthetic dataset is evaluated using ResNet 50, VGG16 and Inception V3 architectures. Inception V3 trained on the proposed AI-generated dataset and tested using CASME Ⅱ achieved the highest accuracy of 99.48%. VGG-16 with data augmentation applied was trained on CASME Ⅱ and tested on the proposed AI-generated dataset achieved 99.54%. While 30% freezing layers method is utilized, Inception V3 trained on the proposed AI-generated dataset and tested using CASME Ⅱ obtained an accuracy of 99.53%. The data augmentation and freezing layers approaches have significantly improved the performance of the models. Our proposed approaches have achieved state-of-the-art performance and outperformed most of the existing state-of-the-art approaches benchmarked in this study.