The given research paper describes a CNN model of classifying images belonging to more than two classes on the Fashion-MNIST data. The model performed a test accuracy of 92.44% and test loss of 0.2533 the greatest accuracy as compared to similar studies with similar architectures. The architecture has three convolutional-pooling blocks, a dense layer with dropout regularization (0.3), and a softmax output layer. The analysis of training and validation curves demonstrates mild overfitting of the later epochs, and the validation loss starts growing even though the training loss continues to decrease. In-depth analysis using confusion matrix and classification report identifies certain patterns of misclassification between visually similar categories. The paper also discusses implications on batch normalization, data augmentation as well as Vision Transformer architecture.