Biometric identification has become a key element in modern security and surveillance applications; however, traditional systems based on a single biometric trait often suffer from noise, distortions, and vulnerability to manipulation, which limits their reliability in real-world environments. To overcome these challenges, this study proposes a multi-pattern biometric identification model that integrates facial features and hand gesture information extracted from video data for remote identity verification. The proposed system captures real-time video of an individual approaching a sensor, selects relevant frames, and applies advanced feature extraction techniques to both facial and hand modalities, which are then fused during the evaluation stage. Identity classification is performed using a time-delay neural network (TDNN), and the model is evaluated on diverse multimedia datasets containing static facial images and dynamic hand gestures, including American Sign Language samples. Experimental results demonstrate that the multimodal approach significantly outperforms single-modal systems, achieving an accuracy of 0.98, recall of 0.98, and F1 score of 0.97, compared to lower performance when using facial or hand features independently. These findings indicate that combining multiple biometric traits enhances robustness, reduces ambiguity, and improves recognition accuracy, making the proposed approach suitable for practical biometric verification scenarios under varying environmental conditions.