Word Sense Disambiguation (WSD) stands as a persistent challenge within the Natural Language Processing (NLP) community. While various NLP packages exist, the Lesk algorithm in the NLTK library, a widely recognized tool, demonstrates suboptimal accuracy. Conversely, the application of deep neural networks offers heightened classification precision, yet their practical utility is constrained by demanding memory requirements. This research paper introduces an innovative method addressing WSD challenges by optimizing memory usage without compromising state-of-the-art accuracy. The presented methodology facilitates the development of WSD system that seamlessly integrates into NLP tasks, resembling the functionality offered by the NLTK library. Furthermore, this paper advocates treating the BERT language model as a gold standard, proposing modifications to manually annotated datasets and semantic dictionaries such as WordNet to enhance WSD accuracy. The empirical validation through a series of experiments establishes the effectiveness of the proposed method, achieving state-of-the-art performance across multiple WSD datasets. This contribution represents advancement in mitigating the challenges associated with WSD, offering a practical solution for integration into NLP applications.