Generative artificial intelligence tools are flourishing across various sciences, particularly in the educational domain. ChatGPT, with its versions 3.5 and 4.0, stands out for its multilingual capabilities, catering to both students and educators. This research aimed to evaluate the performance differences between these versions, and importantly, to compare the accuracy of ChatGPT's responses in English versus Arabic. In our study, we presented ChatGPT with 39 chemistry problems related to 6th-7th grade curriculum, including twenty-five open questions and fourteen multiple-choice questions. Each response was categorized as accurate, partially accurate, or inaccurate. Our analysis, focused on trends across versions and languages, revealed significant enhancements in version 4.0, particularly in its ability to process Arabic. However, despite these improvements, our findings indicate that responses in English consistently outperformed those in Arabic in terms of accuracy. In light of this discrepancy, we suggest that ChatGPT either incorporates additional Arabic databases into its training regimen or develops a code that adopts the following strategy to ensure more accurate responses in science education: questions should first be translated from Arabic to English, addressed, and then retranslated into Arabic before being presented to the user. This approach takes advantage of the greater accuracy of responses in English to benefit Arabic-speaking users, and it may also improve outcomes for speakers of other languages who experience less precise answers in the fields of chemistry or science education.