The mandibular cortical index (MCI) is a valuable screening tool for osteoporosis on dental panoramic radiographs; however, inter-examiner variability remains a significant challenge. This study aimed to evaluate the diagnostic performance and reproducibility of a closed-type generative AI (NotebookLM, Google) compared with eight dentists of varying experience levels. One hundred radiographs were evaluated in two sessions with an interval of at least two weeks. The intra-examiner reliability for the AI was exceptionally high (κ = 0.987), and its processing speed was approximately six times faster than that of the dentists. However, the agreement between the AI and the dentists remained at "slight agreement" or lower (κ < 0.2), statistically rejecting the null hypothesis of diagnostic equivalence. Notably, a "two-level discrepancy" was observed, where the AI interchanged Class 1 (normal) and Class 3 (severe) in over 10% of cases. In contrast, dentists demonstrated a significant learning effect, with inter-examiner agreement improving between sessions. These results suggest that while generative AI offers superior speed and reproducibility, its current decision-making logic deviates fundamentally from human expert criteria. Future integration should focus on hybrid models where AI serves as a standardized feedback tool while dentists provide final confirmatory diagnoses.