In the literature, the question of the amount of data necessary and sufficient to validate different models of the occurrence of risk of adverse events for patients or the classification of the presence or absence of pathological features has been repeatedly raised. In the presented study, we propose a new approach to determine the necessary and enough studies for validation of medical software based on artificial intelligence technology, whose main task is to classify medical X-rays according to the presence of normality and pathology. It is shown that for several studies in a dataset, when AUC ROC has maximum heterogeneity, it varies depending on the balance of "norm"/"pathology" classes. Thus, for a balance of "normal"/"abnormality", where 90% is "normal" and 10% is "abnormality", maximum heterogeneity is achieved for 190 studies, for a balance of 80% ("normal")/20% ("abnormality") for 80 studies, for a balance of 70% ("norm")/30% ("abnormality") - 120 trials, for a balance of 60% ("norm")/40% ("abnormality") the maximum heterogeneity is reached at 110 trials, and for a balance of 50% ("norm")/50% ("abnormality") - at 70 trials. The obtained data are in good agreement with the previous results. They allow us to determine a sufficient (necessary) number of studies in the dataset to perform an unbiased assessment of AUC ROC.