Prioritizing Data Quality Governance for AI in Prostate Cancer: A Methodological Proof-of-Concept Using Neural Networks for Risk Stratification

Vanessa Talavera-Cobo; Jose Enrique Robles-Garcia; Francisco Guillen-Grima; Andres Calva-Lopez; Mario Tapia-Tapia; Luis Labairu-Huerta; Francisco Javier Ancizu-Marckert; Laura Guillen-Aguinaga; Daniel Sanchez-Zalabardo; Bernardino Miñana-Lopez

doi:10.20944/preprints202603.1207.v1

Submitted:

15 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract

Background: An accurate D’Amico risk stratification is mandatory for prostate cancer (PCa) management. The purpose of this proof-of-concept study was to establish a methodological framework of integrating validated clinical nomograms with strict data-quality governance in order to generate reliable artificial neural networks (ANN), even when the sample is small. Methods: We performed a retrospective analysis of a curated cohort of 49 patients from one center. A multilayer perceptron (MLP) was trained using 11 variables, including the ISUP biopsy grade and Briganti nomogram. Model development was guided by a proactive data-quality protocol based on FAIR principles, with stringent checks for accuracy, consistency and validity to ensure data were “AI-ready”. A sensitivity analysis was conducted on three data partitioning scenarios (20/80, 34/66 and 39/61). Results: From a starting pool of 76 patients, the FAIR-based data governance architecture was applied to create a highly selected cohort of 49 patients. A multilayer perceptron (MLP) trained on this “AI-ready” dataset achieved a mathematically perfect but clinically uninterpretable discrimination (AUC 1.000) for High vs. Intermediate risk groups on a small internal test set (N=9 for the 20/80 split). However, this complete accuracy is a best-case scenario reflecting the high data quality, not proof of generalizable clinical utility, as the large confidence interval (66.4-100%) and the requirement to exclude instances with unusual attributes for model validation (as described in the methods) highlight. Conclusions: The main contribution of this proof-of-concept study is the effective illustration of a strict, repeatable data governance approach for producing “AI-ready” urological datasets. Although the MLP demonstrated a robust internal signal for risk discrimination, its flawless accuracy is an ideal, non-generalizable situation. The most important deliverable that needs external validation is the framework, not the model’s performance metrics.

Keywords:

prostate cancer

;

artificial neural network

;

D’Amico risk stratification

;

multilayer perceptron

;

ISUP grade

;

Briganti nomogram

;

data quality governance

;

FAIR principles

;

AI-readiness

;

reproducibility

;

proof-of-concept

Subject:

Medicine and Pharmacology - Urology and Nephrology

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Prioritizing Data Quality Governance for AI in Prostate Cancer: A Methodological Proof-of-Concept Using Neural Networks for Risk Stratification

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe