The adoption of Machine Learning (ML) for building emulators for complex physical processes has seen an exponential rise in the recent years. While neural networks are good function approximators, optimizing the hyper-parameters of the network to reach a global minimum is not trivial, and often needs human knowl- edge and expertise. In this light, automatic ML or autoML methods have gained large interest as they automate the process of network hyper-parameter tuning. In addition, Neural Architecture Search (NAS) has shown promising outcomes for improving model performance. While autoML methods have grown in popularity for image, text and other applications, their effectiveness for high-dimensional, complex scientific datasets remains to be investigated. In this work, a data driven emulator for turbulence closure terms in the context of Large Eddy Simulation (LES) models is trained using Artificial Neural Networks and an autoML frame- work based on Bayesian Optimization, incorporating priors to jointly optimize the hyper-parameters as well as conduct a full neural network architecture search to converge to a global minima, is proposed. Additionally, we compare the effect of using different network weight initializations and optimizers such as ADAM, SGDM and RMSProp, to explore the best performing setting. Weight and function space similarities during the optimization trajectory are investigated, and critical differences in the learning process evolution are noted and compared to theory. We observe ADAM optimizer and Glorot initialization consistently performs better, while RMSProp outperforms SGDM as the latter appears to have been stuck at a local minima. Therefore, this autoML BayesOpt framework provides a means to choose the best hyper-parameter settings for a given dataset.