The Greater Caribbean manatee is classified as vulnerable, yet the lack of data related to population status in the Costa Rican Caribbean severely hinders conservation policy due to limited ecological knowledge. This study aims to address this challenge by refining a pipeline for the automated manatee count method to enhance classification robustness and efficiency for accurate spatial and temporal density estimation. The bioacoustics analysis consists of a deep learning manatee call detector and an unsupervised individual manatee counting. Methodologically, we implemented an offline feature extraction strategy to avoid a substantial initial computational bottleneck, measured at almost 13h, required to convert 43,031 audio samples into labeled images. To mitigate the high risk of overfitting associated with class imbalance, common in bioacoustic databases, a bootstrapping method was applied post-data splitting, generating a labeled dataset of 100,000 spectrograms. Transfer learning with the VGG-16 architecture yielded superior results, achieving a robust mean 10-fold cross-validation accuracy of 98.94% (±0.10%) and normalized F1-scores of 0.99. Furthermore, this optimized fine-tuning was rapidly executed in just 22min and 36s. Subsequently, the unsupervised individual manatee counting utilized k-means clustering on the top three music information retrieval descriptors along with dimensionality reduction, successfully segregating detected calls into three acoustically distinct clusters, likely representing three individuals. This performance was validated by a silhouette coefficient of 79.03%. These validated results confirm the refined automatic manatee count method as a robust and scalable framework ready for deployment on Costa Rican passive acoustic monitoring data to generate crucial scientific evidence for species conservation.