Coconut is an important multipurpose crop wherein different stages of its maturity are used for different products. Traditional approach of its classification use sound formed by tapping the fruit, and the tapper will base its judgment through the sound he heard. The procedure was highly subjective and attempts had been made by other researchers to automate and classify the adulthood of the coconut objectively. Nowadays, deep learning techniques are being utilized to solve classification problems. One such method is Convolutional Neural Network (CNN) that takes raw pixel data, learns to extract features, and ultimately classifies the input. In this study, a portable device was developed that takes audio signals generated from tapping the fruit mechanically. This audio clip was converted to spectrogram and was used as input to a LeNet CNN architecture. We argue that CNN classifier has improved accuracy compared to the non-deep learning system. Our evaluation confirmed that the use of CNN had improved the accuracy of classifying coconut by about 15%.