AN CRITICAL ANALYSIS OF SPEECH RECOGNITION OF TAMIL AND MALAY LANGUAGE THROUGH ARTIFICIAL NEURAL NETWORK

Human and Computer interaction has been a part of our day-to-day life. Speech is one of the essential and comfortable ways of interacting through devices as well as a human being. The device, particularly smartphones have multiple sensors in camera and microphone, etc. speech recognition is the process of converting the acoustic signal to a smartphone as a set of words. The efficient performance of the speech recognition system highly enhances the interaction between humans and machines by making the latter more receptive to user needs. The recognized words can be applied for many applications such as Commands & Control, Data entry, and Document preparation. This research paper highlights speech recognition through ANN (Artificial Neural Network). Also, a hybrid model is proposed for audio-visual speech recognition of the Tamil and Malay language through SOM (Self-organizing map0 and MLP (Multilayer Perceptron). The Effectiveness of the different models of NN (Neural Network) utilized in speech recognition will be examined.


Introduction
Speech recognition (SR) is considered as the process used for converting the acoustic signal which was captured using a telephone or microphone to a set of words. The words recognized using this process can be further used fr data entry, Document preparation, and as commands. The speech recognization can be categorized based on the parameters they The phonetic variables can be demonstrated by acoustic differences using certain words they are true, butter, it, into.
The proper SR model will 1 st try to model the source of the variable in many ways. In signal representation, many researchers have developed speaker independent features for signaling, and to analyze the speaker-dependent characters. At an acoustic-phonetic level, the variability f the speaker will be modeled through a statistical method applied in the large data model. The word-level variable will allow alternate pronunciation of words through a pronunciation network.
The HMM (Hidden Markov Model) is the predominant model used in SR for the past 15 years. In HMM the generation of the frame-by-frame word, the surface acoustic realization is the 2 represented proves in probabilistically as Markov process.
NN (neural network) is the alternate process used for estimation frame-by-frame word score after obtaining the source they are combined with HMM model this process is also referred to us as a hybrid model. (Nilsson & Ejnarsson, 2002) One of the important aspects of SR is to assist people with fractional disability. This would help them in their daily activity. By speech, they could control all electronic appliances such as fans, lights, machines, etc which they use for their domestic purpose. The architecture of the audiovisual speech recognition engine is shown in Figure 1. .

Review of the approached method
ANN is a powerful computational device that can be used for massive parallelism which makes the system a very effective one. This model can learn and generalize for the given training data. It is very tolerant to a fault. Also adapts itself for noise. They can perform any kind of operation such as logic as well as symbolic. There are many types of ANN.
Most of the ANN operators with a neural device with connected neuron are the Self Organizing Map (SOM), Multi-Layer Perceptron (MLP), and the Hopefield network. The main motive of this ANN network is to analyze the link between the input and output pattern. This process is achieved by modifying the link weight between the units.

Overview of Tamil Language
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 5 February 2021 doi:10.20944/preprints202102.0156.v1 The Tamil language is one of the members of the Dravidian language which is predominantly spoken in the southern part of India. It is the official language of state Tamil Nadu and Puducherry located in the southern part. The Tamil Language is also considered the official language for countries like Singapore and Srilanka. Also, many Tamil

Overview of Malay Language
The Malay language is one of the members of the Austronesian language family. This language is largely spoken by 33M people of Sumatra, Borneo, and the Malay Peninsula. It is widely spoken by people of Malaysia and Indonesia.
The Malay language shows major resemble Sumatra but it is related to other Austronesian languages such as Java,

Breif description of phonemes of Tamil and Malay Languages :
The basic theoretical unit for describing how to bring linguistic meaning to the formed speech, in the mind, is called phonemes

PHONEMES OF Tamil LANGUAGE:
The principal Tamil phonemes can be represented in the chart form as given below:

Phonemes of Malay language
The principal Malay phonemes can be represented in the chart form as given below: Reliable and accurate phoneme segmentation is the common factor in the requirement of the application to satisfy the needs. Many methods have been used for phoneme segmentation but some of them showed better performance because of having some phonetic knowledge. But certain methods based on rules and regulation is very difficult to optimize the performance. The performance of the segmentation degrades in real-time application. To face the disadvantage NN based approach must be implemented with the Conventional Rule-based method. This method will give a significant performance under the presence of disturbance and noise.
The MLP in phoneme segmentation has 1 hidden layer and 1 output layer. They also have 72 feature parameters for 4 consecutive frames and this is served as an output dataset. The output layer has 1 one and it has the right to decide on the 2nd frame whether it is a phoneme boundary or not. The hidden and output layer is used as the activation function.
The number of nodes in the hidden layer will be changed according to the performance of the experiment.  to the network from input to output nodes of the weight vector. The weight vector will be arranged based on the topology order of nodes present in the network.
[17] SOM model stores the topology order in the original space. The main motive of this model is to utilize the output of the self-organized map with speech processing output block which will reduce the feature vector through which the original behavior of the feature vector will be obtained and preserved.
Now the accurate number of neurons for the SOM model will be obtained. The obtained optimal size of SOM will ow ensure the Self-organized map with a sufficient number of neurons.

SOM ARCHITECTURE:
The

Experimentation for speech recognition
In this model, the feature vector of phoneme segmentation is obtained from 5 input through the consecutive frame.
The difference in the inter-frame is found between the 2 consecutive frames, and 4 inter-frame is placed in a range between 40 to 5. For every 5 consecutive frames, the interface difference is obtained and out of 4 interframes, one interframe contains 18 elements. These 72 element acts as the input for the MLP phoneme segmenter. In this present research for about 12 hidden nodes in MLP has been used. the MLP model obtained an accuracy of 65 percent for 10 m/sec and 83 percent for 20 m/sec duration. The feature vector for phoneme will be isolated. The features obtained from the MLP will now undergo a time wrapping mechanism which is utilized to make them equal in length and then later passed for the Kohonen SOM model with 6 clusters. for this model has been taken from the speech input pattern and it will be saved like an array. The input array is normalized for making the network work efficiently. The weights will range between +1 and saved into the array weight based on the Dimension of the SOM model. The training process will be carried out until it reaches the maximum epoch. The input vector from the input array will be selected on a random basis for learning and functioning to determine the winner node which is placed in the closest distance when compared to all other input nodes. Again the weight vector of the node placed in the closest distance will be made as to the new winner. This process will continue until it reaches the maximum epoch.

Conclusion
The neural network is a promising technique for speech recognition. The research directions in this area are fairly diverse and almost none of the existing approaches outstandingly dominate over the others. Indeed, speech recognition is well known for being a complex pattern recognition problem that can usually be divided into several sub-problems. Neural Network Approach is the most highly relied on one. Besides, some fundamentals of Neural Network is reviewed, based on the topology and type of learning. It is expected that the present study has contributed towards the development of the recognition of Tamil and Malay words by using neural networks. The proposed model combines two neural networks namely SOM and MLP. The evaluation of the performance of the proposed model is made through its recognition accuracy.