TY - JOUR
T1 - Enhancing Audio Classification Through MFCC Feature Extraction and Data Augmentation with CNN and RNN Models
AU - Rezaul, Karim Mohammed
AU - Jewel, Md.
AU - Islam, Md Shabiul
AU - Siddiquee, Kazy Noor e Alam
AU - Barua, Nick
AU - Rahman, Muhammad Azizur
AU - Shan-A-Khuda, Mohammad
AU - Sulaiman, Rejwan Bin
AU - Shaikh, Md Sadeque Imam
AU - Hamim, Md Abrar
AU - Tanmoy, F.M
AU - Haque, Afraz Ul
AU - Nipun, Musarrat Saberin
AU - Dorudian, Navid
AU - Kareem, Amer
AU - Farid, Ahmmed Khondokar
AU - Mubarak, Asma
AU - Jannat, Tajnuva
AU - Asha, Umme Fatema Tuj
PY - 2024/7
Y1 - 2024/7
N2 - Sound classification is a multifaceted task that necessitates the gathering and processing of vast quantities of data, as well as the construction of machine learning models that can accurately distinguish between various sounds. In our project, we implemented a novel methodology for classifying both musical instruments and environmental sounds, utilizing convolutional and recurrent neural networks. We used the Mel Frequency Cepstral Coefficient (MFCC) method to extract features from audio, which emulates the human auditory system and produces highly distinct features. Knowing how important data processing is, we implemented distinctive approaches, including a range of data augmentation and cleaning techniques, to achieve an optimized solution. The outcomes were noteworthy, as both the convolutional and recurrent neural network models achieved a commendable level of accuracy. As machine learning and deep learning continue to revolutionize image classification, it is high time to explore the development of adaptable models for audio classification. Despite the challenges associated with a small dataset, we successfully crafted our models using convolutional and recurrent neural networks. Overall, our strategy for sound classification bears significant implications for diverse domains, encompassing speech recognition, music production, and healthcare. We hold the belief that with further research and progress, our work can pave the way for breakthroughs in audio data classification and analysis.
AB - Sound classification is a multifaceted task that necessitates the gathering and processing of vast quantities of data, as well as the construction of machine learning models that can accurately distinguish between various sounds. In our project, we implemented a novel methodology for classifying both musical instruments and environmental sounds, utilizing convolutional and recurrent neural networks. We used the Mel Frequency Cepstral Coefficient (MFCC) method to extract features from audio, which emulates the human auditory system and produces highly distinct features. Knowing how important data processing is, we implemented distinctive approaches, including a range of data augmentation and cleaning techniques, to achieve an optimized solution. The outcomes were noteworthy, as both the convolutional and recurrent neural network models achieved a commendable level of accuracy. As machine learning and deep learning continue to revolutionize image classification, it is high time to explore the development of adaptable models for audio classification. Despite the challenges associated with a small dataset, we successfully crafted our models using convolutional and recurrent neural networks. Overall, our strategy for sound classification bears significant implications for diverse domains, encompassing speech recognition, music production, and healthcare. We hold the belief that with further research and progress, our work can pave the way for breakthroughs in audio data classification and analysis.
U2 - 10.14569/IJACSA.2024.0150704
DO - 10.14569/IJACSA.2024.0150704
M3 - Article
SN - 2158-107X
VL - 15
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 7
ER -