Analisis Fitur dan Convolutional Neural Network pada Pengenalan Aksen Ucapan

Dwi Sari Widyowaty


Each country has its characteristics and culture, one of these characteristics is the accent of speech. By listening to someone's accent, we can identify the country of origin of the speaker. Research on accent recognition includes Automatic Speech Recognition (ASR) Technology which is currently being developed, an example of ASR technology, namely Virtual Assistant, the development of this research can be more intelligent Virtual Assistant because it can provide an accent from a speaker. In this study, the authors tried to classify accents from various countries (5 classes), namely English, Spanish, Mandarin, French and Arabic. The dataset used in this study consists of English 627 audio, Spanish 220 audio, Mandarin 132 audio, French 80 audio, and Arabic 172 audio, where all sentences are the same sentence in English. In this study, the audio features used are Mel - Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Energy (in librosa it is called RMS). Audio feature extraction generates an array of each audio, the result of audio feature extraction will be the Convolutional Neural Network (CNN) Method input for classifying the accent. This research resulted in 51.30% accuracy for the MFCC feature, 48.05% for the ZCR feature, and 51.95% for the Energy feature. The Energy feature gets good accuracy, followed by the MFCC and ZCR features.


Accent Recognition; MFCC; Zero Crossing Rate; Energy; CNN.

Full Text:



B. D. Barkana and A. Patel, “Analysis of vowel production in Mandarin/Hindi/American- accented English for accent recognition systems,” Appl. Acoust., vol. 162, p. 107203, 2020, doi: 10.1016/j.apacoust.2019.107203.

A. G. Jondya and B. H. Iswanto, “Analisis dan Seleksi Fitur Audio pada Musik Tradisional Indonesia,” J. CoreIT J. Has. Penelit. Ilmu Komput. dan Teknol. Inf., vol. 4, no. 2, p. 77, 2018, doi: 10.24014/coreit.v4i2.6506.

G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio signal feature extraction methods,” Appl. Acoust., vol. 158, p. 107020, 2020, doi: 10.1016/j.apacoust.2019.107020.

K. Chionh, “Application of Convolutional Neural Networks in Accent Identification,” 2018.

L. Mak, A. Sheng, M. Wei, and X. Edmund, “Deep Learning Approach to Accent Classification,” pp. 1–6, 2017.

Y. Singh, A. Pillay, and E. Jembere, “Features of speech audio for deep learning accent recognition,” pp. 4–6, 2019.

Y. Singh, A. Pillay, and E. Jembere, “Features of speech audio for accent recognition,” 2020 Int. Conf. Artif. Intell. Big Data, Comput. Data Commun. Syst. icABCD 2020 - Proc., 2020, doi: 10.1109/icABCD49160.2020.9183893.

A. Patel and B. D. Barkana, “Analysis of American English corner vowels produced by Mandarin, Hindi, and American accented speakers and a baseline accent recognition system,” 2018 IEEE Long Isl. Syst. Appl. Technol. Conf. LISAT 2018, pp. 1–5, 2018, doi: 10.1109/LISAT.2018.8378031.

M. A. Imtiaz and G. Raja, “Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN,” Proc. - APMediaCast 2016, pp. 106–110, 2017, doi: 10.1109/APMediaCast.2016.7878163.

A. Winursito, R. Hidayat, and A. Bejo, “Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition,” 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, pp. 379–383, 2018, doi: 10.1109/ICOIACT.2018.8350748.

T. H. Zaw and N. War, “The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection,” 20th Int. Conf. Comput. Inf. Technol. ICCIT 2017, vol. 2018-January, pp. 1–5, 2018, doi: 10.1109/ICCITECHN.2017.8281794.

P. Raguraman, R. Mohan, and M. Vijayan, “LibROSA Based Assessment Tool for Music Information Retrieval Systems,” Proc. - 2nd Int. Conf. Multimed. Inf. Process. Retrieval, MIPR 2019, pp. 109–114, 2019, doi: 10.1109/MIPR.2019.00027.

G. Mason and University, “The Speech Accent Archive.” (accessed Sep. 30, 2020).

Y. S. HARIYANI, S. HADIYOSO, and T. S. SIADARI, “Deteksi Penyakit Covid-19 Berdasarkan Citra X-Ray Menggunakan Deep Residual Network,” ELKOMIKA J. Tek. Energi Elektr. Tek. Telekomun. Tek. Elektron., vol. 8, no. 2, p. 443, 2020, doi: 10.26760/elkomika.v8i2.443.

D. Lionel, R. Adipranata, and E. Setyati, “Klasifikasi Genre Musik Menggunakan Metode Deep Learning Convolutional Neural Network dan Mel- Spektrogram,” J. Infra Petra, vol. 7, no. 1, pp. 51–55, 2019, [Online]. Available:

K. Choi, D. Joo, and J. Kim, “Kapre: On-GPU audio preprocessing layers for a quick implementation of deep neural network models with KERAS,” arXiv, 2017.

K. Choi, G. Fazekas, M. Sandler, and K. Cho, “A comparison of audio signal preprocessing methods for deep neural networks on music tagging,” Eur. Signal Process. Conf., vol. 2018-September, pp. 1870–1874, 2018, doi: 10.23919/EUSIPCO.2018.8553106.

H. Purwins, B. Li, T. Virtanen, J. Schlüter, S. Y. Chang, and T. Sainath, “Deep Learning for Audio Signal Processing,” IEEE J. Sel. Top. Signal Process., vol. 13, no. 2, pp. 206–219, 2019, doi: 10.1109/JSTSP.2019.2908700.

N. K. Manaswi, “Deep Learning with Applications Using Python,” Deep Learn. with Appl. Using Python, pp. 31–43, 2018, doi: 10.1007/978-1-4842-3516-4.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Indexed By :


Jurnal EXPLORE STMIK Mataram
Jalan Kampus STMIK - ASM Mataram Kekalik Jaya Kota Mataram Prov. NTB - 83126
Telp: 0370-635007, 0370-628418

Lisensi Creative Commons
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-NonKomersial 4.0 Internasional.