Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

H. P. Shen, N. Minematsu, T. Makino, S. H. Weinberger, T. Pongkittiphan, Chung-Hsien Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If the speaker is a learner, he can also know how his pronunciation compares to other varieties. Creating the map mathematically requires a matrix of pronunciation distances among all the speakers considered. This paper investigates invariant pronunciation structure analysis and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. In experiments, the Speech Accent Archive (SAA), which contains speech data of worldwide accented English, is used as training and testing samples. IPA narrow transcriptions in the archive are used to prepare reference pronunciation distances, which are then predicted based on structural analysis and SVR, not with IPA transcriptions. Correlation between the reference distances and the predicted distances is calculated. Experimental results show very promising results and our proposed method outperforms by far a baseline system developed using an HMM-based phoneme recognizer.

Original languageEnglish
Title of host publication2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings
Pages222-227
Number of pages6
DOIs
Publication statusPublished - 2013 Dec 1
Event2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Olomouc, Czech Republic
Duration: 2013 Dec 82013 Dec 13

Publication series

Name2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

Other

Other2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013
CountryCzech Republic
CityOlomouc
Period13-12-0813-12-13

Fingerprint

Sodium Glutamate
Cluster Analysis
Tongue
Language
Communication

All Science Journal Classification (ASJC) codes

  • Speech and Hearing

Cite this

Shen, H. P., Minematsu, N., Makino, T., Weinberger, S. H., Pongkittiphan, T., & Wu, C-H. (2013). Automatic pronunciation clustering using a World English archive and pronunciation structure analysis. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings (pp. 222-227). [6707733] (2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings). https://doi.org/10.1109/ASRU.2013.6707733
Shen, H. P. ; Minematsu, N. ; Makino, T. ; Weinberger, S. H. ; Pongkittiphan, T. ; Wu, Chung-Hsien. / Automatic pronunciation clustering using a World English archive and pronunciation structure analysis. 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings. 2013. pp. 222-227 (2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings).
@inproceedings{7099e3533ddf4762ba10d5096d065eb8,
title = "Automatic pronunciation clustering using a World English archive and pronunciation structure analysis",
abstract = "English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If the speaker is a learner, he can also know how his pronunciation compares to other varieties. Creating the map mathematically requires a matrix of pronunciation distances among all the speakers considered. This paper investigates invariant pronunciation structure analysis and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. In experiments, the Speech Accent Archive (SAA), which contains speech data of worldwide accented English, is used as training and testing samples. IPA narrow transcriptions in the archive are used to prepare reference pronunciation distances, which are then predicted based on structural analysis and SVR, not with IPA transcriptions. Correlation between the reference distances and the predicted distances is calculated. Experimental results show very promising results and our proposed method outperforms by far a baseline system developed using an HMM-based phoneme recognizer.",
author = "Shen, {H. P.} and N. Minematsu and T. Makino and Weinberger, {S. H.} and T. Pongkittiphan and Chung-Hsien Wu",
year = "2013",
month = "12",
day = "1",
doi = "10.1109/ASRU.2013.6707733",
language = "English",
isbn = "9781479927562",
series = "2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings",
pages = "222--227",
booktitle = "2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings",

}

Shen, HP, Minematsu, N, Makino, T, Weinberger, SH, Pongkittiphan, T & Wu, C-H 2013, Automatic pronunciation clustering using a World English archive and pronunciation structure analysis. in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings., 6707733, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings, pp. 222-227, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013, Olomouc, Czech Republic, 13-12-08. https://doi.org/10.1109/ASRU.2013.6707733

Automatic pronunciation clustering using a World English archive and pronunciation structure analysis. / Shen, H. P.; Minematsu, N.; Makino, T.; Weinberger, S. H.; Pongkittiphan, T.; Wu, Chung-Hsien.

2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings. 2013. p. 222-227 6707733 (2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic pronunciation clustering using a World English archive and pronunciation structure analysis

AU - Shen, H. P.

AU - Minematsu, N.

AU - Makino, T.

AU - Weinberger, S. H.

AU - Pongkittiphan, T.

AU - Wu, Chung-Hsien

PY - 2013/12/1

Y1 - 2013/12/1

N2 - English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If the speaker is a learner, he can also know how his pronunciation compares to other varieties. Creating the map mathematically requires a matrix of pronunciation distances among all the speakers considered. This paper investigates invariant pronunciation structure analysis and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. In experiments, the Speech Accent Archive (SAA), which contains speech data of worldwide accented English, is used as training and testing samples. IPA narrow transcriptions in the archive are used to prepare reference pronunciation distances, which are then predicted based on structural analysis and SVR, not with IPA transcriptions. Correlation between the reference distances and the predicted distances is calculated. Experimental results show very promising results and our proposed method outperforms by far a baseline system developed using an HMM-based phoneme recognizer.

AB - English is the only language available for global communication. Due to the influence of speakers' mother tongue, however, those from different regions inevitably have different accents in their pronunciation of English. The ultimate goal of our project is creating a global pronunciation map of World Englishes on an individual basis, for speakers to use to locate similar English pronunciations. If the speaker is a learner, he can also know how his pronunciation compares to other varieties. Creating the map mathematically requires a matrix of pronunciation distances among all the speakers considered. This paper investigates invariant pronunciation structure analysis and Support Vector Regression (SVR) to predict the inter-speaker pronunciation distances. In experiments, the Speech Accent Archive (SAA), which contains speech data of worldwide accented English, is used as training and testing samples. IPA narrow transcriptions in the archive are used to prepare reference pronunciation distances, which are then predicted based on structural analysis and SVR, not with IPA transcriptions. Correlation between the reference distances and the predicted distances is calculated. Experimental results show very promising results and our proposed method outperforms by far a baseline system developed using an HMM-based phoneme recognizer.

UR - http://www.scopus.com/inward/record.url?scp=84893687605&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893687605&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2013.6707733

DO - 10.1109/ASRU.2013.6707733

M3 - Conference contribution

SN - 9781479927562

T3 - 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

SP - 222

EP - 227

BT - 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings

ER -

Shen HP, Minematsu N, Makino T, Weinberger SH, Pongkittiphan T, Wu C-H. Automatic pronunciation clustering using a World English archive and pronunciation structure analysis. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings. 2013. p. 222-227. 6707733. (2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings). https://doi.org/10.1109/ASRU.2013.6707733