Abstract
This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.
Original language | English |
---|---|
Pages (from-to) | 294-298 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
Publication status | Published - 2015 Jan 1 |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: 2015 Sep 6 → 2015 Sep 10 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation
Cite this
}
Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation. / Huang, Yi Chin; Wu, Chung Hsien; Shie, Ming Ge.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2015-January, 01.01.2015, p. 294-298.Research output: Contribution to journal › Conference article
TY - JOUR
T1 - Fluent personalized speech synthesis with prosodicword-level spontaneous speech generation
AU - Huang, Yi Chin
AU - Wu, Chung Hsien
AU - Shie, Ming Ge
PY - 2015/1/1
Y1 - 2015/1/1
N2 - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.
AB - This paper proposes an automatic approach to generating speech with fluency at the prosodic word level based on a small- sized speech database of the target speaker, consisting of read and fluent speech. First, an auto-segmentation algorithm is em- ployed to automatically segment and label the database of the target speaker. A pre-trained average voice model is adapted to the voice model of the target speaker by using the auto- segmented data. For synthesizing fluent speech, a prosodic model is proposed to smooth the prosodic word-level param- eters to improve the fluency in a prosodic word. Finally, a postfilter method based on the modulation spectrum is adopted to alleviate over-smoothing problem of the synthesized speech and thus improve the speaker similarity. Experimental results showed that the proposed method can effectively improve the speech fluency and speaker likeliness of the synthesized speech for a target speaker compared to the MLLR-based model adap- tation method.
UR - http://www.scopus.com/inward/record.url?scp=84959173118&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959173118&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84959173118
VL - 2015-January
SP - 294
EP - 298
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
ER -