Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis

Yi Chin Huang, Chung-Hsien Wu, Sz Ting Weng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.

Original languageEnglish
Title of host publication2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
Pages79-83
Number of pages5
DOIs
Publication statusPublished - 2012 Dec 1
Event2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012 - Hong Kong, China
Duration: 2012 Dec 52012 Dec 8

Publication series

Name2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012

Other

Other2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012
CountryChina
CityHong Kong
Period12-12-0512-12-08

Fingerprint

Pitch Contour
Speech Synthesis
linguistics
language
evaluation
Prosodic Units
Intonation
Naturalness
Language Model
Evaluation
Utterance

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Cite this

Huang, Y. C., Wu, C-H., & Weng, S. T. (2012). Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis. In 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012 (pp. 79-83). [6423536] (2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012). https://doi.org/10.1109/ISCSLP.2012.6423536
Huang, Yi Chin ; Wu, Chung-Hsien ; Weng, Sz Ting. / Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis. 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. 2012. pp. 79-83 (2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012).
@inproceedings{446d2ab100424c98a8c2f0b2a1e47593,
title = "Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis",
abstract = "In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.",
author = "Huang, {Yi Chin} and Chung-Hsien Wu and Weng, {Sz Ting}",
year = "2012",
month = "12",
day = "1",
doi = "10.1109/ISCSLP.2012.6423536",
language = "English",
isbn = "9781467325059",
series = "2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012",
pages = "79--83",
booktitle = "2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012",

}

Huang, YC, Wu, C-H & Weng, ST 2012, Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis. in 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012., 6423536, 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012, pp. 79-83, 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012, Hong Kong, China, 12-12-05. https://doi.org/10.1109/ISCSLP.2012.6423536

Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis. / Huang, Yi Chin; Wu, Chung-Hsien; Weng, Sz Ting.

2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. 2012. p. 79-83 6423536 (2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis

AU - Huang, Yi Chin

AU - Wu, Chung-Hsien

AU - Weng, Sz Ting

PY - 2012/12/1

Y1 - 2012/12/1

N2 - In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.

AB - In this paper, a novel hierarchical prosodic unit selection method is proposed based on pitch contour pattern retrieval, in order to obtained natural pitch contour of the personalized synthetic voice. In this framework, a hierarchical prosodic unit based on Fujisaki model is used to take local pitch contour variation and global intonation of utterance into account. Furthermore, novel ways of integrating pitch contour pattern of prosodic units in the prosodic model are invents in order to improve the selection mechanism of the appropriate pitch contour. A novel prosodic unit selection method is proposed based on sentence retrieval, which not only uses the traditional linguistic cue as selection criterion, but also the shape of the pitch contour. Also, the codewords of pitch patterns in the training corpus and synthesized corpus were constructed by the proposed method and were used to map the relation between training codeword and synthesized corpus. Finally, the language model of pitch pattern is adopted to find the proper pitch pattern sequence of input text. The evaluation results demonstrate that the proposed prosodic model substantially improves naturalness of the intonation of the synthesized speech compared to that of model-based method.

UR - http://www.scopus.com/inward/record.url?scp=84874454139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874454139&partnerID=8YFLogxK

U2 - 10.1109/ISCSLP.2012.6423536

DO - 10.1109/ISCSLP.2012.6423536

M3 - Conference contribution

SN - 9781467325059

T3 - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012

SP - 79

EP - 83

BT - 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012

ER -

Huang YC, Wu C-H, Weng ST. Hierarchical prosodic pattern selection based on Fujisaki model for natural mandarin speech synthesis. In 2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012. 2012. p. 79-83. 6423536. (2012 8th International Symposium on Chinese Spoken Language Processing, ISCSLP 2012). https://doi.org/10.1109/ISCSLP.2012.6423536