A Novel Prosodic-Information Synthesizer Based on Recurrent Fuzzy Neural Network for the Chinese TTS System

Chin Teng Lin, Rui Cheng Wu, Jyh Yeong Chang, Sheng Fu Liang

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

In this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are on-line available for demonstration.

Original languageEnglish
Pages (from-to)309-324
Number of pages16
JournalIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Volume34
Issue number1
DOIs
Publication statusPublished - 2004 Feb 1

Fingerprint

Recurrent neural networks
Fuzzy neural networks
Fuzzy inference
Fuzzy rules
Network layers
Multilayer neural networks
Linguistics
Electron energy levels
Demonstrations
Acoustic waves
Neural networks
Economics

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

@article{d485114cb4d240a19d16da31c20baaf8,
title = "A Novel Prosodic-Information Synthesizer Based on Recurrent Fuzzy Neural Network for the Chinese TTS System",
abstract = "In this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are on-line available for demonstration.",
author = "Lin, {Chin Teng} and Wu, {Rui Cheng} and Chang, {Jyh Yeong} and Liang, {Sheng Fu}",
year = "2004",
month = "2",
day = "1",
doi = "10.1109/TSMCB.2003.811518",
language = "English",
volume = "34",
pages = "309--324",
journal = "IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics",
issn = "1083-4419",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

A Novel Prosodic-Information Synthesizer Based on Recurrent Fuzzy Neural Network for the Chinese TTS System. / Lin, Chin Teng; Wu, Rui Cheng; Chang, Jyh Yeong; Liang, Sheng Fu.

In: IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol. 34, No. 1, 01.02.2004, p. 309-324.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A Novel Prosodic-Information Synthesizer Based on Recurrent Fuzzy Neural Network for the Chinese TTS System

AU - Lin, Chin Teng

AU - Wu, Rui Cheng

AU - Chang, Jyh Yeong

AU - Liang, Sheng Fu

PY - 2004/2/1

Y1 - 2004/2/1

N2 - In this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are on-line available for demonstration.

AB - In this paper, a new technique for the Chinese text-to-speech (TTS) system is proposed. Our major effort focuses on the prosodic information generation. New methodologies for constructing fuzzy rules in a prosodic model simulating human's pronouncing rules are developed. The proposed Recurrent Fuzzy Neural Network (RFNN) is a multilayer recurrent neural network (RNN) which integrates a Self-cOnstructing Neural Fuzzy Inference Network (SONFIN) into a recurrent connectionist structure. The RFNN can be functionally divided into two parts. The first part adopts the SONFIN as a prosodic model to explore the relationship between high-level linguistic features and prosodic information based on fuzzy inference rules. As compared to conventional neural networks, the SONFIN can always construct itself with an economic network size in high learning speed. The second part employs a five-layer network to generate all prosodic parameters by directly using the prosodic fuzzy rules inferred from the first part as well as other important features of syllables. The TTS system combined with the proposed method can behave not only sandhi rules but also the other prosodic phenomena existing in the traditional TTS systems. Moreover, the proposed scheme can even find out some new rules about prosodic phrase structure. The performance of the proposed RFNN-based prosodic model is verified by imbedding it into a Chinese TTS system with a Chinese monosyllable database based on the time-domain pitch synchronous overlap add (TD-PSOLA) method. Our experimental results show that the proposed RFNN can generate proper prosodic parameters including pitch means, pitch shapes, maximum energy levels, syllable duration, and pause duration. Some synthetic sounds are on-line available for demonstration.

UR - http://www.scopus.com/inward/record.url?scp=0742307301&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0742307301&partnerID=8YFLogxK

U2 - 10.1109/TSMCB.2003.811518

DO - 10.1109/TSMCB.2003.811518

M3 - Article

C2 - 15369074

AN - SCOPUS:0742307301

VL - 34

SP - 309

EP - 324

JO - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

JF - IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

SN - 1083-4419

IS - 1

ER -