Speech Synthesis Related Papers

本文用以记录语音合成 (Speech Synthesis) 领域相关论文，包括经典的和未来的方向。

Acoustic model

Tacotron1: Tacotron: Towards End-to-End Speech Synthesis (Interspeech 2017)
Tacotron2: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions (ICASSP 2018)
FastSpeech1: FastSpeech: Fast, Robust and Controllable Text to Speech (NIPS 2019)
FastSpeech2: FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (arXiv 2020)
Glow-TTS: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search (NIPS 2020)
EfficientTTS: EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture (arXiv 2020)
BVAE-TTS: Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (ICLR 2021)

Vocoder

WaveNet: WaveNet: A Generative Model for Raw Audio (ISCA SS Workshop 2016)
FFTNet: FFTNet: a Real-Time Speaker-Dependent Neural Vocoder (ICASSP 2018)
WaveRNN: Efficient Neural Audio Synthesis (ICML 2018)[Code]
WaveGlow: WaveGlow: A Flow-based Generative Network for Speech Synthesis (ICASSP 2019)
MelGAN: MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis (NIPS 2019) [Code]
HiFi-GAN: HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis (NIPS 2020) [Code]

Prosody Modeling

Prosody-Tacotron: Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron (ICML2018)
GST-Tacotron: Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis (ICML2018)
VAE-Tacotron: Learning Latent Representations for Style Control and Transfer in End-to-End Speech Synthesis (ICASSP 2019)
VAE-Flow: Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech (ICASSP 2020)
Fine-grained-Attention: Robust and Fine-Grained Prosody Control of End-to-End Speech Synthesis (ICASSP 2019)
Manual-feature-based: Fine-grained robust prosody transfer for single-speaker neural text-to-speech (Interspeech 2019)
CopyCat: CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech (Interspeech 2020)

PREVIOUSBlog使用方法

NEXT最近工作事项