TTS WebUI

Home Pipeline Text-to-Speech Audio/Music Generation Audio Conversion Outputs Tools

Welcome to the TTS Webui!

This is a web interface for the TTS project. It allows you to generate audio using the TTS models.

To get started, select a tab above to choose a model and generate some audio.

Text-to-Speech Models:

Kokoro

Kokoro is a fast and lightweight TTS model with 82 million parameters. Small but comparable in quality to larger models.

Run Github

Chatterbox

Expressive text-to-speech model with reference audio support for voice cloning.

Run Github

Bark

Bark is a text-to-speech model that can generate speech from text.

Run Github

Tortoise

Tortoise is a text-to-speech model that can generate speech from text.

Run Github

Maha TTS

Maha TTS is a text-to-speech model that can generate speech from text, supports many Indian languages.

Run Github

MMS

Fairseq based text-to-speech model that supports 1000+ languages

Run Github

VALL-E X

Multilingual TTS: Speak in three languages - English, Chinese, and Japanese - with natural and expressive speech synthesis.

Run Github

Audio/Music Generation Models:

Musicgen

MusicGen is a state-of-the-art controllable text-to-music model.

Run Github

MAGNeT

A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.

Run Github

Stable Audio

A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.

Run Github

ACE-Step

ACE-Step: A Step Towards Music Generation Foundation Model.

Run Github

Audio Conversion Models:

RVC

An easy-to-use voice conversion framework based on VITS.

Run Github

Demucs

Demucs is a post-processing model for Music Source Separation.

Run Github

Vocos Wav

Vocos Wav is a post-processing model that can refine the output of a text-to-speech model.

Run Github

Vocos NPZ

Vocos NPZ is a post-processing model that can refine the output of a Bark.

Run Github