Leveraging Speaker Embeddings from Speaker Verification for Controllable Multispeaker Text-to-Speech

Nilam Thakkar; Shruti Yagnik; Tripti Sharma

Leveraging Speaker Embeddings from Speaker Verification for Controllable Multispeaker Text-to-Speech

Authors

Nilam Thakkar
Shruti Yagnik
Tripti Sharma

Abstract

Our hybrid system is compatible with Librosa library, Gaussian Mixture Model, and text-to-speech (TTS) technology. Neural networks are then used by TTS to produce audio speech that mimics the sounds of several speakers, including those that are excluded from the training set. The system incorporates three separately trained components: (1) With a small clip of the target speaker audio, the pre-trained encoder can validate it after comparing with a stand alone separately stored dataset of thousands of speakers' high pitched vocal notes without transcripts can produce fixed-length embedding vectors; (2) TacotronII endorsed sequential model that, relies on the primary level speaker embedding, transforms text into mel-spectrograms; (3) An auto regressive ‘WaveNet vocoder’ converts Mel spectrograms to waveforms with are functions of time.We demonstrate how the discriminative pre-training of the speech encoder on large-scale speaker diversity conveys important knowledge about speaker variability to the multi-speaker TTS challenge, allowing high-quality synthesis even for unknown speakers. We measure the advantages of rich and heterogeneous speaker datasets for enhanced generalization. Progressively, the new sounds generated with the aid of embedding of a random/ variable speaker can effectively generate new sounds that are different from the training set, suggesting that the model has picked up strong speaker representations.To prevent undue similarities, alternate wording and structure are used while preserving the essential factual data. Our recently added custom Librosa layers extract necessary features, which is helpful to improve the efficiency of the particular feature, and our freshly added custom GMM layers eliminate noise from the raw audios by removing noisy features.

Downloads

Published

2025-09-25

How to Cite

Nilam Thakkar, Shruti Yagnik, & Tripti Sharma. (2025). Leveraging Speaker Embeddings from Speaker Verification for Controllable Multispeaker Text-to-Speech. Utilitas Mathematica, 122(2), 1269–1300. Retrieved from https://utilitasmathematica.com/index.php/Index/article/view/2858

Download Citation

Issue

Vol. 122 No. 2 (2025): Vol. 122 No. 2 (2025): Volume 122, 2025

Section

Articles

Citation Check

Inviting Applications for Editorial Board Membership

August 12, 2025

Greetings from the Utilitas Mathematica (e-ISSN: 0315-3681) A Canadian journal of applied mathematics, computer science and statistics

We are pleased to announce that the Utilitas Mathematica is currently inviting qualified researchers, academics, and professionals to join our Editorial Board.

As a peer-reviewed, interdisciplinary journal committed to advancing knowledge in all areas of mathematics and computer science, UTM is seeking individuals with strong academic backgrounds, research expertise, and a passion for scholarly communication.

We are looking for:

Established researchers with a Ph.D. in a relevant field

Proven publication and peer review experience

Commitment to uphold ethical standards in scholarly publishing

Willingness to contribute to editorial decisions and journal development

Responsibilities include:

Reviewing submitted manuscripts

Assisting in maintaining the journal’s academic quality

Promoting the journal in your professional network

Advising on journal policies and special issues.

How to Apply:

Please submit your CV, list of publications, and a brief statement of interest to:

[editor@utilitasmathematica.com]

Join us in shaping the future of scientific publishing and innovation!

Sincerely,

Editorial Office

Utilitas Mathematica

Journal URL: https://utilitasmathematica.com/index.php/Index

Leveraging Speaker Embeddings from Speaker Verification for Controllable Multispeaker Text-to-Speech

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Citation Check

Most read articles by the same author(s)

Make a Submission

Indexing

Keywords

Menu

Information

Browse

Announcements

Inviting Applications for Editorial Board Membership

It’s important to remain vigilant against deceptive websites:

Editorial Team

Paper Selection and Publishing Process

Developed By