Some advanced features #871

NiKola-UE · 2024-05-14T15:42:54Z

NiKola-UE
May 14, 2024

Hello for all.

Here I would present some of my revised proposals that I tried to send through the official RHVoice email address, but the message not only did not arrive, but my Outlook account was temporarily blocked directly from Microsoft after this, which really shocked me.

I know you are too busy and have a lot to do (which is good anyway), so let anyone who wants to join the discussion. I'm not saying, maybe I've gone too far with my requests, but all criticism are welcome.

Because it uses less compression than the Microsoft One Core Voices, RHVoice turns out to be better and more flexible, and it's also nice that it takes up much less space than the commercial Nuance's Vocalizer Expressive. Voices for Polish, Czech and Slovak read texts in Croatian and Serbian Latin very well, and they also do it better than the voices of the mentioned products. However, I think that all current and future voices should better discover and recognize other languages, especially English, which is still the lingua franca.

Although I don't speak that many languages, I tested many of them voices to see how they work. Thus, the Nepali voice Dina that is available on Loader Pages is still not on the RHVoice site, while the Turkmen voice Dunya works flawlessly as the Windows Sapi 5 application, but there are some problems with the NVDA add-on because nothing is heard, ie. no speech. Although I can't get into the reasons for that, I admit that it confuses me that the Brazilian Portuguese male voice Henrique is proprietary, available separately on the site
rhvoice.com, ,
only as an app for Windows, and requires registration where Microsoft e-addresses are blocked (and I use one just like that). Only that page has a newer version of the NVDA RHVoice driver (2.0.19), while all others still point to the old one (1.14.1). Furthermore, as a user with two disabilities who likes various types and experiments with voices and speech (and let's not forget that screen readers are used by dyslexics and people with speech, motor and hearing impairments in addition to the blind and visually impaired), I really want that apart from setting the language, rate, pitch and volume are added by other parameters that can be changed and set as desired, such as language sinchronization between voices, modulation, aceleration, inflection, depth, sharpness, quality of speech, roughness, transparency, loudness, shrillness, seriousness, clarity, pauses after phrases, etc. The more all kinds of various settings, the better, more customizable and more accessible. So, some FOSS NVDA addons like
Newfon Speech, ,
IBMTTS, ,
WorldVoice, ,
ewen ESpeak, and others already have some of that interesting features, which can certainly be used for the best.

It would also be good if all these and other options were implemented directly in the driver, where they would be adjusted separately, because NVDA is the FOSS application in which each synthesizer and addon can have its own "autonomy", while commercial and closed JAWS does not allow this, so additional settings they can change when they are added to the RHVoice application that is already installed on Windows. In both cases, all available voices and notifications of the new versions should be implemented in applications and drivers where they can be checked/selected and downloaded directly, as is already done in the Android app, although not all voices are there yet. All of this will no doubt appeal to users who want simplicity, as well as to those of us who like complexity and a variety of advanced functions and additional commands...

When it is possible, I would like to slowly move from a statistical parametric system to a deep learning method using neural networks and artificial intelligence. I know that it is very difficult, demanding and expensive, but still there are some good and more easily available FOSS models, bots and networks, and the data would be stored on IPFS and SeaweedFS protocols that can be used complementary and jointly for this purpose. The only thing is that it requires a lot of resources, time and a huge knowledge and database. Of course, there should also be a page for voluntary donations, because without that it is very difficult, if not impossible, to survive and continue with good work and further improvements.

Finally, I would like to inform you that I have agreed with your colleague Zvonimir Stanečić, whose work, experience and perseverance amaze me more and more, to send him good and appropriate recordings of the already redy and finished texts, which will be used to develop and obtain new voices, but it will go a long way slower and more difficult than I thought because it is really difficult and not so cheap in these conditions to find a suitable and high-quality studio, equipment, experienced engineers and other collaborators willing to help wholeheartedly and with understanding to do everything carefully and in detail as it should be, but still nothing I cannot promise soon, although of course I have a strong and firm desire to contribute as much as I can because this is a project for the common good.

If I'm not being too intrusive and so arrogant, I'm convinced that it would be a really big deal if Mr. Hrvoje Katić
(@hkatic)
(Zvonimir certainly knows who I'm talking about) who is older than me, more experienced, knows a lot more and has a better command and knowledge of speech syntheses, screen readers and technology generally, would get involved as well. is also a professional programmer, musician, audio and recording engineer and producer, so he has better working conditions and can do and contribute much more in a much shorter time than me...

Thanks in advanced.

zstanecic · 2024-05-14T17:23:05Z

zstanecic
May 14, 2024

Hi,
Let's answer your all messages here inline
Hello for all.
Here I would present some of my revised proposals that I tried to send through the official RHVoice email address, but the message not only did not arrive, but my Outlook account was temporarily blocked directly from Microsoft after this, which really shocked me.
Don't use e-mail from outlook. Switch to gmail instead. I am not, and never be responsible for someone's mail provider. I know that it can be harsh, but it is true.
I know you are too busy and have a lot to do (which is good anyway), so let anyone who wants to join the discussion. I'm not saying, maybe I've gone too far with my requests, but all criticism are welcome.
Because it uses less compression than the Microsoft One Core Voices, RHVoice turns out to be better and more flexible, and it's also nice that it takes up much less space than the commercial Nuance's Vocalizer Expressive. Voices for Polish, Czech and Slovak read texts in Croatian and Serbian Latin very well, and they also do it better than the voices of the mentioned products. However, I think that all current and future voices should better discover and recognize other languages, especially English, which is still the lingua franca.
Although I don't speak that many languages, I tested many of them voices to see how they work.
Polish voices have the foreign characters support, because of the fact that polish people for example are following the sport events, in which Croats and or serbs compete, smiles...
Current and new voices are just doing that what you want, but based on the alphabet. Sorry.
Thus, the Nepali voice Dina that is available on Loader Pages is still not on the RHVoice site, while the Turkmen voice Dunya works flawlessly as the Windows Sapi 5 application, but there are some problems with the NVDA add-on because nothing is heard, ie. no speech. Although I can't get into the reasons for that, I admit that it confuses me that the Brazilian Portuguese male voice Henrique is proprietary, available separately on the site
rhvoice.com, ,
What has in common the Dina and Henrique? why are you mixing unrelated things?
only as an app for Windows, and requires registration where Microsoft e-addresses are blocked (and I use one just like that). Only that page has a newer version of the NVDA RHVoice driver (2.0.19), while all others still point to the old one (1.14.1).
Furthermore, as a user with two disabilities who likes various types and experiments with voices and speech (and let's not forget that screen readers are used by dyslexics and people with speech, motor and hearing impairments in addition to the blind and visually impaired), I really want that apart from setting the language, rate, pitch and volume are added by other parameters that can be changed and set as desired, such as language sinchronization between voices, modulation, aceleration, inflection, depth, sharpness, quality of speech, roughness, transparency, loudness, shrillness, seriousness, clarity, pauses after phrases, etc. The more all kinds of various settings, the better, more customizable and more accessible. So, some FOSS NVDA addons like
Newfon Speech, ,
IBMTTS, ,
WorldVoice, ,
ewen ESpeak, and others already have some of that interesting features, which can certainly be used for the best.
It would also be good if all these and other options were implemented directly in the driver, where they would be adjusted separately, because NVDA is the FOSS application in which each synthesizer and addon can have its own "autonomy", while commercial and closed JAWS does not allow this, so additional settings they can change when they are added to the RHVoice application that is already installed on Windows. In both cases, all available voices and notifications of the new versions should be implemented in applications and drivers where they can be checked/selected and downloaded directly, as is already done in the Android app, although not all voices are there yet. All of this will no doubt appeal to users who want simplicity, as well as to those of us who like complexity and a variety of advanced functions and additional commands...
When it is possible, I would like to slowly move from a statistical parametric system to a deep learning method using neural networks and artificial intelligence. I know that it is very difficult, demanding and expensive, but still there are some good and more easily available FOSS models, bots and networks, and the data would be stored on IPFS and SeaweedFS protocols that can be used complementary and jointly for this purpose. The only thing is that it requires a lot of resources, time and a huge knowledge and database.
Forget about Dnn and Ai here. The priority is responsiveness of the tts.
Of course, there should also be a page for voluntary donations, because without that it is very difficult, if not impossible, to survive and continue with good work and further improvements.
Finally, I would like to inform you that I have agreed with your colleague Zvonimir Stanečić, whose work, experience and perseverance amaze me more and more, to send him good and appropriate recordings of the already redy and finished texts, which will be used to develop and obtain new voices, but it will go a long way slower and more difficult than I thought because it is really difficult and not so cheap in these conditions to find a suitable and high-quality studio, equipment, experienced engineers and other collaborators willing to help wholeheartedly and with understanding to do everything carefully and in detail as it should be, but still nothing I cannot promise soon, although of course I have a strong and firm desire to contribute as much as I can because this is a project for the common good.
I appreciate it, and i am glad to see your recordings, along with suggestions to the language.
If I'm not being too intrusive and so arrogant, I'm convinced that it would be a really big deal if Mr. Hrvoje Katić
(@hkatic)
(Zvonimir certainly knows who I'm talking about) who is older than me, more experienced, knows a lot more and has a better command and knowledge of speech syntheses, screen readers and technology generally, would get involved as well. is also a professional programmer, musician, audio and recording engineer and producer, so he has better working conditions and can do and contribute much more in a much shorter time than me...
Thanks in advanced.
Hrvoje is more busy that me. He is working in a professional company. I don't think so that he has the tim.

0 replies

rmcpantoja · 2024-05-14T17:57:14Z

rmcpantoja
May 14, 2024

Hi @NiKola-UE.

I really want that apart from setting the language, rate, pitch and volume are added by other parameters that can be changed and set as desired, such as language sinchronization between voices, modulation, aceleration, inflection, depth, sharpness, quality of speech, roughness, transparency, loudness, shrillness, seriousness, clarity, pauses after phrases, etc.

If I'm not wrong, this is possible using flite, or by modifying some features in the hts system core to support some custom common voice params.
Fun fact: setting voicing to 0.0 on any voice makes the voice have a perfect whisper effect.
The inflection is possible if the language has special features where the pitch must be strictly modified (pitch features, pitch modifications). A clear example of this is the Portuguese language. The inflection can be changed from the configuration file of a voice in this case.

0 replies

NiKola-UE · 2024-05-14T20:39:20Z

NiKola-UE
May 14, 2024
Author

Thanks for the replies, I didn't expect them so soon.

Zvonimir, you don't have to quote every message of mine in its entirety, it is enough to refer only to what you consider controversial and problematic.

Right now I can't just change my address because everything is here, and I've already changed a lot of them. But I'm already thinking about some different FOSS alternatives like Mailpile or Forward Email that are more private and interesting.

I understand that the RHVoice is still in the early stages of development, but as it progresses, there will definitely be improvements. Of course, West Slavic and South Slavic languages have the same or similar diacritical characters, but, unlike the RHVoice voices, the voices of the mentioned commercial synthesizers read them worse or cannot read them at all.

As I said, I experimented with various voices because I was interested in how they worked, and the opening of this discussion is about voice as such in general; without dwelling on a specific point. Well, you trained and developed voices for many languages, including Uzbek, which I don't know if you speak, which is truly commendable.

By the method of neural networks I mean
this, ,
which is also used.

Although I'm not a developer, I'm interested in the whole of something I want to use, not just specific parts (in this particular case voices or languages). To repeat, the different versions of the NVDA driver for RHVoice in .com site and all other places are not very clear to me, but well...

As for the recording, I will fulfill my promise at some point, but I ask for your patience and understanding because it may take a very long time, and the more people agree to participate, the more expensive it will be. Some even persuade me to give up or delay fearing the high prices, but giving up after everything we have agreed on is the last thing on my mind, although I understand their concern which is still justified... At worst, I can record in my temporarily soundproofed room with some borrowed boom mic, but I have no idea what those recordings will sound like.

I'm glad you know hkatic whose TTS sketches literally made me take a more serious interest in this and delve deeper into it, although there is still a lot to learn and understand. I know he is very busy because I read an interview with him on Telegram.hr and his YouTube channels have been inactive for a long time. If he has more time, I suppose he will get involved as much as he can and wants, but of course we will not disturb him; especially not me.

Mateo, I like your observations and suggestions. This is all well and good, but I still think that somehow it should be incorporated directly into the settings themselves - either because of those who are bad with technical details, or because of better stability and more levels of the RHVoice itself.

Let me ask you: Are you involved in any way in developing voices for Italian (since I assume that's your language), although I don't know how much that language is in the plan, but it's really so good if it is?

1 reply

rmcpantoja May 14, 2024

Mateo, I like your observations and suggestions. This is all well and good, but I still think that somehow it should be incorporated directly into the settings themselves - either because of those who are bad with technical details, or because of better stability and more levels of the RHVoice itself.

It may be possible to control these settings through NVDA itself by sending those parameters, but the inflection change, being possible only in some languages and not globally, we can't track it, unless the HTS system that supports it is modified to work globally (all voices).

Let me ask you: Are you involved in any way in developing voices for Italian (since I assume that's your language), although I don't know how much that language is in the plan, but it's really so good if it is?

No, sorry. I'm currently developing TTS speech corpora for Spanish designed to screen reader usage. A Castilian Spanish Male speaker is almost done. :)

hkatic · 2024-05-15T09:37:12Z

hkatic
May 15, 2024

Hello, Let me clarify: 1. Although I have a job, free time is not a problem for me. The problem is, I'm not speech synthesis expert as someone pointed out, neither I'm an expert in C++ coding. I am a Python and Android developer. 2. Since I have more than solid studio equipment, I can possibly contribute by lending my voice, but nothing more than that, I'm afraid. In general, I am not currently willing to participate in large projects, but I am willing to give feedback where I feel competent to give feedback. 3. And finally, just a small remark: Since we are living in a period of time when AI voices are most certainly taking over (just look at ElevenLabs and OpenAI), I'm afraid that classic speech synthesis will soon become a thing of the past, unfortunately. Especially if we consider the fact that the AI voice speaks in whatever language you want with no extra effort. But nevertheless, I truely support RhVoice as a compact and free HQ speech synthesizer. Best regards, HKatic

…

On Tue, May 14, 2024 at 7:23 PM Zvonimir Stanečić ***@***.***> wrote: Hi, Let's answer your all messages here inline Hello for all. Here I would present some of my revised proposals that I tried to send through the official RHVoice email address, but the message not only did not arrive, but my Outlook account was temporarily blocked directly from Microsoft after this, which really shocked me. Don't use e-mail from outlook. Switch to gmail instead. I am not, and never be responsible for someone's mail provider. I know that it can be harsh, but it is true. I know you are too busy and have a lot to do (which is good anyway), so let anyone who wants to join the discussion. I'm not saying, maybe I've gone too far with my requests, but all criticism are welcome. Because it uses less compression than the Microsoft One Core Voices, RHVoice turns out to be better and more flexible, and it's also nice that it takes up much less space than the commercial Nuance's Vocalizer Expressive. Voices for Polish, Czech and Slovak read texts in Croatian and Serbian Latin very well, and they also do it better than the voices of the mentioned products. However, I think that all current and future voices should better discover and recognize other languages, especially English, which is still the lingua franca. Although I don't speak that many languages, I tested many of them voices to see how they work. Polish voices have the foreign characters support, because of the fact that polish people for example are following the sport events, in which Croats and or serbs compete, smiles... Current and new voices are just doing that what you want, but based on the alphabet. Sorry. Thus, the Nepali voice Dina that is available on Loader Pages is still not on the RHVoice site, while the Turkmen voice Dunya works flawlessly as the Windows Sapi 5 application, but there are some problems with the NVDA add-on because nothing is heard, ie. no speech. Although I can't get into the reasons for that, I admit that it confuses me that the Brazilian Portuguese male voice Henrique is proprietary, available separately on the site rhvoice.com, , What has in common the Dina and Henrique? why are you mixing unrelated things? only as an app for Windows, and requires registration where Microsoft e-addresses are blocked (and I use one just like that). Only that page has a newer version of the NVDA RHVoice driver (2.0.19), while all others still point to the old one (1.14.1). Furthermore, as a user with two disabilities who likes various types and experiments with voices and speech (and let's not forget that screen readers are used by dyslexics and people with speech, motor and hearing impairments in addition to the blind and visually impaired), I really want that apart from setting the language, rate, pitch and volume are added by other parameters that can be changed and set as desired, such as language sinchronization between voices, modulation, aceleration, inflection, depth, sharpness, quality of speech, roughness, transparency, loudness, shrillness, seriousness, clarity, pauses after phrases, etc. The more all kinds of various settings, the better, more customizable and more accessible. So, some FOSS NVDA addons like Newfon Speech, , IBMTTS, , WorldVoice, , ewen ESpeak, and others already have some of that interesting features, which can certainly be used for the best. It would also be good if all these and other options were implemented directly in the driver, where they would be adjusted separately, because NVDA is the FOSS application in which each synthesizer and addon can have its own "autonomy", while commercial and closed JAWS does not allow this, so additional settings they can change when they are added to the RHVoice application that is already installed on Windows. In both cases, all available voices and notifications of the new versions should be implemented in applications and drivers where they can be checked/selected and downloaded directly, as is already done in the Android app, although not all voices are there yet. All of this will no doubt appeal to users who want simplicity, as well as to those of us who like complexity and a variety of advanced functions and additional commands... When it is possible, I would like to slowly move from a statistical parametric system to a deep learning method using neural networks and artificial intelligence. I know that it is very difficult, demanding and expensive, but still there are some good and more easily available FOSS models, bots and networks, and the data would be stored on IPFS and SeaweedFS protocols that can be used complementary and jointly for this purpose. The only thing is that it requires a lot of resources, time and a huge knowledge and database. Forget about Dnn and Ai here. The priority is responsiveness of the tts. Of course, there should also be a page for voluntary donations, because without that it is very difficult, if not impossible, to survive and continue with good work and further improvements. Finally, I would like to inform you that I have agreed with your colleague Zvonimir Stanečić, whose work, experience and perseverance amaze me more and more, to send him good and appropriate recordings of the already redy and finished texts, which will be used to develop and obtain new voices, but it will go a long way slower and more difficult than I thought because it is really difficult and not so cheap in these conditions to find a suitable and high-quality studio, equipment, experienced engineers and other collaborators willing to help wholeheartedly and with understanding to do everything carefully and in detail as it should be, but still nothing I cannot promise soon, although of course I have a strong and firm desire to contribute as much as I can because this is a project for the common good. I appreciate it, and i am glad to see your recordings, along with suggestions to the language. If I'm not being too intrusive and so arrogant, I'm convinced that it would be a really big deal if Mr. Hrvoje Katić ***@***.*** <https://github.com/hkatic>) (Zvonimir certainly knows who I'm talking about) who is older than me, more experienced, knows a lot more and has a better command and knowledge of speech syntheses, screen readers and technology generally, would get involved as well. is also a professional programmer, musician, audio and recording engineer and producer, so he has better working conditions and can do and contribute much more in a much shorter time than me... Thanks in advanced. Hrvoje is more busy that me. He is working in a professional company. I don't think so that he has the tim. — Reply to this email directly, view it on GitHub <#871 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7HOPVTTY375D5B2MO2SI3ZCJCA7AVCNFSM6AAAAABHWOHN7KVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TIMZWG4YDM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

NiKola-UE · 2024-05-15T14:51:03Z

NiKola-UE
May 15, 2024
Author

Hi, Hrvoje. Thank you for reaching out - I'm the one who called you out after all.

I recently sent you an email, but I don't know if it reached you (it's also possible that I made a mistake when typing). Apart from what we are talking about here, I wanted to supplement one of your videos in which you compare the voices of the announcers and the obtained speech synthesizers, and I referred to some voices for AlfaNum's products, but I would not talk about that here. Unlike you, I'm just an enthusiast and I want to do a lot of things, but I don't know. The knowledge you have can certainly be applied to RHVoice. I agree with everything you said, which is why I mentioned the neural network method. Unfortunately, I don't know how voices are developed and trained, so if you are interested, discuss the details with uncle Zvonimir, whom you certainly know well.

Greetings.

0 replies

NiKola-UE · 2024-05-15T16:04:27Z

NiKola-UE
May 15, 2024
Author

Mateo, I don't know Italian, but based on your name and surname, I thought you were Italian, but I apologize if I was wrong. Anyway, it's nice that we get the first Spanish voice for RHVoice and I applaud your work and effort. I hope you won't run into problems like a mentioned (not open source) Portuguese voice Henrique for which different rules apply and whose authors need funds.

The good thing about FOSS speech synths like RHVoice is that they can be changed and adjusted again and again, so we'll see how it progresses. But in contrast to the syntheses I mentioned, with RHVoice the rule applies that each language has its own voices, so the settings of each of them cannot be exactly the same and equally acceptable for all languages; if you know what I'm trying to say.

Regarding what @hkatic said, artificial intelligence is indeed progressing faster and faster, but it still cannot be fully integrated with screen readers, and it is unlikely that someone will just release something that is cross-platform, open source and accessible to everyone.

0 replies

NiKola-UE · 2024-05-16T14:32:36Z

NiKola-UE
May 16, 2024
Author

Here are two more of my ideas: demo and browser extensions.

The demonstration can be on the website or as a portable application (online or offline) and would serve for easy, quick and simple testing and in appearance and purpose would not differ from similar ones. The text to be spoken is entered in the first field, followed by the options for choosing the language, voice, rate, pitch, volume and all those parameters I mentioned, "Listen entered text" and "Export as audio" and that's all. This is a good way to test the voices before downloading, and I'll also be less scolded by the dr. Zvonimir for downloading voices for languages I don't speak...

Browser extensions would read entire pages or selected text on them, and would be a great alternative to the likes of "Read Aloud" which is not easy to use and in which not all voices are equally accessible. Even those with healthy vision would use it because it is simple and easy.

1 reply

NiKola-UE May 17, 2024
Author

I'm sorry, I completely forgot about
this page, ,
but the demo is only available in Russian, it doesn't have all the voices and other parameters, and only two formats are available: MP3 and OGG Vorbis. I have no doubt that it will be able to be even better, it just takes a little more patience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some advanced features #871

{{title}}

Replies: 7 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Some advanced features #871

NiKola-UE May 14, 2024

Replies: 7 comments · 2 replies

zstanecic May 14, 2024

rmcpantoja May 14, 2024

NiKola-UE May 14, 2024 Author

rmcpantoja May 14, 2024

hkatic May 15, 2024

NiKola-UE May 15, 2024 Author

NiKola-UE May 15, 2024 Author

NiKola-UE May 16, 2024 Author

NiKola-UE May 17, 2024 Author

NiKola-UE
May 14, 2024

Replies: 7 comments 2 replies

zstanecic
May 14, 2024

rmcpantoja
May 14, 2024

NiKola-UE
May 14, 2024
Author

hkatic
May 15, 2024

NiKola-UE
May 15, 2024
Author

NiKola-UE
May 15, 2024
Author

NiKola-UE
May 16, 2024
Author

NiKola-UE May 17, 2024
Author