Replies: 7 comments 2 replies
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
Hi @NiKola-UE.
If I'm not wrong, this is possible using flite, or by modifying some features in the hts system core to support some custom common voice params. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the replies, I didn't expect them so soon. Zvonimir, you don't have to quote every message of mine in its entirety, it is enough to refer only to what you consider controversial and problematic. Right now I can't just change my address because everything is here, and I've already changed a lot of them. But I'm already thinking about some different FOSS alternatives like Mailpile or Forward Email that are more private and interesting. I understand that the RHVoice is still in the early stages of development, but as it progresses, there will definitely be improvements. Of course, West Slavic and South Slavic languages have the same or similar diacritical characters, but, unlike the RHVoice voices, the voices of the mentioned commercial synthesizers read them worse or cannot read them at all. As I said, I experimented with various voices because I was interested in how they worked, and the opening of this discussion is about voice as such in general; without dwelling on a specific point. Well, you trained and developed voices for many languages, including Uzbek, which I don't know if you speak, which is truly commendable. By the method of neural networks I mean Although I'm not a developer, I'm interested in the whole of something I want to use, not just specific parts (in this particular case voices or languages). To repeat, the different versions of the NVDA driver for RHVoice in .com site and all other places are not very clear to me, but well... As for the recording, I will fulfill my promise at some point, but I ask for your patience and understanding because it may take a very long time, and the more people agree to participate, the more expensive it will be. Some even persuade me to give up or delay fearing the high prices, but giving up after everything we have agreed on is the last thing on my mind, although I understand their concern which is still justified... At worst, I can record in my temporarily soundproofed room with some borrowed boom mic, but I have no idea what those recordings will sound like. I'm glad you know hkatic whose TTS sketches literally made me take a more serious interest in this and delve deeper into it, although there is still a lot to learn and understand. I know he is very busy because I read an interview with him on Telegram.hr and his YouTube channels have been inactive for a long time. If he has more time, I suppose he will get involved as much as he can and wants, but of course we will not disturb him; especially not me. Mateo, I like your observations and suggestions. This is all well and good, but I still think that somehow it should be incorporated directly into the settings themselves - either because of those who are bad with technical details, or because of better stability and more levels of the RHVoice itself. Let me ask you: Are you involved in any way in developing voices for Italian (since I assume that's your language), although I don't know how much that language is in the plan, but it's really so good if it is? |
Beta Was this translation helpful? Give feedback.
-
Hello,
Let me clarify:
1. Although I have a job, free time is not a problem for me. The problem
is, I'm not speech synthesis expert as someone pointed out, neither I'm an
expert in C++ coding. I am a Python and Android developer.
2. Since I have more than solid studio equipment, I can possibly contribute
by lending my voice, but nothing more than that, I'm afraid. In general, I
am not currently willing to participate in large projects, but I am willing
to give feedback where I feel competent to give feedback.
3. And finally, just a small remark: Since we are living in a period of
time when AI voices are most certainly taking over (just look at ElevenLabs
and OpenAI), I'm afraid that classic speech synthesis will soon become a
thing of the past, unfortunately. Especially if we consider the fact that
the AI voice speaks in whatever language you want with no extra effort. But
nevertheless, I truely support RhVoice as a compact and free HQ speech
synthesizer.
Best regards,
HKatic
…On Tue, May 14, 2024 at 7:23 PM Zvonimir Stanečić ***@***.***> wrote:
Hi,
Let's answer your all messages here inline
Hello for all.
Here I would present some of my revised proposals that I tried to send
through the official RHVoice email address, but the message not only did
not arrive, but my Outlook account was temporarily blocked directly from
Microsoft after this, which really shocked me.
Don't use e-mail from outlook. Switch to gmail instead. I am not, and
never be responsible for someone's mail provider. I know that it can be
harsh, but it is true.
I know you are too busy and have a lot to do (which is good anyway), so
let anyone who wants to join the discussion. I'm not saying, maybe I've
gone too far with my requests, but all criticism are welcome.
Because it uses less compression than the Microsoft One Core Voices,
RHVoice turns out to be better and more flexible, and it's also nice that
it takes up much less space than the commercial Nuance's Vocalizer
Expressive. Voices for Polish, Czech and Slovak read texts in Croatian and
Serbian Latin very well, and they also do it better than the voices of the
mentioned products. However, I think that all current and future voices
should better discover and recognize other languages, especially English,
which is still the lingua franca.
Although I don't speak that many languages, I tested many of them voices
to see how they work.
Polish voices have the foreign characters support, because of the fact
that polish people for example are following the sport events, in which
Croats and or serbs compete, smiles...
Current and new voices are just doing that what you want, but based on the
alphabet. Sorry.
Thus, the Nepali voice Dina that is available on Loader Pages is still not
on the RHVoice site, while the Turkmen voice Dunya works flawlessly as the
Windows Sapi 5 application, but there are some problems with the NVDA
add-on because nothing is heard, ie. no speech. Although I can't get into
the reasons for that, I admit that it confuses me that the Brazilian
Portuguese male voice Henrique is proprietary, available separately on the
site
rhvoice.com, ,
What has in common the Dina and Henrique? why are you mixing unrelated
things?
only as an app for Windows, and requires registration where Microsoft
e-addresses are blocked (and I use one just like that). Only that page has
a newer version of the NVDA RHVoice driver (2.0.19), while all others still
point to the old one (1.14.1).
Furthermore, as a user with two disabilities who likes various types and
experiments with voices and speech (and let's not forget that screen
readers are used by dyslexics and people with speech, motor and hearing
impairments in addition to the blind and visually impaired), I really want
that apart from setting the language, rate, pitch and volume are added by
other parameters that can be changed and set as desired, such as language
sinchronization between voices, modulation, aceleration, inflection, depth,
sharpness, quality of speech, roughness, transparency, loudness,
shrillness, seriousness, clarity, pauses after phrases, etc. The more all
kinds of various settings, the better, more customizable and more
accessible. So, some FOSS NVDA addons like
Newfon Speech, ,
IBMTTS, ,
WorldVoice, ,
ewen ESpeak, and others already have some of that interesting features,
which can certainly be used for the best.
It would also be good if all these and other options were implemented
directly in the driver, where they would be adjusted separately, because
NVDA is the FOSS application in which each synthesizer and addon can have
its own "autonomy", while commercial and closed JAWS does not allow this,
so additional settings they can change when they are added to the RHVoice
application that is already installed on Windows. In both cases, all
available voices and notifications of the new versions should be
implemented in applications and drivers where they can be checked/selected
and downloaded directly, as is already done in the Android app, although
not all voices are there yet. All of this will no doubt appeal to users who
want simplicity, as well as to those of us who like complexity and a
variety of advanced functions and additional commands...
When it is possible, I would like to slowly move from a statistical
parametric system to a deep learning method using neural networks and
artificial intelligence. I know that it is very difficult, demanding and
expensive, but still there are some good and more easily available FOSS
models, bots and networks, and the data would be stored on IPFS and
SeaweedFS protocols that can be used complementary and jointly for this
purpose. The only thing is that it requires a lot of resources, time and a
huge knowledge and database.
Forget about Dnn and Ai here. The priority is responsiveness of the tts.
Of course, there should also be a page for voluntary donations, because
without that it is very difficult, if not impossible, to survive and
continue with good work and further improvements.
Finally, I would like to inform you that I have agreed with your colleague
Zvonimir Stanečić, whose work, experience and perseverance amaze me more
and more, to send him good and appropriate recordings of the already redy
and finished texts, which will be used to develop and obtain new voices,
but it will go a long way slower and more difficult than I thought because
it is really difficult and not so cheap in these conditions to find a
suitable and high-quality studio, equipment, experienced engineers and
other collaborators willing to help wholeheartedly and with understanding
to do everything carefully and in detail as it should be, but still nothing
I cannot promise soon, although of course I have a strong and firm desire
to contribute as much as I can because this is a project for the common
good.
I appreciate it, and i am glad to see your recordings, along with
suggestions to the language.
If I'm not being too intrusive and so arrogant, I'm convinced that it
would be a really big deal if Mr. Hrvoje Katić
***@***.*** <https://github.com/hkatic>)
(Zvonimir certainly knows who I'm talking about) who is older than me,
more experienced, knows a lot more and has a better command and knowledge
of speech syntheses, screen readers and technology generally, would get
involved as well. is also a professional programmer, musician, audio and
recording engineer and producer, so he has better working conditions and
can do and contribute much more in a much shorter time than me...
Thanks in advanced.
Hrvoje is more busy that me. He is working in a professional company. I
don't think so that he has the tim.
—
Reply to this email directly, view it on GitHub
<#871 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7HOPVTTY375D5B2MO2SI3ZCJCA7AVCNFSM6AAAAABHWOHN7KVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TIMZWG4YDM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi, Hrvoje. Thank you for reaching out - I'm the one who called you out after all. I recently sent you an email, but I don't know if it reached you (it's also possible that I made a mistake when typing). Apart from what we are talking about here, I wanted to supplement one of your videos in which you compare the voices of the announcers and the obtained speech synthesizers, and I referred to some voices for AlfaNum's products, but I would not talk about that here. Unlike you, I'm just an enthusiast and I want to do a lot of things, but I don't know. The knowledge you have can certainly be applied to RHVoice. I agree with everything you said, which is why I mentioned the neural network method. Unfortunately, I don't know how voices are developed and trained, so if you are interested, discuss the details with uncle Zvonimir, whom you certainly know well. Greetings. |
Beta Was this translation helpful? Give feedback.
-
Mateo, I don't know Italian, but based on your name and surname, I thought you were Italian, but I apologize if I was wrong. Anyway, it's nice that we get the first Spanish voice for RHVoice and I applaud your work and effort. I hope you won't run into problems like a mentioned (not open source) Portuguese voice Henrique for which different rules apply and whose authors need funds. The good thing about FOSS speech synths like RHVoice is that they can be changed and adjusted again and again, so we'll see how it progresses. But in contrast to the syntheses I mentioned, with RHVoice the rule applies that each language has its own voices, so the settings of each of them cannot be exactly the same and equally acceptable for all languages; if you know what I'm trying to say. Regarding what @hkatic said, artificial intelligence is indeed progressing faster and faster, but it still cannot be fully integrated with screen readers, and it is unlikely that someone will just release something that is cross-platform, open source and accessible to everyone. |
Beta Was this translation helpful? Give feedback.
-
Here are two more of my ideas: demo and browser extensions. The demonstration can be on the website or as a portable application (online or offline) and would serve for easy, quick and simple testing and in appearance and purpose would not differ from similar ones. The text to be spoken is entered in the first field, followed by the options for choosing the language, voice, rate, pitch, volume and all those parameters I mentioned, "Listen entered text" and "Export as audio" and that's all. This is a good way to test the voices before downloading, and I'll also be less scolded by the dr. Zvonimir for downloading voices for languages I don't speak... Browser extensions would read entire pages or selected text on them, and would be a great alternative to the likes of "Read Aloud" which is not easy to use and in which not all voices are equally accessible. Even those with healthy vision would use it because it is simple and easy. |
Beta Was this translation helpful? Give feedback.
-
Hello for all.
Here I would present some of my revised proposals that I tried to send through the official RHVoice email address, but the message not only did not arrive, but my Outlook account was temporarily blocked directly from Microsoft after this, which really shocked me.
I know you are too busy and have a lot to do (which is good anyway), so let anyone who wants to join the discussion. I'm not saying, maybe I've gone too far with my requests, but all criticism are welcome.
Because it uses less compression than the Microsoft One Core Voices, RHVoice turns out to be better and more flexible, and it's also nice that it takes up much less space than the commercial Nuance's Vocalizer Expressive. Voices for Polish, Czech and Slovak read texts in Croatian and Serbian Latin very well, and they also do it better than the voices of the mentioned products. However, I think that all current and future voices should better discover and recognize other languages, especially English, which is still the lingua franca.
Although I don't speak that many languages, I tested many of them voices to see how they work. Thus, the Nepali voice Dina that is available on Loader Pages is still not on the RHVoice site, while the Turkmen voice Dunya works flawlessly as the Windows Sapi 5 application, but there are some problems with the NVDA add-on because nothing is heard, ie. no speech. Although I can't get into the reasons for that, I admit that it confuses me that the Brazilian Portuguese male voice Henrique is proprietary, available separately on the site
rhvoice.com, ,
only as an app for Windows, and requires registration where Microsoft e-addresses are blocked (and I use one just like that). Only that page has a newer version of the NVDA RHVoice driver (2.0.19), while all others still point to the old one (1.14.1). Furthermore, as a user with two disabilities who likes various types and experiments with voices and speech (and let's not forget that screen readers are used by dyslexics and people with speech, motor and hearing impairments in addition to the blind and visually impaired), I really want that apart from setting the language, rate, pitch and volume are added by other parameters that can be changed and set as desired, such as language sinchronization between voices, modulation, aceleration, inflection, depth, sharpness, quality of speech, roughness, transparency, loudness, shrillness, seriousness, clarity, pauses after phrases, etc. The more all kinds of various settings, the better, more customizable and more accessible. So, some FOSS NVDA addons like
Newfon Speech, ,
IBMTTS, ,
WorldVoice, ,
ewen ESpeak, and others already have some of that interesting features, which can certainly be used for the best.
It would also be good if all these and other options were implemented directly in the driver, where they would be adjusted separately, because NVDA is the FOSS application in which each synthesizer and addon can have its own "autonomy", while commercial and closed JAWS does not allow this, so additional settings they can change when they are added to the RHVoice application that is already installed on Windows. In both cases, all available voices and notifications of the new versions should be implemented in applications and drivers where they can be checked/selected and downloaded directly, as is already done in the Android app, although not all voices are there yet. All of this will no doubt appeal to users who want simplicity, as well as to those of us who like complexity and a variety of advanced functions and additional commands...
When it is possible, I would like to slowly move from a statistical parametric system to a deep learning method using neural networks and artificial intelligence. I know that it is very difficult, demanding and expensive, but still there are some good and more easily available FOSS models, bots and networks, and the data would be stored on IPFS and SeaweedFS protocols that can be used complementary and jointly for this purpose. The only thing is that it requires a lot of resources, time and a huge knowledge and database. Of course, there should also be a page for voluntary donations, because without that it is very difficult, if not impossible, to survive and continue with good work and further improvements.
Finally, I would like to inform you that I have agreed with your colleague Zvonimir Stanečić, whose work, experience and perseverance amaze me more and more, to send him good and appropriate recordings of the already redy and finished texts, which will be used to develop and obtain new voices, but it will go a long way slower and more difficult than I thought because it is really difficult and not so cheap in these conditions to find a suitable and high-quality studio, equipment, experienced engineers and other collaborators willing to help wholeheartedly and with understanding to do everything carefully and in detail as it should be, but still nothing I cannot promise soon, although of course I have a strong and firm desire to contribute as much as I can because this is a project for the common good.
If I'm not being too intrusive and so arrogant, I'm convinced that it would be a really big deal if Mr. Hrvoje Katić
(@hkatic)
(Zvonimir certainly knows who I'm talking about) who is older than me, more experienced, knows a lot more and has a better command and knowledge of speech syntheses, screen readers and technology generally, would get involved as well. is also a professional programmer, musician, audio and recording engineer and producer, so he has better working conditions and can do and contribute much more in a much shorter time than me...
Thanks in advanced.
Beta Was this translation helpful? Give feedback.
All reactions