-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle homophones (similar sounding words) #1493
Comments
You can't do that. Whatever the STT engine understands (whichever STT engine you use), that's what gets passed on in the pipeline to the intent recognition engine, not the other way around. That's only fixable in one of two ways:
I am not sure the second option is available in the underpinnings of the Nabu Casa STT engine, but maybe it's something @synesthesiam wants to take a look at.
What is the use case here? I mean... if you don't say the words and the recognized words get passed on to the intent recognition service (e.g. |
I'm aware, i'm not suggesting intent -> STT engine flow. here is put another way : Confidence scores, if they exist are probably the better way, i agree : but the implementation of that solution is probably more complex, far in the future and SST engine dependant, than a simple rewrite of the displayed text.
Yes, intent has already happen, action too. Note : I'm conviced that in the future, we will say less words (laziness), and yet have a full sentence written (more satisfying), this was a way to have that without much additional work. |
Written where? In the Assist dialog box? That gets displayed before any intent matching is done. Also, the plan is to only recognize grammatically correct sentences in order to train a recognition model that can "catch" more sentences than just those which were manually defined, so recognizing |
My two cents on this. I recommend doing something like this:
This accomplishes two things:
|
I handled this kind of issue in Rhasspy with a ":" operator in sentence templates, so "mais:mets" would match "mais" but output "mets". There were two output sentences too, one with the literally recognized text (mais) and one with the transformed text (mets). I could see adding this to hassil, but we need to make a clear case for it. I don't want to mask STT errors, but we also want to be robust to them. |
Another example of the same issue, this time in German #1373 (comment) |
For information, another similar issue is for numbers (1 or one), Edit : I finally found a better workaround using only one sentence based on default value defined by slot. |
@Kelesis that specific problem regarding numbers has been addressed and will be included in the following releases |
How hard would it be to match intent not by their text but by their SOUNDEX or equivalent algorithm?
What do you think? |
This is not wanted since it create exponential growth on the potential sentences. In French (and probably other language) there are multiple spelling for the same sound (like "Ouvrer / Ouvrez / Ouvré / Ouvrés / Ouvrée / Ouvrait / Ouvraient / ...") so a simple sentence with "Ouvrez les volets roulants" could be written as "Ouvrer les volets roulants" (which is perfectly correct grammatically and semantically) even "Ouvre haie lait veau lé roue lent" (incorrect grammatically and semantically). The STT engine can't decide on the former or the latter since there's no context, so it's perfectly right to choose either one (and it's 100% correct doing so). So it's wrong to blame STT here. In a YAML, you can't list all possibility and it would be impossible to match against those even if you did. |
That sounds a lot like this suggestion, doesn't it? |
Exactly like this. Thanks for linking it! |
The discussion @tetele linked has more info, but in short the plan is to have HA attempt matches first without and then with fuzzy recognition enabled in hassil. This is happening in text, though, so it's not as ideal as using something like SOUNDEX. However, we need to support many more languages than just English. |
Hey @synesthesiam, please have a look to my tinkering here and more specifically the tests (run with I've used Epitran to support many languages (close to a hundred) for G2P and implemented an fussy intent matching on top based on a IPA mapping. The intent are built using a tree where you have either a simple Basic node (simple text that must be here), an Optional node (a text that can be missing), a greedy Parametric node (a value, like I think it should match more or less what HA intent type that exist. Yet, it's able to match sentences like: |
I have a custom sentence in french like this :
[mets] [le] volume [à|a] {volume} [pourcent|%]
(it means, put the volume to x percent)
Often, Nabucasa STT understands
mais
and the sentence does not match. to make it work i changed to[mets|mais] [le] volume [à|a] {volume} [pourcent|%]
[mets|mais]
are two words that means put for the first one, and but for the second one.both are similar sounding words, confusing them is a dyslexia symptom.
We (probably) can't fix what Nabucasa STT says it understands, but maybe we can change the displayed sentence in the prompt ?
I thought of something like this :
[mets!|mais] [le] volume [à|a] {volume} [pourcent|%]
with
!
after the word that is the correct one.that ways even if Nabucasa STT understands
mais volume à 10 %
it will write it asmets volume à 10 %
this could also work in expansion rules.
to go even further, we could even imagine displaying
mets le volume à 10 %
even ifmais volume 10
was understood/said.with a sentence written like this :
[mets!|mais]! [le]! volume [à|a]! {volume} [pourcent|%!]!
notice the
!
after the]
as a way to allow the word not to be said, but still keeping it in the displayed sentence.when using
[]!
syntax if there is no!
inside[]
it will keep the first option.side note : i do feel like this was more a https://github.com/home-assistant/hassil issue, but most of the other in this project too.
The text was updated successfully, but these errors were encountered: