Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add languages icons #351

Open
irina060981 opened this issue Apr 29, 2021 · 26 comments
Open

Add languages icons #351

irina060981 opened this issue Apr 29, 2021 · 26 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@irina060981
Copy link
Member

from online meeting with @abrasax

Add no langauge is detected icon - when the language was not detected by service or there is too less characters for detection (less then 5)
Add language code above text box

@irina060981
Copy link
Member Author

image

@monzug
Copy link

monzug commented Apr 29, 2021

Need to retest as Harry and I saw this icon after uploading files with the right languages and also after we said yes to all in the language pop-up.

@abrasax
Copy link

abrasax commented Apr 29, 2021 via email

@monzug
Copy link

monzug commented Apr 30, 2021

tested in Alpheios Translation Alignment editor 1.3.3 (build development.20210430.74)
I uploaded a file with English text and eng in the filename (set language is English). then in the translation, I entered Italian text and the set language is still English. I would have expected to have the language icon for the translation text (words are in Italian but language is set to English) and not for the Original in which words are in English and language is set to English. Surprisingly I got the opposite: language icon is the Original for English language and not icon in the Translation (Italian words with English as set language). see attachment

Screen Shot 2021-04-30 at 2 52 27 PM

@monzug monzug assigned irina060981 and unassigned monzug Apr 30, 2021
@monzug monzug added bug Something isn't working and removed waiting-verification labels Apr 30, 2021
@irina060981
Copy link
Member Author

Monica - I was not able to reproduce your result.
I recordered my tries - https://www.dropbox.com/s/cxnk3ohxgajsznl/Monica-351-lang%20detection.wmv?dl=0

Could you point what are doing the other way - and getting your failed result?

@monzug
Copy link

monzug commented May 3, 2021

see below few examples of failed language icon, tested in Alpheios Translation Alignment editor 1.3.3 (build development.20210503268)

  1. in both Original and Translation it does not recognize that language is Latin and I do not get no language detected icon
    Screen Shot 2021-05-03 at 3 46 59 PM

  2. here the Translation text is missing relevant info (not ready for alignment so I have to start over Translation)
    Screen Shot 2021-05-03 at 8 33 57 AM

  3. here languages in both Original and Translation are correct but for some reason I do still get the no language detected icon.
    Screen Shot 2021-05-03 at 8 31 24 AM

  4. here Translation language is set to English but it should be Italian and still I do not get the no language detected icon.
    Screen Shot 2021-05-03 at 8 29 54 AM

@irina060981
Copy link
Member Author

Monica, could describe me exact clicks for such cases?
Because I couldn't get your failed examples - as I shown in the video

@monzug
Copy link

monzug commented May 4, 2021

one more: same text, in Original is French, in Translation is English
Screen Shot 2021-05-04 at 2 01 35 PM

@irina060981
Copy link
Member Author

I could not reproduce it in updated application
Monica, could you recheck?

@monzug
Copy link

monzug commented May 20, 2021

I did enter text in both enter text boxes as per example 1) and language was Eng or EPO or Dan respectively. I did also try with amo, amas, amat, amamus, amatis, amant but the language was set to French not to Latin.
I do not think it works properly with text entered directly in edit text. it does work better with uploaded files.

  1. not reproducible

  2. not reproducible

  3. Please find below the file to upload. this is the case in which the same Latin text has the language detected icon in the Original but not in the Translation. why?
    20-05_10-53-alignment-lat-lat.zip

lat-to-be-checked

@monzug
Copy link

monzug commented May 21, 2021

One more, missing the language icon. see attachment

Screen Shot 2021-05-21 at 10 15 49 AM

This was referenced May 25, 2021
@irina060981
Copy link
Member Author

Will check later

@irina060981 irina060981 reopened this May 28, 2021
@monzug
Copy link

monzug commented May 28, 2021

to repro the missing language icons as per previous comment:

  1. generate an error by uploading an unsupported file type (as json file from choose file)
  2. enter text or copy text in Original and Translations boxes.
  3. you will notice that the Original box is missing the Language icon and Text or Tei icon

@alpheios-project alpheios-project deleted a comment from abrasax May 28, 2021
@alpheios-project alpheios-project deleted a comment from balmas Jun 1, 2021
@monzug
Copy link

monzug commented Jun 1, 2021

Arabic and Greek texts have both the no language is detected icon (this is a full json file that has been uploaded after passing a prior check on languages and saved with grc and ara in the file name)

Screen Shot 2021-06-01 at 10 04 09 AM

see zip file below
01-06_10-05-full-alignment-ara-grc.json.zip

@monzug
Copy link

monzug commented Jun 2, 2021

In the following scenario, the icon is missing in Original but it should be there as language is Italian

Screen Shot 2021-06-02 at 2 40 59 PM

@monzug
Copy link

monzug commented Jun 13, 2021

from @balmas 's email:
3) Autodetecting ge'ez text -- I think we don't do this well right now. When I enter the Ge'ez text, I get it detected as Amharic. Pietro was (maybe) getting Serbian. I am sure what library Irina ended up using for the language detection, but I think it would be nice if we could get this to work better. Maybe we could ask Pietro for the unicode range of the Ge'ez (fidal) text and do this a bit better?

@irina060981
Copy link
Member Author

@balmas, @monzug , @abrasax

  1. Autodetecting ge'ez text -- I think we don't do this well right now. When I enter the Ge'ez text, I get it detected as Amharic. Pietro was (maybe) getting Serbian. I am sure what library Irina ended up using for the language detection, but I think it would be nice if we could get this to work better. Maybe we could ask Pietro for the unicode range of the Ge'ez (fidal) text and do this a bit better?

I think it is not as easy as you describe in the comment.
We have the following steps:

  • some text was copied or uploaded from some text file
  • application checks if it is TEI, if not than does the next step
  • sends a request to the remote service to detect the language (max 200 characters)
  • retrieves detection result - it is a list of available languages
  • choose the most relieble and shows

On what step should application decide that final result is not properly and should use check by char code?

And also I should point again - that we check only plain texts - we don't try to detect TEI texts.
The only exception - DTS API links have predefined languages - for Betamasaheft - gez -it is manualy defined and not detected by the service.

@irina060981
Copy link
Member Author

@balmas , @abrasax , @monzug

I don't have enough information to check the detection service and icon.
We could have here several cases:

  • you could give me reproducable text samples (in correct char codes - not screenshots) - that give wrong detection result (you are not agreed with this result), I would check full answer from the service (isReliable, confidence) - and then we could discuss such exceptions of the service

  • may be such problems could happen from service availability failure (I didn't face it myself), then I need screenshot with the error from browser console / network tab - and we could decide how to handle with such failure

For now I could check only the following scenarios described by Monica (when I would have priority for this)

to repro the missing language icons as per previous comment:

  1. generate an error by uploading an unsupported file type (as json file from choose file)
  2. enter text or copy text in Original and Translations boxes.
  3. you will notice that the Original box is missing the Language icon and Text or Tei icon

Arabic and Greek texts have both the no language is detected icon (this is a full json file that has been uploaded after passing a prior check on languages and saved with grc and ara in the file name)

In this case - I should suppress icon, because we don't detect text uploaded from JSON.


If to be honest from my point of view - detection service works well and I like it, I didn't face with problems with it in normal workflow - paste text or enter it - detection goes well.
That's why we should check exceptions and only after that decide that this icon doesn't work properly.

@monzug
Copy link

monzug commented Jun 14, 2021 via email

@irina060981
Copy link
Member Author

Detection service gives the following result
(we use this - https://detectlanguage.com/ it has a demo textArea to test texts manually)

image

And you are right this service doesn't detect text correctly - it shows very low confidence and isRelieble === false

So if we had such a result from teh service - we could do additional check.

But @monzug , how we could get Serbian from such a text? Do you have such text samples?

@monzug
Copy link

monzug commented Jun 14, 2021 via email

@irina060981
Copy link
Member Author

No, I didn't change the service - we always use https://detectlanguage.com/
It has good background and good remarks.
We use free account - it has some limits for amount of detections.

And I don't really think that definition by char code manually or other services are really better for most cases.
But there could be some exceptions like such not widely spread language as Geez.

As I could see from the list of supported languages - https://detectlanguage.com/languages
Geez could not be detected - it is not supported.
If I understand right - Amharic and Geez - both are from Ethiopian group, that's why service could find some common words from Amharic.

We could test this service more widely and correctl- with checking confidence and reliable properties in the answer for different samples of the text.
If we want to spend time for testing the service, getting exceptions and handle with such exceptions inside our own service - than I believe we could make this result better, but I don't think that it would be really much better, but I am sure it could spend all our free time.
That's why a user has an ability to point language manually - and this detection is only an additional helpful feature. It is not a feature that really could prevent user from using the application from my point of view.

We discussed it with @abrasax , and he told that we don't want to spend much time for language detection.
Anyway I would be glad to get suggestions to make this better!

@monzug
Copy link

monzug commented Jun 15, 2021

issue 1) as reported above is still reproducible

Screen Shot 2021-06-15 at 12 22 50 PM

@monzug
Copy link

monzug commented Jul 20, 2021

what's left on this issue is:
no icons are generated when there is the unsupported file type error on screen

to repro the missing language icons as per previous comment:

  1. generate an error by uploading an unsupported file type (as json file from choose file)
  2. enter text or copy text in Original and Translations boxes.
  3. you will notice that the Original box is missing the Language icon and Text or Tei icon

I was not able to reproduce the 'ergo sum magistris' set to English, it was latin in my latest tests.

@abrasax
Copy link

abrasax commented Aug 31, 2021 via email

@monzug
Copy link

monzug commented Aug 31, 2021

when I went back to the align screen all the remaining portions and tiles were gone. Definitely a bug. Definitely has to be addressed before we give it to anyone.

I think this comment belong to issue #503

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants