Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add swiss german as a language #164

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bweben
Copy link

@bweben bweben commented Jan 25, 2023

Hello

I added Swiss German as another language.
In order to do that, I had to move the training files into a subfolder named after the ISO 639_3 code as the _1 is not unique between German and Swiss German. For that reason I also had to change the name of the test files.
If this change is not OK, I am open for suggestions on how to "fix" this problem :)

The accurracy is not that great, but this was kinda expected as Swiss German is pretty similar to German. Maybe with better training data this could be fixed. However due to the "grouping" by the ISO 639_1 code, it is probably possible to have a prediction for Swiss German and German simultanously and thus "improving" the accurracy, as far as I understand.

I got all data from here. I used the 2021 Wikipedia 100k for the training and the 2017 Web 100k for the test.

Thanks for your feedback :)

@bweben bweben mentioned this pull request Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant