Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Wiktionary files #47

Open
page200 opened this issue Sep 19, 2020 · 6 comments
Open

Support for Wiktionary files #47

page200 opened this issue Sep 19, 2020 · 6 comments

Comments

@page200
Copy link

page200 commented Sep 19, 2020

Support for Wiktionary would be great. There are different languages, here for example are Portuguese files: https://dumps.wikimedia.org/ptwiktionary/20200901/

@wtetsu
Copy link
Owner

wtetsu commented Sep 26, 2020

Thank you for your suggestion.

I checked the data but I feel, converting the data into TSV by user, and importing it to Mouse Dictionary, is fine.
(Wiktionary -> TSV -> Mouse Dictionary)

That's because, as far as I checked the Wiktionary data, each entry could be very large.
And, the format is not necessarily suitable for Mouse Dictionary view.
For instance, it has many Wikimedia-specific markup, that Mouse Dictionary doesn't handle it.

For that reason, users convert the XML file into TSV file as the user like, and import it, is a good solution for it for the moment.

@page200
Copy link
Author

page200 commented Sep 26, 2020

Thanks for having a look!

Where can I find a script to convert Wiktionary to TSV, or an example of what the TSV format should look like?

The Wikimedia-specific markup probably can be ingored for now.

I'm looking forward to using Mouse Dictionary with my languages. :)

@wtetsu
Copy link
Owner

wtetsu commented Oct 12, 2020

I don't know such a tool, but I may develop a tool for it in the future.

@GrimPixel
Copy link

I know PyGlossary https://github.com/ilius/pyglossary, which supports the Zim format for Kiwix https://www.kiwix.org/.

@yuis-ice
Copy link

+1 and the Wikipedia dump would be an amazing enhancement too.

I agree, the users might be able to do that by their own, ofc. But as a one who actually did that for Spanish, parsing a dictionary dump data, ugh that was a hell lot of work. I'm sure that many must be happy if there will be a plugin or such available in the Mouse Dictionary space. Like enabling users a quick setup that does a series of configurations in a friendly interface to import a dataset, like the Wiktionary dataset or Wikipedia dataset, as we are discussing here. Those plugins would open up the true potential to the world.

@wtetsu
Copy link
Owner

wtetsu commented May 19, 2023

It may be a good idea to leave to plug-ins what I cannot create as standard functionality for various reasons, but as far as I know, Chrome Extension (v3) does not allow execution of any code other than what is in the package 😕

https://developer.chrome.com/docs/extensions/mv3/intro/mv3-overview/#remotely-hosted-code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants