Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more API languages. #81

Open
ariedov opened this issue Aug 24, 2019 · 39 comments
Open

Add more API languages. #81

ariedov opened this issue Aug 24, 2019 · 39 comments

Comments

@ariedov
Copy link

ariedov commented Aug 24, 2019

Hey, love this repo!

I play DnD in Ukraine, and we use Russian as our primary language for the parties, so I would love to have this API available in different languages.

Maybe creating folders like en, ru and just storing the different json files there would be a good option?

I am ready to contribute, just don't want to deploy my own API :)

@bagelbits
Copy link
Collaborator

bagelbits commented Sep 5, 2019

That seems like a pretty interesting idea. I'm a little concerned about changing the folder structure (or in this case adding a folder structure) for the existing files. My other main concern is that we would need someone to maintain the different languages as well as make sure changes in files in one language end up propagating to the other languages. I, unfortunately, do not speak or write in Russian.

@bagelbits
Copy link
Collaborator

It might be better to some how store the different languages together but it still wouldn't handle the divergence of trying to maintain more than one language.

@bagelbits
Copy link
Collaborator

Were you thinking just the descriptions or all fields converted to Russian and, inevitably, other languages?

@ariedov
Copy link
Author

ariedov commented Nov 2, 2019

Yeah, pretty much. And also having the ability to request a specific language in a GET param.

@benjaminapetersen
Copy link

Does the SRD exist in any other languages currently? There is prob a risk of copyright infringement as well, validating that translations are not pulling from proprietary (D&D) info. Just pointing that out as something to think about.

@ogregoire
Copy link
Collaborator

ogregoire commented Feb 28, 2020

I am not a lawyer.

@benjaminapetersen yes the SRD exists in other languages. According to the OGL, it is allowed to translate the SRD as long as the translation is also under OGL. So yes, you may translate the SRD if you want. You shouldn't be able to get sued for that.

But currently there are no official translation of the SRD. (By "official", I mean WotC-approved.) There are official translations of books, but not any of the SRD.

Therefore I don't see why this should be included and part of this project goal. Plus, if things are translated from the D&D books and not the SRD, we have no way to know that and this project could be hosting copyright-infringing material without anyone noticing.

It's already hard to keep non-SRD monsters, spells, races, and subclasses out of this repo, I don't see why we should take the burden of doing it in languages we don't understand.

@carloslancha
Copy link

carloslancha commented Mar 24, 2020

Hi guys!

I'd love to have this API available in spanish. I understand the problems @ogregoire is pointing out and the others about changing the project structure or mantaining the language. My proposition to do it would be:

  • Any pr adding a new language must be sent together with a link to the published SRD translation in order check that is following OGL. (ie: http://srd.nosolorol.com/DD5/index.html)

  • About the structure. All texts could be replaced with language keys (ie: "Barbarian" to "barbarian", "Skill: Animal Handling" to "skill-animal-handling") and a new .json per language created (enUS.jon, esES.json, ruRU.json...), linking each language key with the translated text:

enUS.json

{
   "barbarian: "Barbarian",
   "skill-animal-handling": "Skill: Animal Handling",
   ....
}

then, during the building process and before refreshing the database, automatically create new localized files (5e-SRD-Classes-enUS.json. 5e-SRD-Classes-esES.json...) replacing those keys with the translated text and all the urls to include the language (/api/proficiencies/skill-animal-handling to /api/en/proficiencies/skill-animal-handling or /api/es/proficiencies/skill-animal-handling), and then deploy'em to the db.

  • About the maintenance. Following my proposition there will be multiple language files with the translations. In case you add any new key to the enUS one you don't need to worry about the other languages to be updated. In the replacement process described in the previous step, this can be configured to get the key from the enUS.json file in case that key is not found on the language being processed.
    In this way if a translation is missing at least the original one in english will appear, waiting for the key to be added on that language.

:)

@ogregoire
Copy link
Collaborator

ogregoire commented Mar 24, 2020

This could work, indeed. But that'd require a lot of work to do the mapping which I don't know how to do. @bagelbits any insight on how to do so?

However I wouldn't go as far as to include the country so far (enUS, esES), because there are simply not enough translations yet. Also I don't like the file naming: usually, in the programming languages I know, different locales are named as _<language> or _<language>_<country>, where <language> is the 2-characters ISO 639-1 representation of a language and <country> would be the 2-characters ISO 3166-1 representation of a country.

So basically, I'd recommend using:

5e-SRD-Classes_en.json
5e-SRD-Classes_es.json

@carloslancha
Copy link

I'm ok with that naming, I was just following the same pattern of the current files (using -) and adding the country just to have it prepared for the future, but you're right, there's not enough translations yet.

For the mapping I was thinking on a simple replacement script that gets the value of name keys on each .json (that's the language key), looks for the language key the language file and replace the original value with the translated one.

I can try later to write a POC for this.

The more tedious work will be replacing the current values with the keys, but I think I can automate it too to replace the original files values with the keys and generate the first language (en).

@ogregoire
Copy link
Collaborator

How would you deal with incomplete or in progress translation?

@carloslancha
Copy link

carloslancha commented Mar 24, 2020

Taking those language keys not found on the incomplete or in in progress translation from the "master" language, english.

What I saw in several projects I worked in, is use that master language and add in the end of the text (Copy from English)

@carloslancha
Copy link

Here you can find the POC: #158

@bagelbits
Copy link
Collaborator

I have some thoughts but I'll have to come back to this in a day or two. I haven't been exactly in the best headspace the last few days. Though I do like the direction where this is going.

@bagelbits
Copy link
Collaborator

Sorry that took so long. I left a comment on that PR just for how we're keying everything. I think we could probably clean it up a bit based off of that but I really like this approach. It's simple and elegant, my favorite way to solve a problem. :D

@bagelbits
Copy link
Collaborator

If we don't want values to be lists, I'd suggest keys along the lines of LANG-KEY-strength-description-1 instead? What do y'all think? The only thing that I think could go wrong with this is key collisions?

@Javrd
Copy link

Javrd commented Nov 8, 2020

@carloslancha @bagelbits @fergcb
Hi, I've ask this last week about other languages colab. and found this ticket.
I've made a first json version of the monsters from spanish srd. It has a sightly different schema but I think it could be useful for this ticket. You can find it on:

https://github.com/Javrd/spanish-srd5.1-crawl/releases/tag/v0.1.0

@bagelbits
Copy link
Collaborator

@Javrd That's really useful! I think the first step is to pick up the work from the POC. This would break all of the english language into a separate doc that could then be hot-swapped for alternative language files. I think the current state is that the POC is sound, but naming conventions need to be updated, and I think there a bunch of merge conflicts, so the language file would probably have to start over.

@Redmega
Copy link
Contributor

Redmega commented Jul 27, 2021

I'm worried about how this will actually get stored in the backend. Our json <-> mongodb pipeline would need to be altered a bit.

Do we make one database per language? Keep languages as separate collections? Do we include translations in the documents themselves?

https://stackoverflow.com/questions/23802834/multilingual-data-modeling-on-mongodb

There's a few good approaches on this SO question that are worth exploring or feeling out

@bagelbits
Copy link
Collaborator

Hmmmm. I think either separate db per language or separate collections? I'm trying to think about how to support this from a GraphQL standpoint.

I guess it also begs the question on the api, how do want to distinguish which language? Would that be in the URL or as a param?

@Redmega
Copy link
Contributor

Redmega commented Jul 28, 2021

I think the API side can be flexible. We can default to something like /api/:lang/[...], and have redirects in place from a middleware detecting the Accept-Language header. Idk if a query param is the right call here.

@djurnamn
Copy link

Hey!

I just created a pull request (#445) for another approach to multilingual support. It allows us to parse the source data and separate it into what should be translatable (locale) and what shouldn't be (templates). It also allows us to build the source data back together with an altered locale file, resulting in a translated version of the database.

Would love to hear your thoughts on it!

@bagelbits
Copy link
Collaborator

Oh dang. I completely forgot to encapsulate the alternative design we came up with in the Discord. I should do that here. I'll take a look at your PR though.

@djurnamn
Copy link

I didn't know there was a Discord 😅, I'll check that out and get up to speed. Okay cool, let me know if you have any questions about it!

@bagelbits
Copy link
Collaborator

@djurnamn Right. So. Here's my alternative suggestion:

I've been thinking about the multi-language support for the API a little bit more. And I think the design of one set of collections per language might be flawed/does not scale. On the the one hand, it means you can just copy the file of all text from one language, and translate it in line. However, I don't think the models in the API will easily support hot swapping which collection you're talking to based on the incoming language request. And I don't want to add a new set of models for each new supported language. The API should not care about new languages that get added after we start supporting them.

We can handle this one of two ways.

Option A

Convert any string or array of strings to a hash where the key is the ISO language code and the value is the string/array in that language:

{
  "description": {
    "en_us": "something",
    "pt_br": "algo",
    "ja_jp": "なにか"
  }
}

or

{
  "description": {
    "en_us": ["something"],
    "pt_br": ["algo"],
    "ja_jp": ["なにか"]
  }
}

Option B

Option B is similar to Option A, except backwards compatible. Namely we keep strings and arrays of strings the same. However, we add an additional key for each. The key would be same but we append ::localization to it. For example:

{
  "description": "something"
  "description::localization": {
    "en_us": "something",
    "pt_br": "algo",
    "ja_jp": "なにか"
  }
}

or

{
  "description": "something"
  "description::localization": {
    "en_us": ["something"],
    "pt_br": ["algo"],
    "ja_jp": ["なにか"]
  }
}

Either is a pretty massive change, but this is an exceptionally complicated feature. I'm honestly, leaning towards Option A, but I could be convinced for B.

@bagelbits
Copy link
Collaborator

You can find the original post in Discord here.

And if you haven't joined the Discord server yet. Here's the invite.

@djurnamn
Copy link

Okay, that's cool! My populate templates script could fairly easily be modified to put the data back together in either of those shapes. And I could extend the part currently reading from one locale file to iterate through a locales folder, allowing us to rebuild the source files with any set of languages we like.

Thanks for the invite! :)

@bagelbits
Copy link
Collaborator

Excellent! Yeah might thoughts are you would basically build two scripts. One is a throwaway script that just coerces the data into this new shape. The second is a helper/tool script that will just prepare the database for a new language. Like adding in "pt_br": "", into every localization map.

@djurnamn
Copy link

Yeah, that sounds good. Let me know how I can help! I think at least the logic for distinguishing between translatable and non-translatable values in my script could be useful for that.

It would be cool to have the locales separately in some standardized format (like WebExtensions json) so that they can be pulled into, and maintained in, a translation management system. And perhaps then, the second script you mention could optionally parse the locale files and add their values in the localization map.

@bagelbits
Copy link
Collaborator

@djurnamn Sorry for taking so long to respond. However, we now have semantic versioning for the docker images that get built for the DB, so I feel way more comfortable with the breaking change this will cause.

It would be cool to have the locales separately in some standardized format (like WebExtensions json) so that they can be pulled into, and maintained in, a translation management system. And perhaps then, the second script you mention could optionally parse the locale files and add their values in the localization map.

Can you say more about this? Are you saying having the locale files being separate from the rest of the data similar to your initial proposal? That is technically doable if it gets all stitched together before getting shoved into the DB.

I think I'm still leaning towards Option A if we go that route. Thoughts?

@djurnamn
Copy link

Hey @bagelbits! Oh, that's cool!

Yeah, I guess that just felt like a more manageable way to maintain the translated content. The compiled version would still be what you outlined in Option A. If the build script, that combines the translatable and non-translatable content into the preferred format, is outside of the scope for what this repo should be, I could just maintain that separately.

I'll start working on a new version of the build script that outputs the compiled data in the Option A format.

@bagelbits
Copy link
Collaborator

@djurnamn I think Option A as the final stitched product makes a lot of sense. I can also see how splitting the locales into their own files makes it a lot easier to manage and work within the repo itself. It also means if you want to translate to a new language, you just copy from the language you feel comfortable translating from. Does a mix of both sound good? (Assuming that made sense)

@bagelbits
Copy link
Collaborator

Also, thank you for doing the legwork on this one!

@djurnamn
Copy link

Sounds great!

Yeah, no problem! Thanks for creating and maintaining this, it opens up so many cool possibilities. :)

@djurnamn
Copy link

djurnamn commented Apr 5, 2022

I don't know if you get notifications from changes made in the pull request, but I managed to get this working the other day if you wanna try it out! :)

@bagelbits
Copy link
Collaborator

Excellent! I'll try to take a look this week!

@patrickelectric
Copy link

Hey, any update on this ?

@djurnamn
Copy link

I rewrote the parser and broke it out into its own repo: https://github.com/sospodd/5e-srd-translations

It identifies the translatable content and creates a separate locale file for them. It can also create templates based on the structure of 5e-database json files where all translated content is represented by placeholders (paths in the generated locale json structure).

@patrickelectric
Copy link

That's awesome @djurnamn! But is it a standalone project or would be integrated in this repository ?

@djurnamn
Copy link

I think it can make sense to maintain the parser and generated source locale separately. And anybody who wants to create a translation in their own language can just fork that and get to work.

I made a version of the template population script in my original pull request that took values from multiple locales and generated the 5e-database json files in the "Option A" format (where each translatable property is an object with locale keys and translated values). It wasn't super pretty, but I believe it worked. I'll try to revisit that soon and add it to the 5e-srd-translations repo as well. Not really sure how to proceed from there but we'd at least be able to generate the data in the desired shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants