Add more API languages. #81

ariedov · 2019-08-24T12:16:23Z

Hey, love this repo!

I play DnD in Ukraine, and we use Russian as our primary language for the parties, so I would love to have this API available in different languages.

Maybe creating folders like en, ru and just storing the different json files there would be a good option?

I am ready to contribute, just don't want to deploy my own API :)

The text was updated successfully, but these errors were encountered:

bagelbits · 2019-09-05T21:18:36Z

That seems like a pretty interesting idea. I'm a little concerned about changing the folder structure (or in this case adding a folder structure) for the existing files. My other main concern is that we would need someone to maintain the different languages as well as make sure changes in files in one language end up propagating to the other languages. I, unfortunately, do not speak or write in Russian.

bagelbits · 2019-09-24T17:55:39Z

It might be better to some how store the different languages together but it still wouldn't handle the divergence of trying to maintain more than one language.

bagelbits · 2019-11-01T19:03:38Z

Were you thinking just the descriptions or all fields converted to Russian and, inevitably, other languages?

ariedov · 2019-11-02T16:38:46Z

Yeah, pretty much. And also having the ability to request a specific language in a GET param.

benjaminapetersen · 2019-11-05T20:38:09Z

Does the SRD exist in any other languages currently? There is prob a risk of copyright infringement as well, validating that translations are not pulling from proprietary (D&D) info. Just pointing that out as something to think about.

ogregoire · 2020-02-28T14:06:49Z

I am not a lawyer.

@benjaminapetersen yes the SRD exists in other languages. According to the OGL, it is allowed to translate the SRD as long as the translation is also under OGL. So yes, you may translate the SRD if you want. You shouldn't be able to get sued for that.

But currently there are no official translation of the SRD. (By "official", I mean WotC-approved.) There are official translations of books, but not any of the SRD.

Therefore I don't see why this should be included and part of this project goal. Plus, if things are translated from the D&D books and not the SRD, we have no way to know that and this project could be hosting copyright-infringing material without anyone noticing.

It's already hard to keep non-SRD monsters, spells, races, and subclasses out of this repo, I don't see why we should take the burden of doing it in languages we don't understand.

carloslancha · 2020-03-24T02:45:12Z

Hi guys!

I'd love to have this API available in spanish. I understand the problems @ogregoire is pointing out and the others about changing the project structure or mantaining the language. My proposition to do it would be:

Any pr adding a new language must be sent together with a link to the published SRD translation in order check that is following OGL. (ie: http://srd.nosolorol.com/DD5/index.html)
About the structure. All texts could be replaced with language keys (ie: "Barbarian" to "barbarian", "Skill: Animal Handling" to "skill-animal-handling") and a new .json per language created (enUS.jon, esES.json, ruRU.json...), linking each language key with the translated text:

enUS.json

{
   "barbarian: "Barbarian",
   "skill-animal-handling": "Skill: Animal Handling",
   ....
}

then, during the building process and before refreshing the database, automatically create new localized files (5e-SRD-Classes-enUS.json. 5e-SRD-Classes-esES.json...) replacing those keys with the translated text and all the urls to include the language (/api/proficiencies/skill-animal-handling to /api/en/proficiencies/skill-animal-handling or /api/es/proficiencies/skill-animal-handling), and then deploy'em to the db.

About the maintenance. Following my proposition there will be multiple language files with the translations. In case you add any new key to the enUS one you don't need to worry about the other languages to be updated. In the replacement process described in the previous step, this can be configured to get the key from the enUS.json file in case that key is not found on the language being processed.
In this way if a translation is missing at least the original one in english will appear, waiting for the key to be added on that language.

:)

ogregoire · 2020-03-24T10:42:54Z

This could work, indeed. But that'd require a lot of work to do the mapping which I don't know how to do. @bagelbits any insight on how to do so?

However I wouldn't go as far as to include the country so far (enUS, esES), because there are simply not enough translations yet. Also I don't like the file naming: usually, in the programming languages I know, different locales are named as _<language> or _<language>_<country>, where <language> is the 2-characters ISO 639-1 representation of a language and <country> would be the 2-characters ISO 3166-1 representation of a country.

So basically, I'd recommend using:

5e-SRD-Classes_en.json
5e-SRD-Classes_es.json

carloslancha · 2020-03-24T14:59:53Z

I'm ok with that naming, I was just following the same pattern of the current files (using -) and adding the country just to have it prepared for the future, but you're right, there's not enough translations yet.

For the mapping I was thinking on a simple replacement script that gets the value of name keys on each .json (that's the language key), looks for the language key the language file and replace the original value with the translated one.

I can try later to write a POC for this.

The more tedious work will be replacing the current values with the keys, but I think I can automate it too to replace the original files values with the keys and generate the first language (en).

ogregoire · 2020-03-24T15:25:19Z

How would you deal with incomplete or in progress translation?

carloslancha · 2020-03-24T16:56:29Z

Taking those language keys not found on the incomplete or in in progress translation from the "master" language, english.

What I saw in several projects I worked in, is use that master language and add in the end of the text (Copy from English)

carloslancha · 2020-03-25T02:50:08Z

Here you can find the POC: #158

bagelbits · 2020-03-25T18:36:15Z

I have some thoughts but I'll have to come back to this in a day or two. I haven't been exactly in the best headspace the last few days. Though I do like the direction where this is going.

bagelbits · 2020-04-03T21:48:49Z

Sorry that took so long. I left a comment on that PR just for how we're keying everything. I think we could probably clean it up a bit based off of that but I really like this approach. It's simple and elegant, my favorite way to solve a problem. :D

bagelbits · 2020-04-03T21:50:38Z

If we don't want values to be lists, I'd suggest keys along the lines of LANG-KEY-strength-description-1 instead? What do y'all think? The only thing that I think could go wrong with this is key collisions?

Javrd · 2020-11-08T21:39:29Z

@carloslancha @bagelbits @fergcb
Hi, I've ask this last week about other languages colab. and found this ticket.
I've made a first json version of the monsters from spanish srd. It has a sightly different schema but I think it could be useful for this ticket. You can find it on:

https://github.com/Javrd/spanish-srd5.1-crawl/releases/tag/v0.1.0

bagelbits · 2020-11-08T21:52:38Z

@Javrd That's really useful! I think the first step is to pick up the work from the POC. This would break all of the english language into a separate doc that could then be hot-swapped for alternative language files. I think the current state is that the POC is sound, but naming conventions need to be updated, and I think there a bunch of merge conflicts, so the language file would probably have to start over.

Redmega · 2021-07-27T15:02:22Z

I'm worried about how this will actually get stored in the backend. Our json <-> mongodb pipeline would need to be altered a bit.

Do we make one database per language? Keep languages as separate collections? Do we include translations in the documents themselves?

https://stackoverflow.com/questions/23802834/multilingual-data-modeling-on-mongodb

There's a few good approaches on this SO question that are worth exploring or feeling out

bagelbits · 2021-07-27T19:06:41Z

Hmmmm. I think either separate db per language or separate collections? I'm trying to think about how to support this from a GraphQL standpoint.

I guess it also begs the question on the api, how do want to distinguish which language? Would that be in the URL or as a param?

Redmega · 2021-07-28T03:30:12Z

I think the API side can be flexible. We can default to something like /api/:lang/[...], and have redirects in place from a middleware detecting the Accept-Language header. Idk if a query param is the right call here.

djurnamn · 2022-02-24T01:05:23Z

Hey!

I just created a pull request (#445) for another approach to multilingual support. It allows us to parse the source data and separate it into what should be translatable (locale) and what shouldn't be (templates). It also allows us to build the source data back together with an altered locale file, resulting in a translated version of the database.

Would love to hear your thoughts on it!

bagelbits · 2022-02-24T04:58:38Z

Oh dang. I completely forgot to encapsulate the alternative design we came up with in the Discord. I should do that here. I'll take a look at your PR though.

djurnamn · 2022-02-24T08:49:19Z

I didn't know there was a Discord 😅, I'll check that out and get up to speed. Okay cool, let me know if you have any questions about it!

bagelbits · 2022-02-24T10:34:10Z

@djurnamn Right. So. Here's my alternative suggestion:

I've been thinking about the multi-language support for the API a little bit more. And I think the design of one set of collections per language might be flawed/does not scale. On the the one hand, it means you can just copy the file of all text from one language, and translate it in line. However, I don't think the models in the API will easily support hot swapping which collection you're talking to based on the incoming language request. And I don't want to add a new set of models for each new supported language. The API should not care about new languages that get added after we start supporting them.

We can handle this one of two ways.

Option A

Convert any string or array of strings to a hash where the key is the ISO language code and the value is the string/array in that language:

{
  "description": {
    "en_us": "something",
    "pt_br": "algo",
    "ja_jp": "なにか"
  }
}

or

{
  "description": {
    "en_us": ["something"],
    "pt_br": ["algo"],
    "ja_jp": ["なにか"]
  }
}

Option B

Option B is similar to Option A, except backwards compatible. Namely we keep strings and arrays of strings the same. However, we add an additional key for each. The key would be same but we append ::localization to it. For example:

{
  "description": "something"
  "description::localization": {
    "en_us": "something",
    "pt_br": "algo",
    "ja_jp": "なにか"
  }
}

or

{
  "description": "something"
  "description::localization": {
    "en_us": ["something"],
    "pt_br": ["algo"],
    "ja_jp": ["なにか"]
  }
}

Either is a pretty massive change, but this is an exceptionally complicated feature. I'm honestly, leaning towards Option A, but I could be convinced for B.

bagelbits · 2022-02-24T10:35:54Z

You can find the original post in Discord here.

And if you haven't joined the Discord server yet. Here's the invite.

djurnamn · 2022-02-24T15:16:44Z

Okay, that's cool! My populate templates script could fairly easily be modified to put the data back together in either of those shapes. And I could extend the part currently reading from one locale file to iterate through a locales folder, allowing us to rebuild the source files with any set of languages we like.

Thanks for the invite! :)

bagelbits · 2022-02-24T19:50:39Z

Excellent! Yeah might thoughts are you would basically build two scripts. One is a throwaway script that just coerces the data into this new shape. The second is a helper/tool script that will just prepare the database for a new language. Like adding in "pt_br": "", into every localization map.

djurnamn · 2022-02-24T21:43:07Z

Yeah, that sounds good. Let me know how I can help! I think at least the logic for distinguishing between translatable and non-translatable values in my script could be useful for that.

It would be cool to have the locales separately in some standardized format (like WebExtensions json) so that they can be pulled into, and maintained in, a translation management system. And perhaps then, the second script you mention could optionally parse the locale files and add their values in the localization map.

bagelbits · 2022-03-26T19:26:46Z

@djurnamn Sorry for taking so long to respond. However, we now have semantic versioning for the docker images that get built for the DB, so I feel way more comfortable with the breaking change this will cause.

It would be cool to have the locales separately in some standardized format (like WebExtensions json) so that they can be pulled into, and maintained in, a translation management system. And perhaps then, the second script you mention could optionally parse the locale files and add their values in the localization map.

Can you say more about this? Are you saying having the locale files being separate from the rest of the data similar to your initial proposal? That is technically doable if it gets all stitched together before getting shoved into the DB.

I think I'm still leaning towards Option A if we go that route. Thoughts?

djurnamn · 2022-03-31T15:24:31Z

Hey @bagelbits! Oh, that's cool!

Yeah, I guess that just felt like a more manageable way to maintain the translated content. The compiled version would still be what you outlined in Option A. If the build script, that combines the translatable and non-translatable content into the preferred format, is outside of the scope for what this repo should be, I could just maintain that separately.

I'll start working on a new version of the build script that outputs the compiled data in the Option A format.

bagelbits · 2022-03-31T16:19:38Z

@djurnamn I think Option A as the final stitched product makes a lot of sense. I can also see how splitting the locales into their own files makes it a lot easier to manage and work within the repo itself. It also means if you want to translate to a new language, you just copy from the language you feel comfortable translating from. Does a mix of both sound good? (Assuming that made sense)

bagelbits · 2022-03-31T16:20:17Z

Also, thank you for doing the legwork on this one!

djurnamn · 2022-03-31T19:25:01Z

Sounds great!

Yeah, no problem! Thanks for creating and maintaining this, it opens up so many cool possibilities. :)

djurnamn · 2022-04-05T18:04:26Z

I don't know if you get notifications from changes made in the pull request, but I managed to get this working the other day if you wanna try it out! :)

bagelbits · 2022-04-05T18:20:28Z

Excellent! I'll try to take a look this week!

patrickelectric · 2023-02-25T12:51:17Z

Hey, any update on this ?

djurnamn · 2023-02-25T17:23:32Z

I rewrote the parser and broke it out into its own repo: https://github.com/sospodd/5e-srd-translations

It identifies the translatable content and creates a separate locale file for them. It can also create templates based on the structure of 5e-database json files where all translated content is represented by placeholders (paths in the generated locale json structure).

patrickelectric · 2023-02-28T14:55:49Z

That's awesome @djurnamn! But is it a standalone project or would be integrated in this repository ?

djurnamn · 2023-02-28T20:19:03Z

I think it can make sense to maintain the parser and generated source locale separately. And anybody who wants to create a translation in their own language can just fork that and get to work.

I made a version of the template population script in my original pull request that took values from multiple locales and generated the 5e-database json files in the "Option A" format (where each translatable property is an object with locale keys and translated values). It wasn't super pretty, but I believe it worked. I'll try to revisit that soon and add it to the 5e-srd-translations repo as well. Not really sure how to proceed from there but we'd at least be able to generate the data in the desired shape.

bagelbits assigned carloslancha Apr 3, 2020

bagelbits mentioned this issue Nov 8, 2020

POC: Add supoort for multi language #158

Open

Platiplus mentioned this issue Dec 12, 2021

Feature localization #434

Closed

djurnamn mentioned this issue Feb 24, 2022

perf(multilingual-support): Multi-language Support #445

Open

bagelbits unassigned carloslancha Feb 24, 2022

Add more API languages. #81

Add more API languages. #81

Comments

ariedov commented Aug 24, 2019

bagelbits commented Sep 5, 2019 • edited

bagelbits commented Sep 24, 2019

bagelbits commented Nov 1, 2019

ariedov commented Nov 2, 2019

benjaminapetersen commented Nov 5, 2019

ogregoire commented Feb 28, 2020 • edited

carloslancha commented Mar 24, 2020 • edited

ogregoire commented Mar 24, 2020 • edited

carloslancha commented Mar 24, 2020

ogregoire commented Mar 24, 2020

carloslancha commented Mar 24, 2020 • edited

carloslancha commented Mar 25, 2020

bagelbits commented Mar 25, 2020

bagelbits commented Apr 3, 2020

bagelbits commented Apr 3, 2020

Javrd commented Nov 8, 2020 • edited

bagelbits commented Nov 8, 2020

Redmega commented Jul 27, 2021

bagelbits commented Jul 27, 2021

Redmega commented Jul 28, 2021

djurnamn commented Feb 24, 2022

bagelbits commented Feb 24, 2022

djurnamn commented Feb 24, 2022

bagelbits commented Feb 24, 2022

Option A

Option B

bagelbits commented Feb 24, 2022

djurnamn commented Feb 24, 2022

bagelbits commented Feb 24, 2022

djurnamn commented Feb 24, 2022

bagelbits commented Mar 26, 2022

djurnamn commented Mar 31, 2022

bagelbits commented Mar 31, 2022

bagelbits commented Mar 31, 2022

djurnamn commented Mar 31, 2022

djurnamn commented Apr 5, 2022

bagelbits commented Apr 5, 2022

patrickelectric commented Feb 25, 2023

djurnamn commented Feb 25, 2023

patrickelectric commented Feb 28, 2023

djurnamn commented Feb 28, 2023

bagelbits commented Sep 5, 2019 •

edited

ogregoire commented Feb 28, 2020 •

edited

carloslancha commented Mar 24, 2020 •

edited

ogregoire commented Mar 24, 2020 •

edited

carloslancha commented Mar 24, 2020 •

edited

Javrd commented Nov 8, 2020 •

edited