Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding three-letter ISO code for languages #17

Open
CasperWSchmidt opened this issue Oct 4, 2022 · 10 comments
Open

Adding three-letter ISO code for languages #17

CasperWSchmidt opened this issue Oct 4, 2022 · 10 comments

Comments

@CasperWSchmidt
Copy link

Hi there
We currently do some validation of language codes in our system. The validation is done based on a regex ^[a-z]{3}$ but I would like to tighten the validation to actual ISO codes. From what I can see in this repo, only the two-letter ISO codes are part of the translations. Is it feasible to add the three-letter ISO codes as well?

Also language info is not part of the main package Nager.Country, but part of Nager.Country.Translation, but isn't it relevant to have the spoken language(s) of a country in the main package, like having currencies? Then translations can stay in a separate package to keep the size down (as noted in #2)

@tinohager
Copy link
Member

I think we only need a dictionary with the mapping
https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

Do you think this is the best language code ISO 639-3?

@CasperWSchmidt
Copy link
Author

Well, according to https://iso639-3.sil.org/about/relationships ISO 639-3 was devised to provide a comprehensive set of identifiers for all languages for use in a wide range of applications, including linguistics, lexicography and internationalization of information systems. The page also describes the differences between 639-1, 639-2 and 639-3. So basically the two-letter ISO 639-1 standard is a subset of the three-letter standard ISO 639-3.

I believe that what is the best standard depends on what it should be used for. Is it simply a set of "overall"/"main" languages spoken in the country or is it necessary to have more fine-grained options (an example is arabic)

@IngBertolini
Copy link

I think that would be useful too! It would be nice to have also 3 letters code for languages just like it is for countries. It would make this library even more complete and robust!

@tinohager
Copy link
Member

tinohager commented Oct 17, 2022

Can someone validate the data? Datasource: https://datahub.io/core/language-codes

using System;
using System.Collections.Generic;

public class Program {
    public static void Main() {
        var items = new Dictionary <string, string> ();
        items.Add("aa", "aar");
        items.Add("ab", "abk");
        items.Add("af", "afr");
        items.Add("ak", "aka");
        items.Add("sq", "alb");
        items.Add("am", "amh");
        items.Add("ar", "ara");
        items.Add("an", "arg");
        items.Add("hy", "arm");
        items.Add("as", "asm");
        items.Add("av", "ava");
        items.Add("ae", "ave");
        items.Add("ay", "aym");
        items.Add("az", "aze");
        items.Add("ba", "bak");
        items.Add("bm", "bam");
        items.Add("eu", "baq");
        items.Add("be", "bel");
        items.Add("bn", "ben");
        items.Add("bh", "bih");
        items.Add("bi", "bis");
        items.Add("bs", "bos");
        items.Add("br", "bre");
        items.Add("bg", "bul");
        items.Add("my", "bur");
        items.Add("ca", "cat");
        items.Add("ch", "cha");
        items.Add("ce", "che");
        items.Add("zh", "chi");
        items.Add("cu", "chu");
        items.Add("cv", "chv");
        items.Add("kw", "cor");
        items.Add("co", "cos");
        items.Add("cr", "cre");
        items.Add("cs", "cze");
        items.Add("da", "dan");
        items.Add("dv", "div");
        items.Add("nl", "dut");
        items.Add("dz", "dzo");
        items.Add("en", "eng");
        items.Add("eo", "epo");
        items.Add("et", "est");
        items.Add("ee", "ewe");
        items.Add("fo", "fao");
        items.Add("fj", "fij");
        items.Add("fi", "fin");
        items.Add("fr", "fre");
        items.Add("fy", "fry");
        items.Add("ff", "ful");
        items.Add("ka", "geo");
        items.Add("de", "ger");
        items.Add("gd", "gla");
        items.Add("ga", "gle");
        items.Add("gl", "glg");
        items.Add("gv", "glv");
        items.Add("el", "gre");
        items.Add("gn", "grn");
        items.Add("gu", "guj");
        items.Add("ht", "hat");
        items.Add("ha", "hau");
        items.Add("he", "heb");
        items.Add("hz", "her");
        items.Add("hi", "hin");
        items.Add("ho", "hmo");
        items.Add("hr", "hrv");
        items.Add("hu", "hun");
        items.Add("ig", "ibo");
        items.Add("is", "ice");
        items.Add("io", "ido");
        items.Add("ii", "iii");
        items.Add("iu", "iku");
        items.Add("ie", "ile");
        items.Add("ia", "ina");
        items.Add("id", "ind");
        items.Add("ik", "ipk");
        items.Add("it", "ita");
        items.Add("jv", "jav");
        items.Add("ja", "jpn");
        items.Add("kl", "kal");
        items.Add("kn", "kan");
        items.Add("ks", "kas");
        items.Add("kr", "kau");
        items.Add("kk", "kaz");
        items.Add("km", "khm");
        items.Add("ki", "kik");
        items.Add("rw", "kin");
        items.Add("ky", "kir");
        items.Add("kv", "kom");
        items.Add("kg", "kon");
        items.Add("ko", "kor");
        items.Add("kj", "kua");
        items.Add("ku", "kur");
        items.Add("lo", "lao");
        items.Add("la", "lat");
        items.Add("lv", "lav");
        items.Add("li", "lim");
        items.Add("ln", "lin");
        items.Add("lt", "lit");
        items.Add("lb", "ltz");
        items.Add("lu", "lub");
        items.Add("lg", "lug");
        items.Add("mk", "mac");
        items.Add("mh", "mah");
        items.Add("ml", "mal");
        items.Add("mi", "mao");
        items.Add("mr", "mar");
        items.Add("ms", "may");
        items.Add("mg", "mlg");
        items.Add("mt", "mlt");
        items.Add("mn", "mon");
        items.Add("na", "nau");
        items.Add("nv", "nav");
        items.Add("nr", "nbl");
        items.Add("nd", "nde");
        items.Add("ng", "ndo");
        items.Add("ne", "nep");
        items.Add("nn", "nno");
        items.Add("nb", "nob");
        items.Add("no", "nor");
        items.Add("ny", "nya");
        items.Add("oc", "oci");
        items.Add("oj", "oji");
        items.Add("or", "ori");
        items.Add("om", "orm");
        items.Add("os", "oss");
        items.Add("pa", "pan");
        items.Add("fa", "per");
        items.Add("pi", "pli");
        items.Add("pl", "pol");
        items.Add("pt", "por");
        items.Add("ps", "pus");
        items.Add("qu", "que");
        items.Add("rm", "roh");
        items.Add("ro", "rum");
        items.Add("rn", "run");
        items.Add("ru", "rus");
        items.Add("sg", "sag");
        items.Add("sa", "san");
        items.Add("si", "sin");
        items.Add("sk", "slo");
        items.Add("sl", "slv");
        items.Add("se", "sme");
        items.Add("sm", "smo");
        items.Add("sn", "sna");
        items.Add("sd", "snd");
        items.Add("so", "som");
        items.Add("st", "sot");
        items.Add("es", "spa");
        items.Add("sc", "srd");
        items.Add("sr", "srp");
        items.Add("ss", "ssw");
        items.Add("su", "sun");
        items.Add("sw", "swa");
        items.Add("sv", "swe");
        items.Add("ty", "tah");
        items.Add("ta", "tam");
        items.Add("tt", "tat");
        items.Add("te", "tel");
        items.Add("tg", "tgk");
        items.Add("tl", "tgl");
        items.Add("th", "tha");
        items.Add("bo", "tib");
        items.Add("ti", "tir");
        items.Add("to", "ton");
        items.Add("tn", "tsn");
        items.Add("ts", "tso");
        items.Add("tk", "tuk");
        items.Add("tr", "tur");
        items.Add("tw", "twi");
        items.Add("ug", "uig");
        items.Add("uk", "ukr");
        items.Add("ur", "urd");
        items.Add("uz", "uzb");
        items.Add("ve", "ven");
        items.Add("vi", "vie");
        items.Add("vo", "vol");
        items.Add("cy", "wel");
        items.Add("wa", "wln");
        items.Add("wo", "wol");
        items.Add("xh", "xho");
        items.Add("yi", "yid");
        items.Add("yo", "yor");
        items.Add("za", "zha");
        items.Add("zu", "zul");
    }
}

@IngBertolini
Copy link

IngBertolini commented Oct 17, 2022

Hello! I tried to validate them and they are corret, but it seems that they use 3-letters codes from the ISO 639-2 standard, which uses english-like codes, instead of the ISO 639-3, which i think is more international and standard. @CasperWSchmidt what do you think?

In addition, referring to wikipedia, the code "bh" is deprecated and no longer used (it is also present in the LanguageCode enum) .

These are the codes in the ISO 639-3 standard (without "bh")

var items = new Dictionary<string, string>();
items.Add("aa", "aar");
items.Add("ab", "abk");
items.Add("af", "afr");
items.Add("ak", "aka");
items.Add("sq", "sqi");
items.Add("am", "amh");
items.Add("ar", "ara");
items.Add("an", "arg");
items.Add("hy", "hye");
items.Add("as", "asm");
items.Add("av", "ava");
items.Add("ae", "ave");
items.Add("ay", "aym");
items.Add("az", "aze");
items.Add("ba", "bak");
items.Add("bm", "bam");
items.Add("eu", "eus");
items.Add("be", "bel");
items.Add("bn", "ben");
items.Add("bi", "bis");
items.Add("bs", "bos");
items.Add("br", "bre");
items.Add("bg", "bul");
items.Add("my", "mya");
items.Add("ca", "cat");
items.Add("ch", "cha");
items.Add("ce", "che");
items.Add("zh", "zho");
items.Add("cu", "chu");
items.Add("cv", "chv");
items.Add("kw", "cor");
items.Add("co", "cos");
items.Add("cr", "cre");
items.Add("cs", "ces");
items.Add("da", "dan");
items.Add("dv", "div");
items.Add("nl", "nld");
items.Add("dz", "dzo");
items.Add("en", "eng");
items.Add("eo", "epo");
items.Add("et", "est");
items.Add("ee", "ewe");
items.Add("fo", "fao");
items.Add("fj", "fij");
items.Add("fi", "fin");
items.Add("fr", "fra");
items.Add("fy", "fry");
items.Add("ff", "ful");
items.Add("ka", "kat");
items.Add("de", "deu");
items.Add("gd", "gla");
items.Add("ga", "gle");
items.Add("gl", "glg");
items.Add("gv", "glv");
items.Add("el", "ell");
items.Add("gn", "grn");
items.Add("gu", "guj");
items.Add("ht", "hat");
items.Add("ha", "hau");
items.Add("he", "heb");
items.Add("hz", "her");
items.Add("hi", "hin");
items.Add("ho", "hmo");
items.Add("hr", "hrv");
items.Add("hu", "hun");
items.Add("ig", "ibo");
items.Add("is", "isl");
items.Add("io", "ido");
items.Add("ii", "iii");
items.Add("iu", "iku");
items.Add("ie", "ile");
items.Add("ia", "ina");
items.Add("id", "ind");
items.Add("ik", "ipk");
items.Add("it", "ita");
items.Add("jv", "jav");
items.Add("ja", "jpn");
items.Add("kl", "kal");
items.Add("kn", "kan");
items.Add("ks", "kas");
items.Add("kr", "kau");
items.Add("kk", "kaz");
items.Add("km", "khm");
items.Add("ki", "kik");
items.Add("rw", "kin");
items.Add("ky", "kir");
items.Add("kv", "kom");
items.Add("kg", "kon");
items.Add("ko", "kor");
items.Add("kj", "kua");
items.Add("ku", "kur");
items.Add("lo", "lao");
items.Add("la", "lat");
items.Add("lv", "lav");
items.Add("li", "lim");
items.Add("ln", "lin");
items.Add("lt", "lit");
items.Add("lb", "ltz");
items.Add("lu", "lub");
items.Add("lg", "lug");
items.Add("mk", "mkd");
items.Add("mh", "mah");
items.Add("ml", "mal");
items.Add("mi", "mri");
items.Add("mr", "mar");
items.Add("ms", "msa");
items.Add("mg", "mlg");
items.Add("mt", "mlt");
items.Add("mn", "mon");
items.Add("na", "nau");
items.Add("nv", "nav");
items.Add("nr", "nbl");
items.Add("nd", "nde");
items.Add("ng", "ndo");
items.Add("ne", "nep");
items.Add("nn", "nno");
items.Add("nb", "nob");
items.Add("no", "nor");
items.Add("ny", "nya");
items.Add("oc", "oci");
items.Add("oj", "oji");
items.Add("or", "ori");
items.Add("om", "orm");
items.Add("os", "oss");
items.Add("pa", "pan");
items.Add("fa", "fas");
items.Add("pi", "pli");
items.Add("pl", "pol");
items.Add("pt", "por");
items.Add("ps", "pus");
items.Add("qu", "que");
items.Add("rm", "roh");
items.Add("ro", "ron");
items.Add("rn", "run");
items.Add("ru", "rus");
items.Add("sg", "sag");
items.Add("sa", "san");
items.Add("si", "sin");
items.Add("sk", "slk");
items.Add("sl", "slv");
items.Add("se", "sme");
items.Add("sm", "smo");
items.Add("sn", "sna");
items.Add("sd", "snd");
items.Add("so", "som");
items.Add("st", "sot");
items.Add("es", "spa");
items.Add("sc", "srd");
items.Add("sr", "srp");
items.Add("ss", "ssw");
items.Add("su", "sun");
items.Add("sw", "swa");
items.Add("sv", "swe");
items.Add("ty", "tah");
items.Add("ta", "tam");
items.Add("tt", "tat");
items.Add("te", "tel");
items.Add("tg", "tgk");
items.Add("tl", "tgl");
items.Add("th", "tha");
items.Add("bo", "bod");
items.Add("ti", "tir");
items.Add("to", "ton");
items.Add("tn", "tsn");
items.Add("ts", "tso");
items.Add("tk", "tuk");
items.Add("tr", "tur");
items.Add("tw", "twi");
items.Add("ug", "uig");
items.Add("uk", "ukr");
items.Add("ur", "urd");
items.Add("uz", "uzb");
items.Add("ve", "ven");
items.Add("vi", "vie");
items.Add("vo", "vol");
items.Add("cy", "cym");
items.Add("wa", "wln");
items.Add("wo", "wol");
items.Add("xh", "xho");
items.Add("yi", "yid");
items.Add("yo", "yor");
items.Add("za", "zha");
items.Add("zu", "zul");

@CasperWSchmidt
Copy link
Author

IMHO the ISO 639-3 standard might as well be used from the beginning if the three-letter codes are added. This will require the opposite relation between two- and three-letter codes though as multiple ISO 639-3 codes maps to the same ISO 639-1 code. Hence the ISO 639-3 codes must be the keys of the dictionary :)

@IngBertolini
Copy link

So we need this kind of mapping, where one of the ISO 693-1 languages can have multiple local languages. (site for reference)

Do you think that the library should also mangage every single local language or is it enough if the the mapping returns simply the macrolanguage? Example:

ILanguageTranslation language = new TranslationProvider().GetLanguage("aeb");

should return an instance of TunisianArabicLanguageTranslation of is it sufficient that it returns an instance of ArabicLanguageTranslation ?

I think that the second alternative should be fine!

@CasperWSchmidt
Copy link
Author

I'm not really into the translation stuff, all I care about are the language codes for each country :) But I believe the answer to your question depends on the differences in each "local" language compared to the macro language (fx. Portuguese and Spanish are spoken in both Europe and South America so differences can be significant)

tinohager added a commit that referenced this issue Apr 11, 2023
@tinohager
Copy link
Member

Hi, does anyone want to make a suggestion for implementation otherwise I will close the issue?

@CasperWSchmidt
Copy link
Author

I would love to but I'm afraid I have other tasks at hand with hard deadlines ATM :( If you keep it open I might be able to take a stab at it in a few months though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants