Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanish Implementation #29

Open
wants to merge 57 commits into
base: master
Choose a base branch
from
Open

Conversation

Laifsyn
Copy link

@Laifsyn Laifsyn commented Apr 7, 2024

Submitting as draft because
1-) I'm unclear how pull requests actually works
2-) I haven't properly made an Integration Test because I found it extremely un-ergonomic to constantly build the Num2Words settings after every .to_word() call

I also submitted a .devcontainer configuration because I found it extremely helpful to be able to test stuffs in a virtual environment specially so when I didn't have available a beefy-enough computer (I might've forgotten to add Rust Docs Viewer in the extensions though)

@Laifsyn
Copy link
Author

Laifsyn commented Apr 12, 2024

Alright, this should be about it. (except for mod.rs because I was too lazy to undo rustfmt on that. Shame on me)
But everything should be the most minimal for the spanish implementation feature.

Rustfmt should be easily be doable in another pull request by just re-adding the rustfmt.toml file and running rustfmt against the project.

the undonde derives could be additionally done in another PR.

and optionally fix clippy warnings in another PR

Copy link
Author

@Laifsyn Laifsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread the commit content. This commit should've been add missing docs

@Ballasi
Copy link
Owner

Ballasi commented Apr 12, 2024

Sorry for making you go through all of this trouble! and thank you very much for taking the time to work on it, it's very kind :)

I'll see the PR more in depth this weekend! If you're not planning on redoing the derives or rustfmt, I'll take on that job once this is merged.

@Laifsyn
Copy link
Author

Laifsyn commented Apr 12, 2024

I was thinking about figuring what rustfmt config was the one that onelined the chained methods and see if it was alright to disable it.

The reason is because if the method chaining gets one-lined, then the type-hint for each method return on the chain will not be hinted by rust-analyser.

but anyways, the chainings tends to be short more often than not, so the loss of the type-hint which can only be seen in an IDE isn't that relevant

@Laifsyn
Copy link
Author

Laifsyn commented Apr 14, 2024

Hi, I was thinking about making a PR Benching with Criterion with another branch for benchmarks. Mainly I want to see what issues can arises when I expect merge conflicts( expecting so because I had to add some derives to do the benchmark)

The reason was to see how the spanish implementation fared because I did a fix to the unneeded 0 spam that the split_thousands() method created. (Ref: link)

Copy link
Owner

@Ballasi Ballasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for taking the time to participate! I'm sorry I am relatively slow, I've got a lot going on lately.

Feel free to take a look at the comments I've put :)

tests/lang_es_test.rs Show resolved Hide resolved
README.md Show resolved Hide resolved
@@ -25,6 +25,7 @@ AVAILABLE LANGUAGES:
fr: French (France and Canada)
fr_BE: French (Belgium and the Democratic Republic of the Congo)
fr_CH: French (Swiss Confederation and Aosta Valley)
es: Spanish
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the pokemon?

src/lang/es.rs Show resolved Hide resolved

let word = to_words(BigFloat::from(21.01), Outputs::Currency, &[]);
assert_eq!(word.unwrap(), "veintiún US dollars con un centavo");
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, compared the results from es.rs to python num2words and see differences

I don't mind much the differences in megas as it depends a lot on the source. however mil trillones makes me wonder, is this what should actually be the number?

10000000000000
diez billones
diez trillones

100000000000000
cien billones
cien trillones

1000000000000000000
un trillón
un quintillón

1000000000000000000000 <=== here
mil trillones
un sextillón

10000000000000000000000000
diez cuatrillones
diez septillones

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's different?. Is Mil Trillones from Python?

I followed the standard of Each thousands changes the Millard because It was a bit confusing that a Billion*1000 is called the equivalent in spanish of "thousand billions" instead of "trillion"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://es.wikipedia.org/wiki/Anexo:Nombres_de_los_n%C3%BAmeros_en_espa%C3%B1ol
So according to this source, it starts to jump by 1_000_000 instead of 1_000 after the trillion

As summary, English's Sextillion is 10^21
Spanish's Sextillion (according to this particular page in wiki) is 10^36

I guess it would still best to discuss with someone more familiarized with spanish, because if this really is the spanish standard, I'm pretty sure almost none will get spanish's quadrillon right (Because it actually is "A Million Trillion")

tests/lang_es_test.rs Show resolved Hide resolved
based on [Real Academia Española](https://www.rae.es/dpd/ordinales) standards, the TENS of 2 have no space between itself and its units.
`"la primera y a la segunda decena se pueden escribir en una o en dos palabras, siendo hoy mayoritaria y siempre preferible la grafía univerbal"`
From Ballasi#29 (comment),
fix ordinal representation based on the rules defined in 2.b && 2.d @
https://www.rae.es/dpd/ordinales
@Laifsyn
Copy link
Author

Laifsyn commented Apr 15, 2024

Alright. That should be it

@Laifsyn
Copy link
Author

Laifsyn commented Apr 15, 2024

Oh, btw. I'm working on translating the currencies. Might submit a commit by the end of the day

Fix the wrong semantics that was used.
Spanish semantics follows this pattern of for each big milliard (Million, Billion, Trillion) grows at a pace of `1 000 000`^n

This means that 1 million is `1 000 000`^1
1 billion => `1 000 000`^2 [1 000_000 000_000]
1 trillion => `1 000 000`^3 [1 000_000 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
...... etc

This semantic differs from english's
1 billion => `1 000`^3 [1 000 000_000] 
1 trillion => `1 000 000`^4 [1 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
Laifsyn added a commit to Laifsyn/num2words_spanish that referenced this pull request Apr 17, 2024
* Fix Tens edge case for Ordinals

based on [Real Academia Española](https://www.rae.es/dpd/ordinales) standards, the TENS of 2 have no space between itself and its units.
`"la primera y a la segunda decena se pueden escribir en una o en dos palabras, siendo hoy mayoritaria y siempre preferible la grafía univerbal"`

* Fix Ordinals Representation
From Ballasi#29 (comment),
fix ordinal representation based on the rules defined in 2.b && 2.d @
https://www.rae.es/dpd/ordinales

* Fix Tests & remove comments

* Currency Translation for spanish

* Cardinal Fix - Mistakenly used English semantics (#1)

Fix the wrong semantics that was used.
Spanish semantics follows this pattern of for each big milliard (Million, Billion, Trillion) grows at a pace of `1 000 000`^n

This means that 1 million is `1 000 000`^1
1 billion => `1 000 000`^2 [1 000_000 000_000]
1 trillion => `1 000 000`^3 [1 000_000 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
...... etc

This semantic differs from english's
1 billion => `1 000`^3 [1 000 000_000] 
1 trillion => `1 000 000`^4 [1 000_000 000_000]
1 quadrillion => `1 000 000`^4 [......]
@Laifsyn
Copy link
Author

Laifsyn commented Apr 25, 2024

I have a question if you can get the answer

I'll be leaving it as the first meanwhile, but should it be like the second case?

assert_eq!(
    ordinal(124_001_091_000_000_000_001),
    "ciento veinticuatrotrillonésimos milnoventa y unobillonésimos primeros"
//  "ciento veinticuatrotrillonésimos mil noventa y unobillonésimos primeros"
);

* Staging changes for ordinal Fix

Some data to take into consideration
Todo: Fix accent on monomorphized words i.e. VigesimoSegundo

Si el ordinal se escribe en dos palabras, el primer elemento mantiene la tilde que le corresponde como palabra independiente: vigésimo segundo, vigésima cuarta, trigésimo octavo, cuadragésima quinta; pero, si se escribe en una sola palabra, el compuesto resultante, al ser ser una voz llana terminada en vocal, debe escribirse sin tilde, pues no le corresponde llevarla según las reglas de acentuación (v. cap. II, § 3.4.5.1.1): vigesimosegundo (no ⊗‍vigésimosegundo).

Los ordinales complejos escritos en una sola palabra solo presentan variación de género y número en el segundo componente: vigesimoprimero, vigesimoprimera, vigesimoprimeros, vigesimoprimeras; pero, si se escriben en dos palabras, ambos componentes son variables: vigésimo primero, vigésima primera, vigésimos primeros, vigésimas primeras. No se consideran correctas las grafías en dos palabras si se mantiene invariable el primer componente: ⊗‍vigésimo segundos, ⊗‍vigésimo cuarta, ⊗‍vigésimo octavas.

* temporarily add rustfmt back

* Finish Ordinal Fix.

Reference: https://www.rae.es/dpd/ordinales

* remove accent on composites of `Vigesimo`

* remove rustfmt for merge
@Laifsyn
Copy link
Author

Laifsyn commented May 26, 2024

I couldn't redo the Cardinal implementation.
Performance gains were dirt low. at about x1.02-x1.25 faster, and also the implementation looked a bit ugly. So I have no changes to the current implementation of Spanish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants