Create binaries to support stand-alone usage #68

mgeisler · 2023-08-29T19:48:11Z

The mdbook-xgettext plugin can currently only be used from mdbook since it reads a special JSON format on standard input. However, it would be quite easy to use the same code to extract text from any Markdown file on disk. We should create a small binary which does this.

Similarly for mdbook-gettext: we should create a small binary which will translate a Markdown file into another language using a PO file.

Having these binaries would make it slightly easier to text our own code:

echo '> foo **bar** *baz*' | mdbook-i18n-xgettext

should be enough to produce a PO file on standard output (it would show that *bar* is normalized to _bar_, for example). I could have used this kind of ad-hoc testing when I was battling corner-cases in #33.

The text was updated successfully, but these errors were encountered:

friendlymatthew · 2023-09-26T13:53:26Z

Howdy @mgeisler -- I would like to work on this :)

Just to clarify - we want to read an MD file from std input, normalize the markdown, and convert to a PO file.

As I am new to the codebase, should I use pulldown-cmark to parse the MD or is there a preferred method? I looked for previous PR's but had a hard time looking for role models.
I'm currently reading mdbook-i18n-normalize.rs as it normalizes the markdown in a po file. Once I create a catalog, I would normalize(catalog) and convert to a po file.

Please let me know if I am misunderstanding or need clarification!

mgeisler · 2023-09-26T15:20:11Z

While building the translation machinery here, I realized that we don't need to tie it to mdbook. We are able to do the .md -> .pot text extraction on any Markdown file. And we are able to do the .md + .po -> .md translation on any pair of Markdown and PO file.

Just to clarify - we want to read an MD file from std input, normalize the markdown, and convert to a PO file.

My thinking is even simpler: the extraction binary should do what mdbook-xgettext does, but operate on a file chosen by the user (or stdin).

As you can see, mdbook-xgettext calls extract_messages, which internally calls pulldown-cmark as you suggest. However, extract_messages is the central function that takes a Markdown files and find the translatable text (it knows to strip unimportant Markdown syntax, so ## Foo becomes just Foo in the PO file). These lines are the ones I think you can copy to the new binary:

            for (lineno, msgid) in extract_messages(&chapter.content) {
                let source = format!("{}:{}", path.display(), lineno);
                add_message(&mut catalog, &msgid, &source);
            }

You'll then need a bit of boiler plate around it to parse command line arguments. You can probably find inspiration in the xgettext command line arguments for input and output.

Similar for a translation program: it would take a Markdown file (or perhaps a directory of files?) plus a PO file and produce a translated output. That is essentially what mdbook-gettext does today, except that mdbook-gettext is tightly tied to mdbook.

See also https://po4a.org/. I've only skimmed the documentation for that project, but my understanding is that it can produce PO files for anything, including Markdown. It would be very interesting to compare our extraction mechanism with the one they have — I don't know how they compare at all right now. That comparison would go into a README or help text for the new binaries.

We might even create a new crate for this, it could depend on the library part of mdbook-i18n-helpers. Users who install the new crate would avoid building the mdbook-specific parts which they don't need.

mgeisler added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Aug 29, 2023

friendlymatthew linked a pull request Sep 27, 2023 that will close this issue

markdown file xgettext + gettext #92

Open

mgeisler assigned friendlymatthew Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create binaries to support stand-alone usage #68

Create binaries to support stand-alone usage #68

mgeisler commented Aug 29, 2023

friendlymatthew commented Sep 26, 2023 •

edited

mgeisler commented Sep 26, 2023

Create binaries to support stand-alone usage #68

Create binaries to support stand-alone usage #68

Comments

mgeisler commented Aug 29, 2023

friendlymatthew commented Sep 26, 2023 • edited

mgeisler commented Sep 26, 2023

friendlymatthew commented Sep 26, 2023 •

edited