Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create binaries to support stand-alone usage #68

Open
mgeisler opened this issue Aug 29, 2023 · 2 comments · May be fixed by #92
Open

Create binaries to support stand-alone usage #68

mgeisler opened this issue Aug 29, 2023 · 2 comments · May be fixed by #92
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@mgeisler
Copy link
Collaborator

The mdbook-xgettext plugin can currently only be used from mdbook since it reads a special JSON format on standard input. However, it would be quite easy to use the same code to extract text from any Markdown file on disk. We should create a small binary which does this.

Similarly for mdbook-gettext: we should create a small binary which will translate a Markdown file into another language using a PO file.

Having these binaries would make it slightly easier to text our own code:

echo '> foo **bar** *baz*' | mdbook-i18n-xgettext

should be enough to produce a PO file on standard output (it would show that *bar* is normalized to _bar_, for example). I could have used this kind of ad-hoc testing when I was battling corner-cases in #33.

@mgeisler mgeisler added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Aug 29, 2023
@friendlymatthew
Copy link
Contributor

friendlymatthew commented Sep 26, 2023

Howdy @mgeisler -- I would like to work on this :)

Just to clarify - we want to read an MD file from std input, normalize the markdown, and convert to a PO file.

  • As I am new to the codebase, should I use pulldown-cmark to parse the MD or is there a preferred method? I looked for previous PR's but had a hard time looking for role models.

  • I'm currently reading mdbook-i18n-normalize.rs as it normalizes the markdown in a po file. Once I create a catalog, I would normalize(catalog) and convert to a po file.

Please let me know if I am misunderstanding or need clarification!

@mgeisler
Copy link
Collaborator Author

While building the translation machinery here, I realized that we don't need to tie it to mdbook. We are able to do the .md -> .pot text extraction on any Markdown file. And we are able to do the .md + .po -> .md translation on any pair of Markdown and PO file.

Just to clarify - we want to read an MD file from std input, normalize the markdown, and convert to a PO file.

My thinking is even simpler: the extraction binary should do what mdbook-xgettext does, but operate on a file chosen by the user (or stdin).

As you can see, mdbook-xgettext calls extract_messages, which internally calls pulldown-cmark as you suggest. However, extract_messages is the central function that takes a Markdown files and find the translatable text (it knows to strip unimportant Markdown syntax, so ## Foo becomes just Foo in the PO file). These lines are the ones I think you can copy to the new binary:

            for (lineno, msgid) in extract_messages(&chapter.content) {
                let source = format!("{}:{}", path.display(), lineno);
                add_message(&mut catalog, &msgid, &source);
            }

You'll then need a bit of boiler plate around it to parse command line arguments. You can probably find inspiration in the xgettext command line arguments for input and output.

Similar for a translation program: it would take a Markdown file (or perhaps a directory of files?) plus a PO file and produce a translated output. That is essentially what mdbook-gettext does today, except that mdbook-gettext is tightly tied to mdbook.

See also https://po4a.org/. I've only skimmed the documentation for that project, but my understanding is that it can produce PO files for anything, including Markdown. It would be very interesting to compare our extraction mechanism with the one they have — I don't know how they compare at all right now. That comparison would go into a README or help text for the new binaries.

We might even create a new crate for this, it could depend on the library part of mdbook-i18n-helpers. Users who install the new crate would avoid building the mdbook-specific parts which they don't need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants