Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markdown file xgettext + gettext #92

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

friendlymatthew
Copy link
Contributor

@friendlymatthew friendlymatthew commented Sep 27, 2023

fixes: #68

markdown-gettext binary

You can execute this binary by the following:

markdown-gettext --dir=(directory path) --po=(lang.po path) --out=(output path)

or

markdown-gettext --f=(file path) --po=(lang.po path) --out=(output path)

* Note: --dir or -f must be specified, -po is always required, and --out defaults to '/translated_md_files`

Suppose in comprehensive-rust I want to translate all md files in the concurrency section to japanese:

❯ markdown-gettext --dir=src/concurrency --po=po/ja.po --out=concurrency_ja_md_files

❯ cd concurrency_ja_md_files

❯ ls 
channels.md       scoped-threads.md send-sync.md      shared_state.md   threads.md

threads.md
Screen Shot 2023-09-28 at 2 27 56 PM

__

markdown-xgettext binary

You can execute this binary by the following:

markdown-xgettext (input.md) (output.po) --lang=(language) --out=(output path)

or

 echo `> ..md content..` | markdown-xgettext - output.po --lang=(language) --out=(output path)

* Note: If --lang is not specified, the .pot default to English.

Example command:

echo '> # Exercises

To practice your Async Rust skills, we have again two exercises for you:

* Dining philosophers: we already saw this problem in the morning. This time
  you are going to implement it with Async Rust.

* A Broadcast Chat Application: this is a larger project that allows you
  experiment with more advanced Async Rust features.

<details>

After looking at the exercises, you can look at the [solutions] provided.

[solutions]: solutions-afternoon.md

</details>
' | markdown-xgettext - output.po --lang=fr

output.po
Screen Shot 2023-09-27 at 5 03 15 PM

@friendlymatthew friendlymatthew marked this pull request as draft September 27, 2023 20:26
@friendlymatthew friendlymatthew marked this pull request as ready for review September 27, 2023 20:51
@friendlymatthew friendlymatthew changed the title markdown-xgettext - binary that converts markdown text to .po file Markdown -> .po binaries Sep 28, 2023
@friendlymatthew friendlymatthew changed the title Markdown -> .po binaries markdown file xgettext + gettext Sep 29, 2023
.gitignore Outdated Show resolved Hide resolved

let (input, content) =
ingest_input(matches.get_one::<String>("input_dir").map(|d| d.as_str()))?;
let lang_option = matches.get_one::<String>("language").unwrap().as_str();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make --lang optional? Simply use an empty string if it is not given. Then people can add a language by hand later in the generated POT file (if they like).

.arg(Arg::new("input_dir").short('f').long("file"))
.arg(Arg::new("language").short('l').long("lang").required(true))
.arg(Arg::new("pot_title").short('t').long("title"))
.arg(Arg::new("output_dir").short('o').long("out"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align the options with xgettext from the Gettext utilities. We should only use a few options, I think something like

  • -o/--output: output file, if not given, we use stdout.
  • inputfile: this is not a flag (no - or --), this is simply the positional arguennts given after the flags. If none are given, we use stdin.

Actually, it seems like xgettext does not set a title or a lang in the header entry (see Filling in the Header Entry). So I would suggest leaving them blank as well: less code for us to maintain! 😄

Ok(catalog)
}

fn ingest_input(input_dir: Option<&str>) -> anyhow::Result<(&str, String)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this function do the following:

  • read files given explicitly on the command line (even if they don't end in .md)
  • recursively read files from directories given on the command line (filtering for files matching *.md: https://docs.rs/glob/ can help with this)
  • if a file named - is given, or if there are no files listed, then read from stdin.

I think the function is almost doing this already, only the handling of multiple inputs and recursive reading of directories is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the writeup for xgettext. (I'll put this in a readme once I get the green light @mgeisler)

md-extract

This tool extracts translatable text from a given Markdown file or stdin and outputs it to a .po file or stdout.

Usage

md-extract [inputfile] -o/--output

Arguments

  1. inputfile
    • a specified file path
    • if not provided, the program reads from stdin

Options

  1. -o/--output
    • specifies the output file name
    • if not provided, the program writes to stdout

Notes:

  • for simplicity sake, this tool doesn't handle multiple inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgeisler do you like this? Happy to implement once I get your approval :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great start, thanks!

Looking at this again, I think we'll eventually want to shorten and simplify the names here. I'm okay with mdbook-xgettext and mdbook-gettext as plugins for mdbook, but these programs will be run by users on a command line. So they should probably be shorter and more clear. Perhaps md-extract and md-translate? We can move things around after we merge the PR, so don't worry about this too much right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- will update my files accordingly

}
}

fn build_path(output_dir: Option<&str>, pot_title: &str) -> anyhow::Result<PathBuf> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this function to be simpler:

  • if output is "-", then write to stdout
  • otherwise write to the path given as output. People can create the necessary directories themselves.

Ok(path)
}

fn main() -> anyhow::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit in doubt about the best command line interface for this...

One idea would be to allow

markdown-gettext xx.po foo.md -o foo-xx.md

which would translate foo.md using xx.po and write the output to foo-xx.md.

Here, foo.md could be left out or it could be - and then we read from stdin (like for markdown-xgettext below). Similar, -o can be left out or be - and then we write to stdout.

If there are multiple inputs, then I guess the user cannot write to stdout and must instead give an output directory:

markdown-gettext xx.po foo.md bar.md --output-dir xx

In this case it would be an error to attempt to read from stdin since we don't really know what to name the output file in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, we want optional flags for output file title and directory.

Since markdown-gettext xx.po foo.md -o foo-xx.md, does this mean if one was to add an additional wef.md, would the command be: markdown-gettext xx.po foo.md wef.md -o foo-xx.md wef-xx.md?

Output directory makes sense to me 👍. I would just like some clarification on file naming conventions, especially when one wants to translate multiple files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, we want optional flags for output file title and directory.

Yeah, I'm trying to figure out how we can model the behavior here to be similar to existing Unix commands.

We should also keep it simple! Less code for all of us to write and maintain 😄

I'm thinking of using cp as a basis:

  • cp can copy a single path to a new name or
  • cp can copy multiple path into a single directory

The paths which are copied can be files or directories (or a mix of both). If there is a directory, then you must add -r to do a recursive copy.

I now see

markdown-gettext xx.po source.md dest.md

as

cp source.md dest.md

but with the translation happening as well.

Similarly,

markdown-gettext xx.po foo.md bar.md dest/

as

cp foo.md bar.md dest/

Finally, the rule out -r should be implemented: if you want to recursively translate a directory, you have to add a -r flag.

This new thinking is different from what I had before where I thought we should use the flags from the GNU Gettext utilities. Back then, I hadn't realized that there isn't a command line program which mirrors what our markdown-gettext program will do 😄

Note that these are just ideas, please let me know if you have better ones!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgeisler I'll write up a doc that outlines what the CLI should do for both xgettext and gettext. Once we get everything to your liking, I'll change the binaries to follow those requirements!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! That document could simultaneously become the manual for the tools, so your work here will be reused right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the writeup for gettext

markdown-gettext

This tool copies source paths to destination paths while performing translations.

Usage

For a single file translation:

md-gettext lang.po source.md dest.md

note for single file translations:

  • if source.md is not specified, the program reads from stdin
  • if dest.md is not specified, the program outputs to stdout

For multiple paths:

md-gettext lang.po foo.md bar.txt wef.md dest/

For directories, you must use the -r flag for a recursive translation:

md-gettext lang.po sourcedir/ destdir/ -r

Arguments

  1. xx.po

    • the translation file in .po format
  2. source.md foo.md bar.txt sourcedir/ etc...

    • source paths which can be a mix of files and directories
  3. dest.md, dest/

    • destination paths

Options

  1. -r
    • enables recursive translation for directories
    • must be added if you want to translate contents of a directory and its subdirectories

Please let me know what you think! @mgeisler @djmitche. I tried to be as simple as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @friendlymatthew, sorry for the slow reply. The above looks great, thank you very much for writing it up! I think it follow the behavior from the cp manual page closely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries @mgeisler, I appreciate the opportunity to work on this 😁.

When I have some time off from official work, I'll get on this ASAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create binaries to support stand-alone usage
3 participants