Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing problems raised in sphinx_issues #10104 #10379

Closed
wants to merge 2 commits into from

Conversation

hoangduytranuk
Copy link

Subject: Solving problems raised in issues #10104

  • Breaking changes: master
  • Feature

Purpose

  • Allowing message locations to be sorted uniquely in alphabetic order. This will remove duplicates locations and enable easier search and identify locations for developers

Detail

  • remove duplicates locations
  • enabled easier search and identify locations for developers

Relates

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

A

@@ -32,7 +32,7 @@ class Message:
"""An entry of translatable message."""
def __init__(self, text: str, locations: List[Tuple[str, int]], uuids: List[str]):
self.text = text
self.locations = locations
self.locations = list(set(locations))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use sorted here -- set doesn't guarantee any sort order at a language level.

Suggested change
self.locations = list(set(locations))
self.locations = sorted({*locations})

Comment on lines 59 to 60
for message in self.messages:
positions = [(source, line) for source, line, uuid in self.metadata[message]]
positions = [(os.path.relpath(source, start=os.getcwd()), line) for source, line, uuid in self.metadata[message]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would call os.getcwd() for every item in the loop. Does ommitting start= still give a usable result? Otherwise you could use cache os.getcwd() in a local.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable change.

But this should be moved to the outside of this class because this class does not have appropriate directory information. I think what directory you want is the base directory of the document (a.k.a srcdir). But it seems you're using os.getcwd(). But it does not equal to srcdir. It usually equals to the confdir that contains conf.py, but not promised. It sometimes returns a different directory if 3rd party extension changes it.

BTW, please use sphinx.util.osutil:relpath() instead of os.path.relpath(). It will not be crashed even if path and start are on different drives on Windows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issue #10104 after the code reviews from Adam Turner and Takeshi Komiya:

  • Added the class variable 'builder' to Catalog class, so any instances of Catalog can access the information within the builder, including things such as app, config, srcdir, outdir etc.. see MessageCatalogBuilder for details.
  • Initialise this Catalog.builder as the instance of MessageCatalogBuilder is created, ie. in the init()
  • As Message class is created, list of locations passing to it will be made singular, using set(), sorted alphabetically for easier searching and identifying origins where the message can be found.
  • source dir is used within the iter of the Catalog, using sphinx.util.osutil.relpath, for safer functionality, to work out the relative path for each location entry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the class variable 'builder' to Catalog class

Modifying global state like this is a bad idea, please could you refactor?

A

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding a class variable is not a good idea. Catalog class is a data class that represents message catalog. It is not related to the builder at all. So the path manipulation using builder should be called outside of the catalog class. How about processing just before calling GettextRenderer?

@@ -1,5 +1,5 @@
"""The MessageCatalogBuilder class."""

import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep a blank line here

Suggested change
import os
import os

@tk0miya tk0miya added this to the 5.0.0 milestone Apr 24, 2022
@hoangduytranuk
Copy link
Author

hoangduytranuk commented Apr 24, 2022 via email

…ner and Takeshi Komiya:

- Added the class variable 'builder' to Catalog class, so any instances of Catalog can access the information within the builder, including things such as app, config, srcdir, outdir etc.. see MessageCatalogBuilder for details.
- Initialise this Catalog.builder as the instance of MessageCatalogBuilder is created, ie. in the init()
- As Message class is created, list of locations passing to it will be made singular, using set(), sorted alphabetically for easier searching and identifying origins where the message can be found.
- source dir is used within the __iter__ of the Catalog, using sphinx.util.osutil.relpath, for safer functionality, to work out the relative path for each location entry.
@@ -32,13 +32,14 @@ class Message:
"""An entry of translatable message."""
def __init__(self, text: str, locations: List[Tuple[str, int]], uuids: List[str]):
self.text = text
self.locations = locations
self.locations = list(sorted(list(set(locations)))) #make list unique and sorted alphabetically
Copy link
Member

@AA-Turner AA-Turner Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted returns a list and can take a set as input

Suggested change
self.locations = list(sorted(list(set(locations)))) #make list unique and sorted alphabetically
self.locations = sorted(set(locations)) # sort and de-duplicate the locations list

@tk0miya tk0miya modified the milestones: 5.0.0, 5.x May 7, 2022
@tk0miya tk0miya modified the milestones: 5.x, 5.1.0 May 15, 2022
@tk0miya tk0miya modified the milestones: 5.1.0, 5.0.0 May 22, 2022
@tk0miya
Copy link
Member

tk0miya commented May 26, 2022

Finally, I reimplemented this as #10104 and merged it. Thank you for your contribution!

@tk0miya tk0miya closed this May 26, 2022
@ghost
Copy link

ghost commented Jun 4, 2022

I would like to add a little comment as I only found out today after using

catalog.dump_po(out_path, data, line_width=4069, sort_output=self.is_sorted)
I found that user comments are repeated and I had to enter the following block of code to change the behaviour:

def _write_comment(comment, prefix=''):
        # NEVER wrap comments, this observation: "xgettext always wraps comments even if --no-wrap is passed;" is FALSE. There seemed to be a bug in the xgettext code, because wrapping doesn't always occur
        # Make sure comments are unique and sorted alphabetically so locations can be easily searched and identify
        has_comment = (bool(comment) and len(comment) > 0)
        if not has_comment:
            return

        is_list_type = isinstance(comment, list)
        comment_list = (list(comment) if not is_list_type else comment)
        comment_set_list = list(set(comment_list))
        comment_set_list.sort()

        for line in comment_set_list:
            _write('#%s %s\n' % (prefix, line.strip()))

to remove duplicated user comments

@ghost
Copy link

ghost commented Jun 5, 2022

Sorry the above solution is NOT perfect yet. I have just found another incident where the

message.user_comments 
message.auto_comments

are duplicated, by running

msgmerge vi.po inkscape.pot -o test_merge.po

these two files are in the following links:

https://gitlab.com/inkscape/inkscape/-/jobs/2544777172/artifacts/file/po/inkscape.pot
https://gitlab.com/hoangduytran1960/inkscape/-/blob/master/po/vi.po

The output file

test_merge.po

The using the Sphinx code to flat out the comment sections and msgid, msgstr using:

data = c.load_po(self.po_path)
c.dump_po(out_path, data, line_width=4069, sort_output=self.is_sorted)

The result file shown this line is repeated twice:

# TRANSLATORS: "Swatches" -> color samples

I have to use the PyCharmIDE to debug and found out, so the correction code is this:

sort_by = None
    if sort_output:
        sort_by = "message"
    elif sort_by_file:
        sort_by = "location"

    for message in _sort_messages(catalog, sort_by=sort_by):
        if not message.id:  # This is the header "message"
            if omit_header:
                continue
            comment_header = catalog.header_comment
            if width and width > 0:
                lines = []
                for line in comment_header.splitlines():
                    lines += wraptext(line, width=width,
                                      subsequent_indent='# ')
                comment_header = u'\n'.join(lines)
            _write(comment_header + u'\n')

        **comment_list = []
        comment_list.extend(message.user_comments)
        comment_list.extend(message.auto_comments)
        comment_list_unique = list(set(comment_list))
        comment_list_unique.sort()**

        _write_comment(comment_list_unique)

the section surrounded by '**". This is an extraction from the file

site-packages/babel/messages/pofile.py

routine:

def _write_message(message, prefix=''):

Ok, I'm out.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants