Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wc_rjust() doesn't work for non-printables #54

Open
wjandrea opened this issue Aug 14, 2021 · 4 comments
Open

wc_rjust() doesn't work for non-printables #54

wjandrea opened this issue Aug 14, 2021 · 4 comments

Comments

@wjandrea
Copy link

Firstly, this isn't a problem for me, I just wanted to let you know about it.

Using wc_rjust() from the readme, if text contains any non-printable characters, the result is longer than length, which should never happen.

For example, '\n' is non-printable:

>>> wc_rjust('\n', 2, '.')
'...\n'

For reference, the width-naive version:

>>> '\n'.rjust(2, '.')
'.\n'

The problem is because of the math here:

max(0, (length - wcswidth(text)))

If wcswidth(text) is negative, the max is length + 1.

The simple solution is to just add a note in the readme warning about this situation, but if you wanted, you could expand the function to raise an error:

>>> def wc_rjust(text, length, padding=' '):
...    from wcwidth import wcswidth
...    width = wcswidth(text)
...    if width < 0:
...        raise ValueError('text contains non-printable characters')
...    return padding * max(0, (length - width)) + text
...
>>> wc_rjust('\n', 2, '.')
Traceback (most recent call last):
  ...
ValueError: text contains non-printable characters

Ultimately, it seems like the problem is using -1 as a sentinel return value instead of raising an error, but it looks like that's inherited from the C function, so fixing that would be a lot of work.

@jquast
Copy link
Owner

jquast commented Aug 20, 2021

I was thinking of a new function wcwidth.length() in next release that just "does its best", but never returns -1. In this case, string containing \n would measure as 0, not -1, and not raise any exception. What do you think of that?

@jquast
Copy link
Owner

jquast commented Oct 19, 2023

Issue #79 proposes new width() function

@jquast
Copy link
Owner

jquast commented Oct 30, 2023

See also #93 to just offer a proper rjust/etc. as part of the API

@jquast jquast added the bug label Nov 7, 2023
@jquast
Copy link
Owner

jquast commented Dec 15, 2023

I just want to also add that this cannot be fixed in the wcwidth() and wcswidth() functions, as they intend to exactly match function signature and behavior of the POSIX functions.

The reason that C0 and C1 control characters return -1, is that the intended application, a terminal emulator especially, should handle these characters in a stream and remove them from the string before passing on to wcswidth. Especially items like \n, \b, and \t. They become complicated, it depends on the current position of the cursor, and also terminal settings, for example \b can wrap to previous row if it is located at column 0, and the number of spaces incurred by '\t' are dependent on the tab stop setting and the current cursor position. C1 characters like '\x1b' may begin a terminal escape sequence, and that too should be processed before sending to wcswidth, etc.

So I won't really consider this a bug, but a feature request for an alternate function that simply ignores these things. Although these characters should not be sent to wcswidth, developers are free to have them silently ignored. I would suggest all control characters have width of 0 for a new function, wcwidth.width

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants