Support all Unicode Versions #23

jquast · 2017-03-10T19:00:35Z

Support all versions of Unicode, using the UNICODE_VERSION environment variable, when defined, or, for non-shells, explicitly by passing argument unicode_version to the wcwidth family of functions.

A demonstration utility that determines the Terminal's Unicode Version is made available as a separate package, https://github.com/jquast/ucs-detect/ which contains a Problem and Solution statement, copied here:

Problem

Chinese, Japanese, Korean, and Emoticon characters are "double-wide", occupying 2 cells, instead of 1, and some other special characters are "zero-width".

Any terminal application that formats and displays these characters may have trouble determining how it will be displayed to the end-user.

This problem happens often, because the Unicode Consortium releases new versions of the Unicode Standard periodically, but the source code of libraries and applications are not updated at the same time, or at all!

Many languages and libraries continue to conform only to Unicode 5.0, which is the last version of wcwidth.c released by Markus Kuhn in 2007.

Solution

The most important factor is to determine: What version of unicode is the Terminal Emulator using?

This program, ucs-detect, is able to automatically detect the version of unicode that the connecting Terminal supports. The python wcwidth library supports all Unicode versions, 4.1.0 through 12.1.0 at time of this writing, and so it is able to select and match the correct return value for by using the given value of the UNICODE_VERSION environment variable.

NOT FINISHED, DO NOT EXECUTE :)

jayvdb · 2019-10-02T07:22:28Z

Hi @jquast , I am wondering if you might be interested in the "multiple unicodedata versions" problem being solved in a separate library. I created fonttools/unicodedata2#28 about this, as I know that project is already partially solving that problem.

jquast · 2020-03-01T02:08:19Z

More than anything, I've been mulling over the idea, "How best should users select their unicode version support level?"

And recently, woah! iTerm2 supports a way to switch versions, see "Unicode Version" in https://iterm2.com/documentation-escape-codes.html

And, I think I can devise a way to determine the support version, by introspection of the terminal, to display 1 double-width char that is new for each unicode support level, and use report-cursor-position to determine what support level the connected terminal is at.

So, in the years since I first developed wcwidth for python, there have been some enhancements to the general ecosystem for determining or setting the version level, but nothing particularly universal or portable/common.

I waited for a few years to add 24-bit color support for https://github.com/jquast/blessed because there was no way to determine whether the terminal would support it, and I couldn't decide how to expose an easy API to select 24-bit color support. Over the years, all terminals implementing 24-bit colors added a COLORTERM environment variable definition to announce their support, http://jdebp.eu/Softwares/nosh/guide/TerminalCapabilities.html

So now the code is perfectly clear and straight-forward for me as a library, and all downstream applications, even users, also do not have to specify this terminal support level, even existing applications that use the library can support 24-bit colors without changes by users or the application developers.

So anyway, I do think environment variable is the best way to go, at least from a terminal support level perspective.

jquast · 2017-10-22T21:29:51Z

bin/wcwidth-browser.py

@@ -92,26 +97,48 @@ def flushout():
    assert 'narrow Python build' in err.args[0], err.args
    LIMIT_UCS = 0x10000

-#: printable length of highest unicode character
+#: printable length of highest unicode character description


Mistaken comment, revert

jquast · 2017-10-22T21:30:18Z

bin/wcwidth-browser.py

+        if inp.code == term.KEY_ENTER:
+            break
+        elif inp.code == term.KEY_ESCAPE or inp == chr(3):
+            text = None


Should not return None

jquast · 2017-10-22T21:31:55Z

bin/wcwidth-browser.py

+        for version, boundaries in ZERO_WIDTH.items():
+            for (begin, end) in boundaries:
+                if version == _wcmatch_version(unicode_version):
+                    for val in [_val for _val in


jquast · 2017-10-22T21:33:16Z

docs/api.rst

+
+.. autofunction:: wcwidth._get_package_version
+
+.. autofunction:: wcwidth._wcmatch_version


Duplicate. Should make function public.

jquast · 2020-03-01T02:39:56Z

Documentation for wcwidth and wcswidth functions will be default value None, for unicode_version argument, which means that the value from environment variable UNICODE_VERSION will be used, or 8 if unspecified.

And in the README, we will be clear to spell out this transitional time of terminal support, and how to set the environment variable for version level 9, if you like, for terminals like iTerm2, to see the results magically appear in any downstream programs like bpython without changes.

And that's the real goal here, if terminal applications or power users can start exporting this variable, we can have a language-independent solution for unicode version level selection.

codecov · 2020-06-01T15:29:03Z

Codecov Report

❗ No coverage uploaded for pull request base (master@ce8acd8). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master      #23   +/-   ##
=========================================
  Coverage          ?   97.84%           
=========================================
  Files             ?        3           
  Lines             ?       93           
  Branches          ?       18           
=========================================
  Hits              ?       91           
  Misses            ?        1           
  Partials          ?        1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce8acd8...5177f98. Read the comment docs.

…nicodes

Jeff Quast and others added 12 commits August 29, 2016 09:06

Intermittent commit: towards unicode version levels

dfeca2f

NOT FINISHED, DO NOT EXECUTE :)

Some-commit!

59912d0

intermittent

2109a32

Some some some

ad69737

SOmx

8c9b336

xnay on versioney

cca794c

nayer

757e12f

rolling up

e8e8656

meh

b53b4a6

sort unicode tables in ascending order!

c96cb3d

some effort towards next minor release

59d14f9

ready for CI by disabling lots of things

c6b82c1

jquast changed the title ~~draft: towards unicode version levels~~ review: towards unicode version levels Oct 22, 2017

jquast mentioned this pull request Oct 22, 2017

Wrong width for emoji chars #19

Closed

jayvdb mentioned this pull request Oct 2, 2019

Add old tables fonttools/unicodedata2#28

Open

jquast mentioned this pull request Oct 22, 2019

some emojis return width of 1 #27

Closed

jquast commented Mar 1, 2020

View reviewed changes

jquast added 10 commits May 29, 2020 02:02

Getting therey

0327f6b

iterating priority tree

c2e7ecf

indexing associative processes

f2ffaf4

reallocating circular heaps

70503c7

setting root arrays

e4bd85d

recalibrating associative map structures

5f2a97d

reallocating binomial callbacks

e1f2fa8

building multi-dimensional stackframes

50e5032

iterating lookup procedures

f1bdb18

searching bit processes

d73e488

jquast added 10 commits June 1, 2020 10:55

refactoring cartesian map structures

38cd149

setting multi-dimensional macros

b154cce

searching cyclic structures

81c7d13

positioning first-order tables

72d76a3

iterating in-order jobs

b9b6152

iterating root namespaces

e44acae

reallocating inverse sectors

c9eda9b

allocating unique tables

837b261

setting parallel map structures

f086e38

freeing b-tree callbacks

ae80724

jquast added 3 commits June 1, 2020 11:35

Merge remote-tracking branch 'origin/master' into towards-versioned-u…

1138ba8

…nicodes

configuring cartesian queues

9fa147c

building multi-dimensional graph structures

b1f968e

jquast mentioned this pull request Jun 1, 2020

Optimize wcwidth() #35

Merged

jquast added 8 commits June 1, 2020 11:42

preparing cartesian procedures

83e47bb

reallocating acyclic graph structures

797e735

parsing trinomial jobs

105cd2d

recalibrating storage stackframes

a277ba6

reordering cartesian tree

1468577

reordering threaded graph structures

0a14ca6

modeling bi-directional splines

92212d7

initializing decision macros

0833758

jquast changed the title ~~review: towards unicode version levels~~ Support *all* Unicode Versions Jun 1, 2020

jquast changed the title ~~Support *all* Unicode Versions~~ Support all Unicode Versions Jun 1, 2020

jquast added 2 commits June 1, 2020 12:27

configuring compute queues

edacd4c

testing crypto jobs

5177f98

jquast merged commit 16a762f into master Jun 1, 2020

jquast deleted the towards-versioned-unicodes branch June 1, 2020 16:48

laixintao mentioned this pull request Jun 23, 2020

upgrade denpendency, fix binary build laixintao/iredis#350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support all Unicode Versions #23

Support all Unicode Versions #23

jquast commented Mar 10, 2017 •

edited

jayvdb commented Oct 2, 2019

jquast commented Mar 1, 2020 •

edited

jquast Oct 22, 2017

jquast Oct 22, 2017

jquast Oct 22, 2017

jquast Oct 22, 2017

jquast commented Mar 1, 2020

codecov bot commented Jun 1, 2020 •

edited


		.. autofunction:: wcwidth._get_package_version

		.. autofunction:: wcwidth._wcmatch_version

Support all Unicode Versions #23

Support all Unicode Versions #23

Conversation

jquast commented Mar 10, 2017 • edited

Problem

Solution

jayvdb commented Oct 2, 2019

jquast commented Mar 1, 2020 • edited

jquast Oct 22, 2017

Choose a reason for hiding this comment

jquast Oct 22, 2017

Choose a reason for hiding this comment

jquast Oct 22, 2017

Choose a reason for hiding this comment

jquast Oct 22, 2017

Choose a reason for hiding this comment

jquast commented Mar 1, 2020

codecov bot commented Jun 1, 2020 • edited

Codecov Report

jquast commented Mar 10, 2017 •

edited

jquast commented Mar 1, 2020 •

edited

codecov bot commented Jun 1, 2020 •

edited