Skip to content

Commit

Permalink
doc,deps: document how to maintain ICU in Node.js
Browse files Browse the repository at this point in the history
- update v8 guide to mention ICU
- move content from the tools/icu/README.md but leave a pointer

Fixes: #26108
Co-Authored-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
PR-URL: #30607

Reviewed-By: Ujjwal Sharma <usharma1998@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Myles Borins <myles.borins@gmail.com>
  • Loading branch information
srl295 authored and MylesBorins committed Nov 27, 2019
1 parent 7f94fe0 commit f36331c
Show file tree
Hide file tree
Showing 3 changed files with 255 additions and 115 deletions.
4 changes: 4 additions & 0 deletions doc/guides/maintaining-V8.md
Expand Up @@ -313,6 +313,10 @@ Node.js keeps a vendored copy of V8 inside of the deps/ directory. In addition,
Node.js may need to float patches that do not exist upstream. This means that
some care may need to be taken to update the vendored copy of V8.

V8 builds against the version of ICU supplied by Node.js,
see [maintaining-icu.md](./maintaining-icu.md) for special considerations.
Specifically, a V8 update may necessitate an ICU update.

### Minor updates (patch level)

Because there may be floating patches on the version of V8 in Node.js, it is
Expand Down
233 changes: 233 additions & 0 deletions doc/guides/maintaining-icu.md
@@ -0,0 +1,233 @@
# Maintaining ICU in Node.js

## Background

International Components for Unicode ([ICU4C][ICU]) is used both by V8
and also by Node.js directly to provide internationalization
functionality. To quote from icu-project.org:

> ICU is a mature, widely used set of C/C++ and Java libraries providing
> Unicode and Globalization support for software applications. ICU is
> widely portable and gives applications the same results on all platforms
> and between C/C++ and Java software.
## Data dependencies

ICU consumes and includes:

* Extracted locale data from [CLDR][]
* Extracted [Unicode][] data.
* Time zone ([tz][]) data

The current versions of these items can be viewed for node with `node -p process.versions`:

```shell
$ node -p process.versions

{
cldr: '35.1',
icu: '64.2',
tz: '2019a',
unicode: '12.1'
}
```

## Release Schedule

ICU typically has >1 release a year, particularly coinciding with a major
release of [Unicode][]. The current release schedule is available on the [ICU][]
website on the left sidebar.

### V8 depends on ICU

V8 will aggressively upgrade to a new ICU version, due to requirements for
features/bugfixes needed for [Ecma402][] support. The minimum required version
of ICU is specified within the V8 source tree. If the ICU version is too old,
V8 will not compile.

```c
// deps/v8/src/objects/intl-objects.h
#define V8_MINIMUM_ICU_VERSION 64
```
V8 in Node.js depends on the ICU version supplied by Node.js.
The file `tools/icu/icu_versions.json` contains the current minimum
version of ICU that Node.js is known to work with. This should be
_at least_ the same version as V8, so that users will find out
earlier that their ICU is too old. A test case validates this when
Node.js is built.
## How to upgrade ICU
* Make sure your Node.js workspace is clean (`git status`
should be sufficient).
* Configure Node.js with the specific [ICU version](http://icu-project.org/download)
you want to upgrade to, for example:
```shell
./configure \
--with-intl=full-icu \
--with-icu-source=http://download.icu-project.org/files/icu4c/58.1/icu4c-58_1-src.tgz
make
```

> _Note_ in theory, the equivalent `vcbuild.bat` commands should work also,
> but the commands below are makefile-centric.

* If there are ICU version-specific changes needed, you may need to make changes
in `tools/icu/icu-generic.gyp` or add patch files to `tools/icu/patches`.
* Specifically, look for the lists in `sources!` in the `tools/icu/icu-generic.gyp` for
files to exclude.

* Verify the Node.js build works:

```shell
make test-ci
```

Also running

<!-- eslint-disable strict -->

```js
new Intl.DateTimeFormat('es', { month: 'long' }).format(new Date(9E8));
```

…Should return `enero` not `January`.

* Now, copy `deps/icu` over to `deps/icu-small`

> :warning: Do not modify any source code in `deps/icu-small` !
> See section below about floating patches to ICU.
```shell
python tools/icu/shrink-icu-src.py
```

* Now, do a clean rebuild of Node.js to test:

```shell
make -k distclean
./configure
make
```

* Test this newly default-generated Node.js

<!-- eslint-disable strict -->

```js
process.versions.icu;
new Intl.DateTimeFormat('es', { month: 'long' }).format(new Date(9E8));
```

(This should print your updated ICU version number, and also `January` again.)

You are ready to check in the updated `deps/icu-small`. This is a big commit,
so make this a separate commit from the smaller changes.

> :warning: Do not modify any source code in `deps/icu-small` !
> See section below about floating patches to ICU.
* Now, rebuild the Node.js license.

```shell
# clean up - remove deps/icu
make clean
tools/license-builder.sh
```

* Update the URL and hash for the full ICU file in `tools/icu/current_ver.dep`.
It should match the ICU URL used in the first step. When this is done, the
following should build with small ICU.

```shell
# clean up
rm -rf out deps/icu deps/icu4c*
./configure --with-intl=small-icu --download=all
make
make test-ci
```

* commit the change to `tools/icu/current_ver.dep` and `LICENSE` files.

* Note: To simplify review, I often will “pre-land” this patch, meaning that
I run the patch through `curl -L https://github.com/nodejs/node/pull/xxx.patch
| git am -3 --whitespace=fix` per the collaborator’s guide… and then push that
patched branch into my PR's branch. This reduces the whitespace changes that
show up in the PR, since the final land will eliminate those anyway.

## Floating patches to ICU

Floating patches are applied at `configure` time. The "patch" files
are used instead of the original source files. The patch files are
complete `.cpp` files replacing the original contents.

Patches are tied to a specific ICU version. They won’t apply to a
future ICU version. We assume that you filed a bug against [ICU][] and
upstreamed the fix, so the patch won’t be needed in a later ICU
version.

### Example

For example, to patch `source/tools/toolutil/pkg_genc.cpp` for
ICU version 63:

```shell
# go to your Node.js source directory
cd <node>

# create the floating patch directory
mkdir -p tools/icu/patches/63

# create the subdirectory for the file(s) to patch:
mkdir -p tools/icu/patches/63/source/tools/toolutil/

# copy the file to patch
cp deps/icu-small/source/tools/toolutil/pkg_genc.cpp \
tools/icu/patches/63/source/tools/toolutil/pkg_genc.cpp

# Make any changes to this file:
(edit tools/icu/patches/63/source/tools/toolutil/pkg_genc.cpp )

# test
make clean && ./configure && make
```

You should see a message such as:

```shell
INFO: Using floating patch "tools/icu/patches/63/source/tools/toolutil/pkg_genc.cpp" from "tools/icu"
```

### Clean Up

Any patches older than the minimum version given in `tools/icu/icu_versions.json`
ought to be deleted, because they will never be used.

### Why not just modify the ICU source directly?

Especially given the V8 dependencies above, there may be times when a floating
patch to ICU is required. Though it seems expedient to simply change a file in
`deps/icu-small`, this is not the right approach for the following reasons:

1. **Repeatability.** Given the complexity of merging in a new ICU version,
following the steps above in the prior section of this document ought to be
repeatable without concern for overriding a patch.

2. **Verifiability.** Given the number of files modified in an ICU PR,
a floating patch could easily be missed — or dropped altogether next time
something is landed.

3. **Compatibility.** There are a number of ways that ICU can be loaded into
Node.js (see the top level README.md). Only modifying `icu-small` would cause
the patch not to be landed in case the user specifies the ICU source code
another way.

[ICU]: http://icu-project.org
[Unicode]: https://unicode.org
[tz]: https://www.iana.org/time-zones
[CLDR]: https://unicode.org/cldr
[Ecma402]: https://github.com/tc39/ecma402
133 changes: 18 additions & 115 deletions tools/icu/README.md
@@ -1,8 +1,9 @@
# Notes about the `tools/icu` subdirectory

This directory contains tools, data, and information about the [ICU](http://icu-project.org)
(International Components for Unicode) integration. ICU is used to provide
internationalization functionality.
This directory contains tools, data, and information about the
International Components for Unicode integration. [ICU][] is used
both by V8 and also by
Node.js itself to provide internationalization functionality.

* `patches/` are one-off patches, actually entire source file replacements,
organized by ICU version number.
Expand All @@ -18,119 +19,21 @@ internationalization functionality.
* `README.md` — you are here
* `shrink-icu-src.py` — this is used during upgrade (see guide below)

## How to upgrade ICU
Note:
> The files in this directory were written for the Node.js v0.12 effort.
> The original intent was to merge the tools such as `icutrim.py` and `iculslocs.cc`
> back into ICU. ICU has gained its own “data slicer” tool.
> There is an issue open, https://github.com/nodejs/node/issues/25136
> for replacing `icutrim.py` with the [ICU data slicer][].
* Make sure your Node.js workspace is clean (clean `git status`) should be
sufficient.
* Configure Node.js with the specific [ICU version](http://icu-project.org/download)
you want to upgrade to, for example:
## See Also

```shell
./configure \
--with-intl=full-icu \
--with-icu-source=http://download.icu-project.org/files/icu4c/58.1/icu4c-58_1-src.tgz
make
```
* [docs/guides/maintaining-icu.md](../../doc/guides/maintaining-icu.md) for
information on maintaining ICU in Node.js

> _Note_ in theory, the equivalent `vcbuild.bat` commands should work also,
> but the commands below are makefile-centric.
* [docs/api/intl.md](../../doc/api/intl.md) for information on the
internationalization-related APIs in Node.js
* [The ICU Homepage][ICU]

* If there are ICU version-specific changes needed, you may need to make changes
in `icu-generic.gyp` or add patch files to `tools/icu/patches`.
* Specifically, look for the lists in `sources!` in the `icu-generic.gyp` for
files to exclude.

* Verify the Node.js build works:

```shell
make test-ci
```

Also running

<!-- eslint-disable strict -->

```js
new Intl.DateTimeFormat('es', {month: 'long'}).format(new Date(9E8));
```

…Should return `enero` not `January`.

* Now, copy `deps/icu` over to `deps/icu-small`

```shell
python tools/icu/shrink-icu-src.py
```

* Now, do a clean rebuild of Node.js to test:

```shell
make -k distclean
./configure
make
```

* Test this newly default-generated Node.js

<!-- eslint-disable strict -->

```js
process.versions.icu;
new Intl.DateTimeFormat('es', {month: 'long'}).format(new Date(9E8));
```

(This should print your updated ICU version number, and also `January` again.)

You are ready to check in the updated `deps/small-icu`. This is a big commit,
so make this a separate commit from the smaller changes.

* Now, rebuild the Node.js license.

```shell
# clean up - remove deps/icu
make clean
tools/license-builder.sh
```

* Update the URL and hash for the full ICU file in `tools/icu/current_ver.dep`.
It should match the ICU URL used in the first step. When this is done, the
following should build with small ICU.

```shell
# clean up
rm -rf out deps/icu deps/icu4c*
./configure --with-intl=small-icu --download=all
make
make test-ci
```

* commit the change to `tools/icu/current_ver.dep` and `LICENSE` files.

* Note: To simplify review, I often will “pre-land” this patch, meaning that
I run the patch through `curl -L https://github.com/nodejs/node/pull/xxx.patch
| git am -3 --whitespace=fix` per the collaborator’s guide… and then push that
patched branch into my PR's branch. This reduces the whitespace changes that
show up in the PR, since the final land will eliminate those anyway.

-----

## Postscript about the tools

The files in this directory were written for the Node.js effort.
It was the intent of their author (Steven R. Loomis / srl295) to
merge them upstream into ICU, pending much discussion within the
ICU-TC.

`icu_small.json` is somewhat node-specific as it specifies a "small ICU"
configuration file for the `icutrim.py` script. `icutrim.py` and
`iculslocs.cpp` may themselves be superseded by components built into
ICU in the future. As of this writing, however, the tools are separate
entities within Node.js, although theyare being scrutinized by interested
members of the ICU-TC. The “upstream” ICU bugs are given below.

* [#10919](http://bugs.icu-project.org/trac/ticket/10919)
(experimental branch - may copy any source patches here)
* [#10922](http://bugs.icu-project.org/trac/ticket/10922)
(data packaging improvements)
* [#10923](http://bugs.icu-project.org/trac/ticket/10923)
(rewrite data building in python)
[ICU data slicer]: https://github.com/unicode-org/icu/blob/master/docs/userguide/icu_data/buildtool.md
[ICU]: http://icu-project.org

0 comments on commit f36331c

Please sign in to comment.