Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on failed ICU data load, ignoring NODE_ICU_DATA or --icu-data-dir #32466

Open
srl295 opened this issue Mar 24, 2020 · 8 comments
Open

Retry on failed ICU data load, ignoring NODE_ICU_DATA or --icu-data-dir #32466

srl295 opened this issue Mar 24, 2020 · 8 comments
Assignees
Labels
cli Issues and PRs related to the Node.js command line interface. i18n-api Issues and PRs related to the i18n implementation.

Comments

@srl295
Copy link
Member

srl295 commented Mar 24, 2020

Related to #30825 (I had the idea during its discussion) and, from the icebox, #3460

Is your feature request related to a problem? Please describe.
Currently, Node with an invalid/missing ICU data directory ( set with NODE_ICU_DATA or --icu-data-dir ) just fails.

$ nvm i --lts
Installing latest LTS version.
v12.16.1 is already installed.
$ node --icu-data-dir=/tmp
node: could not initialize ICU (check NODE_ICU_DATA or --icu-data-dir parameters)

Describe the solution you'd like

It might be possible to fall back as if the --icu-data-dir was not specified (or the default data dir in #30825 was not present.)

As early as that error message is generated, it might be possible to call u_cleanup() to unload ICU and then reinitialize it using the baked-in data or other defaults.

This could be done completely within InitializeICUDirectory(), although it might be advantageous to allow programmatic detection of the fact that this fallback happened. Ideas there:

  1. expose const char *u_getDataDirectory() via the ICU process binding.
  2. provide the eventual path string via some kind of variable
  3. print out a warning (not a good idea)
  4. some other flag
  5. do nothing - this would probably be OK if there isn't a strong use case (other than perhaps test cases)

FYI @sam-github @sgallagher @nodejs/i18n-api

Describe alternatives you've considered
There's really no alternative to restarting the node process at this point.

@srl295 srl295 added the i18n-api Issues and PRs related to the i18n implementation. label Mar 24, 2020
@srl295 srl295 self-assigned this Mar 24, 2020
@srl295 srl295 added the cli Issues and PRs related to the Node.js command line interface. label Mar 24, 2020
@sgallagher
Copy link
Contributor

So, just to clarify, the issue here is when Node has been called with a specific --icu-data-dir argument or the NODE_ICU_DATA set in the environment and that directory is empty, nonexistent or otherwise inaccessible (e.g. permissions)?

It seems to me that this would then be a case of "operator error" and that reporting the error back to the user makes sense. Maybe it's appropriate just to make the error message more detailed, such as:

node: could not initialize ICU: NODE_ICU_DATA or --icu-data-dir specifies a path that does not contain valid ICU data files.

@srl295
Copy link
Member Author

srl295 commented Mar 24, 2020

So, just to clarify, the issue here is when Node has been called with a specific --icu-data-dir argument or the NODE_ICU_DATA set in the environment and that directory is empty, nonexistent or otherwise inaccessible (e.g. permissions)?

It could be several things: (and note that the "dir" is really a search path):

  • The path doesn't contain data of the correct endianness and matching ICU version
  • The path contains a data file, but it is somehow corrupt

It seems to me that this would then be a case of "operator error" and that reporting the error back to the user makes sense. Maybe it's appropriate just to make the error message more detailed, such as:

node: could not initialize ICU: NODE_ICU_DATA or --icu-data-dir specifies a path that does not contain valid ICU data files.

(This is probably a good change to make in the wording anyway.)

It could be classified as operator error. V8 certainly thinks so, and used to fail with fatal errors for related problems.

But it could also be the situation where the user has upgraded their Node version, or deployed the same directory to a different hardware architecture, etc. Now, Node.js as packaged by default, has a built in full data file. However, that data file is not on the "search path" if the user specifies one.

Put another way: You can't deploy Node.js unconditionally calling it with NODE_ICU_DATA presuming that you might install override files (timezone anyone?) or the like there at some future point. It's all or nothing. You either get:
a.) the baked-in data from Node.js build time
b.) you're fully responsible as the user for providing all data.

This PR provides an in-between. There could be additional controls, such as a build option to "not bake in data", or a runtime/build option to "fail if --icu-data-dir does not contain data", but I think for many users, this fallback might be an improvement.

A data point: 7,000 hits on SO for node: could not initialize ICU (check NODE_ICU_DATA or --icu-data-dir parameters — perhaps many of the users would be benefitted by "fail fast", but some might benefit from some kind of a warning. Perhaps require('full-icu').assertHaveData() or similar could actually do something, and fail if data isn't loaded?

@sgallagher
Copy link
Contributor

My concern here is violating the Principle of Least Surprise. If I specify --icu-data-dir or I set NODE_ICU_DATA in my environment, I've made a conscious choice that I expect to use the data from those directories. If Node.js silently decides "Oh, hey. That doesn't exist, so I'll use the fallback instead", it's going to surprise me. Worse, I may not even notice right away.

Maybe it would be better for Node.js upstream to behave the way the Fedora packaging does. Instead of building full-icu, we build small-icu and then ship the matching icuXXy.dat file in a well-known location and set the --with-icu-default-data-dir to that path. Absent a NODE_ICU_DATA or --icu-data-dir argument, it will use the data file. Then users could also elect for themselves to set NODE_ICU_DATA=/my/custom/path:/the/well/known/path if they want to fall back to the built-in ICU data or they can just specify NODE_ICU_DATA=/my/custom/path if they want it to fail outright.

@srl295
Copy link
Member Author

srl295 commented Mar 25, 2020

My concern here is violating the Principle of Least Surprise

👍 and I definitely hear that. That's why this isn't a PR.

we build small-icu and then ship the matching icuXXy.dat file in a well-known location

That was somewhat of the idea in #3460, the problem is that Node.js is shipped as a tarball, so a well-known location can be harder to come by.

Let me see if I can distill some thoughts and specific action items here :

A. First, I think the default of building full-icu for upstream is generally simpler. You can still override the data, but by default it just works. I can see the model you mentioned where Node.js has an 'installer' (dnf, Mac, nvm, brew), but it seems we are getting fewer issues after this change.

B. Perhaps we can reduce user surprise by making some things explicit:

  • The error message should improve. Perhaps it should even have a URL linking off to a short guide.
  • Documentation should improve to explain that NODE_ICU_DATA / --icu-data-dir is a search path and not a directory. (This was an oversight, probably going back to the ICU API that should have been u_setDataPath not u_setDataDirectory.)
  • Since ICU itself doesn't know about the Node-builtin data, (it usually has its own builtin data…) there's no "path" that specifies it. Perhaps a convention such as NODE_ICU_DATA=BUILTIN (which would be a no-op but would document) or NODE_ICU_DATA=/my/custom/path:/the/well/known/path:BUILTIN (it would have to be at the end) could explicitly request fallback to the builtin data. I could see a use case for this. Related to that, (per above) there's no way to say, "use the Node-builtin data, but override it with directory X."
  • From the user side, again as above, it could be good to give the user some API (perhaps on full-icu or on Node.js itself) to definitely assert that the runtime environment is as expected.

C. In rethinking through the ICU-builtin vs Node-builtin data, actually, if the Node-builtin data is the full set, it should really be linked into ICU as ICU-builtin. That would allow the partial override scenario I mentioned above. You couldn't add a locale, but you could override specific data items. (I'm not as sure this is a good idea, I think it might prevent wholesale replacement of the builtin data—because it does not work the way Node's data works today.)

@mhdawson
Copy link
Member

My current thinking is that I'd prefer that if we can't do what you ask it results in an error like it does now. In terms of being able to additionally specify to fall back I wonder if it is a common enough use case to warrant the extra work/code?

@srl295
Copy link
Member Author

srl295 commented Mar 25, 2020

My current thinking is that I'd prefer that if we can't do what you ask

Consider the scenario:

  • app is developed on, say, Node 10
  • app installs full-icu and sets the path
  • environment upgrades to Node 12. Full data is included, full-icu is no longer needed.
  • If full-icu is not updated, the app will still fail.

In any event, I think I am fine with closing this if it's going to cause more confusion than not.

The error message clearly needs work… Perhaps the error message could suggest removing the path if it is not needed (especially if baked in full data is detected)?

@sam-github
Copy link
Contributor

I haven't played with icu, so let me see if I understand:

  • app uses node10
  • it depends on full-icu, which is npm installed
  • using full-icu means running as node --icu-data-dir=node_modules/full-icu/... app.js
  • node is changed to 12
  • app is run with node12 and the same command line (probably from package.json's start:).

Do this, and the app fails, because full-icu installed with node10 isn't compatible with node12?

If that's the situation, it seems the same as node addons. Would the same solution, rebuilding, not work? You can't generally take an installed node_modules folder and keep it between node majors.

Or am I missing something more unique to icu that makes this worse than usual for users?

@mhdawson
Copy link
Member

@srl295 I like the idea of the error message for the case where the requested data cannot be found indicating that you may not need the command line option anymore for versions where the data is bundle in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Issues and PRs related to the Node.js command line interface. i18n-api Issues and PRs related to the i18n implementation.
Projects
None yet
Development

No branches or pull requests

4 participants