ansible: use gcc 4.9 on CentOS 6 #809

seishun · 2017-07-22T14:12:52Z

Currently CentOS 6 machines use gcc 4.8 while Node.js requires "gcc and g++ 4.9.4 or newer". See #762.

devtoolset-3 has gcc 4.9.1, too old.
devtoolset-5 has "gcc (GCC) 5.3.1 20160406", that's still older than gcc 4.9.4.
devtoolset-6 has "gcc (GCC) 6.3.1 20170216", good enough.

The CERN repo doesn't have devtoolset newer than 2, so I replaced that with Software Collections.

rvagg · 2017-08-11T05:52:55Z

heads-up that we're dealing with some glibc bug on centos6 for 8.x builds on the linux distributions: nodesource/distributions#513, waiting for Chris to shave that yak so we can understand more.

chrislea · 2017-08-11T17:39:06Z

Here is what I know thus far:

If you compile 8.3.0 on CentOS 6 with gcc = 4.8.2 from devtoolset-2, the binary works fine.

If you compile 8.3.0 on CentOS 6 with gcc = 4.9.2 from devtoolset-3, the binary does not work and gives the error:

./node: error while loading shared libraries: cannot allocate memory in static TLS block

You see the same bad behavior if you use the compilers from devtoolset-4 or devtoolset-6. The issue is constrained specifically to CentOS 6. CentOS 7 does not have this problem, nor does any version of Fedora, Debian or Ubuntu that we tend to care about.

The error appears to be due to this bug which wasn't actually fixed until a newer version of glibc, meaning that fix is not at this point going to make it back to CentOS 6.

This is problematic as the compiler requirement recently moved to "4.9.x" basically. So the options seem to be:

Back the compiler requirement back down. I'm not sure if this will actually be technically possible going forward if V8 starts using language features that gcc 4.8.x doesn't have.
Stop supporting CentOS 6. Rather undesirable considering the amount of enterprise deployments still using it.
Some technical (or other) fix for this that's past my understanding.

Happy to provide more info if I have it, just let me know.

seishun · 2017-08-11T18:08:05Z

Back the compiler requirement back down.

In other words, continue working around compiler bugs due to a glibc bug that was fixed 4.5 years ago.

I'd say we should just raise the minimum supported glibc version to 2.17 (which came out on 25 Dec 2012). Enterprises still using CentOS 6 would need to upgrade to CentOS 7 or get a newer version of glibc.

cc @nodejs/build

chrislea · 2017-08-11T18:12:56Z

I'd say we should just raise the minimum supported glibc version to 2.17 (which came out on 25 Dec 2012). Enterprises still using CentOS 6 would need to upgrade or get a newer version of glibc.

Just speaking from experience, that won't happen. If we do that, we should understand that we are dropping support for CentOS 6 / RHEL 6.

I'm not saying that to imply it's the wrong course of action, but we shouldn't kid ourselves about the consequences.

gibfahn · 2017-08-11T18:20:35Z

Enterprises still using CentOS 6 would need to upgrade or get a newer version of glibc.

AIUI successfully upgrading glibc is basically impossible (or at least not something you'd ever contemplate for a production system).

seishun · 2017-08-11T18:52:10Z

It's possible to install a newer version in parallel: https://unix.stackexchange.com/a/299665/164374

chrislea · 2017-08-11T20:17:36Z

@seishun It's possible yes, but that seems like a really bad thing for the Node Foundation to suggest for end users for several reasons.

As soon as you do that, you're putting the Node project in the line of sight for support for something other than installing Node itself, which is definitely not desirable.
Even worse, when the inevitable case happens where somebody does install an out-of-band version of glibc, does something to their system to invoke using it for Node (adjusting LD_PRELOAD_PATH or somesuch), and then forgets they did that, and some other application doesn't work properly as a result, the Node project will be seen as a bad actor for having suggested doing it in the first place.
Probably worst of all, one of the reasons we're having the discussion is that we don't want to back the compiler requirement back down to 4.8.x. So if we suggest this, we're basically saying "We aren't willing to work around issues with old glibc implementations anymore because it's just too painful for us. So instead we are pushing that pain point onto our end users." This is not a good strategy for making end users happy. You'd never see Oracle suggesting such a thing for a Java build, for example. If you're at the point where you're telling an end user to compile system level libraries from source just to use the application software you're shipping, you've gone down a bad road.

There's a bit of a pickle here already of course, because 8.3.0 is already out in the wild, and was built with a compiler that doesn't meet the current compiler requirements. But that's probably a small issue in the grand scheme of things.

I still think that if we're not willing to use the older compiler, and there's no way to code around this, then it's time we simply say CentOS 6 / RHEL 6 is too old and it's not supported anymore. That day is going to come eventually, and it's going to come before those distros reach EOL, so if we're at that point so be it.

seishun · 2017-08-11T20:40:39Z

the Node project will be seen as a bad actor for having suggested doing it in the first place.

Node project shouldn't suggest anything. It should just state the glibc and compiler requirements and let the users (or repo maintainers) figure out how to satisfy them.

So if we suggest this, we're basically saying "We aren't willing to work around issues with old glibc implementations anymore because it's just too painful for us. So instead we are pushing that pain point onto our end users." This is not a good strategy for making end users happy. You'd never see Oracle suggesting such a thing for a Java build, for example.

Java is maintained by Oracle, so they can follow whatever policy they like. But Node is mostly maintained by unpaid volunteers and having to work around old compiler bugs will turn people away. Requiring a glibc version that's less than 5 years old is not an unreasonable demand.

chrislea · 2017-08-11T20:42:19Z

Hey, like I said, I'm not in any way trying to suggest that saying "EL6 is too old, sorry" is the wrong answer here. Nor is it my decision to make. All I was trying to point out was that if that's what's happening, I think the project should be clear about it.

refack · 2017-08-11T21:05:38Z

I'd like to restate the working assumption "feature frozen system is a feature frozen system", that is, if user decided to freeze their OS for stability's sake, it is fairly safe to assume they have also frozen their node version.
So the combination of an old OS + new major node version is highly unlikely.
Hence we should only support old OSs for old node versions.
Ref: nodejs/node#12672

chrislea · 2017-08-11T21:11:14Z

Oh man @refack my life would be so much easier if ^^^ was true. ;-)

refack · 2017-08-11T21:17:00Z

Oh man @refack my life would be so much easier if ^^^ was true. ;-)

Well then you are our gatekeeper, personally I haven't seen many "minor version upgrade broke my system" bugs in the core repo. So thank you, and sorry 🤗
But then again I'm gonna sue you for attempting to penetrate my system with
curl -sL https://deb.nodesource.com/setup_8.x | sudo -E bash - 😉
_{maybe I've been preoccupied with "windows installer broke my system" bugs}

rvagg · 2017-08-12T07:10:29Z

It's hard to overstate how poorly this will go down: "Enterprises still using CentOS 6 would need to upgrade to CentOS 7". Folks are still using EL5 in production out there. In very large companies, these kinds of moves are very complicated and therefore move very slowly. We're a little too comfortable catering to nimble startups and early-adopter types that are more used to working on the cutting edge, or close to it. As you go out further than that we hit real world constraints where these things are really painful.

Does clang offer us a get-out-of-jail here perchance?

bnoordhuis · 2017-08-12T07:21:24Z

There might be a simple workaround. Can someone try this patch?

diff --git a/deps/v8/src/trap-handler/handler-shared.cc b/deps/v8/src/trap-handler/handler-shared.cc
index 7b399f5eea..e4f0136be7 100644
--- a/deps/v8/src/trap-handler/handler-shared.cc
+++ b/deps/v8/src/trap-handler/handler-shared.cc
@@ -23,7 +23,9 @@ namespace v8 {
 namespace internal {
 namespace trap_handler {
 
-THREAD_LOCAL bool g_thread_in_wasm_code = false;
+// Note: sizeof(g_thread_in_wasm_code) must be > 1 to work around
+// https://sourceware.org/bugzilla/show_bug.cgi?id=14898
+THREAD_LOCAL int g_thread_in_wasm_code = false;
 
 size_t gNumCodeObjects = 0;
 CodeProtectionInfoListEntry* gCodeObjects = nullptr;
diff --git a/deps/v8/src/trap-handler/trap-handler.h b/deps/v8/src/trap-handler/trap-handler.h
index 5494c5fdb3..ed9459918b 100644
--- a/deps/v8/src/trap-handler/trap-handler.h
+++ b/deps/v8/src/trap-handler/trap-handler.h
@@ -65,7 +65,7 @@ inline bool UseTrapHandler() {
   return FLAG_wasm_trap_handler && V8_TRAP_HANDLER_SUPPORTED;
 }
 
-extern THREAD_LOCAL bool g_thread_in_wasm_code;
+extern THREAD_LOCAL int g_thread_in_wasm_code;
 
 inline bool IsThreadInWasm() { return g_thread_in_wasm_code; }

bnoordhuis · 2017-08-12T07:23:47Z

Does clang offer us a get-out-of-jail here perchance?

Unlikely. It's a glibc bug, gcc is not at blame.

seishun · 2017-08-12T09:55:11Z

@bnoordhuis Builds and runs fine with the patch.

@rvagg Without undermining your point about large and slow companies, comparing "working on the cutting edge" with "using an EOL distro that was superseded 6 years ago" is a bit of a false dichotomy.

bnoordhuis · 2017-08-12T11:54:44Z

Thanks. I'll check if upstream is amenable to this change.

bnoordhuis · 2017-08-12T12:31:06Z

https://chromium-review.googlesource.com/c/612351 - let's wait and see.

seishun · 2017-08-12T12:33:19Z

I'm curious why this isn't an issue with gcc 4.8.2 though.

FireBurn · 2017-08-12T12:36:20Z

It's an issue for me with GCC 4.8.2 on RHEL6

chrislea · 2017-08-12T16:20:17Z

Confirming (also) that 8.3.0 builds and runs on EL6 using that patch with the compilers from

devtoolset-3: gcc = 4.9.2
devtoolset-4: gcc = 5.3.1
devtoolset-6: gcc = 6.3.1

As seems often the case, a great big thanks to @bnoordhuis for saving the day.

A question (again I suppose for you Ben) since I'm not very familiar with how quickly things move through V8: is it reasonable to assume that they'll include this patch and that it will make it to the mainstream V8 that gets shipped with Node Soon (tm)? I ask because on the RPM packages side, it's fairly easy for me to use this patch and put in logic like "if el6 use the patch else don't". But if this is going to land in mainline next week or something, it's probably not worth it to do that, rebuild everything, and then back that out when it's no longer needed.

Thanks again!

refack · 2017-08-12T16:57:47Z

But if this is going to land in mainline next week or something, it's probably not worth it to do that, rebuild everything, and then back that out when it's no longer needed.

V8 CLs can land very fast, if they are "amenable" to the actual change.
P.S. @bnoordhuis, didn't they give you Dry Run permissions?

bnoordhuis · 2017-08-12T17:08:07Z

@chrislea What Refael said. If I get a LGTM, I can land the patch and cherry-pick it into node. That would then go out in the next release.

@refack I have, but this is one of those it-either-compiles-or-it-doesn't things. I didn't feel it was worth spending Watts on a dry run.

chrislea · 2017-08-12T20:42:38Z

Okay, thanks Ben. If that's the case I'll to a one-off rebuild just for EL6 RPM packages and this should be settled.

Also updates glibc version to 2.17. This means the minimum supported distros are RHEL/CentOS 7 and Ubuntu 14.04. | Distro | Kernel | glibc | | --- | --- | --- | | Current | 2.6.32 | 2.12 | | RHEL6 | 2.6.32 | 2.12 | | New | 3.10.0 | 2.17 | | RHEL7 | 3.10.0 | 2.17 | | Ubuntu 14.04 | 3.13 | 2.19 | | Ubuntu 16.04 | 4.4 | 2.23 | | Latest | 4.12.5 | 2.26 | Refs: nodejs/build#809

glibc before 2.17 has a bug that makes it impossible to execute binaries that have single-byte thread-local variables: % node --version node: error while loading shared libraries: cannot allocate memory in static TLS block Work around that by making the one instance in the V8 code base an int. See: https://sourceware.org/bugzilla/show_bug.cgi?id=14898 See: nodesource/distributions#513 See: nodejs/build#809 Change-Id: Iefd8009100cd93e26cf8dc5dc03f2d622b423385 Reviewed-on: https://chromium-review.googlesource.com/612351 Commit-Queue: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-by: Eric Holk <eholk@chromium.org> Cr-Commit-Position: refs/heads/master@{#47400}

refack · 2017-10-25T19:02:18Z

@seishun since ATM we don't have a test CentOS6 x32 machine and one one release machine, we probably don't want this to run on x32 machines, just in case.

To sum discussion, we need these bot setups:

CentOS6	x64	x32
Release	4.8 & 4.9	4.8
Test	4.8 & 4.9	~~4.8~~

seishun · 2017-10-25T19:06:25Z

@refack Why do we need 4.8 on 64-bit?

refack · 2017-10-25T19:07:11Z

@refack Why do we need 4.8 on 64-bit?

Testing & Building node v6

seishun · 2017-10-25T20:00:48Z

Makes sense. Does that mean the repo should have separate ansible scripts for 4.8 and >=4.9?

seishun · 2017-10-26T09:20:18Z

(edited because I realized there is no 32-bit CentOS 6 test machine)

On the other hand, we'll already have gcc 4.8 on CentOS 5 if we want to make sure node 6.x builds with gcc 4.8. Is there really a need to use gcc 4.8 on all other test machines for node 6.x?

refack · 2017-10-26T20:21:27Z

@sgallagher hi, I just saw you commenting on the c-ares thread and was wondering if you have any input on this?

sgallagher · 2017-10-26T20:30:03Z

@refack Sorry, I've scanned the messages, but I'm not sure what question you want me to answer. Could you summarize, please?

refack · 2017-10-26T20:40:12Z

@sgallagher we want to upgrade to GCC 4.9.4, we couldn't find an RPM that will cross compile on CentOS6.64 for an 32bit target.
I was wondering if you know of a simple way to achieve that (optimal scenario a 4 way side by side, one host that can use 4.9.4x32 and x64, and 4.8x32 and 64)?

codonell · 2017-10-26T21:01:04Z

I'm sure @sgallagher would agree with me here...

As an upstream senior developer for glibc, and the glibc team lead for Red Hat Enterprise Linux, I strongly suggest that issues like this be taken upstream for your distribution. If you are supporting CentOS 6, take it upstream to RHEL6, and file a bug. That way my team can evaluate the direct impact of the problem, perhaps even providing a workaround (if one exists). Your bug always adds to the data we have in our tracker, and that helps us evaluate risk vs. reward for a fix in RHEL6 (in production phase 3, accepting only urgent bug fixes).

The upstream glibc bug 14898 was a trivial fix of a singular internal constant (changed to -1) to allow a single-byte TLS variable to exist. It is a fix that has limited scope, little risk, and would fix this use case. The question to really answer is: Is this the problem you are seeing? If it is, then a trivial workaround is to increase the size of the object beyond 1 byte, that would safely change the size of the allocated object not to clash with the use of the internal constant (which is 1 also).

So you have a workable solution to keep running on RHEL6 IMO.

Lastly, I do not think this is a compiler issue, but it might be, perhaps the compiler was able to optimize and remove a TLS variable that was never used, and now you're down to just one variable of size 1-byte. It's hard for me to evaluate this in this ticket (there is a lot of noise here). Again, filling a ticket upstream would help us get to the bottom of this particular use case.

sgallagher · 2017-10-26T21:07:46Z

@refack I would suggest you take a look at the mock tool in CentOS. It’s whole purpose is to work with yum to set up a chroot environment for doing clean compilations. (It’s what all of the builders in Fedora use to create RPMs).

You can use both the epel-6-x86_64 and epel-6-i386 configs to build for 64-bit and 32-bit, respectively.

Feel free to reach out to me privately if you need help setting it up.

refack · 2017-10-26T21:11:28Z

@codonell thanks for the reply. The TLS issue was indeed a blocker, but as you state it was resolved. IMHO what we are facing now is more of a configuration/ABI-compatibility issue. That is we are now stuck at the link stage:

  g++ -pthread -rdynamic -m32  -o /home/centos/node/out/Release/openssl-cli -Wl,--start-group ... /home/centos/node/out/Release/obj.target/deps/openssl/libopenssl.a -ldl -Wl,--end-group
/opt/rh/devtoolset-6/root/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/ld: skipping incompatible /opt/rh/devtoolset-6/root/usr/lib/gcc/x86_64-redhat-linux/6.2.1/libstdc++_nonshared.a when searching for -lstdc++_nonshared
/opt/rh/devtoolset-6/root/usr/libexec/gcc/x86_64-redhat-linux/6.2.1/ld: cannot find -lstdc++_nonshared

refack · 2017-10-26T21:14:22Z

P.S. @codonell @sgallagher are there any obvious caveats in building on CentOS/RHEL6 with a non "canonical" GCC toolchain (i.e. 4.9.4)

sgallagher · 2017-10-26T21:16:56Z

@refack Do you mean a custom-made toolchain or one of those from Red Hat’s Developer Toolset releases? Those at least are fully supported by Res Hat for production use (with a Red Hat subscription, of course).

refack · 2017-10-26T21:25:26Z

Do you mean a custom-made toolchain or one of those from Red Hat’s Developer Toolset releases? Those at least are fully supported by Res Hat for production use (with a Red Hat subscription, of course).

AFAIK Since node is an OSS project there's a tendency to use FOSS, so till now we used devtoolset-2 built by SLC. And now we're considering devtoolset-6 build by mlampe for copr.
_{I'm assuming any answers are general observations and not official RH support}

codonell · 2017-10-27T14:36:01Z

@refack Your question about building with a non-canonical toolchain is a very good one. In the case of devtoolset, the tooling is designed specifically to layer a new compiler (and parts of the runtime) on top of system runtimes. If you build a new toolchain from whole cloth, you are in a very different position, a position where you become responsible for bundling (a) the compiler runtime, (b) the language runtimes and (c) any new dependencies therein. Therefore using devtoolset (developer toolset) vs. using a freshly built gcc (which may need a new binutils, and whose generated code may or may not need a new glibc) are very different questions. I would suggest staying with devtoolset as much as you can since the results are predictable and well studied.

seishun · 2017-11-06T18:15:12Z

@refack ping... what's the status of this?

refack · 2017-11-14T00:50:59Z

summary:

uninstalled devtoolset-2
removed centos6-64-gcc44 label
updated git to rh-git29 (added to profile.d and sysconfig/jenkins)

seishun · 2017-11-14T09:57:09Z

Ran CI with a commit that fails on gcc < 4.9. Looks good: https://ci.nodejs.org/job/node-test-commit-linux/14092/nodes=centos6-64/

Just curious, how did you solve the problem of testing and releasing on Node versions < 9.x?

Also, while I appreciate your work here, upgrading gcc on select machines doesn't help much if it's not going to be upgraded on all machines used for CI on master. Which is why I'm going to wait until #797 (comment) is resolved before continuing with other machines that require upgrading.

refack · 2017-11-14T12:09:47Z

Just curious, how did you solve the problem of testing and releasing on Node versions < 9.x?

I tested the binary outputted by devtoolset-6 on vanilla CenOS6 and it "just works" (as @codonell commented in #809 (comment))

Also, while I appreciate your work here, upgrading gcc on select machines doesn't help much if it's not going to be upgraded on all machines used for CI on master.

👍 I'll use https://ci.nodejs.org/job/node-test-commit/14000/ as the basis for an upgrade task list

ansible: use gcc 4.9 on CentOS 6

92c92ba

gibfahn mentioned this pull request Aug 12, 2017

build: update minimum kernel version to 3.10.0 nodejs/node#14795

Closed

2 tasks

refack mentioned this pull request Aug 16, 2017

Node.js 8.3.0 for CentOS 6 has glibc dependency errors nodejs/node#14755

Closed

seishun mentioned this pull request Oct 26, 2017

build: use higher version c++ nodejs/node#16501

Closed

3 tasks

seishun mentioned this pull request Nov 3, 2017

meta: 32 bit linux is experimental nodejs/node#16723

Merged

3 tasks

refack merged commit f6f19ab into nodejs:master Nov 13, 2017

action items automation moved this from todo to done Nov 13, 2017

refack removed the build-agenda label Nov 14, 2017

seishun deleted the centos6-gcc49 branch November 14, 2017 07:45

seishun mentioned this pull request Feb 14, 2018

ansible: use gcc 4.9 on cross compiling machine #1094

Merged

This was referenced Mar 30, 2018

Use gcc 4.9.4 in CI #762

Closed

ansible: use gcc 4.9.4 or newer on CentOS 6 #1203

Closed

refack mentioned this pull request Oct 30, 2018

arm: release worker centos7-arm64 uses GCC4.8 #1542

Closed

refack mentioned this pull request Nov 23, 2018

chore: list compiler versions used to build each node version on each platform #1522

Closed

ansible: use gcc 4.9 on CentOS 6 #809

ansible: use gcc 4.9 on CentOS 6 #809

Conversation

seishun commented Jul 22, 2017 • edited

rvagg commented Aug 11, 2017

chrislea commented Aug 11, 2017

seishun commented Aug 11, 2017 • edited

chrislea commented Aug 11, 2017

gibfahn commented Aug 11, 2017

seishun commented Aug 11, 2017

chrislea commented Aug 11, 2017 • edited

seishun commented Aug 11, 2017

chrislea commented Aug 11, 2017

refack commented Aug 11, 2017 • edited

chrislea commented Aug 11, 2017

refack commented Aug 11, 2017

rvagg commented Aug 12, 2017

bnoordhuis commented Aug 12, 2017

bnoordhuis commented Aug 12, 2017

seishun commented Aug 12, 2017 • edited

bnoordhuis commented Aug 12, 2017

bnoordhuis commented Aug 12, 2017

seishun commented Aug 12, 2017

FireBurn commented Aug 12, 2017

chrislea commented Aug 12, 2017

refack commented Aug 12, 2017

bnoordhuis commented Aug 12, 2017

chrislea commented Aug 12, 2017

refack commented Oct 25, 2017 • edited

seishun commented Oct 25, 2017

refack commented Oct 25, 2017

seishun commented Oct 25, 2017

seishun commented Oct 26, 2017 • edited

refack commented Oct 26, 2017

sgallagher commented Oct 26, 2017

refack commented Oct 26, 2017 • edited

codonell commented Oct 26, 2017

sgallagher commented Oct 26, 2017

refack commented Oct 26, 2017

refack commented Oct 26, 2017

sgallagher commented Oct 26, 2017

refack commented Oct 26, 2017

codonell commented Oct 27, 2017

seishun commented Nov 6, 2017

refack commented Nov 14, 2017

seishun commented Nov 14, 2017

refack commented Nov 14, 2017

seishun commented Jul 22, 2017 •

edited

seishun commented Aug 11, 2017 •

edited

chrislea commented Aug 11, 2017 •

edited

refack commented Aug 11, 2017 •

edited

seishun commented Aug 12, 2017 •

edited

refack commented Oct 25, 2017 •

edited

seishun commented Oct 26, 2017 •

edited

refack commented Oct 26, 2017 •

edited