Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy memory usage #5072

Closed
Banou26 opened this issue Aug 25, 2020 · 55 comments · Fixed by #9300
Closed

Heavy memory usage #5072

Banou26 opened this issue Aug 25, 2020 · 55 comments · Fixed by #9300

Comments

@Banou26
Copy link
Contributor

Banou26 commented Aug 25, 2020

🐛 bug report/question

Trying to import ffmpeg.js results in a process taking up 5GBs+ of RAM in a few seconds.
The imported file is 10mb so i can understand it taking up a lot of ram but that much ?

This is something that has been recurring over my projects, parcel processes taking up a lot of ram.
Do you guys have any plans on reducing the memory footprint ?

@DeMoorJasper
Copy link
Member

DeMoorJasper commented Aug 26, 2020

We're definitely working on addressing this, it's currently one of our top priorities. We're mainly focusing on stability and performance at this point, so we can release a stable version of Parcel 2

I'm able to reproduce this and will figure out why this is happening exactly, will report back in this issue.

@download13
Copy link

I've been able to mitigate this problem by passing PARCEL_WORKERS=1 to it. It seems like the memory usage has something to do with how many worker threads it's trying to run. Maybe they're duplicating work?

@roman-petrov
Copy link

I have (it seems) exactly the same issue with ffmpeg.js and Parcel 1.12.4: it runs out of memory when I import ffmpeg and start parcel serve. Is there any existing workaround, maybe some Parcel CLI arguments?

@Banou26
Copy link
Contributor Author

Banou26 commented Sep 13, 2020

@roman-petrov You should try using Parcel 2 with the workaround literally on top of your comment.

I've been able to mitigate this problem by passing PARCEL_WORKERS=1 to it.

@roman-petrov
Copy link

@Banou26 , thank you. I will try to upgrade to Parcel 2 and use PARCEL_WORKERS=1

@aminya

This comment has been minimized.

@mischnic
Copy link
Member

mischnic commented Jan 9, 2021

if the workers are implemented using "processes"

They aren't. Node's WorkerThreads are used when available (you could set PARCEL_WORKER_BACKEND=process to force using processes).

Could you check what

cores = os
.cpus()
.filter((cpu, index) => !cpu.model.includes('Intel') || index % 2 === 1)
.length;
returns on your machine? That is used as the default PARCEL_WORKERS value on Windows.

I think we should cap that count to 4.

@mischnic
Copy link
Member

mischnic commented Jan 9, 2021

Also, could you test this with a bigger project (that takes something like 30s) to determine whether this is the overhead from starting the workers itself.

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

Could you check what

cores = os
.cpus()
.filter((cpu, index) => !cpu.model.includes('Intel') || index % 2 === 1)
.length;

returns on your machine? That is used as the default PARCEL_WORKERS value on Windows.
I think we should cap that count to 4.

It returns 16 which is correct, but I don't think you should spawn all the 16 threads due to its overhead.

I need to do more testing, but it seems that more than 4 is destructive.

@mischnic
Copy link
Member

mischnic commented Jan 9, 2021

It returns 16 which is correct

(It should return 8 because that function is supposed to determine the number of real cores and not threads.)

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

It returns 16 which is correct

(It should return 8 because that function is supposed to determine the number of real cores and not threads.)

I fixed it in #5617

@devongovett
Copy link
Member

I don't think hard coding to 4 is a good idea. It doesn't make sense to me that there is a limit regardless of hardware. We currently base it on the number of available cores, which seems to make sense in regards to the amount of parallelism that is possible. If it slows down after a certain number, there must be a bottleneck somewhere that we can potentially solve. I had looked into this somewhat before but couldn't determine what the problem was. Maybe someone will have better luck.

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

Wrong benchmark

Running the benchmark using Parcel 2.0.0-nightly.520 gives another result. Using workers is slower altogether! No matter how many.

// disclaimer: these fluctuate. Running the command the second time gives a better result (despite running `npm run clean` in between)
1 worker: 384ms
2 workers: 450ms
4 workers: 698ms
8 workers: 746ms

Ran on solid-simple-table. The command was npm run build with && npm run style deleted from the end.

I don't think hard coding to 4 is a good idea. It doesn't make sense to me that there is a limit regardless of hardware. We currently base it on the number of available cores, which seems to make sense in regards to the amount of parallelism that is possible. If it slows down after a certain number, there must be a bottleneck somewhere that we can potentially solve. I had looked into this somewhat before but couldn't determine what the problem was. Maybe someone will have better luck.

Parallelism is only helpful when the overhead is low. That's when parallelism can have performance benefits.
https://youtu.be/9hJkWwHDDxs?t=1016

@mischnic
Copy link
Member

mischnic commented Jan 9, 2021

In your 300ms example, no workers might be faster (btw you can also do PARCEL_WORKERS=0 to actually use 0 workers).

But if you build takes more than a few seconds, it is beneficial.

@aminya

This comment has been minimized.

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

OK, here is the correct benchmark. Still, increasing the number of workers has no effect.

Worker number Time (s)
0 9.72
1 10.63
2 10.18
4 10.42
6 10.37
8 10.36

Ran on solid-simple-table. The command was npm run build with && npm run style deleted from the end.

@devongovett
Copy link
Member

Interesting. That indicates that there is some kind of bug to me. More workers definitely shouldn't be slower. Can you run with the --profile flag and upload the results? You can open the profile in Chrome dev tools to view the benchmark results yourself.

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

Here is the profile:
profile-20210109-161111.zip

What I see is that in the Node file system a lot of sync methods are used instead of using async methods and letting Windows manage the disk read and writes.

This load function for example is calling sync functions

load(resolved: FilePath, from: FilePath): any {

Sorted by total-time
image

Sorted by self-time
image

BTW, the profiler seems to have issues in writing the files to the disk. I have to run the profiler a couple of times (with clean in between) to get one working. It exits with a crash code.

@parcel/core: Starting profiling...
/ Optimizing SimpleTable.js...
npm ERR! code 3221225477

@devongovett
Copy link
Member

Based on the profile I see a couple things:

  1. Was this profile done without workers? I only see the "Master" thread.
  2. It looks like a majority of the time (58%) was spent in cssnano, specifically loading (requiring) a preset. This seems extremely excessive to me. It seems it spent a ton of time waiting for I/O, as you can see based on the time spent in open, lstat, and realpathSync. Not sure what would explain that.

That said, one thing that could explain workers not being faster for some cases is if a majority of the build time is spent in minification of one large bundle, for example. This is not parallelizable, so we'd expect the times to be similar in this case. In your profile, transformation only accounts for 1.4s of the total build time, whereas minification accounts for 11s. During that time, only a single thread will be active.

@mischnic
Copy link
Member

mischnic commented Jan 9, 2021

Results I get with your project:

macOS (4 core i5), last number is wall time

PARCEL_WORKERS=0: 6.74s user, 0.72s system, 150% cpu, 4.948 total
PARCEL_WORKERS=1: 6.88s user, 0.72s system, 155% cpu, 4.893 total
PARCEL_WORKERS=2: 8.42s user, 1.04s system, 188% cpu, 5.011 total
PARCEL_WORKERS=3: 7.58s user, 0.95s system, 190% cpu, 4.467 total
PARCEL_WORKERS=4: 7.75s user, 0.98s system, 191% cpu, 4.571 total

Windows 10 (much older 4 core i5):

PARCEL_WORKERS=0: 6.78s
PARCEL_WORKERS=1: 6.56s
PARCEL_WORKERS=2: 6.78s
PARCEL_WORKERS=3: 6.68s
PARCEL_WORKERS=4: 6.82s

not sure why it's slower in your case. Tested with Yarn & pnpm on macOS and Yarn on Windows.

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

Based on the profile I see a couple things:

  1. Was this profile done without workers? I only see the "Master" thread.

No. The worker number was set to 8.

  1. It looks like a majority of the time (58%) was spent in cssnano, specifically loading (requiring) a preset. This seems extremely excessive to me.

I have only one small less file! I am not sure what minification it is doing there!
https://github.com/aminya/solid-simple-table/blob/master/src/SimpleTable.less

It seems it spent a ton of time waiting for I/O, as you can see based on the time spent in open, lstat, and realpathSync. Not sure what would explain that.

Yes. The Disk IO seems the bottleneck here. It is irrelevant to the CPU.

@aminya
Copy link
Contributor

aminya commented Jan 9, 2021

not sure why it's slower in your case. Tested with Yarn & pnpm on macOS and Yarn on Windows.

I used Powershell for timing. Parcel itself reports almost 4 seconds less!

√ Built in 6.07s

@ranisalt
Copy link
Contributor

Just to add some extra data, on my Ryzen 7 4800H with 16 GB of RAM, running parcel@2.0.0-nightly.535 on node v15.6.0 takes:

  • 1 worker: 50.56s, 1994M resident memory
  • 2 workers: 33.80s, 1926M
  • 3 workers: 30.80s, 2723M
  • 4: 31.88s, 2915M
  • 8: 34.88s, 4426M
  • 16: fills up my RAM, proceeds to fill up my swap and annihilate my computer

I did the 3 workers test after testing with 2, 4 and 8 just to check for any potential sweet spot.

I noticed that rebuilding with a non-empty dist folder (that is, serve, kill, then serve again without cleaning) is slower: 35.56s, with 2383M resident memory. However, parcel memory usage is as inconsistent as it gets, so I attribute it to sheer luck.

@devongovett
Copy link
Member

One interesting thing we haven't explored yet is whether this is specific to worker threads or whether it also applies to processes. Is there a single memory limit across all threads or is it per thread? This could affect the frequency of garbage collection. Could people in this thread also run it with the PARCEL_WORKER_BACKEND=process environment variable and compare the results with different worker counts?

@ranisalt
Copy link
Contributor

@devongovett with 4 workers, using process backend, it takes 35.73s (albeit I have more software running simultaneously) but the resident memory usage dropped to 834M

@devongovett
Copy link
Member

@aminya have you looked into why disk access on your machine is so slow? I don't think anyone else has been able to reproduce the cssnano issue. Are you running off a network drive by chance?

@aminya
Copy link
Contributor

aminya commented Jan 17, 2021

@aminya have you looked into why disk access on your machine is so slow? I don't think anyone else has been able to reproduce the cssnano issue. Are you running off a network drive by chance?

My drive is a fast SSD with high bandwidth. The issue is not my hardware.

image

require is slow and the more you have it, the more it punishes the user. CSSNano requires many files and Parcel's decision not to limit require calls by making a single file out of this huge library makes things worse.
#5671 (comment)

There is also the problem with not using realpath.native which makes some of the package managers like pnpm slower.

@devongovett
Copy link
Member

Sure, I agree re require being slow. But there still exists the question why it takes 3s on your machine and milliseconds on other machines so trying to get to the bottom of it...

@AndyOGo
Copy link

AndyOGo commented Feb 14, 2022

@devongovett @mischnic

Is there any update on this?
We observe high memory consumption too in contrast to parcel 1.

@mischnic
Copy link
Member

@AndyOGo Could you share more details about your situation (and ideally a reproduction)? Do you also have many cores like some of the other commenters above? Does setting PARCEL_WORKERS=4 yarn parcel build help?

@Lazerbeak12345
Copy link

Lazerbeak12345 commented Feb 24, 2022

can confirm heavy memory usage. A basic 200 line typescript file (32K in file-size according to du) that compiles in 2 seconds in tsc takes 15+ minutes and at least 4 GB Ram plus 8 GB swap. I've never once gotten parcel to compile this simple file as my computer all but crashes once all of its memory (swap plus hardware) becomes completely filled.

No special settings, config or anything. Exactly the getting-started demo, but with different typescript (typescript that tsc handles just fine). (Running fedora linux if that matters. I'm not using windows.)

(It's also really hard to kill. Spawns like 20-30 node processes. Note that my system is a meager 4 core Intel Core I5 mobile. Not some sorta threadripper. No reason to have that many workers for 200 lines of typescript on such low-grade hardware)

@mischnic
Copy link
Member

@Lazerbeak12345 Can you share your project (or some version of that typescript file which still causes this problem)?

@Lazerbeak12345
Copy link

Sure https://github.com/Lazerbeak12345/pixelmanipulator/tree/v5-alpha is the closest thing - but you're going to have to remove these files (and replace them with the appropriate matching files from the getting started guide):

  • package.json
  • yarn.lock
  • tsconfig.json
  • gulpfile.ts

Another note is that you might need to remove src/demo as recent changes to that branch have now included further typescript files that should not be included in the library build itself.

Sorry - I had actually planned on making a separate branch to recreate this easier, but I don't have much time on my hands these days.

Alternatively, as the typescript file is (currently) completely standalone, copying the content of src/lib/pixelmanipulator.ts into the demo project (with typescript adjustments) should recreate it just fine as well.

I suspect that this might not actually be easy to recreate without hardware as old as mine though (more-or-less factory Lenovo ThinkPad T420 but with a more-or-less factory Fedora Linux 35 WS).

@mischnic
Copy link
Member

mischnic commented Mar 1, 2022

Yeah, it works for me on macOS (also 4 core i5). But I'm also not sure if this is because I didn't modify your repo correctly or if it's actually caused by some hardware/OS difference.

@Lazerbeak12345
Copy link

I'll make a branch with the specific changes made then and link it here. This might take some time.

@Lazerbeak12345
Copy link

I'm actively working on this right now, and here's an interesting finding: If I provide yarn parcel build an entry point, this issue does not happen, but if i run it without (yet with entry points provided in the package.json) the issue still happens. I'll post further information later.

@mischnic
Copy link
Member

I'm guessing this is caused by a bad entry root/project root calculation then:

let entryRoot = getRootDir(entries);
let projectRootFile =
(so that root is too high up in your FS and it does some unnecessary stuff then causing the memory usage)

Which should get fixed by #7537

@Lazerbeak12345
Copy link

Lazerbeak12345 commented Mar 18, 2022

Alright, I've figured it out. (I actually solved it half an hour from my last post, but was offline so couldn't post this).

@mischnic Your hunch was correct.

Parcel seems to search for the "root" from / to $(pwd), looking for a file to indicate what package manager to use when auto-installing things. The problem was because this file was present: ~/yarn.lock. For my purposes, I don't need a lockfile of any sort in my ~ so I removed it, and since then, parcel has worked for me.

I don't really like that that was the problem - I expected parcel to use a depth-first search for the "root," as this is the behavior of node when searching for a package in a node_modules folder. (If node_modules isn't in the current folder or if the package is not present, try ../ until ../ is the same as ./).

@Lazerbeak12345
Copy link

Lazerbeak12345 commented Mar 18, 2022

It is also interesting to note, however, that while this is resolved for me - it still used a ton of threads and ran out of memory.

This implies that parcel might fail on huge projects.

Running parcel build in such a large directory should still build (even if, as in my case, that wasn't the intended result). I suspect that each thread (or group of threads) is associated with one or more files. In the case of my light hardware, it'd be better to use a queue system and postpone files when the maximum threads have been reached.

Also unexpected, was that --log-level verbose didn't say what directory it believed the root to be. I learned that from an odd error message that I eventually triggered by accident. (this verbosity thing perhaps warrants its own issue, if one doesn't exist for it yet)

@jpcaparas
Copy link

The PARCEL_WORKERS=1 directive did wonders for me on my Macbook M1. I am currently running Parcel v1 and my machine would stutter to the point of being unusable whenever the serve process started bundling my changes.

@lorenzogrv
Copy link

On our case, parcel was failing to build for ARM64 on a Graviton2 instance due to insane memory usage.

  • We did not test the PARCEL_WORKERS=1 workaround. We want parcel to do parallel work if it can
  • We did not test the PARCEL_WORKER_BACKEND=process mitigation
  • Specifying the entry point within the build command (parcel build src/index.html instead of parcel build) worked like a charm. It also builds way faster (about 50 seconds)

NOTE: We had to disable image optimization as it's currently broken on ARM, but see #8790 and #7402

@lorenzogrv
Copy link

We have again the problem with memory usage, even specifying the entry point.

We tried PARCEL_WORKERS=1 without luck.

Still researching a workaround, can't build on arm64 (aarch65) natively now

@mischnic
Copy link
Member

Does really fail only on ARM64 and other archs build fine with lower memory usage?

@lorenzogrv
Copy link

Our build works fine on amd64, abnormal memory usage happens under arm64 (native build using an AWS EC2 Graviton2-based instance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet