Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change ASAR archive cache to per-process to fix leak #29293

Merged
merged 5 commits into from
Jun 4, 2021

Conversation

dsanders11
Copy link
Member

@dsanders11 dsanders11 commented May 23, 2021

Description of Change

Close #29292.

This PR changes the ASAR archive cache in shell/common/asar/asar_util.cc to be process-wide instead of in thread-locals to prevent the asar::Archive from being leaked on worker thread exit. See issue #29292 for the technical details about the bug.

Changing the cache to being process-wide means that thread-safety had to be added to asar::Archive and asar::GetOrCreateAsarArchive. This is (unfortunately?) accomplished with locks, as asar::GetOrCreateAsarArchive is used in both async (AsarURLLoader) and sync (NativeImage) contexts. I know in Chromium locks are frowned upon, but using them here allows the code to continue to be used where it is currently in synchronous code.

Other changes include making asar::IsDirectoryCached thread-safe, where it wasn't before, and removing all current usages of asar::ClearArchives since it was being used in spots that weren't really accomplishing anything (when processes are being torn down anyway). asar::ClearArchives was left in the code base though for potential future use in manually clearing the cache. The asar::Archive::header method was also removed since it is not currently used, and it could be used to modify the parsed header which would make asar::Archive not thread-safe.

Alternatives Considered

Quite a few alternatives were considered, in an attempt to avoid locking for thread-safety, in no particular order:

Use base::ThreadLocalOwnedPointer To Fix Leak

This was initially attempted as the most concise fix for the leak, as it would destruct the cache on thread exit. However, this change led to a crash on thread exit, through the File::Close code path. This is due to another thread-local storage in the Chromium code base, which appears to have already been destructed by the time the archive cache is being destructed, but the code in File::Close needs to use that other thread-local storage. Ensuring the order of destruction of disconnected thread-local storages seems fragile and bug prone. Per-thread caches are also redundant, and make it very difficult to clear the cache.

Eliminate Archive Cache All Together

The bug could be fixed by simply eliminating the archive cache and letting each interaction with an ASAR archive load and parse the header. However, that parsing of the header can be rather slow, in my testing I was seeing ~40ms with an ASAR archive header which was about 850KB. Eliminating the cache would lead to a lot of redundant processing, and more resource thrash since each request would temporarily store the parsed header in memory, and would open a new file handle.

Don't Use Archive Cache for Sync Code Paths, Use A Sequence Instead of Locks

This has the same shortcomings as eliminating the cache, but isolated to sync code (like NativeImage). The sync code path is accessed from JavaScript, so it likely currently benefits from the cache consistently since all access will be on the same thread. As such, not using the archive cache would be a performance regression for those code paths.

Use base::SequenceLocalStorageSlot Instead of Thread-Local Storage

base:: SequenceLocalStorageSlot is like thread-local storage, but it's per-sequence. That means all worker threads using the same sequence would get the same archive cache, and it would be destructed when the sequence was destructed. This approach isn't viable because AsarURLLoader runs on a new sequence for each request, and running all the loaders on the same sequence could be a bottleneck.

cc @nornagon, @zcbenz

Checklist

Release Notes

Notes: Fixed memory leak when requesting files in ASAR archive from renderer

@electron-cation electron-cation bot added the new-pr 🌱 PR opened in the last 24 hours label May 23, 2021
@zcbenz zcbenz added semver/patch backwards-compatible bug fixes target/12-x-y labels May 24, 2021
@electron-cation electron-cation bot removed the new-pr 🌱 PR opened in the last 24 hours label May 24, 2021
@deepak1556
Copy link
Member

Nice analysis of the issue!

Putting aside the process wide archive cache solution for a moment, regarding the ThreadLocalOwnedPointer crash, isn't it a bug in the chromium code to use the tls value that has already been freed ?

The call to File::Close runs a ScopeBlockingCall which relies on some tls keys and the one we are interested for the crash is the BlockingObserver . This observer slot value is basically a delegate to the worker thread which is filled up when the worker thread is run and is also owned by the worker thread, however the slot value is not cleared when the worker thread exits. I think ThreadGroupImpl::WorkerThreadDelegateImpl::OnMainExit should make a call to ClearBlockingObserverForCurrentThread() inorder to avoid the bad access.

@dsanders11
Copy link
Member Author

Putting aside the process wide archive cache solution for a moment, regarding the ThreadLocalOwnedPointer crash, isn't it a bug in the chromium code to use the tls value that has already been freed ?

It does seem like one since it's a bad access crash, yea.

I think ThreadGroupImpl::WorkerThreadDelegateImpl::OnMainExit should make a call to ClearBlockingObserverForCurrentThread() in order to avoid the bad access.

I've added that change locally, and it does indeed fix the bad access crash on thread exit. (Maybe submit that fix to Chromium?)

Interestingly enough, while using base::ThreadLocalOwnedPointer along with that Chromium patch does fix the leak, the memory usage remains at a higher level in my testing than it does with the change in this PR. I would have expected a little more memory usage since there could be multiple copies of the cache, but given that the worker pool dwindles down to two workers when idle, I wouldn't have thought the extra memory usage would be much more. So, initially unclear why the memory usage differs noticeably between the two after running requests. (Even with either fix there's still a slow memory usage growth when running batches of 1500 requests, but it's very small compared to this leak).

Here's some data on memory usage, first table is the fix in this PR, second table is using base::ThreadLocalOwnedPointer with the Chromium patch. Each iteration was recorded once the total number of threads hit 27-29. The zero iteration is with dev tools open to the console.

Iteration Memory Real Memory
0 43.3 MB 146.6 MB
1 60.8 MB 172.3 MB
2 60.9 MB 176.5 MB
3 61.6 MB 179.5 MB
4 61.6 MB 179.6 MB
5 61.7 MB 180.1 MB
Iteration Memory Real Memory Open File Handles to ASAR
0 44.4 MB 145.4 MB 0
1 89.9 MB 240.6 MB 2
2 86.3 MB 236.5 MB 1
3 91.8 MB 251.7 MB 1
4 94.0 MB 267.6 MB 1
5 89.7 MB 274.8 MB 1
Here's the `base::ThreadLocalOwnedPointer` patch for completeness.
diff --git a/shell/common/asar/asar_util.cc b/shell/common/asar/asar_util.cc
index 65ddbf284..f43d9340d 100644
--- a/shell/common/asar/asar_util.cc
+++ b/shell/common/asar/asar_util.cc
@@ -10,6 +10,7 @@
 #include "base/files/file_path.h"
 #include "base/files/file_util.h"
 #include "base/lazy_instance.h"
+#include "base/memory/ptr_util.h"
 #include "base/stl_util.h"
 #include "base/threading/thread_local.h"
 #include "base/threading/thread_restrictions.h"
@@ -21,7 +22,7 @@ namespace {
 
 // The global instance of ArchiveMap, will be destroyed on exit.
 typedef std::map<base::FilePath, std::shared_ptr<Archive>> ArchiveMap;
-base::LazyInstance<base::ThreadLocalPointer<ArchiveMap>>::Leaky
+base::LazyInstance<base::ThreadLocalOwnedPointer<ArchiveMap>>::Leaky
     g_archive_map_tls = LAZY_INSTANCE_INITIALIZER;
 
 const base::FilePath::CharType kAsarExtension[] = FILE_PATH_LITERAL(".asar");
@@ -41,7 +42,7 @@ bool IsDirectoryCached(const base::FilePath& path) {
 
 std::shared_ptr<Archive> GetOrCreateAsarArchive(const base::FilePath& path) {
   if (!g_archive_map_tls.Pointer()->Get())
-    g_archive_map_tls.Pointer()->Set(new ArchiveMap);
+    g_archive_map_tls.Pointer()->Set(base::WrapUnique(new ArchiveMap));
   ArchiveMap& map = *g_archive_map_tls.Pointer()->Get();
 
   // if we have it, return it
@@ -61,8 +62,9 @@ std::shared_ptr<Archive> GetOrCreateAsarArchive(const base::FilePath& path) {
 }
 
 void ClearArchives() {
-  if (g_archive_map_tls.Pointer()->Get())
-    delete g_archive_map_tls.Pointer()->Get();
+  if (g_archive_map_tls.Pointer()->Get()) {
+    g_archive_map_tls.Pointer()->Set(nullptr);
+  }
 }
 
 bool GetAsarArchivePath(const base::FilePath& full_path,

@deepak1556
Copy link
Member

I've added that change locally, and it does indeed fix the bad access crash on thread exit. (Maybe submit that fix to Chromium?)

Yup, it is a good candidate. Please give it a try in upstreaming.

So, initially unclear why the memory usage differs noticeably between the two after running requests.

Heap Profiler from chromium should help identify the difference here. You would need this change to start the profiler.

I do like the thread scoped archive map for its simple lifetime and lock free code in IO sections. But the process wide map does show its advantage in memory usage. I am inclined to either solutions, will wait for other maintainers to chime in.

@dsanders11
Copy link
Member Author

dsanders11 commented May 24, 2021

Yup, it is a good candidate. Please give it a try in upstreaming.

I was suggesting you be the one to ustream it. 😄 I just randomly stuck the call into ThreadGroupImpl::WorkerThreadDelegateImpl::OnMainExit on your advice to see if it worked, I think you've got a better handle on that part of the Chromium code base, I just know enough to hurt myself.

Heap Profiler from chromium should help identify the difference here. You would need this change to start the profiler.

Thanks for the info. I'm not inclined to spend any more time on this for now, I've already sunk a good amount of time into debugging the issue and making sure I fully understood it for the bug report write up. Anyone else, feel free, all the info needed and patches are here.

I do like the thread scoped archive map for its simple lifetime and lock free code in IO sections. But the process wide map does show its advantage in memory usage. I am inclined to either solutions, will wait for other maintainers to chime in.

I like the lock-free nature of the thread scope map, which is why I examined quite a few alternatives before settling on this PR's final form, I would have preferred avoiding the explicit locking.

However, as mentioned in the original comment, the lifetime of the threads in which the cache would be most useful (thread pool workers) significantly diminishes cache hits. The threads are constantly destroyed, meaning cache misses will happen often and the ASAR will need to be parsed again on a request, so you're not gaining anything from the cache in those situations. As the second table in my previous comment shows (open file handles are a proxy measure for how many caches in threads there are), after 30 or so seconds there's only one cache (I even saw it drop back to 0 at one point in testing) remaining, so a single request might benefit from the cache, but if a handful of requests (fairly common usage) are sent at once, only one would benefit from the cache, the others will all parse the ASAR, using memory and file handles in the meantime.

It's also important to consider the code paths which use this cache. The JavaScript land fs support for ASARs does use asar::Archive, but it doesn't use asar:: GetOrCreateAsarArchive, so it doesn't use this cache - instead it caches them in the same style (map with path as the key) in JavaScript land. The code paths which use this cache are the AsarURLLoader, and some sync code, namely nativeImage.createFromPath. I believe that last case would be executing on the main thread (correct me if I'm wrong), so it wouldn't be accessing the thread scope caches created by AsarURLLoader, and AsarURLLoader can't access the thread scope cache nativeImage.createFromPath creates.

Finally, the distributed nature of the thread scope cache makes it difficult (if not impossible?) to ever provide manual control over the caching. This appears to be a use case some developers want (see #28368, and node-fs/#876), because the open file handles to the ASAR file prevents them from doing certain things. With a centralized cache there's at least hope of providing a way to free those file handles.

Copy link
Member

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it does not affect fs module, there should be no performance punishment, I'm good with making the cache per-process.

Copy link
Member

@deepak1556 deepak1556 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Tested AsarURLLoader thread ticks before and after this change with batch of requests from relatively large asar file, PR didn't regress either sync or async scenario and timing is better with this change.

Trace code
diff --git a/shell/browser/net/asar/asar_url_loader.cc b/shell/browser/net/asar/asar_url_loader.cc
index 394d0f9db..3aca3e6ae 100644
--- a/shell/browser/net/asar/asar_url_loader.cc
+++ b/shell/browser/net/asar/asar_url_loader.cc
@@ -9,6 +9,7 @@
#include <utility>
#include <vector>

+#include "base/trace_event/typed_macros.h"
#include "base/strings/stringprintf.h"
#include "base/task/post_task.h"
#include "base/task/thread_pool.h"
@@ -90,6 +91,7 @@ class AsarURLLoader : public network::mojom::URLLoader {
            mojo::PendingReceiver<network::mojom::URLLoader> loader,
            mojo::PendingRemote<network::mojom::URLLoaderClient> client,
            scoped_refptr<net::HttpResponseHeaders> extra_response_headers) {
+    TRACE_EVENT_BEGIN("electron", "AsarURLLoader::Start");
   auto head = network::mojom::URLResponseHead::New();
   head->request_start = base::TimeTicks::Now();
   head->response_start = base::TimeTicks::Now();
@@ -264,6 +266,7 @@ class AsarURLLoader : public network::mojom::URLLoader {
 }

 void MaybeDeleteSelf() {
+    TRACE_EVENT_END("electron");
   if (!receiver_.is_bound() && !client_.is_bound())
     delete this;
 }

trace_logs.zip

@dsanders11
Copy link
Member Author

Great, thanks @deepak1556 for the extra testing with tracing. I'd like @nornagon to get a chance to comment when he gets a chance, since he was recently in this part of the code base with the ASAR memory-mapped file work.

@dsanders11
Copy link
Member Author

PR rebased and deleted some #include "shell/common/asar/asar_util.h" lines which are now unused.

@zcbenz zcbenz changed the title fix: change ASAR archive cache to per-process to fix leak (#29292) fix: change ASAR archive cache to per-process to fix leak Jun 1, 2021
shell/common/asar/archive.h Outdated Show resolved Hide resolved
shell/common/asar/archive.cc Outdated Show resolved Hide resolved
shell/common/asar/asar_util.cc Outdated Show resolved Hide resolved
@dsanders11
Copy link
Member Author

@nornagon, ready for re-review. 👍

@zcbenz zcbenz merged commit b1d1ac6 into electron:main Jun 4, 2021
@release-clerk
Copy link

release-clerk bot commented Jun 4, 2021

Release Notes Persisted

Fixed memory leak when requesting files in ASAR archive from renderer

@trop
Copy link
Contributor

trop bot commented Jun 4, 2021

I was unable to backport this PR to "12-x-y" cleanly;
you will need to perform this backport manually.

@trop
Copy link
Contributor

trop bot commented Jun 4, 2021

I have automatically backported this PR to "13-x-y", please check out #29535

@trop trop bot removed the target/13-x-y label Jun 4, 2021
@trop
Copy link
Contributor

trop bot commented Jun 4, 2021

I have automatically backported this PR to "14-x-y", please check out #29536

dsanders11 added a commit to dsanders11/electron that referenced this pull request Jun 5, 2021
…9293)

* fix: change ASAR archive cache to per-process to fix leak (electron#29292)

* chore: address code review comments

* chore: tighten up thread-safety

* chore: better address code review comments

* chore: more code review changes
@trop
Copy link
Contributor

trop bot commented Jun 5, 2021

@dsanders11 has manually backported this PR to "12-x-y", please check out #29548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch backwards-compatible bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Main Process Can Leak Memory/File Handles/Temp Files When Renderer Requests Files From ASAR
4 participants