Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the atomicity of the language definition cache #605

Closed
Xophmeister opened this issue Aug 24, 2023 · 3 comments · Fixed by #583
Closed

Fix the atomicity of the language definition cache #605

Xophmeister opened this issue Aug 24, 2023 · 3 comments · Fixed by #583
Assignees
Labels
P2 major: an upcoming release type: bug

Comments

@Xophmeister
Copy link
Member

Describe the bug
The language definition cache (introduced in #583) is designed to cache the necessary components for the Topiary API for a given input file type. This has been implemented to be thread-safe, as it is used concurrently, with a RwLock: First the read lock is acquired, to retrieve the cache contents if it exists; otherwise, the read lock is released and a write lock is acquired to update the cache and return the appropriate value.

This works and is thread-safe, but it doesn't exhibit the desired behaviour. When multiple consumers of the cache (in our case, async futures) query the cache more-or-less simultaneously, they (almost) all bypass the read branch of the code -- because the cache is empty -- and hit the write branch. This flaunts the point of the cache to reduce writes (which involves disk IO to produce the value).

To Reproduce

  1. Create a directory named, for example, test. In that directory, create a simple JSON file and reproduce it 200 times:

    $ mkdir test
    $ jq -n "{test:123}" > test/test0.json
    $ for i in $(seq 200); do cp test/test0.json test/test${i}.json; done
  2. Format the contents of this directory, with the logging level turned up, and search for Cache in the output:

    $ RUST_LOG=debug topiary fmt test 2>&1 | grep Cache

You will see 201 lines that almost all look like this:

[2023-08-24T09:58:20Z DEBUG topiary::language] Cache 0x7fffffff6af0: Insert at 0xb503fcb641e7c545 (json, json.scm)

You may get lucky and some of the lines will look like this:

[2023-08-24T09:58:20Z DEBUG topiary::language] Cache 0x7fffffff6af0: Hit at 0xb503fcb641e7c545 (json, json.scm)

Expected behavior
In the logging output, we should see the first line look something like this:

[2023-08-24T09:58:20Z DEBUG topiary::language] Cache 0x7fffffff6af0: Insert at 0xb503fcb641e7c545 (json, json.scm)

Then we should see 200 lines that look like this:

[2023-08-24T09:58:20Z DEBUG topiary::language] Cache 0x7fffffff6af0: Hit at 0xb503fcb641e7c545 (json, json.scm)

Environment

  • OS name + version: NixOS 23.05
  • Version of the code: 004c4c3

Additional context
The obvious solution would be to wrap the cache (a HashMap) in a Mutex, rather than a RwLock; i.e., lock the whole thing on access. Unfortunately, this isn't as straightforward as it seems, because we have to use an async scope to placate the borrow checker. That imposes a Send constraint, which is not satisfied by this simple change.

💡 This issue may conflict with the lazy loading effort (see XXX).

@Xophmeister
Copy link
Member Author

@ErinvanderVeen Is there a branch/PR/issue for the lazy loading, so we can cross-reference it here?

@Xophmeister
Copy link
Member Author

Tokio's Mutex may be the fix to the problem...

@Xophmeister Xophmeister self-assigned this Aug 24, 2023
@Xophmeister
Copy link
Member Author

...it totally worked 🎉

@Xophmeister Xophmeister added the P2 major: an upcoming release label Aug 24, 2023
@Xophmeister Xophmeister mentioned this issue Aug 25, 2023
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 major: an upcoming release type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant