Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat(gc): record workspace manifest and target dir in global cache tracker #13846

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

baby230211
Copy link
Contributor

@baby230211 baby230211 commented May 2, 2024

What does this PR try to resolve?

Fix #13136
This PR aims to track of the following data in GC for further garbage collection use.

  1. root-manifest path
  2. target directory path

There will be three new table for this features.

  1. worksapce_manifest_index table - index and record the root manifest path
id workspace_manifest timestamp
? /foo/Cargo.toml ?
  1. target_dir_index table - index and record the target_dir path
id target_dir timestamp
? /foo/target ?
  1. workspace_src table - combine the previous two table
id worksapce_id target_dir_id timestamp
? ? ? ?

How should we test and review this PR?

  1. Run compile (cargo run build or else)
  2. inspect your root gc file (~/.cargo/.global-cache)
  3. see if the three table is record correctly.

@rustbot
Copy link
Collaborator

rustbot commented May 2, 2024

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@baby230211 baby230211 marked this pull request as draft May 2, 2024 12:48
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2024
@rustbot rustbot added the A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. label May 2, 2024
@baby230211 baby230211 force-pushed the feat-13136 branch 3 times, most recently from 5f32f71 to 15e2863 Compare May 2, 2024 12:52
Comment on lines +328 to +350
basic_migration(
"CREATE TABLE workspace_manifest_index (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
timestamp INTEGER NOT NULL
)",
),
basic_migration(
"CREATE TABLE target_dir_index (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
timestamp INTEGER NOT NULL
)",
),
basic_migration(
"CREATE TABLE workspace_src (
workspace_id INTEGER NOT NULL,
target_dir_id INTEGER NOT NULL,
timestamp INTEGER NOT NULL,
PRIMARY KEY (workspace_id, target_dir_id),
FOREIGN KEY (workspace_id) REFERENCES workspace_manifest_index (id) ON DELETE CASCADE,
FOREIGN KEY (target_dir_id) REFERENCES target_dir_index (id) ON DELETE CASCADE
)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the motivation for

  • Tracking this as 3 separate tables
  • Having timestamps for workspace dir, target dir, and workspace dir + target dir

Comment on lines 1482 to 1523
/// New workspace manifest entries to insert.
workspace_db_timestamps: HashMap<WorkspaceManifestIndex, Timestamp>,
/// New target dir entries to insert.
target_dir_db_timestamps: HashMap<TargetDirIndex, Timestamp>,
/// New workspace src entries to insert.
workspace_src_timestamps: HashMap<WorkspaceSrc, Timestamp>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does db mean in this context?

I think for git

  • git_db is the .git
  • git_checkout is a work dir of a shared .git

Comment on lines 230 to 231
pub encoded_workspace_manifest_name: InternedString,
pub encoded_target_dir_name: InternedString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the others refer to "encoded" they are referring to how we translate package names to file names.

That doesn't quite make sense here and just naming these target_dir and workspace_manifest_path should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll rename it with target_dir_path and workspace_manifest_path for these naming.
Thanks.

/// Indicates the given [`WorkspaceManifest`] has been used right now.
///
/// Also implicitly marks the workspace manifest used, too.
pub fn mark_workspace_src_used(&mut self, workspace_src: WorkspaceSrc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I wonder why we exposed these in the API, rather than taking parameters. Seems like it creates more boilerplate for the callers

@@ -264,6 +264,16 @@ pub fn create_bcx<'a, 'gctx>(
HasDevUnits::No
}
};
let _ = &gctx.deferred_global_last_use()?.mark_workspace_src_used(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why let _ = &?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove let _ = &, rustc would be warning as follow

rustc: unused borrow that must be used (unused_must_use)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it locally and didn't get any message...

    gctx.deferred_global_last_use()?
        .mark_workspace_src_used(global_cache_tracker::WorkspaceSrc {
            workspace_manifest_name: InternedString::new(ws.root_manifest().to_str().unwrap()),
            target_dir_name: InternedString::new(
                ws.target_dir().as_path_unlocked().to_str().unwrap(),
            ),
        });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1632,6 +1734,32 @@ impl DeferredGlobalLastUse {
);
}

// Flushes all of the `target_dir_db_timestamps` to the database,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note: I've not yet dug into all of the DB operations

@baby230211 baby230211 force-pushed the feat-13136 branch 3 times, most recently from a1492d3 to 7d4fbbe Compare May 22, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collect whole target/
4 participants