Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to remove old commits #681

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Provide a way to remove old commits #681

wants to merge 19 commits into from

Conversation

minwoox
Copy link
Member

@minwoox minwoox commented Feb 23, 2022

Motivation:
Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history,
Central Dogma will eventually get in trouble managing disk usage.
We can handle this by maintaining the primary and secondary Git repositories internally.
This works in this way:

  • Commits are pushed to the primary Git repository.
  • If the number of commits exceeds the threshold (minRetentionCommits), then the secondary Git repository is created.
  • Commits are pushed to both primary and secondary Git repositories.
  • If the secondary Git repository has the number of commits more than the threshold;
    • The secondary Git repository is promoted to the primary Git repository.
    • The primary Git repository is removed completely.
    • Another secondary Git repository is created.
  • Back to the third.

Modifications:

  • Add CreateRollingRepositoryCommand that creates the rolling repository by the CommitRetentionManagementPlugin.
  • Add GitRepositoryV2 that manages the rolling jGit repositories.
    • The name of internal repositories will be: foo_0000000000, foo_0000000001, foo_0000000002 and so on
      • So that we can only store the suffix of the primary repo to maintain the repository. (i.e. We don't have to maintain the directory information of the secondary repo.)
      • RepositoryMetadataDatabase has the suffix in its file database.
    • GitRepository is not removed for the migration test.
  • Add InternalRepository that has jGit repository and CommitIdDatabase.
  • What happens if the revision of an operation(such as diff, watch, history, etc.) is lower than the firstRevision of the current primary repo?
    • If Revision.INIT(1) is used, the firstRevision is used instead.
      • e.g. diff(INIT, new Revision(100) ...) is equals to diff(new Revision(firstRevisionNumber), new Revision(100) ...)
    • If the Revision between Revision.INIT(1) and the firstRevision is used, a RevisionNotFoundException is raised.
      • except watch and findLatestRevision.

Result:

Todo:

  • Provide a way to set minRetentionCommits and minRetentionDay for each repository.
  • Unused internal repositories are removed by purging service.
  • Support mirroring from CD to external Git.

Motivation:
Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history,
Central Dogma will eventually get in trouble managing disk usage.
We can handle this by maintaing the primary and secondary Git repository internally.
This works in this way:
1 Commits are pushed to the primary Git repository.
2 If the number of commits exceed the threshold (`minRetentionCommits`), then the secondary Git repository is created.
3 Commits are pushed to the both primary and secondary Git repositories.
4 If the secondary Git repository has the number of commits more than the threshold;
  - The secondary Git repository is promoted to the primary Git repository.
  - The primary Git repository is removed completely.
  - Another secondary Git repository is created.
5 Back to 3.

Modifications:
- TBD

Result:
- Close 575
- TBD

Todo:
- Provide a way to set `minRetentionCommits` and `minRetentionDay` for each repository.
- Support mirroring from CD to external Git.
@codecov
Copy link

codecov bot commented Feb 23, 2022

Codecov Report

Patch coverage: 70.19% and project coverage change: -2.17% ⚠️

Comparison is base (079daeb) 65.69% compared to head (84cb6cd) 63.53%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #681      +/-   ##
============================================
- Coverage     65.69%   63.53%   -2.17%     
- Complexity     3351     3473     +122     
============================================
  Files           358      366       +8     
  Lines         13893    15093    +1200     
  Branches       1496     1687     +191     
============================================
+ Hits           9127     9589     +462     
- Misses         3916     4591     +675     
- Partials        850      913      +63     
Files Changed Coverage Δ
...ecorp/centraldogma/server/CentralDogmaBuilder.java 53.38% <0.00%> (-0.82%) ⬇️
...orp/centraldogma/server/CommitRetentionConfig.java 0.00% <0.00%> (ø)
...p/centraldogma/server/command/AbstractCommand.java 47.61% <ø> (ø)
.../linecorp/centraldogma/server/command/Command.java 74.35% <0.00%> (-13.52%) ⬇️
...server/command/CreateRollingRepositoryCommand.java 0.00% <0.00%> (ø)
...ogma/server/command/StandaloneCommandExecutor.java 82.25% <0.00%> (-3.46%) ⬇️
...internal/storage/DirectoryBasedStorageManager.java 63.31% <ø> (ø)
...ver/internal/storage/repository/CacheableCall.java 65.21% <ø> (ø)
...internal/storage/repository/RepositoryWrapper.java 22.85% <0.00%> (-2.15%) ⬇️
...al/storage/repository/cache/CachingRepository.java 92.36% <0.00%> (-2.95%) ⬇️
... and 16 more

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@trustin
Copy link
Member

trustin commented Mar 8, 2023

If Revision.INIT(1) is used, the firstRevision is used instead.

This somewhat sounds like a surprising behavior to me. This means a diff command that once worked as expected starts to return completely different result once rolling occurs.

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

@aidanbae
Copy link

This issue seems to be a necessary feature to keep the cluster running reliably.

@minwoox
Copy link
Member Author

minwoox commented Aug 24, 2023

@aidanbae Sorry for making you wait for this. I'm working on it now. Please stay tuned.

@minwoox
Copy link
Member Author

minwoox commented Aug 25, 2023

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

That makes sense. Let me apply that approach.

minwoox added a commit to minwoox/centraldogma that referenced this pull request Aug 28, 2023
Motivation:
Before we apply line#681 that removes old commits,
it would be nice if we can find any usages that access data with the INIT Revision or
revisions that are more than 5000 (default minimum number of commit retentions) commits ahead from the head.

Modifications:
- Add a counter that is incresed when INIT Revision or revisions that are more than 5000 ahead from the head are used.

Result:
- Track the number of usages.
- This commit will be reverted after we find the usage and fix it.
@minwoox
Copy link
Member Author

minwoox commented Aug 29, 2023

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

Had a chat with the team, and we decided to raise an exception instead of treating it as an empty state.

This PR is ready, so PTAL. 😉

minwoox added a commit that referenced this pull request Sep 4, 2023
Motivation:
Before we apply #681 that removes old commits, it would be nice if we could find any usages that access data with the INIT Revision or revisions that are more than 5000 (default minimum number of commit retentions) commits ahead from the head.

Modifications:
- Add a counter that is increased when INIT Revision or revisions that are more than 5000 ahead from the head are used.

Result:
- Track the number of usages.
- This commit will be reverted after we find the usage and fix it.
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Left some minor questions 🙇

requireNonNull(initialRevision, "initialRevision");
checkArgument(minRetentionCommits > 0, "minRetentionCommits: %s (expected: > 0)",
minRetentionCommits);
checkArgument(minRetentionDays > 0, "minRetentionDays: %s (expected: > 0)", minRetentionDays);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) isn't it possible for minRetentionDays to be 0?

Suggested change
checkArgument(minRetentionDays > 0, "minRetentionDays: %s (expected: > 0)", minRetentionDays);
checkArgument(minRetentionDays >= 0, "minRetentionDays: %s (expected: >= 0)", minRetentionDays);

this.parent = requireNonNull(parent, "parent");
originalRepoName = requireNonNull(repositoryDir, "repositoryDir").getName();
this.repositoryWorker = requireNonNull(repositoryWorker, "repositoryWorker");
requireNonNull(author, "author");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can remove the assignment at L171

Suggested change
requireNonNull(author, "author");
this.author = requireNonNull(author, "author");

}
}

private int applyChanges(@Nullable Revision baseRevision, @Nullable ObjectId baseTreeId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would honestly be very helpful if these kind of methods were deduped instead of copy-pasted so that we don't have to review them again 😅

Anyways, let me just assume that these methods were copy-pasted since I don't think it's realistic to invest more time into looking in detail through this file

Comment on lines +253 to +255
if (!primaryRepoDir.mkdirs()) {
throw new StorageException("failed to create " + primaryRepoDir + " while migrating to V2.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be safer to do this before L249 so that we run less of a risk. (If this step fails, then repoDir doesn't exist and only tmpRepoDir exists at this point)

continue;
}

final boolean fetchContent = FindOption.FETCH_CONTENT.get(options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can compute this once at L441

this.headRevision = res.revision;
final InternalRepository secondaryRepo = this.secondaryRepo;
if (secondaryRepo != null) {
assert headRevision.equals(secondaryRepo.headRevision());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Can this be guaranteed?

For instance, if a CentralDogma server is shut down forcefully due to a jvm crash before the secondaryRepo.commit is invoked, will the server be recoverable? or do we need to manually delete the secondaryRepo folder?

This scenario could also be possible if secondaryRepo == null while a commit is occurring. (due to secondary repo rolling)

Rather, would a mechanism which catches up to the primary repo be a better idea?

assert headRevision.equals(secondaryRepo.headRevision());

// Push the same commit to the secondary repo.
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a StorageException is thrown for whatever reason (e.g. insufficient disk), should the overall call to the commit also fail?

Suggested change
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis,
try {
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis,
} (catch ...) {}

Comment on lines +158 to +159
primarySuffix = newPrimarySuffix;
writeSuffix(newPrimarySuffix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should update the suffix in case writeSuffix fails (primaryRepoDir will return the wrong value)

Suggested change
primarySuffix = newPrimarySuffix;
writeSuffix(newPrimarySuffix);
writeSuffix(newPrimarySuffix);
primarySuffix = newPrimarySuffix;

}

@Override
public void createRollingRepository(Revision rollingRepositoryInitialRevision,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) I'm assuming that we don't have a way to synchronize two CreateRollingRepositoryCommand being issued on a single repository simultaneously. (and judging from the current implementation I don't think it should be allowed)

Do you think this scenario is impossible? or do we introduce a separate lock for this command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to remove the old commits
4 participants