Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidate: disable vfull duplicate job check #1739

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

SamuelBoerlin
Copy link
Contributor

@SamuelBoerlin SamuelBoerlin commented Mar 18, 2024

Thank you for contributing to the Bareos Project!

This PR sets IgnoreDuplicateJobChecking = true for consolidate vfulls (i.e. just like migration/copy jobs).
Currently consolidate vfulls also take part in the "Allow Duplicate Job" logic which can end up causing other jobs to be cancelled. In my opinion consolidation should not affect other jobs like that, and I can't currently think of a case where you would want it to.
I ran into this problem when my incremental jobs were being cancelled due to a long running consolidation job: https://groups.google.com/g/bareos-users/c/iqx4JSjSBxE

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
    Check backport line
    Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR
Tests
  • Decision taken that a test is required (if not, then remove this paragraph)
  • The choice of the type of test (unit test or systemtest) is reasonable
  • Testname matches exactly what is being tested
  • On a fail, output of the test leads quickly to the origin of the fault

@sebsura
Copy link
Contributor

sebsura commented Mar 28, 2024

Are you still working on this PR as its labeled as draft ?

@SamuelBoerlin
Copy link
Contributor Author

Are you still working on this PR as its labeled as draft ?

Yes. I wanted to add systemtests to ensure that ignoreduplicatecheck is set for consolidate/migrate/copy jobs and works as intended, but haven't had the time yet.

@SamuelBoerlin SamuelBoerlin marked this pull request as ready for review April 10, 2024 13:55
@SamuelBoerlin
Copy link
Contributor Author

Hi @sebsura, the PR would now be ready for review whenever you have time.

@sebsura sebsura removed the draft label Apr 15, 2024
@sebsura
Copy link
Contributor

sebsura commented Apr 23, 2024

While this fixes the problem that you cannot start a virtual full job while another normal job is running, this does not fix the reverse: if you have a virtual full running, you still cannot start a normal job.

I think the best approach to fix the second issue is to ignore jobs that ignore duplicates when checking for duplicates.

Do you want to try to fix this ?

@SamuelBoerlin
Copy link
Contributor Author

While this fixes the problem that you cannot start a virtual full job while another normal job is running, this does not fix the reverse: if you have a virtual full running, you still cannot start a normal job.

I think the best approach to fix the second issue is to ignore jobs that ignore duplicates when checking for duplicates.

Do you want to try to fix this ?

Thanks for taking a look!

Hm, not quite sure I follow. From my understanding IgnoreDuplicateJobChecking already goes both ways, no?
If it is set then the job is ignored during the duplicate checks: https://github.com/bareos/bareos/blob/master/core/src/dird/job.cc#L885-L904

What you're describing is what I'm testing in the added system tests: a consolidate job is started and then a normal job is started while the consolidate virtual full is still running. Usually the normal job would get cancelled. But after this change it is now no longer cancelled.

@SamuelBoerlin
Copy link
Contributor Author

Just realized that this change might actually cause an issue if you have consolidate jobs that are running for a long time. Currently, if you have a still running consolidate VF and then a new duplicate consolidation VF is started it would be cancelled if you have something like this:

  Allow Duplicate Jobs = no
  Cancel Lower Level Duplicates = yes
  Cancel Queued Duplicates = no
  Cancel Running Duplicates = no

After this change this would of course no longer be the case because duplicate job checking is disabled. I guess duplicate consolidations could still be mitigated with setting MaxConcurrentJobs = 1 in the Consolidate job.

It would probably be better to prevent duplicate/conflicting consolidation jobs in the first place, though. What do you think?
We could perhaps check at this point here https://github.com/bareos/bareos/blob/master/core/src/dird/consolidate.cc#L276 whether there is already another always-incremental VF job running with overlapping vf_jobids.

@sebsura
Copy link
Contributor

sebsura commented Apr 24, 2024

Hm, not quite sure I follow. From my understanding IgnoreDuplicateJobChecking already goes both ways, no?

You are right!

We could perhaps check at this point here https://github.com/bareos/bareos/blob/master/core/src/dird/consolidate.cc#L276 whether there is already another always-incremental VF job running with overlapping vf_jobids.

Ill have to think about that for a bit. The code is currently not set up in a way where you can inspect another jobs vf_jobids, e.g. there is no lock protecting this member, and this might fail if vf_jobids is null (which is a special case, see GetVfJobids() in vbackup.cc) or if two of these jobs are started almost at the same time.

@sebsura
Copy link
Contributor

sebsura commented Apr 29, 2024

I would suggest that we check inside AllowDuplicateJob() whether there are any other always incremental jobs that "run" on the same client and have the same fileset (dir_impl->res.client/fileset).

@SamuelBoerlin
Copy link
Contributor Author

I've now also added a systemtest for the duplicate consolidation job cancellation.

@SamuelBoerlin
Copy link
Contributor Author

Ah whoops, only just now saw that the always-incremental-consolidate tests have been rewritten. I'll rebase the changes.

@SamuelBoerlin SamuelBoerlin force-pushed the consolidate-ignoreduplicatecheck branch from 9aba2a4 to 5f2b9c1 Compare May 21, 2024 10:03
@sebsura
Copy link
Contributor

sebsura commented May 23, 2024

The changes look good. I split up your test as well. There is one thing im currently looking into and afterwards ill push my changes.

@sebsura sebsura force-pushed the consolidate-ignoreduplicatecheck branch from 3207b4c to facd0bc Compare May 23, 2024 12:06
@sebsura
Copy link
Contributor

sebsura commented May 23, 2024

I squashed the fixup commits as well as fixing the copyright year on your new test (it was 2021-2024 before).
Let me know what you think of the split up test. Otherwise I would be happy to get this merged.

@SamuelBoerlin
Copy link
Contributor Author

Looks good to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants