Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BP-63: Force Reschedule Auditor tasks #4025

Open
wenbingshen opened this issue Jul 10, 2023 · 1 comment
Open

BP-63: Force Reschedule Auditor tasks #4025

wenbingshen opened this issue Jul 10, 2023 · 1 comment
Assignees

Comments

@wenbingshen
Copy link
Member

BP

This is the master ticket for tracking BP-63 :
Proposal PR - #3964

Motivation

Currently, the Bookie can reschedule Auditor check tasks in several ways, excluding the auditorBookieTask as it provides a separate mechanism to trigger task reexecution. This BP specifically discusses AuditorCheckAllLedgersTask/AuditorPlacementPolicyCheckTask/AuditorReplicasCheckTask:

1: The Bookie provides three execution times based on ZooKeeper, checkallledgersctime/placementpolicycheckctime/replicascheckctime. By updating these execution times, we can dynamically adjust the execution frequency of auditor tasks, but it requires restarting the Auditor process or reopening the Auditor election to trigger task execution.

2: By using the ForceAuditorChecksCmd tool, which is still based on the underlying logic of the first point, restarting the Auditor or performing an election is also necessary to trigger task execution.

3: The Decommission and RecoveryBookie tools tend to focus on executing recovery logic and only check and recover a specific subset of Bookie services.

The above methods are complex and have poor stability when rescheduling the Auditor check tasks in a cluster.

Proposal

Therefore, I propose further optimizing the rescheduling of Auditor tasks.

1: The Auditor monitors the persistent znode path /ZK_LEDGERS_ROOT_PATH/underreplication/scheduleAuditor.
2: Users modify the task ctime using the ForceAuditorChecksCmd tool and forcefully create the above znode path using the force parameter.
3: The Auditor creates callbacks through scheduleAuditor to reschedule the aforementioned three tasks.
4: After the Auditor completes rescheduling the tasks, the scheduleAuditor node is deleted.
5: When the Auditor starts, it deletes the old scheduleAuditor node to avoid logical confusion.

This way, we can trigger the scheduling and execution of Auditor tasks through an online interface without relying on service restart or re-election.

Compatibility, Deprecation, and Migration Plan

There are no compatibility issues. This BP introduces a new trigger flag that does not affect the original logic and does not involve any changes to other existing public APIs. There is no deprecation or migration plan.

@dlg99
Copy link
Contributor

dlg99 commented May 30, 2024

Does this new functionality do anything that is not covered by REST API?
https://bookkeeper.apache.org/docs/4.10.0/admin/http/#endpoint-apiv1autorecoverytrigger_audit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants