-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility for repairs to never be triggered #264
Comments
Possible TestCase:
But if you're running in a container you can just execute
Some considerations: I believe this case of ecChronos restarting during the interval is valid, but I don't see how all ecChronos instances could be always restarting, if one is, others will be running repairs on their nodes and the repair_history will receive data. |
where can this jar be found? in my environment it does not exist epkdaek@elx721027t9: epkdaek@elx721027t9: |
should the restart suggested to trigger the scenario be the same as if ecc is started via ./bin/ecctool start -f and the stopping it via ctrl-C and then starting it again via the same command? and I assume the changing from days to minutes is in /conf/ecc.yml |
Yes, I figure out that it really does not exist, you can try in the way you've suggested. |
After discussion with the author this is how to reproduce the issue/bug. run "ecctool schedules" and notice "completed at" and "next repair" epkdaek@elx721027t9:~/cassandra/ecchronos-binary-5.0.0-SNAPSHOT$ ./bin/ecctool schedules
|
The repair jobs seems to execute just fine. I have changed the repair schedule to 10 min in ecc.yml before starting. epkdaek@elx721027t9:~/cassandra/ecchronos-binary-5.0.0-SNAPSHOT$ ./bin/ecctool start -f . ____ _ __ _ _ 11:46:28.754 [main] INFO c.e.b.c.e.a.spring.SpringBooter - Starting SpringBooter using Java 11.0.21 on elx721027t9 with PID 1066218 (/home/epkdaek/cassandra/ecchronos-binary-5.0.0-SNAPSHOT/lib/application-5.0.0-SNAPSHOT.jar started by epkdaek in /home/epkdaek/cassandra/ecchronos-binary-5.0.0-SNAPSHOT) |
Since ecchronos always assumes a repair is successful if the history is empty, I don't see why this would be considered a bug if ecchronos is crashed/restarted before the interval is reached - once every interval. Sounds to me the actual bug to investigate is why ecchronos crashes/restarts all the time. ;-) |
Whether you consider this a bug or enhancement doesn't matter. This is a scenario that can occur in real world and you won't even get any alarms since ecChronos still thinks everything is repaired because repair history is empty. |
What is the expected behaviour? 12:21:13.488 [RepairScheduler-0] INFO c.e.b.c.e.c.r.state.RepairStateImpl - Assuming the table is new, next repair . |
@masokol
|
Yes, and that's maskol's point; e.g. the next repair date will keep moving forward if history is empty and ecchronos is restarted. |
From what i've understood the assumption that everything is repaired if repair history is empty was made to avoid option 1. So maybe option 2 or some completely new solution. One could argue that having an explicit delay, like repair_delay for each schedule might be a better option. Anyway, whichever solution you choose, i think ecChronos should know if it's the first time it's starting or not. |
The working assumption that was decided is that when a new table is found ecchronos should update the histroy that a repair has been done now without doing the repair so there is history information the next time it starts |
more to come
ECCTOOL_EXAMPLES.md remains to be updated
ECCTOOL_EXAMPLES.md remains to be updated
Since ecchronos assumes tables are repaired when there's no repair history, it's possible that repairs will never be triggered if ecchronos restarts/crashes once every repair interval before repair is actually triggered.
The text was updated successfully, but these errors were encountered: