-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete model #17208
Delete model #17208
Conversation
/build |
38d6323
to
4eb1e2c
Compare
9591666
to
77f18d4
Compare
77f18d4
to
3aaae75
Compare
3aaae75
to
1b729d5
Compare
// other models. | ||
modelUUID := coremodel.UUID(stModel.UUID()) | ||
|
||
// TODO (stickupkid): We need to delete the model info when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't make sense. In any case, could we not in this instance delete and recreate the table? I don't think anything has RI to it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we need to implement the life cycle for a model. We're currently going from Alive -> Dead
in the model manager, instead of Alive -> Dying -> Dead
@@ -221,12 +224,20 @@ func NewWorker(cfg WorkerConfig) (*dbWorker, error) { | |||
// that case we do want to cause the dbaccessor to go down. This | |||
// will then bring up a new dqlite app. | |||
IsFatal: func(err error) bool { | |||
// If a database is dead we should not kill the worker. | |||
if errors.Is(err, database.ErrDBDead) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this says that if we kill the worker with ErrDBDead
, the worker is not restarted and it is not fatal to us, the parent, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
a903456
to
4ec3cba
Compare
4ec3cba
to
97fe63c
Compare
The intermittent test failures are fixed by #17341 |
97fe63c
to
f16f741
Compare
/build |
f16f741
to
ec166c0
Compare
2597e30
to
d31391f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good as a progression. All QA is solid.
We do however have to be very careful regarding the proper progression of model life-cycle, and cleanup/teardown when we come to it.
A note in the risk register wouldn't hurt - we don't ever want this path recruited by accident or prematurely.
apiserver/shared.go
Outdated
@@ -49,6 +50,7 @@ type sharedServerContext struct { | |||
logger corelogger.Logger | |||
charmhubHTTPClient facade.HTTPClient | |||
dbGetter changestream.WatchableDBGetter | |||
dbDeleter database.DBDeleter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like a comment here regarding the fact that it ultimately only ends up in a migration context, and is under no circumstances to have expanded availability.
@@ -85,6 +88,7 @@ type ModelInfoService interface { | |||
// CreateModel is responsible for creating a new read only model | |||
// that is being imported. | |||
CreateModel(context.Context, uuid.UUID) error | |||
DeleteModel(context.Context) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment this. Say exactly what it does, which DB(s) it acts on.
The model migration service needs to be able to delete models if it fails to correctly import. Removal of the model is required so that another attempt can be made. This mostly involves threading the model deleter through the apiserver to pass to the model migartion coordinator.
As we can't drop a database, when a migration fails, we have to delete the contents of the database. This includes all the schema that was added to the database. The rationale for deleting the schema is that, if you migrate src:A to dst:A, but that fails and then you perform a upgrade to dst via a patch release, there isn't a guarantee that dst:A schema is completely updated. So removing everything leaves us with a blank state. There have been thoughts around aliasing tables instead of a clean slate, but this is fraught with danger if you accidently write to the wrong DB (i.e. the alias one). In addition to this, we also can't clear the DB in one query, as we get a segfault. The code batches up the statements, so we're not running a query for every statement, but should be enough to prevent a segfault. The way that we drop everything from the db needs to happen in this order to prevent constraint violations. In addition we ensure that the foreign key param is set to _false_ to also prevent any issues in ordering of deletions.
As the read-only model has some immutable triggers around it to prevent deletion and updates, we need to destroy it before deleting the DB.
The worker needs to be stopped upon deletion. We want to ensure it's dead. We don't want to allow the worker to spring back up. So having a sentinel error for when we want to delete a model propergate through the stack allows others to know when to give up.
Until we fully implement tearing down of a model, we can't currently remove a model until the very last moment. This code just makes it optional, so that we can state that during a model migration we do want to delete the model so we can retry. In addition, I've also fixed the dbaccessor to correctly terminate transactions based on if the tomb is dying. See context sourceable.
To prevent all trackers from polling the db at the same causing problems, introduce a jitter to give more random access patterns to the requests.
We need to remove the model info (read-only model), but we can't do it until everything has been removed from the database. Attempting to remove it early causes everything to lock up.
The comments weren't quite correct, improve them to better explain i.e. why we're tying the tomb errors to the context.
d31391f
to
cb4ccb4
Compare
cb4ccb4
to
cc890c0
Compare
/merge |
|
||
// If the db should not be deleted then we can return early. | ||
if !options.DeleteDB() { | ||
s.logger.Infof("skipping model deletion, model database will still be present") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need logging in here? Surely this would be better from the callers side and also updating the comment of the function to explain expected default behaviour.
My reason for asking is I am really not a fan of logging in the services layer. The services layer has nice well defined behaviour that is documented and strong errors. Logging can get pushed up a layer with all of that done I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this only exists because we can't delete a model outside of model migration in its current state. The key part we're missing is going from alive -> dying -> dead
. We should delete these options when models are correctly progressing through the life states.
As we can't drop a database when a migration fails, we have to
delete the contents of the database. This includes all the schema
that was added to the database. The rationale for deleting the schema
is that, if you migrate src:A to dst:A, but that fails, and then you
perform an upgrade to dst via a patch release, there isn't a guarantee
that dst:A schema is completely updated. So removing everything leaves
us with a blank slate.
There have been thoughts around aliasing tables instead of a clean
slate, but this is fraught with danger if you accidentally write to
the wrong DB (i.e. the alias one).
In addition to this, we also can't clear the DB in one query, as we
get a segfault. The code batches up the statements, so we're not running
a query for every statement, but should be enough to prevent a
segfault.
The way that we drop everything from the db needs to happen in this
order to prevent constraint violations. In addition, we ensure that
the foreign key param is set to false to also prevent any issues
in ordering of deletions.
We currently do not delete the model during a destroy-model
command, as that completely locks up the model. That will
have to be tackled later.
Checklist
QA steps
Apply patch
We need to apply this patch to ensure that the migration will fail:
Create the src:
Create the dst:
This should fail (check the logs).
Using the
juju models
command, find the default model-uuid, we'll need it later.Login to dqlite
Ensure to replace
<model-uuid>
with the valid one.The result should be empty.
Revert patch
Revert the patch and upgrade the src and dst controllers.
Should succeed.
Links
Jira card: JUJU-5877