Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rerun feeder should be able to re-run payloads with a terminal state #199

Open
jkeifer opened this issue Nov 5, 2022 · 3 comments
Open
Milestone

Comments

@jkeifer
Copy link
Collaborator

jkeifer commented Nov 5, 2022

Payloads in the COMPLETED and INVALID states cannot be re-run by the current implementation of the rerun feeder if they did not already have the "replace": true key in their process definition, as it has no facility for setting that key for the re-run payload. This is because it currently just resolves the payload url and sends that along to process to resolve from S3.

Two options for fixing this oversight:

  • Resolve the payloads from S3 in rerun, apply the replace key, then send the message to process
  • Allow process to use a replace key in the url payload to set that on the resolved payload

The former requires more logic in rerun that effectively duplicates operations already supported in process, and for large payloads would require pulling them down, modifying them, pushing them back up to S3, then pulling them down again in process.

The latter adds additional logic to process to understand a sort of ad hoc and perhaps not well-considered interface for replacements, and seems a slippery slope to having process support for modifying payloads in other ways. We just pulled such features out of process to keep its logic simple.

@jkeifer jkeifer added this to the 0.9.0 milestone Nov 5, 2022
@matthewhanson
Copy link
Member

I don't like the option to add the replace key to a payload, as modifying the payload and saving it to s3 seems like it could cause problems down the road. That payload may be referenced in the future as the payload, such as in the rerun feeder.

Is there a case where a user could ask to rerun a batch of jobs, some of which are already in Completed state. Let's say some of those Completed have replace: True, and others have it as False. Would this lead to some of these getting run again and others not? This is not deterministic as the user wouldn't know what the behavior would be on any individual payload.

Having replace in process seems like the better solution. As it stands now I really have no way of rerunning an INVALID workflow unless I duplicate some logic from the process function.

@matthewhanson
Copy link
Member

Actually, looks like the StateDB.process already takes a process keyword. But the rerun feeder can't use right now since it just adds things to the queue and isn't calling process directly. I think this is what you were getting at with using replace in the url payload?

So in the case of a URL payload, you'd add replace here:

{
   "url": "s3://yada/yada/yada"
   "replace": true
}

So then the process Lambda would set the replace keyword of ProcessPayloads.process to true if it were included in the payload. In the case where the payload does not have a "url" field, it would use the normal replace in the process definition?

@jkeifer
Copy link
Collaborator Author

jkeifer commented Feb 7, 2023

Yes.

I'd propose changing ProcessPayload instantiation to pop ProcessPayload.process.replace and set a new property ProcessPayload.replace to the popped value. Instantiating a ProcessPayload should also take a new parameter replace, which can take the value of replace from the top-level payload in the case of URL payloads.

Popping replace from ProcessPayload.process will mitigate the concern about payloads being persisted to S3 with replace set. Of course, due to the way we have ProcessPayload be it's own dict, ProcessPayload.replace would be persisted to S3. This is not ideal, but also not the end of the world as it won't be used to trigger replacement.

That said, a likely better solution here is to compose the payload into ProcessPayload using one of these solutions:

  • define well-known top-level keys for those properties to be persisted to S3 (I did something like this in the cirrus-mgmt plugin to extract persisted properties into a parent data class, which had the json-conversion methods on it using dataclasses.asdict(self))
  • put the entire payload data structure under a single key, say as payload or raw

I'd hazard to say while the latter might be a more inconvenient refactor, it would probably be the best long-term solution, and would eliminate some weirdness we already have, e.g., ProcessPayload.process vs ProcessPayload["process"].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants