Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CWL-aligned design/implementation #14

Open
1 task
mih opened this issue May 15, 2024 · 0 comments
Open
1 task

CWL-aligned design/implementation #14

mih opened this issue May 15, 2024 · 0 comments

Comments

@mih
Copy link
Member

mih commented May 15, 2024

This issue replaces or concludes a number of previously expressed idea, aiming to reduce the complexity and make more obvious where we stand right now. Replaced are:

The present concept is to think of a recomputation as a three-step process, where each step can be represented as a node in a CWL workflow:

  • Provision: Establish the environment required for a computation
  • Compute: Run a computation
  • Extract: Pick relevant outputs and present them as outputs of the computation in a particular context

Each step needs critical information that must be stored and supplied. All steps also have different scopes:

  • Provision: the exact same parameterization can yield suitable inputs for more than one computation
  • Compute: the exact same compute specification can be combinable with a broad range of inputs and yield different outputs
  • Extract: One and the same compute output can be filtered in many ways to yield desired outputs in a particular context

The steps also have different applicability with respect to fixed or variable values for a particular recompute

  • Provision: exact for reproducing (I want the same) vs. variable for reevaluation (I want to see how different it is, e.g. datalad rerun --onto)
  • Compute: recompute exactly vs. recompute exactly with the new version of the tool
  • Extract: Mostly together with a change in the compute specification or implementation, output filters may need to be adjusted to continue to deliver the same output (name/location change)

Taken together these requirements determine where and how the parameters of all three steps can be stored, and, importantly, how they need to be referenced. In general this means that we would want to be able to identified all parameter sets, simultaneously, by precise version (exact parameters), and by concept (or latest version).

TODO:

  • anticipatory walkthrough for the use case "recompute git-annex key"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant