-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channels concept #939
Comments
In my understanding, it is to support lazy range of strings, Files and other objects. How about extending a concept of |
I agree that channels and streamable contents have something in common, and this is why we probably should call them
In my opinion, we could think to extend |
The The channel proposal is a concept within the workflow runner, operating at the level of communication between workflow steps, which is a level of abstraction above files and pipes and invoking concrete commands. Because it's different from |
The CWL Channel typeResuming briefly, this PR advocates the introduction of a new CWLType called
Channel sourcesAs stated by @tetron in the initial proposal, I propose to reuse the
Note that the
Note that if a Channel portsSince
For these reasons, I suggest disallow Furthermore, I suggest to disallow the possibility to return a Channel sinksSince The first way to manipulate a The proposed syntax can also transform a Indeed, there could be steps that do not need the entire
The Intuitively, the |
I propose that a channel object is represented at the CWL level (in expressions) this way:
Where To model "get" operations, I propose a "get" operation consists of a "channel id" and a "reader id". Each distinct reader id starts at the beginning, so two readers on the same channel do not interfere with one another (the "broadcast" pattern). I'd like to suggest that a CommandLineTool can be allowed to accept a channel as input, but only in the plain json data form above, it can't do anything with it except print it out or pass it through. By analogy with File objects, where the file path is just a handle that you use to request data from the operating system, in the future we could introduce some kind of API where you could exchange the channel id for a pipe or socket where you read or write streaming data, but that should be out of scope for this initial design. The window concept is not something I had thought about. I do think being able to go back and forth between arrays and channels is going to be important, and the window concept seems like a useful generalization of "collect all channel items into an array" or "emit each array item into a channel". |
@kinow and I had a discussion about using CWL for climate models, and the need to be able to send events between processes that haven't actually finished yet. This means, building on the channel concept, a running process itself should be able to send and receive channel messages to/from the workflow engine. I wanted to make sure I recorded a couple of ideas for communication protocols:
|
Related, for fetching/posting dynamic channel events, an implementation-specific program with a standard name like |
Design sketch for a CWL "channel" type.
Motivation
Model
A channel is an asynchronous queue of items. It has two logical variables
The channel has three operations
Channels are never explicitly created, closed, written to or read within the workflow. This model exists only to inform the behavior of workflow steps that produce or consume channel objects.
Passing a channenl to a workflow step
When a channel is passed to a workflow step, it checks the type of the parameter on the run step.
Use in scattering
A channel object can be produced by a scatter operation:
When "stepOutputType: channel", then a scattered workflow step immediately produces channels for all its outputs. Each individual scatter step puts its result into the channel as it becomes available. When all scatter steps complete, the channel is closed.
A workflow step can "scatter" over a channel.
This will start a scatter instance for each item in the channel, until the channel is closed and empty.
Scattering over multiple channels follow similar dot product / cross product behavior as for arrays.
Channels could also be produced by the proposed iterative loop feature.
Single item channels
If singleItem is true, the channel will only ever have at most one item. The first "put" closes the channel.
A non-scattered step can produce channels with
stepOutputType: "channel"
On a non-scattered workflow step, this produces results immediately as channels with
singleItem: true
When a channel with
singleItem: true
is passed to a workflow step, it checks the type of the parameter on the run step.TBD
What happens if a channel is passed to two downstream workflow steps. Are the items copied to both, do they race to get the items, or is this an error?
The deferred input case described here https://cwl.discourse.group/t/deferred-input-for-scattered-subworkflow/484 favors the "items copied" pattern.
Proposed model for items copied pattern: when calling
get()
you provide a "get id" for that reader instance. Each unique id starts at position 0 and reads to close.This also more explicitly models the channel as an array whose final size is not known when it is created, but facilitates the "automatic behavior" conversion of a channel to an array when connected to an array type input.
The text was updated successfully, but these errors were encountered: