Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: allow assigning outputs to more that one collection. #228

Open
matthewhanson opened this issue Aug 17, 2023 · 3 comments
Open

Comments

@matthewhanson
Copy link
Member

Currently, the upload_options.collections dictionary will assign the collection that has the first match.

For example a typical collections dict might be

"collections": {
    "landsat-8": "$[?(@.id =~ 'LC08.*')]",
    "sentinel-2": "$[?(@.id =~ 'S2.*')]"
}

However, we have a case where we want to publish an item to multiple collections. It's likely not a common case, however by explicitly allowing for multiple matches it also requires a payload writer to be more exact in how they write the JSONPath expressions.

In this case we may have:

"collections": {
    "landsat-8": "$[?(@.id =~ 'LC08.*')]",
    "sentinel-2": "$[?(@.id =~ 'S2.*')]",
    "landsat-8-legacy": "$[?(@.id =~ 'LC08.*')]"
}

So that Landsat-8 items would be put in both the "landsat-8" and "landsat-8-legacy" collections.

@jkeifer
Copy link
Collaborator

jkeifer commented Aug 17, 2023

I understand the need and desire to produce items that belong to multiple collections, however I am uncertain how this would work in practice. Maybe you can help me understand?

Specifically, in your example, a Landsat 8 item matches both collections, but STAC items can have only one collection defined, correct? If so, does this mean we need to duplicate this item in the workflow output for each additional matched collection? Or does one of these collections take precedence for the workflow output (first matched?) and we only consider multiple collection matches when publishing (so we publish more items than are in the workflow output)?

I don't love either of these solutions. The former seems preferable for some reasons, but in either case you end up with multiple items representing the same (meta)data, and that seems problematic from both a data consistency and catalog management perspective.

Does this situation potentially point at a gap in the STAC spec, that supporting only a single collection per item is too limiting? Should the spec allow items to have multiple collections? Or do we need a new concept to cover such "item aliases"?

I think we've discussed the idea of dynamic collections before. Would that perhaps be an idea here?

@matthewhanson
Copy link
Member Author

The intention would be that it would be published, separately, to each collection so yes it would be duplicated.

@ircwaves
Copy link
Member

If the item is to be duplicated, it seems like some metadata should be injected to the one destined for the -legacy collection, such that the JSONPath expression could be used to differentiate the two and send them to their respective collections. The expression can work over the whole item, so the IDs can remain the same, and the dest collections can differ.

Side note: until #226 merges, the last match collection is the one assigned to the item. After, it will be the first match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants