-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing detail in grid documentation #973
Comments
don't know if there is an issue for it, but would be great to remove the numbers altogether with a flag, especially when we want to send data to other processes and need to know the filenames. |
Yeah, this makes sense. the filenames should be predictable, I'm deciding between two solutions:
tasks:
- source: random-forest.py
name: random-forest-
product: 'n_estimators=[[n_estimators]]/criterion=[[criterion]].html'
grid_number_suffix: false
grid:
n_estimators: [5, 10, 20]
criterion: [gini, entropy]
tasks:
- source: random-forest.py
name: random-forest-
# no numbering here
product: 'n_estimators=[[n_estimators]]/criterion=[[criterion]].html'
grid_number_suffix: false
grid:
n_estimators: [5, 10, 20]
criterion: [gini, entropy]
tasks:
- source: random-forest.py
name: random-forest-
# numbering here
product: 'report.html'
grid_number_suffix: false
grid:
n_estimators: [5, 10, 20]
criterion: [gini, entropy]
option 2 is nice since users won't need to read docs, but it'll often lead to issues if the tasks in the grid produce non-unique products (ploomber does not allow this since this will cause tasks to overwrite their results hence it raises an error) thoughts? |
I think the second option with a reasonable error is better. Although I assume you meant to remove
|
Ah, great catch! Yeah, for the second scenario I meant removing the grid_number_suffix, but I like your approach of having that option as well and showing it as part of the error message, so users don't have to dig into the docs. I'll add this to the backlog. thanks a lot for your feedback! |
Ok so what's left to do here? Are we settle on an option yet? |
let's go with this @Wxl19980214 |
So what we want to do is to take in another parameter |
Let me just clarify things a bit. So this functionality only applies to grid usage? And the premise is that user doesn't look at our docs so they don't know placeholders, all they do is something like
So if they do this, we output error saying you can either give placeholders like report-param=[param].html, or add the |
What about this case:
So the products will have different names, but the shouldnt we add number suffix to names: like random-forest-0, etc? |
Yeah, you got it right. Let's start with the main motivation for this: pipeline task's should produce unique output files, so if I do: tasks:
- source: random-forest.py
name: random-forest-
# no numbering here
product: output.html
grid:
n_estimators: [5, 10, 20]
criterion: [gini, entropy] ploomber will complain since all tasks generated from the grid will create an "output.html". so the initial solution was to automatically number outputs: output-1.html, output-2.html... Recently we added the tasks:
- source: random-forest.py
name: random-forest-
product: 'n_estimators=[[n_estimators]]/criterion=[[criterion]].html'
grid_number_suffix: false
grid:
n_estimators: [5, 10, 20]
criterion: [gini, entropy] but this wont work if there are missing parameters: tasks:
- source: random-forest.py
name: random-forest-
# n_estimators is missing!
product: 'criterion=[[criterion]].html'
grid_number_suffix: false
grid:
n_estimators: [5, 10, 20]
criterion: [gini, entropy] So the proposed solution is: if we detect non-unique products, then we tell the user to either turn |
in the docs we have an example that looks like this:
link
the `[[placeholders]]`` are replaced at runtime by the parameter values, however, to ensure each file has a different name we also append a number at the end; but this isn't documented.
right below the snippet that I put above, we should add a little note saying that we number the filenames to prevent conflicting paths.
The text was updated successfully, but these errors were encountered: