Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take JSON in CLI #63

Open
alistaire47 opened this issue Sep 16, 2022 · 0 comments
Open

Take JSON in CLI #63

alistaire47 opened this issue Sep 16, 2022 · 0 comments

Comments

@alistaire47
Copy link
Collaborator

Given

a. the number of options for dataset formatting will continue to grow and that having a CLI flag for each will get ungainly, and
b. this package will primarily be used by machines,

it seems like a good idea to accept a JSON blob containing all the bits. Indeed, we pretty much have to to accept schemas, anyway.

There are some options here:

  1. switch out the generate interface entirely (i.e. remove the current one): datalogistik generate '<blob>'
  2. add a separate interface in addition: datalogistik generate --json '<blob' or datalogistik generate-json '<blob>'
  3. accept blobs for certain parameters that can get complicated: datalogistik generate -d fanniemae -f '<blob>' or datalogistik generate -d fanniemae --format-json '<blob>'

There are tradeoffs in maintenance burden and human usability. Regardless, JSON schemas should be

i. well documented in a fashion that will stay in sync as they evolve
ii. not require all fields to be specified where they don't matter (e.g. chunk size for CSVs) or defaults are fine (chunk size for parquet most of the time)

Given the new Dataset class in #62, it probably makes sense to accept part or all of its JSON-serialized form, so we could just unpack it with Dataset(**<blob>) or Dataset(name="fanniemae", format=**<blob>).

Before picking up work for this, we should make a decision of which option we prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant