Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change kgtkreader and kgtkwriter to process Pandas dataframes #684

Open
CraigMiloRogers opened this issue Nov 17, 2022 · 6 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@CraigMiloRogers
Copy link
Collaborator

CraigMiloRogers commented Nov 17, 2022

To better integrate with the Python Notebook environment, change kgtkreader and kgtkwriter to process Pandas dataframes directly.

@CraigMiloRogers CraigMiloRogers added the enhancement New feature or request label Nov 17, 2022
@CraigMiloRogers CraigMiloRogers self-assigned this Nov 17, 2022
@CraigMiloRogers CraigMiloRogers added this to To do in KGTK Development via automation Nov 17, 2022
@CraigMiloRogers
Copy link
Collaborator Author

Adding this support at such a low level will help make it pervasive, but will make kgtk be dependent upon pandas in a more fundamental way than it is at present.

@CraigMiloRogers
Copy link
Collaborator Author

Alternatively, we can handle dataframe processing in the api wrappers, much as it is treated in functions.py, but that might be a bit verbose.

@CraigMiloRogers
Copy link
Collaborator Author

Changes committed to dev.

KGTK Development automation moved this from To do to Done Dec 2, 2022
@CraigMiloRogers
Copy link
Collaborator Author

Need an option to convert KGTK strings to native datatypes (int, float, date/time?) when writing a DataFrame.

KGTK Development automation moved this from Done to In progress Dec 2, 2022
@CraigMiloRogers
Copy link
Collaborator Author

Need Documentation.

@CraigMiloRogers
Copy link
Collaborator Author

There are now two dataframe output formats for KgtkWriter.

  • OUTPUT_FORMAT_DATAFRAME_STRING writes a dataframe containing KGTK-formatted string values. It is fast and no information is lost.
  • OUTPUT_FORMAT_DATAFRAME_NATIVE writes a dataframe with Python strings, ints, floats, and bools. It is slower, information is lost, and the conversion is incomplete (e.g., dates could be converted to a native type, but are not yet done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
KGTK Development
  
In progress
Development

No branches or pull requests

1 participant