Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple rows and columns introduce false index items #573

Open
1 of 3 tasks
irm-codebase opened this issue Feb 21, 2024 · 7 comments · May be fixed by #596
Open
1 of 3 tasks

Multiple rows and columns introduce false index items #573

irm-codebase opened this issue Feb 21, 2024 · 7 comments · May be fixed by #596
Assignees
Labels
bug has-workaround The issue describes a valid workaround until the primary issue is solved v0.7 (upcoming) version 0.7
Milestone

Comments

@irm-codebase
Copy link

irm-codebase commented Feb 21, 2024

What happened?

Loading a file with multiple rows and multiple columns adds fake indexes sometimes.

See the file:

vintagesteps,2020,2030,2040,2050,2030,2040,2050,2040,2050,2050
investsteps,2020,2020,2020,2020,2030,2030,2030,2040,2040,2050
techs,,,,,,,,,,
geothermal,1,1,1,1,1,1,1,1,1,1
hydropower,1,1,1,1,1,1,1,1,1,1
waste,1,1,0.0,0,1,1,0.0,1,1,1
bioenergy,1,1,0.8,0,1,1,0.8,1,1,1
oil,1,1,1,1,1,1,1,1,1,1
coal,1,1,1,1,1,1,1,1,1,1
ccgt,1,1,0.0,0,1,1,0.0,1,1,1
wind,1,1,0.0,0,1,1,0.0,1,1,1
pv,1,1,1,0.2,1,1,1,1,1,1
battery_li,1,1,0.5,0,1,1,0.5,1,1,1
battery_phs,1,1,1,1,1,1,1,1,1,1

Loaded via:

  vintage_availability_techs:
    source: data_sources/investstep_series/available_vintages_techs.csv
    rows: techs
    columns: [vintagesteps, investsteps]
    add_dimensions:
      parameters: available_vintages

In this case, a fake index called techs will be added.

Which operating systems have you used?

  • macOS
  • Windows
  • Linux

Version

v0.7

Relevant log output

No response

@irm-codebase
Copy link
Author

Another example:

This one fails. Removing the header fixes the issue. Interestingly, the behavior changes depending on whether or not you are using a debugger.

nodes,techs,parameters,values
NORD,ccgt,initial_flow_cap,20000000
NORD,hydropower,initial_flow_cap,11191600
NORD,wind,initial_flow_cap,115600
NORD,pv,initial_flow_cap,8319100
NORD,battery_phs,initial_flow_cap,5064300
NORD,battery_phs,initial_storage_cap,469050200
NORD,waste,initial_flow_cap,384700
NORD,bioenergy,initial_flow_cap,2159900
CNOR,ccgt,initial_flow_cap,2000000
CNOR,hydropower,initial_flow_cap,1100900
CNOR,wind,initial_flow_cap,133600
CNOR,pv,initial_flow_cap,2270800
CNOR,waste,initial_flow_cap,23000
data_sources:
  # Initial setup
  initial_tech_capacity_params:
    source: data_sources/initial_capacity_techs_kw.csv
    rows: [nodes, techs, parameters]

@brynpickering brynpickering added the v0.7 (upcoming) version 0.7 label Feb 21, 2024
@brynpickering
Copy link
Member

brynpickering commented Feb 21, 2024

OK, so this is a limitation of what we can ask of pandas.

A workaround:

data_sources:
  # Initial setup
  initial_tech_capacity_params:
    source: data_sources/initial_capacity_techs_kw.csv
    rows: [nodes, techs, parameters]
    columns: [values]
    drop: values

@brynpickering
Copy link
Member

I can only reproduce this issue with your second example. The first one loads just fine.

@brynpickering brynpickering added the has-workaround The issue describes a valid workaround until the primary issue is solved label Feb 21, 2024
@irm-codebase
Copy link
Author

I can only reproduce this issue with your second example. The first one loads just fine.

Odd, that's the one I saw as most problematic. I'll give an update if I can reproduce it...

For the second: I would like to propose that this type of "dropping" should be the standard, to ensure the files given to the model are "stand alone". Otherwise, you'd need to always consult two files. This way, you have good data practices "baked in".

What do you think?

@brynpickering
Copy link
Member

brynpickering commented Feb 21, 2024 via email

@irm-codebase
Copy link
Author

Hmmm, that is true...
The only way to make it possible would be to force one type of table (i.e. rows only), which would make the input very inflexible).

@brynpickering
Copy link
Member

Plan: enforce a header to always exist in a CSV, even if it is just one row. We will set header=0 as the bare minimum internally.

@brynpickering brynpickering added this to the 0.7.0 milestone Apr 11, 2024
@brynpickering brynpickering self-assigned this Apr 11, 2024
@brynpickering brynpickering linked a pull request May 10, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug has-workaround The issue describes a valid workaround until the primary issue is solved v0.7 (upcoming) version 0.7
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants