Multiple rows and columns introduce false index items #573

irm-codebase · 2024-02-21T16:37:14Z

What happened?

Loading a file with multiple rows and multiple columns adds fake indexes sometimes.

See the file:

vintagesteps,2020,2030,2040,2050,2030,2040,2050,2040,2050,2050
investsteps,2020,2020,2020,2020,2030,2030,2030,2040,2040,2050
techs,,,,,,,,,,
geothermal,1,1,1,1,1,1,1,1,1,1
hydropower,1,1,1,1,1,1,1,1,1,1
waste,1,1,0.0,0,1,1,0.0,1,1,1
bioenergy,1,1,0.8,0,1,1,0.8,1,1,1
oil,1,1,1,1,1,1,1,1,1,1
coal,1,1,1,1,1,1,1,1,1,1
ccgt,1,1,0.0,0,1,1,0.0,1,1,1
wind,1,1,0.0,0,1,1,0.0,1,1,1
pv,1,1,1,0.2,1,1,1,1,1,1
battery_li,1,1,0.5,0,1,1,0.5,1,1,1
battery_phs,1,1,1,1,1,1,1,1,1,1

Loaded via:

  vintage_availability_techs:
    source: data_sources/investstep_series/available_vintages_techs.csv
    rows: techs
    columns: [vintagesteps, investsteps]
    add_dimensions:
      parameters: available_vintages

In this case, a fake index called techs will be added.

Which operating systems have you used?

macOS
Windows
Linux

Version

v0.7

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

irm-codebase · 2024-02-21T16:46:54Z

Another example:

This one fails. Removing the header fixes the issue. Interestingly, the behavior changes depending on whether or not you are using a debugger.

nodes,techs,parameters,values
NORD,ccgt,initial_flow_cap,20000000
NORD,hydropower,initial_flow_cap,11191600
NORD,wind,initial_flow_cap,115600
NORD,pv,initial_flow_cap,8319100
NORD,battery_phs,initial_flow_cap,5064300
NORD,battery_phs,initial_storage_cap,469050200
NORD,waste,initial_flow_cap,384700
NORD,bioenergy,initial_flow_cap,2159900
CNOR,ccgt,initial_flow_cap,2000000
CNOR,hydropower,initial_flow_cap,1100900
CNOR,wind,initial_flow_cap,133600
CNOR,pv,initial_flow_cap,2270800
CNOR,waste,initial_flow_cap,23000

data_sources:
  # Initial setup
  initial_tech_capacity_params:
    source: data_sources/initial_capacity_techs_kw.csv
    rows: [nodes, techs, parameters]

brynpickering · 2024-02-21T17:26:36Z

OK, so this is a limitation of what we can ask of pandas.

A workaround:

data_sources:
  # Initial setup
  initial_tech_capacity_params:
    source: data_sources/initial_capacity_techs_kw.csv
    rows: [nodes, techs, parameters]
    columns: [values]
    drop: values

brynpickering · 2024-02-21T17:29:13Z

I can only reproduce this issue with your second example. The first one loads just fine.

irm-codebase · 2024-02-21T19:35:39Z

I can only reproduce this issue with your second example. The first one loads just fine.

Odd, that's the one I saw as most problematic. I'll give an update if I can reproduce it...

For the second: I would like to propose that this type of "dropping" should be the standard, to ensure the files given to the model are "stand alone". Otherwise, you'd need to always consult two files. This way, you have good data practices "baked in".

What do you think?

brynpickering · 2024-02-21T20:15:47Z

It's impossible to make it standard as we have to know whether the top row is data or not. As soon as you say it isn't data (columns: [...]) then it becomes an index / column and you wouldn't want that data deleted automatically if you actually _wanted_ it to be part of you model. So the only way to handle it consistently is to force the user to be explicit or to tell them that there must _always_ be a header and that if they miss it then their top row of data may be silently(?) lost...

…

On Wed, 21 Feb 2024, 19:35 Ivan Ruiz Manuel, ***@***.***> wrote: I can only reproduce this issue with your second example. The first one loads just fine. Odd, that's the one I saw as most problematic. I'll give an update if I can reproduce it... For the second: I would like to propose that this type of "dropping" should be the standard, to ensure the files given to the model are "stand alone". Otherwise, you'd need to always consult two files. This way, you have good data practices "baked in". What do you think? — Reply to this email directly, view it on GitHub <#573 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEDB63XQKJEQUYE5ON7NDI3YUZEBNAVCNFSM6AAAAABDTMI46KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXG43DKNBTGA> . You are receiving this because you commented.Message ID: ***@***.***>

irm-codebase · 2024-02-21T21:12:43Z

Hmmm, that is true...
The only way to make it possible would be to force one type of table (i.e. rows only), which would make the input very inflexible).

brynpickering · 2024-04-11T10:01:07Z

Plan: enforce a header to always exist in a CSV, even if it is just one row. We will set header=0 as the bare minimum internally.

irm-codebase added the bug label Feb 21, 2024

brynpickering added the v0.7 (upcoming) version 0.7 label Feb 21, 2024

brynpickering added the has-workaround The issue describes a valid workaround until the primary issue is solved label Feb 21, 2024

brynpickering added this to the 0.7.0 milestone Apr 11, 2024

brynpickering self-assigned this Apr 11, 2024

brynpickering linked a pull request May 10, 2024 that will close this issue

Force minimum of one header row; Add to docs #596

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple rows and columns introduce false index items #573

Multiple rows and columns introduce false index items #573

irm-codebase commented Feb 21, 2024 •

edited

irm-codebase commented Feb 21, 2024

brynpickering commented Feb 21, 2024 •

edited

brynpickering commented Feb 21, 2024

irm-codebase commented Feb 21, 2024

brynpickering commented Feb 21, 2024 via email

irm-codebase commented Feb 21, 2024

brynpickering commented Apr 11, 2024

Multiple rows and columns introduce false index items #573

Multiple rows and columns introduce false index items #573

Comments

irm-codebase commented Feb 21, 2024 • edited

What happened?

Which operating systems have you used?

Version

Relevant log output

irm-codebase commented Feb 21, 2024

brynpickering commented Feb 21, 2024 • edited

brynpickering commented Feb 21, 2024

irm-codebase commented Feb 21, 2024

brynpickering commented Feb 21, 2024 via email

irm-codebase commented Feb 21, 2024

brynpickering commented Apr 11, 2024

irm-codebase commented Feb 21, 2024 •

edited

brynpickering commented Feb 21, 2024 •

edited