Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch and report duplicate column names in header line #327

Open
cowtowncoder opened this issue Aug 21, 2022 · 2 comments
Open

Catch and report duplicate column names in header line #327

cowtowncoder opened this issue Aug 21, 2022 · 2 comments
Labels

Comments

@cowtowncoder
Copy link
Member

(note: offshoot of a comment in #285)

It looks like code does not currently check that the column names included in the header line are unique; meaning that one name may be occur more than once, and in that case the last one is used.
This should not be allowed: column names in header line should not have duplicates.

@workingenius
Copy link

I have a case that may change your mind.
This is a real case at work, but I did some simplification.

We get csv files from other companies to parse:

ID,Name
1,"Bob"
2,"David"

, with a schema like this:

@JsonPropertyOrder({"id", "name"})
class A {
  @JsonProperty("Id")
  @JsonAlias({"Identification", "ID"})
  public int id;

  @JsonProperty("Name")
  @JsonAlias({"Caption", "Title"})
  public String name;
}

It worked fine. Pay attention to the aliases, because there are several possible variations.

And one day, their schema changed, adding another column "ID".
The csv files could be like this:

ID,Name,ID
1,"Bob","UA2940"
2,"David","IM3592"

Although there are now two columns named "ID", we can tell the difference.
We know exactly what the new column means.
Now we want the schema to be like this:

@JsonPropertyOrder({"idNumber", "name", "idString"})
class A2 {
  @JsonProperty("Id")
  @JsonAlias({"Identification", "ID"})
  public int idNumber;

  @JsonProperty("Name")
  @JsonAlias({"Caption", "Title"})
  public String name;

  @JsonProperty("IdString")
  @JsonAlias({"IDString", "ID"})
  public String idString;
}

This will cause exceptions. "ID" will always map to "IdNumber", but the new column is not int type.

Because there are two duplicated columns, now the order of the headers matters.
We have given the order, hoping it would take effect, but it didn't.

The schema of the headers is out of our control. We can only adapt to it.
Now we could only preprocess the header in advance, renaming those conflict column names.
Is there any better solution to this situation?

As jackson csv is a library for general purposes, I don't think the order of the headers should be totally ignored.
There should be some elegant way to support this.

BTW, we are using jackson-dataformat-csv-2.10.3, which is not the latest version.
I'm going to catch up with new changes and see.

@workingenius
Copy link

Sorry, we met this issue because we incorrectly used mapper.schemaWithHeader(). After changing it to mapper.schemaFor(xxx), our class with annotation worked fine for the case I described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants