Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record overlap: rewe-shop, rewe-group-com #2150

Open
mal-tee opened this issue Mar 11, 2023 · 7 comments
Open

Record overlap: rewe-shop, rewe-group-com #2150

mal-tee opened this issue Mar 11, 2023 · 7 comments

Comments

@mal-tee
Copy link
Member

mal-tee commented Mar 11, 2023

Both have "Rewe Markt GmbH" in the runs-Array. Seems like a mistake we should resolve?

@WebworkrNet
Copy link
Contributor

Thank you for opening this issue (based on my email).

@mal-tee
Copy link
Member Author

mal-tee commented Mar 24, 2023

Should we turn this into a test? @baltpeter

@baltpeter
Copy link
Member

I haven't looked into that particular case yet. Are we sure that that is a mistake?

But, either way, we can't generally forbid two records having identical runs entries. There are already valid records where that is the case, e.g. the Amazon records for different companies:

https://github.com/datenanfragen/data/blob/master/companies/amazon-de.json
https://github.com/datenanfragen/data/blob/master/companies/amazon-es.json

@mal-tee
Copy link
Member Author

mal-tee commented Mar 24, 2023

I haven't looked into that particular case yet. Are we sure that that is a mistake?

Haven't looked either. 😅

But, either way, we can't generally forbid two records having identical runs entries. There are already valid records where that is the case, e.g. the Amazon records for different companies:

master/companies/amazon-de.json master/companies/amazon-es.json

Yeah, we should only do that test if there is no overlap in the countries. 🤔

@baltpeter
Copy link
Member

Yeah, we should only do that test if there is no overlap in the countries. thinking

If there is overlap in the countries, you mean, right?

But even then, I'm not sure whether there can never be a case where that is valid…

@mal-tee
Copy link
Member Author

mal-tee commented Mar 24, 2023

If there is overlap in the countries, you mean, right?

Yes, oops.

I wrote a little script to implement this:

from collections import defaultdict
import os
import json

hashmap = defaultdict(list)

for file in os.listdir("companies/"):
    with open("companies/" + file, "r") as f:
        company = json.load(f)
        slug = company["slug"]
        hashmap[company["name"]].append(slug)
        if "runs" in company:
            for run in company["runs"]:
                hashmap[run].append(slug)

simple_overlap = {k: v for k, v in hashmap.items() if len(v) > 1}
print("simple", len(simple_overlap.keys()))
for name, slugs in simple_overlap.items():
    used_rvs = defaultdict(list)
    alls = set()
    for slug in slugs:
        with open("companies/" + slug + ".json", "r") as f:
            company = json.load(f)
            if "relevant-countries" in company:
                if company["relevant-countries"] == ["all"]:
                    alls.add(name)
                else:
                    for rv in company["relevant-countries"]:
                        used_rvs[rv].append(slug)
    filtered_overlap = {k: v for k,v in used_rvs.items() if len(v) > 2 or name in alls}
    if(filtered_overlap):
        print(name, filtered_overlap, alls)
simple 38
REWE Markt GmbH {'de': ['rewe-shop']} {'REWE Markt GmbH'}
Ideawise Limited {'de': ['gay-de', 'fetisch-de', 'poppen-de', 'kaufmich-com']} set()
Seven.One Entertainment Group GmbH {'de': ['sat1gold', 'prosieben', 'kabeleinsdoku', 'kabeleins']} set()
cpx online active AG {'de': ['optivel'], 'ch': ['optivel'], 'fr': ['optivel'], 'at': ['optivel']} {'cpx online active AG'}
Ingenico Payment Services GmbH {'de': ['ingenico-de']} {'Ingenico Payment Services GmbH'}
Ingenico Healthcare GmbH {'de': ['ingenico-de']} {'Ingenico Healthcare GmbH'}
  1. the initial case for this issue. Seems legit, since the websites are different.
  2. websites are different.
  3. same
  4. ...

Yeah, we'd also have to check if the websites are different. And probably every other key as well.


However, we can close this issue: The rewe group collision is okay, since the webpages are different.

@mal-tee mal-tee closed this as completed Mar 24, 2023
@WebworkrNet
Copy link
Contributor

I see my original concern as unresolved. The database currently shows 2 officials for REWE Markt GmbH:

  • REWE Markt GmbH
  • REWE Zentralfinanz eG

As I understand it, this cannot be the case, as the unambiguity is missing.
Which sources indicate that REWE Zentralfinanz eG is also responsible for REWE Markt GmbH? I have not been able to verify this so far.

@mal-tee mal-tee reopened this Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants