Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rdflib 7.0.0: Inadequate Support for Importing Multiple Prefixes with the Same IRI and Base IRI #2768

Open
hsekol-hub opened this issue Apr 16, 2024 · 2 comments

Comments

@hsekol-hub
Copy link

The issue is regarding loading .ttl files that contain multiple prefixes defined with the same IRI. This practice doesn't violate any W3C standards and is commonly observed. The simplest illustration of what I'm trying to achieve can be demonstrated with the following code:

from rdflib import Graph, Namespace
EX = Namespace("http://example.org/")
EX1 = Namespace("http://example.org/")
g = Graph(bind_namespaces="none")
g.bind("ex", EX)
g.bind("ex1", EX1)
print(list(g.namespaces()))
[('ex1', rdflib.term.URIRef('http://example.org/'))]

Another issue arises when attempting to read @base in the parsing or Graph() doesn't automatically fetch it from the file:

# Not able to fetch @base
    graph: rdflib.Graph = rdflib.Graph()
    with filepath.open(encoding="utf-8") as file:
        graph.parse(file)
        
 print(graph.base)  # Outputs `None`

Instead, I have to implement something like this (sub-optimal and questions using rdflib in the first place):

    with open(filepath, encoding="utf-8") as file:
        for line in file:
            if line.strip().startswith("@base :"):
                # Extract the base URI from the line
                base_uri = rdflib.URIRef(
                    line.split(" ")[-2].replace("<", "").replace(">", "")
                )
                break
        graph = rdflib.Graph(base=base_uri)
        graph.parse(file, format="turtle")

These fundamental functionalities lead to inconsistencies in ontology files if read via the rdflib library. Can someone please suggest an alternate library or provide a solution to fix this issue at the earliest convenience?

@WhiteGobo
Copy link
Contributor

As a workaround you can write your second prefix directly into the default memory store.

from rdflib import Graph, Namespace, URIRef
EX = Namespace("http://example.org/")
g = Graph(bind_namespaces="none")
g.bind("ex", EX)
g.store._Memory__namespace["ex1"] = URIRef("http://example.org/")
print(list(g.namespaces()))
#[('ex', rdflib.term.URIRef('http://example.org/')), ('ex1', rdflib.term.URIRef('http://example.org/'))]

I havent tested this any further. So dont know if there will be any problems or other stores will behave in the same manner.

@nicholascar
Copy link
Member

The base issue could be a real one and I will look into it when I'm back from holidays in a few weeks.

But the multi prefixes one is not! Sure, it's not a violation to want to have multiple prefixes for the same namespace and you do see it in data but prefixes are just presentation conveniences and I think that too much catering for every possible use of them overemphasises their role. They are not real data things and a single prefix for a namespace will always work, even when multiple are originally supplied in the data, or defined as per your code above.

So I'm not motivated to solve this one. And there is indeed a work around.

If it's really important to you, @hsekol-hub, please feel free to create a Pull Request to address it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants