Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PROV-Dictionary? #129

Open
stain opened this issue Oct 19, 2018 · 0 comments
Open

Support PROV-Dictionary? #129

stain opened this issue Oct 19, 2018 · 0 comments

Comments

@stain
Copy link

stain commented Oct 19, 2018

We've been wanting to use PROV-Dictionary extension with prov.py, but it's a bit tricky if we want to serialize in multiple formats.

Our current workaround is to register the regular membership of prov:Collection as supported by prov.py, and also say it's a prov:Dictionary:

entity = document.entity("ex:someFile")
coll = document.entity("ex:someDirectory", [                    
                     (provM.PROV_TYPE, PROV["Collection"]),
                     (provM.PROV_TYPE, PROV["Dictionary"]),
               ])

Then regular membership is easy:

document.membership(coll, entity)

prov.py does however not have a dictionaryMembership method. To express the PROV Dictionary we use a PROV-O compatible attributes:

# Membership relation
m_entity  = document.entity(uuid.uuid4().urn, [
  (PROV["KeyEntityPair"])
  ])
m_entity.add_attributes({
    PROV["pairKey"]: entry["basename"],
    PROV["pairEntity"]: entity,
})

This workaround produces PROV-O statements correct according to PROV-Dictionary section 5:

ex:someDirectory a 
        prov:Collection,
        prov:Dictionary,
        prov:Entity ;
    prov:hadMember ex:someFile ;
    prov:hadDictionaryMember <urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> .

<urn:uuid:25d8fc8b-2b63-45dc-9e33-276e9839a0a8> a 
        prov:Entity,
        prov:KeyEntityPair ;
    prov:pairEntity ex:someFile ;
    prov:pairKey "filename.txt"^^xsd:string .

However the PROV-N output does not match PROV-Dictionary section 4:

 entity(ex:someDirectory, [prov:type='prov:Dictionary', prov:type='prov:Collection', prov:hadDictionaryMember='id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8'])

  hadMember(ex:someDirectory, ex:someFile)

  entity(id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8, [prov:type='prov:KeyEntityPair', prov:pairKey="filename.txt", prov:pairEntity='ex:someFile'])

If this was supported the membership should come in PROV-N as:

prov:hadDictionaryMember(ex:someDirectory, ex:someFile, "filename.txt")

Is there a way to add such name-spaced statements to PROV-N with prov.py?

Similarly expressed in PROV-XML according to PROV-Dictionary section 6 we would expect something like:

<prov:collection prov:id="ex:someDirectory" />
<prov:hadMember>
    <prov:collection prov:ref="ex:someDirectory"/>
    <prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>


<prov:dictionary prov:id="ex:someDirectory" />

<prov:hadDictionaryMember>
    <prov:dictionary prov:ref="ex:someDirectory"/>
    <prov:keyEntityPair>
        <prov:key>filename.txt</prov:key>
        <prov:entity prov:ref="ex:someFile"/>
    </prov:keyEntityPair>
</prov:hadDictionaryMember>

but with our workaround we get:

  <prov:collection prov:id="ex:someDirectory">
    <prov:type xsi:type="xsd:QName">prov:Dictionary</prov:type>
    <prov:hadDictionaryMember xsi:type="xsd:QName">id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8</prov:hadDictionaryMember>
  </prov:collection>

<prov:hadMember>
    <prov:collection prov:ref="ex:someDirectory"/>
    <prov:entity prov:ref="ex:someFile"/>
</prov:hadMember>

  <prov:entity prov:id="id:25d8fc8b-2b63-45dc-9e33-276e9839a0a8">
    <prov:type xsi:type="xsd:QName">prov:KeyEntityPair</prov:type>
    <prov:pairEntity xsi:type="xsd:QName">id:aa96fdb4-ecb6-4488-9a9b-00f0c17a1fbd</prov:pairEntity>
    <prov:pairKey>rsem_reference.seq</prov:pairKey>
  </prov:entity>

Note that this style seems to survive a round-trip from PROV-O via PROV-XML over to PROV-O again.

Obviously we can blame the PROV-Dictionary spec for not implementing it in this PROV-O style also in PROV-XML and PROV-N (which would then have been backwards compatible to all PROV syntaxes)

This issue however asks for some prov.py API support for making PROV-Dictionary statements across all syntaxes.

It might ideally need some hacks to have consistent serialization and parsing though - but as a first attempt I would suggest adding support for our approach as it would not cause issues in loading/saving. Also I think the implication of a Dictionary being a Collection should be implied for compatibility with consumers not understanding PROV-Dictionary, but I understand if that can be harder to maintain in a mutable prov model in memory (e.g. there could be multiple keys having same value).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants