PROV-N deserialization? #122

MarcelPa · 2018-09-05T16:11:40Z

Hello, I would like to know whether a PROV-N deserializer is somewhere on the implementation roadmap? If not, I would like to contribute that; in case it is of interest of course.

trungdong · 2018-09-05T20:49:49Z

Thanks, @MarcelPa. That'd be fab!

I've been thinking about doing this, but haven't found the time for it. @TomasKulhanek recently wrote an ANTLR grammar for PROV-N, which I believe can be used to build a PROV-N parser.

Are you interested in working on that? I'd be very happy to help with testing/integration when needed.

MarcelPa · 2018-09-06T09:39:14Z

Great, I would like to work on that then. The ANTLR grammar will surely be really helpful for that, thanks @trungdong (and @TomasKulhanek of course).
I just forked the repo and will start to look into ANTLR; I hope to start pushing to the forked repo to the development branch starting next week.
Will keep you posted :-)

trungdong · 2018-09-06T22:30:32Z

Excellent! Thanks a lot, @MarcelPa.

MarcelPa · 2018-09-12T17:30:03Z

Quick question regarding testing: is there a pattern on how to create test files that can be found under tests/rdf for example? My approach would be to just copy the rdf documents and translate them into provn docs step by step.

FYI: ANTLR works fine so far, I got rid of raising NotImplementedError to successfully run all test cases in my virtual python environment.

trungdong · 2018-09-13T10:24:54Z

There is an extensive suite of round-trip conversion tests that you can use right away. See test_json.py for an example. The following test code will do:

class RoundTripPROVNTests(RoundTripTestCase, AllTestsBase):
    FORMAT = 'provn'

BTW, could you develop from the dev branch, please? I've been reorganising the directory structure there and will update it in the next release. Cheers!

MarcelPa · 2018-11-17T12:52:25Z

Quite some time that I have pushed to my forked repo, therefore I am giving you an update via this issue:
Right now, the antlr-grammar seems to be erroneous in some cases, like langtags. Unfortunaly, I am not an expert in grammars, but I hope to get my head around them soon. Right now, I think these errors can be solved by reordering the lexer rules of the grammar, I will test soon whether this helps.

trungdong · 2018-11-20T21:14:06Z

Thanks for the update, @MarcelPa. Unfortunately, I won't be of much help on ANTLR.

The PROV-N specs does use grammar rules, which you might find useful.

MarcelPa · 2019-03-27T09:31:43Z

Hey, after quite a while I finally had some spare time to spend for this. I modified the grammar a little bit (basically just reordered some rules), now it seems to work properly :-)
I am down to 13 test cases which fail / error. Next step for me will be to rebase to the newest commit of the dev branch and keep on developing.
As for now, failed test cases seem to be incorrectly parsed float values from typed literals. What would you expect to insert into the attributes of an expression? Something like

Literal(somevalue, datatype="xsd:float", langtag=someLangtag)

or parsed native value, like

float(somevalue)

I think I used those a little bit inconsequently right now. I will need to refactor this one way or another ;-)

trungdong · 2019-03-27T11:10:40Z

Thank you for the update and the work, @MarcelPa!

float(value) should work, I think. Do you have an example of a problematic case?

BTW, a Python float value is mapped to xsd:double by the package though.

MarcelPa · 2019-03-27T12:07:06Z

I do: running test_entity_with_multiple_attribute fails. Both outputs are almost identical:
Parsed data from a debug print:

document
  prefix ex <http://example.org/>
  prefix ex_1 <http://example4.org/>
  
  entity(ex:emov, [ex:v_0="un lieu", ex:v_1="un lieu"@fr, ex:v_2="a place"@en, ex:v_3=1, ex:v_4=1, ex:v_5="1" %% xsd:short, ex:v_6="2" %% xsd:float, ex:v_7="1" %% xsd:float, ex:v_8="10" %% xsd:decimal, ex:v_9="1" %% xsd:boolean, ex:v_10="0" %% xsd:boolean, ex:v_11="10" %% xsd:byte, ex:v_12="10" %% xsd:unsignedInt, ex:v_13="10" %% xsd:unsignedLong, ex:v_14="10" %% xsd:integer, ex:v_15="10" %% xsd:unsignedShort, ex:v_16="10" %% xsd:nonNegativeInteger, ex:v_17="-10" %% xsd:nonPositiveInteger, ex:v_18="10" %% xsd:positiveInteger, ex:v_19="10" %% xsd:unsignedByte, ex:v_20="http://example.org" %% xsd:anyURI, ex:v_21="http://example.org" %% xsd:anyURI, ex:v_22='ex:abc', ex:v_23='ex:abcd', ex:v_24='ex_1:zabc', ex:v_25='ex_1:zabcd', ex:v_26="2019-03-27T12:52:02.266484" %% xsd:dateTime, ex:v_27="2019-03-27T12:52:02.266486" %% xsd:dateTime])
endDocument

versus the testcase data:

document
  prefix ex <http://example.org/>
  prefix ex_1 <http://example4.org/>
  
  entity(ex:emov, [ex:v_0="un lieu", ex:v_1="un lieu"@fr, ex:v_2="a place"@en, ex:v_3=1, ex:v_4=1, ex:v_5="1" %% xsd:short, ex:v_6="2" %% xsd:float, ex:v_7="1.0" %% xsd:float, ex:v_8="10" %% xsd:decimal, ex:v_9="1" %% xsd:boolean, ex:v_10="0" %% xsd:boolean, ex:v_11="10" %% xsd:byte, ex:v_12="10" %% xsd:unsignedInt, ex:v_13="10" %% xsd:unsignedLong, ex:v_14="10" %% xsd:integer, ex:v_15="10" %% xsd:unsignedShort, ex:v_16="10" %% xsd:nonNegativeInteger, ex:v_17="-10" %% xsd:nonPositiveInteger, ex:v_18="10" %% xsd:positiveInteger, ex:v_19="10" %% xsd:unsignedByte, ex:v_20="http://example.org" %% xsd:anyURI, ex:v_21="http://example.org" %% xsd:anyURI, ex:v_22='ex:abc', ex:v_23='ex:abcd', ex:v_24='ex_1:zabc', ex:v_25='ex_1:zabcd', ex:v_26="2019-03-27T12:52:02.266484" %% xsd:dateTime, ex:v_27="2019-03-27T12:52:02.266486" %% xsd:dateTime])
endDocument

The difference is noticable at ex:v_7="1.0" %% xsd:float, which will be parsed as a float but returned as ex:v_7="1" %% xsd:float.

So far, I did not notice any changes happening from float to double.

pohutukawa · 2020-06-21T21:29:56Z

@MarcelPa Just a quick question what the status of the PROV-N deserialiser is. It's been a good year, and it looked like things weren't far off.

MarcelPa · 2020-06-22T06:35:28Z

Oh my, I completely lost track of this issue, thanks for the reminder @pohutukawa ! I will rebase later today and give a status update; If I recall correctly, I was "stuck" editing the antlr prov-n grammar. Will keep you posted :-)

MarcelPa · 2020-06-22T20:16:12Z

I am back at finding out how antlr4 works (any help is appreciated!). For reasons I do not yet understand, langtags and some int_literals will fail to parse, which gives me 57 fails of 185 unit tests. Once I figure out how to fix that, PROV-N deserialization should near its completion.

pohutukawa · 2020-06-30T22:50:56Z

That looks promising!
Even if there are some "glitches" as in the comment above (where a float is parsed to ex:v_7="1" %% xsd:float), I'd be fully happy, as the value and its type is still preserved, and only the formatting (to 1.0) is lost.

ChrisJMacdonald · 2020-12-01T19:48:24Z

Hi @MarcelPa, Wondering if you've had any progress on this deserializer? I'm wanting to work some more with Prov-n but seem quite limited without the ability to store and extract from Prov-n strings. I'm taking a bit of a look at the code and the tools to see if I could help but it's a little bit beyond me at this stage
Thanks!

ChrisJMacdonald · 2020-12-03T01:42:03Z

I also found a mildly hacky way to convert in and out of Prov-n using the java ProvToolbox and provconvert,
Saving the file as a .provn then using provconvert to spit it out as .json, and then using the python deserialiser to get it back as a ProvDocument.
Luc Moreau had put some of his information up about the ANTLR3 grammer for prov too (here)

pohutukawa · 2020-12-10T03:32:30Z

We've been trying to use @MarcelPa's feature branch that can parse PROV-N with decent results so far. Though, it's not based on the current 2.0 version, yet, so that's a bit of a pity.

If the ANTLR3 grammar by Luc is more complete, would that be an option to move forward on? (Even though it may be more "sexy" to use a current ANTLR4 grammar.) After all, there is a antlr3_python_runtime Python module as well.

I'm just searching for ways to not create any inconsistencies between individual approaches, for the case that the ANTLR4 grammar may differ from the ANTLR3 one ...

trungdong added the prov-n label May 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PROV-N deserialization? #122

PROV-N deserialization? #122

MarcelPa commented Sep 5, 2018

trungdong commented Sep 5, 2018

MarcelPa commented Sep 6, 2018

trungdong commented Sep 6, 2018

MarcelPa commented Sep 12, 2018

trungdong commented Sep 13, 2018

MarcelPa commented Nov 17, 2018 •

edited

trungdong commented Nov 20, 2018

MarcelPa commented Mar 27, 2019

trungdong commented Mar 27, 2019

MarcelPa commented Mar 27, 2019

pohutukawa commented Jun 21, 2020

MarcelPa commented Jun 22, 2020

MarcelPa commented Jun 22, 2020

pohutukawa commented Jun 30, 2020

ChrisJMacdonald commented Dec 1, 2020

ChrisJMacdonald commented Dec 3, 2020 •

edited

pohutukawa commented Dec 10, 2020

PROV-N deserialization? #122

PROV-N deserialization? #122

Comments

MarcelPa commented Sep 5, 2018

trungdong commented Sep 5, 2018

MarcelPa commented Sep 6, 2018

trungdong commented Sep 6, 2018

MarcelPa commented Sep 12, 2018

trungdong commented Sep 13, 2018

MarcelPa commented Nov 17, 2018 • edited

trungdong commented Nov 20, 2018

MarcelPa commented Mar 27, 2019

trungdong commented Mar 27, 2019

MarcelPa commented Mar 27, 2019

pohutukawa commented Jun 21, 2020

MarcelPa commented Jun 22, 2020

MarcelPa commented Jun 22, 2020

pohutukawa commented Jun 30, 2020

ChrisJMacdonald commented Dec 1, 2020

ChrisJMacdonald commented Dec 3, 2020 • edited

pohutukawa commented Dec 10, 2020

MarcelPa commented Nov 17, 2018 •

edited

ChrisJMacdonald commented Dec 3, 2020 •

edited