You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
THE VERY SHORT VERSION: Translating XML to JSON seems to result in significant differences between the JSON and rendered website text.
I printed the JSON text data from https://github.com/spdx/license-list-data/blob/main/json/details/AGPL-1.0.json using a Rust program after applying the transformation of the \u2007 escaping sequence to a Rust-recognized \u{2007} sequence. Later experiments with JS REPLs seem to yield an exactly matching text output. I acquired this: LICENSE.txt. Yet this is different from what the website renders, because the website's rendered version looks like:
Note that both get the first line right and then start on the same second line but then disagree on the next three. The JSON data for `"licenseText" up to that point is the following:
"Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed."
I have excerpted this quote in a standard citational form but I have not added emphasis because, as the license says... changing it is not allowed. This suggests one of the two forms, the XML-encoded text, or the JSON string, is meaningfully incorrect, as they render to substantively different displayed text by typical renderers for their encoding.
I have no idea if this actually matters, of course. I am not a lawyer, this is not legal advice, etc. etc. etc. However, it seems that the generation of the JSON data from the XML masters may be dropping important formatting details, and it would not seem strange to me if a legal case, however frivolous-seeming, hinged on this difference, given how many cases have been decided on the presence or absence of commas.
This seems to have fellow issues in, but does not seem to be an exact duplicate of,
The reason why it does not seem to be an exact copy of #1924 is that it seems like all the data necessary to achieve a replication of the website's formatting is there in the XML, but not in the JSON, and that the checked-in test data seems to be derived from a JSONified-first form?
This could also be, say, an HTML vs. XML difference.
The text was updated successfully, but these errors were encountered:
The text in the JSON file actually come from a text file and not the XML.
For context, please refer to this pull request for the tool that generates the JSON and website from the XML and test data: spdx/LicenseListPublisher#83
If the JSON data is incorrect, then the test data is incorrect.
BTW - there is a flag in the LicenseListPublisher tool to generate the JSON file from the XML instead of the test data. If we change the switch, it will reopen many issues raised in the above mentioned pull request.
the AGPL-1.0 XML data is correct (or "more correct")
the AGPL-1.0 JSON data is incorrect
thus the AGPL-1.0 test data is incorrect
Obviously, no one is really using the AGPL 1.0 for new work right now, indeed as far as I am aware it was never very popular, and then the AGPL 3.0 happened only a few years later. But that was why I chose it as an initial test case: it's fairly easy to reference its canonical version, and I had, at the time, figured its lack of popularity meant there wouldn't be as much dispute over its exact contents, which is an issue that plagues e.g. MIT, the various BSD-N-clauses, etc.
THE VERY SHORT VERSION: Translating XML to JSON seems to result in significant differences between the JSON and rendered website text.
I printed the JSON text data from https://github.com/spdx/license-list-data/blob/main/json/details/AGPL-1.0.json using a Rust program after applying the transformation of the
\u2007
escaping sequence to a Rust-recognized\u{2007}
sequence. Later experiments with JS REPLs seem to yield an exactly matching text output. I acquired this: LICENSE.txt. Yet this is different from what the website renders, because the website's rendered version looks like:However, the JSON-tripped version is:
Note that both get the first line right and then start on the same second line but then disagree on the next three. The JSON data for `"licenseText" up to that point is the following:
The XML data looks like:
That is, it includes a pair of
<br/>
s here, one in each<p></p>
pair, which I believe is accounting for the rendered spacing on the website. This causes copying the version from the website to get a LICENSE-RIGHTCLICK.txt and running that through tools like askalono to return an inexact match, despite being, as far as I know, an exact copy!Note that the AGPL 1.0 has the clause:
"Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed."
I have excerpted this quote in a standard citational form but I have not added emphasis because, as the license says... changing it is not allowed. This suggests one of the two forms, the XML-encoded text, or the JSON string, is meaningfully incorrect, as they render to substantively different displayed text by typical renderers for their encoding.
I have no idea if this actually matters, of course. I am not a lawyer, this is not legal advice, etc. etc. etc. However, it seems that the generation of the JSON data from the XML masters may be dropping important formatting details, and it would not seem strange to me if a legal case, however frivolous-seeming, hinged on this difference, given how many cases have been decided on the presence or absence of commas.
This seems to have fellow issues in, but does not seem to be an exact duplicate of,
The reason why it does not seem to be an exact copy of #1924 is that it seems like all the data necessary to achieve a replication of the website's formatting is there in the XML, but not in the JSON, and that the checked-in test data seems to be derived from a JSONified-first form?
This could also be, say, an HTML vs. XML difference.
The text was updated successfully, but these errors were encountered: