Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Non-ASCII quotations mishmash #2417

Closed
jb2170 opened this issue Mar 3, 2024 · 4 comments
Closed

Question: Non-ASCII quotations mishmash #2417

jb2170 opened this issue Mar 3, 2024 · 4 comments

Comments

@jb2170
Copy link

jb2170 commented Mar 3, 2024

Hi! My issue / question revolves mainly around the CC-BY-NC-SA-4.0 license, though I guess it applies generally. It's about the (un?)importance of non-ASCII characters in international licenses.

Context

Up until a few weeks ago, Arch Linux's licenses package used to include two main folders for /usr/share/licenses/. One is the spdx folder which persists today, and sources its files from the spdx/license-list-data repo. The other, named common, now removed since a commit to the Arch package on GitLab by @dvzrv, used to consist of some common licenses like the GPLs. What was nice about some files under common is that they consisted of only ASCII codepoints. Indeed the version of GPL-3.0 that I use in all of my programming repositories is that version, and is the same (md5 checksum-wise) version still offered on GNU's website. For example on line four it uses (C) instead of © which spdx uses.

In particular my issue focuses on the spdx CC that I seek to use, which seems to have a chaotic mixing of (U+201C) (line 13), (U+201D) (line 13), and " (U+0022) (line 45), as well as (U+2019) (line 32) and ' (U+0027) (line 63), and some mixed dashes vs -. So too the creative commons version seems to use both double-quotes types, but in different places, using one set of non-ASCII double-quotes towards the end.

Question / remarks

My question is therefore "If the quotes differ between license vendors, does it even matter? Can I carefully swap all the non-ASCII codepoints for their ASCII version, and does it have a legal effect?" I am not a lawyer, and I am not seeking strict legal advice, I'm just wondering if I'm being an extreme pedant, or if the mix of quoting styles is there for a reason. It seems unchanged from the second ever commit to this repository (if it's not broken don't fix it). I am naively suspicious that slight punctuation differences could have consequences, therefore I am sticking with the spdx version for now.

I hope this doesn't come across as too rudely English-centric either, yearning for simple ASCII-only files. I can appreciate internationali(s|z)ed versions having say guillemets, and hey, I'm British so I have a non-ASCII £ on my keyboard! It's just about simplicity, Unix / KISS philosophy, important files being uncomplicated, with all glyphs reproducible with a US keyboard, and displayable on a limited tty with a 7/8-bit character set.

See also #1893

(I am very aware that the maintenance of Arch Linux packages is orthogonal to spdx's work, it's just for context 😄 )

@richardfontana
Copy link
Contributor

Just an aside, it is interesting to learn about Arch's approach to inclusion of license files since at some point the Fedora Project will need to reassess how it handles that. 🤓

@xsuchy
Copy link
Collaborator

xsuchy commented Mar 25, 2024

IMO this should be implemented in software that match the data while honoring https://spdx.github.io/spdx-spec/v2.3/license-matching-guidelines-and-templates/#b63-guideline-hyphens-dashes i.e., all kinds of hyphens, dashes and quotes are equivalent.

But the "(C) vs. ©" is a bit different. I would rather not solve it by general rule, but rather but 'alt' markup.

@jlovejoy
Copy link
Member

Hi @jb2170 Thanks for the question!

SPDX has matching guidelines for the purposes of defining a license "match" - see https://spdx.github.io/spdx-spec/v2.3/license-matching-guidelines-and-templates/

Regarding your question about different styles of quote, that is covered in B.6.4 Guideline: Quotes
Any variation of quotations (single, double, curly, etc.) should be considered equivalent.
As @xsuchy already noted, we have a similar guideline for hyphens, dashes, etc.

Those differences are not legally significant in my opinion ;)

The reason that other punctuation needs to match is that the placement of a comma or period can change the meaning. We have recently used markup to indicate a specific comma is optional where we determined it did not/could not change the meaning. (how's that for pedantic!)

@jlovejoy jlovejoy added this to the 3.24 milestone Apr 11, 2024
@jlovejoy
Copy link
Member

I'm going to close this now, as I don't think there is any action associated with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants