Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache-2.0 license text without appendix is a closer match for "Pixar" than "Apache-2.0" since v3.23 #2418

Closed
decathorpe opened this issue Mar 9, 2024 · 5 comments · Fixed by #2480
Assignees
Labels
XML markup change potential change or addition to XML markup in license
Milestone

Comments

@decathorpe
Copy link

Many projects in Fedora Linux are using askalono to match license files against SPDX. Since we updated the SPDX data to version 3.23, we're getting somewhat unexpected results for files that are definitely Apache-2.0 license texts. If the "APPENDIX" is present, then matches are reported with a score of 1.000.

But if the (optional? looking at https://spdx.org/licenses/Apache-2.0) appendix is not present in the file, it is now reported as a closer match to the newly added "Pixar" license (with score 0.983) despite being a perfect match to the Apache-2.0 license text without appendix.

For example, the LICENSE-APACHE file from the Rust programming language (which is definitely Apache-2.0, not Pixar) is now classified as "Pixar" because it's apparently a closer match: https://github.com/rust-lang/rust/blob/master/LICENSE-APACHE

So it appears that there's some issue with the matching data, either for Apache-2.0 or the newly added Pixar license.

@swinslow
Copy link
Member

Hi @decathorpe, thanks for sharing this!

I'm not familiar with the askalono tool, so I'm making some assumptions here. Note that (to my knowledge) the SPDX project doesn't really have a concept of "closeness" matching of licenses. The SPDX matching guidelines define whether something does or doesn't match as a binary matter, but don't define a percentage match, etc.

I'm not sure whether askalono is implementing SPDX's matching guidelines or some other process for comparison. If it isn't using the matching guidelines, then this might be more of a question or issue for that tool's author.

That said, I do have a couple of thoughts for how we could improve this for you:

As currently structured, Apache-2.0 has three separate bits in the "optional" section at the end of the license:

  1. the "END OF TERMS AND CONDITIONS" lead-in;
  2. the next two paragraphs after "APPENDIX"; and
  3. the <standardLicenseHeader> section.

These are currently all contained within one <optional> block. That means that (under the SPDX matching guidelines) either all of these would need to be present or else all would need to be absent, in order for a license text to match to Apache-2.0. I'm guessing that's the problem for something like the rust-lang example; it has 1 but not 2 and 3.

I think we can fix this by splitting 1 into its own <optional> block, separate from 2+3 together.

@jlovejoy do you have any concerns with this approach?

@decathorpe As described above, this should help with the problem you're seeing from an SPDX matching guidelines perspective; but I'll have to defer to the askalono tool author in terms of how their tool makes use of the license list data and templates.

@swinslow swinslow added the XML markup change potential change or addition to XML markup in license label Mar 13, 2024
@swinslow swinslow added this to the 3.24 milestone Mar 13, 2024
@swinslow swinslow self-assigned this Mar 13, 2024
@decathorpe
Copy link
Author

Thank you for the detailed response!

Indeed, it looks like the problem might be because it's a single block, but I would need to look at how askalono does its matching / scoring to be able to tell for sure.

I didn't know that there was no "canonical" definition for matching license texts to SPDX data and that askalono apparently defines its own "closeness" score. I can try reaching out to the author to see if they have any further insights here.

@jlovejoy
Copy link
Member

thanks both - agreed @swinslow with the idea of splitting 1 into its own block, separate from 2+3 together. I believe we have other situations with separate sections along these lines

@jlovejoy
Copy link
Member

@swinslow - were you going to make a PR for this?

@swinslow
Copy link
Member

Hi @jlovejoy, yes, I'll put together a PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
XML markup change potential change or addition to XML markup in license
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants