New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache-2.0 license text without appendix is a closer match for "Pixar" than "Apache-2.0" since v3.23 #2418
Comments
Hi @decathorpe, thanks for sharing this! I'm not familiar with the askalono tool, so I'm making some assumptions here. Note that (to my knowledge) the SPDX project doesn't really have a concept of "closeness" matching of licenses. The SPDX matching guidelines define whether something does or doesn't match as a binary matter, but don't define a percentage match, etc. I'm not sure whether askalono is implementing SPDX's matching guidelines or some other process for comparison. If it isn't using the matching guidelines, then this might be more of a question or issue for that tool's author. That said, I do have a couple of thoughts for how we could improve this for you: As currently structured, Apache-2.0 has three separate bits in the "optional" section at the end of the license:
These are currently all contained within one I think we can fix this by splitting 1 into its own @jlovejoy do you have any concerns with this approach? @decathorpe As described above, this should help with the problem you're seeing from an SPDX matching guidelines perspective; but I'll have to defer to the askalono tool author in terms of how their tool makes use of the license list data and templates. |
Thank you for the detailed response! Indeed, it looks like the problem might be because it's a single block, but I would need to look at how askalono does its matching / scoring to be able to tell for sure. I didn't know that there was no "canonical" definition for matching license texts to SPDX data and that askalono apparently defines its own "closeness" score. I can try reaching out to the author to see if they have any further insights here. |
thanks both - agreed @swinslow with the idea of splitting 1 into its own block, separate from 2+3 together. I believe we have other situations with separate sections along these lines |
@swinslow - were you going to make a PR for this? |
Hi @jlovejoy, yes, I'll put together a PR shortly. |
Many projects in Fedora Linux are using askalono to match license files against SPDX. Since we updated the SPDX data to version 3.23, we're getting somewhat unexpected results for files that are definitely Apache-2.0 license texts. If the "APPENDIX" is present, then matches are reported with a score of
1.000
.But if the (optional? looking at https://spdx.org/licenses/Apache-2.0) appendix is not present in the file, it is now reported as a closer match to the newly added "Pixar" license (with score
0.983
) despite being a perfect match to the Apache-2.0 license text without appendix.For example, the LICENSE-APACHE file from the Rust programming language (which is definitely Apache-2.0, not Pixar) is now classified as "Pixar" because it's apparently a closer match: https://github.com/rust-lang/rust/blob/master/LICENSE-APACHE
So it appears that there's some issue with the matching data, either for Apache-2.0 or the newly added Pixar license.
The text was updated successfully, but these errors were encountered: