Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add complete License-Text to cyclonedx bom #570

Open
andife opened this issue Aug 28, 2023 · 9 comments · May be fixed by #674
Open

feat: Add complete License-Text to cyclonedx bom #570

andife opened this issue Aug 28, 2023 · 9 comments · May be fixed by #674
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@andife
Copy link
Contributor

andife commented Aug 28, 2023

We have as a requirement in the CycloneDX json to include the complete license information/text provided.

My idea would have been to take this directly from the wheel. This contains this information in the *dist-info directory. There is a LICENSE file.

This information can also be accessed via "pip-licenses --with-license-file --format=json".

It would be nice if the designated area could be filled in the cyclonedx format (https://cyclonedx.org/docs/1.4/json/#components_items_licenses_items_license_text_content) for the license file.

@jkowalleck jkowalleck added the enhancement New feature or request label Oct 24, 2023
@nejch
Copy link

nejch commented Nov 27, 2023

I've implemented something like this, although I store the license texts in ComponentEvidence since it's more a result of analysis rather than a guaranteed license. Some packages contain multiple license files, for example. I'll try to upstream this here at some point if it's accepted.

I used pip-licenses as well in the past but moved to this approach for exactly the same reason as stated above :)

@jkowalleck
Copy link
Member

@nejch better wait with an implementation until the following were properly merged to master:

@jkowalleck jkowalleck changed the title Feature Request: Add complete License-Text to cyclonedx bom feat: Add complete License-Text to cyclonedx bom Jan 6, 2024
@jkowalleck
Copy link
Member

see #567

@jkowalleck jkowalleck added the help wanted Extra attention is needed label Jan 6, 2024
@jkowalleck
Copy link
Member

jkowalleck commented Feb 2, 2024

since v4 was published and released, feel free to contribute this feature.

as explained, the target of the "detected" license texts shall be component.evidence.licenses[] (https://cyclonedx.org/docs/1.5/json/#metadata_component_evidence_licenses)

example outcome:

{
  // ...
  "evidence": {
    "licenses": [
      {
        "name": "detected license text from file XZY",
        "text": {
           "contentType": "text/markdown",
           "encoding": "base64",
           "content": "IyBNSVQgTm8gQXR0cmlidXRpb24KCkNvcHlyaWdodCAyMDI0IEphbmUgRG9lCgpQZXJtaXNzaW9uIGlzIGhlcmVieSBncmFudGVkLCBbLi4uXQ=="
        }
      }
    ]
  } 
}

@jkowalleck jkowalleck added the good first issue Good for newcomers label Feb 2, 2024
@nejch
Copy link

nejch commented Feb 2, 2024

Thanks a lot, that's almost exactly what I have now although so far I didn't encode it:

            "evidence": {
                "licenses": [
                    {
                        "license": {
                            "name": "mkdocs-1.5.3.dist-info/licenses/LICENSE",
                            "text": {
                                "contentType": "text/plain",
                                "content": "Copyright \u00a9 2014-present, Tom Christie. All rights reserved.\n\nRedistribution and use in source and binary forms, with or\nwithout modification, are permitted provided that the following\nconditions are met:\n\nRedistributions of source code must retain the above copyright\nnotice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright\nnotice, this list of conditions and the following disclaimer in\nthe documentation and/or other materials provided with the\ndistribution.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND\nCONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES,\nINCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF\nMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR\nCONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\nSPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\nLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF\nUSE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED\nAND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\nLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN\nANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\nPOSSIBILITY OF SUCH DAMAGE.\n"
                            }
                        }
                    }
                ]
            }

@jkowalleck just wondering, does the current implementation have a mechanism to try and guess the license (spdx or generic name) from a text? I'd then potentially try to reuse it, might be useful to add in the name, even if not really guaranteed.

@nejch
Copy link

nejch commented Feb 2, 2024

Also note to self: Just noticed this package no longer has a public API. So I guess this functionality should actually go into https://github.com/CycloneDX/cyclonedx-python-lib, if we as users want to use it programmatically.

Edit: maybe not, as that only has the models etc 😅 hmm.

@jkowalleck
Copy link
Member

jkowalleck commented Feb 2, 2024

re #570 (comment)
@nejch

[...] although so far I didn't encode it

encoding the license text is not an option, it is mandatory, AFAIK.
you need to should get your implementation fixed.

PS: Edit: need to dig into the SBOM guide and check it it is actually required. At least it is what I thought, and I still encourage encoding the content, to prevent issues when embedding the text in the transport media(XML/JSON/ProtoBuff)

[...] a mechanism to try and guess the license (spdx or generic name) from a text?

Nope, not exactly. Guessing license identifiers based on text snippets is nothing that is planned for any python implementation. (This would bloat the library or depend on external services... However, there are tools that can do this already. e.g. https://github.com/CycloneDX/license-scanner)
A thing that exists is detecting of license names -- see the library here: cyclonedx.factory.license.LicenseFactory.make_from_string() -- https://cyclonedx-python-library.readthedocs.io/en/latest/autoapi/cyclonedx/factory/license/index.html#cyclonedx.factory.license.LicenseFactory.make_from_string

@nejch
Copy link

nejch commented Feb 2, 2024

@jkowalleck sure, this was mostly for internal use to display the SBOM angular-style, will ensure it's encoded before going upstream!

Thanks for the hint there, since some license contents start with the name itself as the first line, I'll see if that could be of some use but not 100%.

@jkowalleck
Copy link
Member

jkowalleck commented Mar 13, 2024

Acceptance criteria

  • the feature to add license texts should be enabled by a CLI switch called --gather-license-evidence (name to be discussed)
  • the feature is disabled per default
  • only if the feature is enabled:
    • license text detection
    • for all components, meta-components, root-components and nested components:
      regardless of SPDX license ID, SPDX license expression or named license, the license texts should be added as ...
      • for declared ones in wheel: the elements declared license -> .licenses[] - via fix: declared license texts as such, not as license name #694
      • for concluded ones in wheel: search for the usual file patterns -> .evidence.licenses[]
      • for sdist: the elements concluded from -> .evidence.licenses[]
        Examples:
        {
          //...
          "evidence": { 
            "licenses": [
              {"id":"Apache-2.0", "text": {
                "contentType": "text/plain",
                "encoding": "base64",
                // base64 of content of file `LICENSE`
                "content": "bG9yZW0gaXBzdW0="
              }}
              {"name":"file: NOTICE", "text": {
                "contentType": "text/plain",
                "encoding": "base64",
                // base46 of content of file `NOTICE`
                "content": "bG9yZW0gaXBzdW0="
              }}
            ]
          },
          // ...
        }
    • if a license text is detected with the package, it would be added to Component's @.evicence.licenses
      • @.name would be 'License of : '
      • @.text would hold the test
        • the content type is to be derived from file extension
        • the content SHOULD be base64 encoded
    • if no license text is shipped with a package, no license test is added as a evidence.
      Nope, no license template is derived from package's declared SPDX license id.
      Reason: license templates (like BSD clause 3) are designed to be modified (unlike others, like Apache2, which is not a template but a complete text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
3 participants