Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License header check #364

Merged
merged 1 commit into from May 23, 2024
Merged

Conversation

wuchunfu
Copy link
Member

@wuchunfu wuchunfu commented Mar 2, 2024

License header check

@wuchunfu wuchunfu mentioned this pull request Mar 3, 2024
@ashvina
Copy link
Contributor

ashvina commented Mar 3, 2024

I found this discussion thread related to this header audit tool: https://lists.apache.org/thread/mcw0l9g71yzvcftkppnrxspz7skdgglz

I am familiar with Apache Rat as well and would love to know its limitations.

.github/workflows/license-check.yaml Outdated Show resolved Hide resolved
@wuchunfu
Copy link
Member Author

wuchunfu commented Mar 3, 2024

I found this discussion thread related to this header audit tool: https://lists.apache.org/thread/mcw0l9g71yzvcftkppnrxspz7skdgglz

I am familiar with Apache Rat as well and would love to know its limitations.

  • 1、Limited degree of automation:

Apache Rate is mainly based on predefined rules and regular expressions to search for license text, which may not accurately recognize complex or non-standard license statements.

  • 2、Relying on manual judgment:

Rat cannot automatically determine whether a file indeed needs to contain a certain authorization statement, and in some cases, it may need to be manually reviewed and decided based on specific circumstances.

  • 3、Not supported for all file types:

Rat may not be able to parse and check the content of all types of files, especially for some special encoding formats or binary files.

  • 4、Configuration complexity:

For large-scale projects or projects with special authorization requirements, configuring the Rate to adapt to all situations may be complex and require the writing of additional metadata or rule files.

  • 5、Continuous integration difficulty:

When integrating Rat into the CI/CD process, additional work may be required to handle report results and ensure that violations can be detected and fixed in a timely manner.

@wuchunfu wuchunfu requested a review from ashvina March 3, 2024 06:12
Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project uses spotless-maven-plugin to enforce/check the LICENSE header as part of the build. Why do we need a separate CI action and additional tooling?

@wuchunfu
Copy link
Member Author

wuchunfu commented Mar 4, 2024

The project uses spotless-maven-plugin to enforce/check the LICENSE header as part of the build. Why do we need a separate CI action and additional tooling?

@zabetak Because some non Java files, spotless-maven-plugin, cannot automatically add license headers

@wuchunfu wuchunfu requested a review from zabetak March 4, 2024 09:16
@zabetak
Copy link
Contributor

zabetak commented Mar 4, 2024

@wuchunfu Spotless is not limited to Java as far as I know. Indeed the current configuration of the plugin only deals with Java files but this can change I believe.

The current PR adds almost all non-java files to paths-ignore so I am not sure what extra checks are added as part of this proposal.

@wuchunfu
Copy link
Member Author

wuchunfu commented Mar 4, 2024

@wuchunfu Spotless is not limited to Java as far as I know. Indeed the current configuration of the plugin only deals with Java files but this can change I believe.

The current PR adds almost all non-java files to paths-ignore so I am not sure what extra checks are added as part of this proposal.

@zabetak I have tried several methods, but have not been successful. May I know which projects have been done in this way? Can you provide some example projects?

@zabetak
Copy link
Contributor

zabetak commented Mar 5, 2024

@wuchunfu I don't have some sample open-source project in mind but it appears that the license-header step is a generic step that could be applied in a variety of formats according to the documentation.

For instance for .md files adding the following configuration should do the trick:

<markdown>
  <includes>
      <include>**/*.md</include>
  </includes>
  <licenseHeader>
      <file>style/md-license-header</file>
      <delimiter>#|---</delimiter>
  </licenseHeader>
</markdown>

It's probably necessary to create appropriate configurations for different file types and some things could obviously be grouped together.

I see the same discussion spanning under #359 so not sure where is the best place to continue discussing.

@wuchunfu
Copy link
Member Author

wuchunfu commented Mar 6, 2024

@wuchunfu I don't have some sample open-source project in mind but it appears that the license-header step is a generic step that could be applied in a variety of formats according to the documentation.

For instance for .md files adding the following configuration should do the trick:

<markdown>
  <includes>
      <include>**/*.md</include>
  </includes>
  <licenseHeader>
      <file>style/md-license-header</file>
      <delimiter>#|---</delimiter>
  </licenseHeader>
</markdown>

It's probably necessary to create appropriate configurations for different file types and some things could obviously be grouped together.

I see the same discussion spanning under #359 so not sure where is the best place to continue discussing.

@zabetak I agree with your suggestion, thank you for your guidance. I will use spotless-maven-plugin later to implement the automatic addition of license header function

@wuchunfu
Copy link
Member Author

@zabetak Although #359 pr has already done this with spotless-maven-plugin, I think it is still necessary to use skywalking-eyes to check because spotless-maven-plugin has already added a license header to the existing files. If new types of files are added later to prevent omissions

@jcamachor
Copy link
Contributor

@xtable-bot run azure

@zabetak
Copy link
Contributor

zabetak commented Mar 11, 2024

@wuchunfu Guarding against missing licenses in new file types is a good idea. If the CI action is light enough then we can revisit this PR once #359 is finalized. We should decide though if we want to go with RAT and enforce it also at build time or with skywalking-eyes and enforce it only in CI.

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spotless plugin will fail if the header is missing, right? We are running that as part of our regular CI so we shouldn't need this anymore?

@wuchunfu
Copy link
Member Author

The spotless plugin will fail if the header is missing, right? We are running that as part of our regular CI so we shouldn't need this anymore?

In some known rules, we can check through mvn spotless:check. If we add files other than those we already know, this plugin seems unable to check

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchunfu Can you add a quick section to the README or something that gives some quick pointers to how to update the spotless plugin if we come across a new file type we need to incorporate into that auto-formatter?

@the-other-tim-brown
Copy link
Contributor

@wuchunfu Can you add a quick section to the README or something that gives some quick pointers to how to update the spotless plugin if we come across a new file type we need to incorporate into that auto-formatter?

@wuchunfu could you add this?

@ashvina do you have any other feedback for the PR?

@wuchunfu
Copy link
Member Author

@wuchunfu Can you add a quick section to the README or something that gives some quick pointers to how to update the spotless plugin if we come across a new file type we need to incorporate into that auto-formatter?

@wuchunfu could you add this?

@ashvina do you have any other feedback for the PR?

@the-other-tim-brown Thank you for your reminder. I will organize it and write it in readme later

@wuchunfu
Copy link
Member Author

@the-other-tim-brown PTAL, thanks

@@ -99,6 +99,36 @@ The custom hadoop configurations can be passed in with the `--hadoopConfig [cust
The config in custom hadoop config file will override the default hadoop configurations. For an example
of a custom hadoop config file, see [hadoop.xml](https://xtable.apache.org/docs/fabric#step-2-translate-source-table-to-delta-lake-format-using-apache-xtable-incubating).

# New file type spotless automatic formatting and modification operation

When a new type of file is merged into the warehouse, it is possible that the `spotless-maven-plugin` will not add the license header by default, and when it is detected by ci and prompted to add the `license header`, what we have to do is to modify the project. The modified part of `pom.xml` under the root path is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a new type of file is merged into the warehouse, it is possible that the `spotless-maven-plugin` will not add the license header by default, and when it is detected by ci and prompted to add the `license header`, what we have to do is to modify the project. The modified part of `pom.xml` under the root path is as follows:
When a new type of file is added to the repo, it is possible that the `spotless-maven-plugin` will not add the license header by default. This is detected by the `License Check` GitHub Action. When this happens the developer should must modify part of `pom.xml` under the root path is as follows:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchunfu here are a few reasons for my suggestion:

  1. Warehouse is a commonly used term in the data space for a Data Warehouse (like Snowflake and BigQuery). It would be best to make sure we avoid confusion.
  2. Adding the name of the check in CI can help people know what to look for.

@wuchunfu
Copy link
Member Author

@xtable-bot run azure

@wuchunfu
Copy link
Member Author

@the-other-tim-brown PTAL, thanks

@the-other-tim-brown
Copy link
Contributor

@ashvina any feedback?

@wuchunfu
Copy link
Member Author

@the-other-tim-brown PTAL, thanks

@ashvina
Copy link
Contributor

ashvina commented May 22, 2024

I have no significant concerns with SkyWalking Eyes. In conjunction with Spotless, it could assist in maintaining code compliance. We might consider revisiting this plugin if the combination of the two begins to pose a problems or overheads.

Copy link
Contributor

@the-other-tim-brown the-other-tim-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchunfu can you squash the commits for us?

@wuchunfu
Copy link
Member Author

@wuchunfu can you squash the commits for us?

@the-other-tim-brown done

@xtable-bot
Copy link

CI report:

Bot commands @xtable-bot supports the following commands:
  • @xtable-bot run azure re-run the last Azure build

@the-other-tim-brown the-other-tim-brown merged commit ae37054 into apache:main May 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants