Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub links semi-regularly fail incorrectly #138

Closed
choldgraf opened this issue Jul 1, 2022 · 8 comments
Closed

GitHub links semi-regularly fail incorrectly #138

choldgraf opened this issue Jul 1, 2022 · 8 comments

Comments

@choldgraf
Copy link

Something that I have noticed with this action is that links to GitHub issues / people / searches / etc seem to unreliably fail, even though clicking the link manually resolves just fine.

As an example, the summary in this action returned Failed: Network Error for all of the following valid GitHub usernames:

I've run into similar issues when linking to GitHub issue search queries, or specific issues.

I'm running the action with --insecure via the configuration here:

https://github.com/executablebooks/meta/blob/ba8d5f908381b6256dca53afc85ad50c7d8bc405/.github/workflows/linkcheck.yml#L21-L37

Do you have any idea why this is happening or a way around it?

@mre
Copy link
Member

mre commented Jul 1, 2022

You're probably running into the Github rate limiter.
I'm working on a proper solution which will respect rate limit headers and wait accordingly, but in the meantime here are a few workarounds you can try: lycheeverse/lychee#634 (comment).

@mre
Copy link
Member

mre commented Jul 1, 2022

On a related note, we really should add the status code back to the Markdown output to make debugging cases like this easier in the future. I broke that during refactoring.

@choldgraf
Copy link
Author

ahhh that would totally make sense!

hmmm this is another reason to just run this job once a day or something instead of in every PR...I have been holding off on doing that until #74 is resolved, since the other actions for creating issues don't update a pre-existing issue, they create new issues only, and I don't want like 30 issues open in a month that we get lazy and don't have the time to fix our links 😅

@polarathene
Copy link

polarathene commented Jul 2, 2022

You could also use --cache which would make a big difference if you're experiencing rate-limiting from high frequency of running the action with PR activity. But presently --cache is broken with v0.10.0, you'd need to specify an older release for the time being.


You can also detect changes introduced by the PR and minimize requests by only check those specific related inputs (it's better to check the whole input/file, rather than only changed lines). For http / https links, that should be fairly reliable 👍

Presently, you need to do some extra transforms but it should be simpler in future:

Examples

Manually generating the filepaths to check:

Click to view YAML
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      # Compares the default generated "test merge commit" by Github (HEAD) to the base branch (HEAD^1)
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      # Gets a list of files from the PR with a git status of Added (A) or Modified (M)
      - name: 'Example - Get a list of changed files to process'
        id: changed-files
        run: |
          CHANGED_FILES=$(git diff-tree --name-only --diff-filter 'AM' -r HEAD^1 HEAD | sed -z "s/\n$//;s/\n/' '/g")
          echo "::set-output name=all_changed_files::${CHANGED_FILES}"

      - name: 'Check Links'
        id: lychee
        uses: lycheeverse/lychee-action@v1.5.0
        with:
          args: --verbose --no-progress -- '${{ steps.changed-files.outputs.all_changed_files }}'

Or delegate to tj-actions/changed-files action to get the file paths, in future the post-processing step can likely be dropped and the action output given to a separate lychee-action input:

Click to view YAML
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      # Compares the default generated "test merge commit" by Github of the PR branch into the base branch
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      # Gets a list of files from the PR with a git status of Added (A) or Modified (M) | (via outputs.all_changed_files)
      - name: 'Example - Get a list of changed files to process'
        id: get-changed-files
        uses: tj-actions/changed-files@v23
        with:
          separator: ','

      # lychee-action requires pre-processing for file paths with spaces (boundary quotes added in lychee-action input):
      - name: 'Compatibility with eval: Quote wrap paths to support spaces'
        id: changed-files
        run: |
          QUOTE_WRAPPED="$( sed "s/,/' '/g" <<< '${{ steps.get-changed-files.outputs.all_changed_files }}' )"
          echo "::set-output name=all_changed_files::${QUOTE_WRAPPED}"

      - name: 'Check Links'
        id: lychee
        uses: lycheeverse/lychee-action@v1.5.0
        with:
          args: --verbose --no-progress -- '${{ steps.changed-files.outputs.all_changed_files }}'

If you are certain that PRs won't contain file paths with spaces, you can already skip the post-processing step with the action (manual example must still convert \n to space):

Click to view YAML
jobs:
  build:
    runs-on: ubuntu-latest
    name: Test changed-files
    steps:
      # Compares the default generated "test merge commit" by Github of the PR branch into the base branch
      - uses: actions/checkout@v3
        with:
          fetch-depth: 2

      # Gets a list of files from the PR with a git status of Added (A) or Modified (M) | (via outputs.all_changed_files)
      - name: 'Example - Get a list of changed files to process'
        id: get-changed-files
        uses: tj-actions/changed-files@v23

      - name: 'Check Links'
        id: lychee
        uses: lycheeverse/lychee-action@v1.5.0
        with:
          args: --verbose --no-progress -- ${{ steps.changed-files.outputs.all_changed_files }}

@choldgraf
Copy link
Author

this is super helpful, thanks so much for putting the time to provide these examples :-)

I'll experiment with using only the files changed in the PR. Though one thing I've found is that, more often than not, this action finds failures due to link targets changing, not an incorrectly-changed link in our documentation source, so it is often quite helpful to check all of the links now and then. Maybe the best approach is like once a month check all the links, and in PRs only check links for the changed files.

@polarathene
Copy link

more often than not, this action finds failures due to link targets changing, not an incorrectly-changed link in our documentation source, so it is often quite helpful to check all of the links now and then.

I have been holding off on doing that until #74 is resolved, since the other actions for creating issues don't update a pre-existing issue, they create new issues only, and I don't want like 30 issues open in a month

That issue is about lychee-action integrating peter-evans/create-issue-from-file under the hood as a composite action. You could still use it individually though as one of the examples show for the "before" approach.

The composite approach as shown for the "after" version would expose an arg to create/update an issue with the results, just optionally opt-in for using that action internally. You can still use it separately without waiting on lychee-action to integrate it, to update an existing issue you would provide it to that actions issue-number arg.

Looking up the issue number depends how you manage the reports. If it's a static issue that is updated and opened/closed with reports, that's fairly simple, otherwise if you close the issue and want a new issue opened for the next report (until it's closed, your workflow keeps updating that open issue), then you'd need a way to lookup/identify it. That could be by title, label, and/or other metadata, some actions provide this functionality, here's an action that focuses on just that task and shows an example paired with peter-evans/create-issue-from-file.

@mre
Copy link
Member

mre commented Jul 27, 2022

FYI: The cache issue is resolved in master and v1.5.1.

@mre
Copy link
Member

mre commented Aug 11, 2022

So I guess we can close this unless I'm missing something, in which case feel free to reopen. Thanks for the collaboration everyone. 🎉

@mre mre closed this as completed Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants