Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#147] Update readme #216

Merged
merged 2 commits into from
Nov 18, 2022
Merged

[#147] Update readme #216

merged 2 commits into from
Nov 18, 2022

Conversation

dcastro
Copy link
Member

@dcastro dcastro commented Nov 7, 2022

Description

Improved the readme and fixed several problems:

  • Mention support for GitLab - this is important and wasn't mentioned anywhere.
  • Add a FAQ clarifying how xrefcheck behaves in some important situations.
  • We don't need to get into a lot of detail about the syntax of the xrefcheck: ignore annotations, where they're allowed and where they're not. A general idea and a couple of examples are more than enough.
  • Added the backlink [↑](#xrefcheck) where it was missing.
  • Fixed inconsistent level headers: we we're using ### where we should be using ##
  • nix run should now be nix shell
  • Add a link to tests/configs/github-config.yaml which contains a list of all supported config options.
  • Instead of mentioning GitHub Actions in the "usage" section and nix in a separate section, mention everything in the "usage" section.
  • Fixed link to stack2cabal
  • Fixed typos and rephrased some bits.

Also noticed an issue with the dockerhub tags, so I fixed it while I'm here:

Problem: We have a pipeline step to tag docker images on dockerhub whenever a new version is released:

- command:
- nix-build docker
- nix run -f ci.nix pkgs.skopeo -c ./scripts/upload-docker-image.sh "docker-archive:$(readlink result)" "docker://docker.io/serokell/xrefcheck:${BUILDKITE_BRANCH}"
label: Push release to dockerhub
if: |
build.branch =~ /^v[0-9]+.*/

However, this doesn't seem to be working, dockerhub only contains the latest tag: https://hub.docker.com/r/serokell/xrefcheck/tags

The problem seems to be that the CI step is only triggered when it builds a branch with a name matching the regex /^v[0-9]+.*/. But we never use that format for branch names, so it's never triggered.

Solution:

  1. Change the CI step to trigger when it detects a tag with a version number
  2. Enable the "Build tags" option in buildkite: https://buildkite.com/serokell/xrefcheck/settings/repository

Related issue(s)

Fixes #147

✅ Checklist for your Pull Request

Ideally a PR has all of the checkmarks set.

If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).

Related changes (conditional)

  • Tests

    • If I added new functionality, I added tests covering it.
    • If I fixed a bug, I added a regression test to prevent the bug from
      silently reappearing again.
  • Documentation

    • I checked whether I should update the docs and did so if necessary:
  • Public contracts

    • Any modifications of public contracts comply with the Evolution
      of Public Contracts
      policy.
    • I added an entry to the changelog if my changes are visible to the users
      and
    • provided a migration guide for breaking changes if possible

Stylistic guide (mandatory)

✓ Release Checklist

  • I updated the version number in package.yaml.
  • I updated the changelog and moved everything
    under the "Unreleased" section to a new section for this release version.
  • (After merging) I edited the auto-release.
    • Change the tag and title using the format vX.Y.Z.
    • Write a summary of all user-facing changes.
    • Deselect the "This is a pre-release" checkbox at the bottom.
  • (After merging) I updated xrefcheck-action.
  • (After merging) I uploaded the package to hackage.

@dcastro dcastro marked this pull request as ready for review November 7, 2022 14:46
README.md Outdated
It is able to check multiple repositores at once if they are gathered in one folder.
Being written on JavaScript, it is fairly slow on large repositories.
Being written in JavaScript, it is fairly slow on large repositories.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Martoon-00 do we have any reference for this claim? Was this something you tried yourself (way back)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was my personal observation at that moment.

However given that we are planning a public release, we may need to verify this again 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would you mind if we delete this particular sentence? It's just that, even if we conclude that the application's slow, I don't think we have enough confidence to assert that it's slow because of being written in js :/ Also, "slow" can be subjective, and I think this section should be more objective when describing other people's projects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that making such statements does not sound too proper 🤔

On the other hand, if we are just left with the mention of this alternative solution without mentioning any weak point, then it's not clear why another tool (xrefcheck) was necessary.

From my experience, at least one of those JavaScript-based tools was noticeably slow. I had an impression that it didn't do any parallelization (thus the claim about JS being the reason, but such a claim is really invalid), and it took a dozen seconds to check a small-to-middle-sized repository that contained only local (!) links.

Given that, maybe we could form more objective statements here instead?

Copy link
Member Author

@dcastro dcastro Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I tried using remark on the xrefcheck repo, here are the steps to reproduce my experiment:

$ npm init
$ npm install remark-cli remark-lint-no-dead-urls remark-validate-links --save-dev

Edit package.json and add this:

  "remarkConfig": {
    "plugins": [
      "remark-lint-no-dead-urls",
      "remark-validate-links"
    ]
  }

Run it and time it:

$ time npm run env -- remark . --frail --ignore-pattern 'tests/markdowns/**/*' --ignore-pattern 'tests/golden/**/*'

It seems to take about 8s on average to verify the xrefcheck repo.

The xrefcheck tools takes about 6~8s, so it's not a huge difference :/

However, I did notice that it does not handle "429 Too Many Requests" - so eventually links to github.com start failing and the tool reports false positives.

I replaced our claim about it being slow with a new claim. And also added "Resilience" as one of xrefcheck's aims.

Please have a look at the latest fixup and let me know what you think ^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (learned new English words, e.g. resilience). As I understand there was no waiting for 429's during speed comparison (since default waiting interval is 30s) so it was fair test . Maybe one could compare speed on a repository with many local links since your test looks like competition of network libraries.

Copy link
Member Author

@dcastro dcastro Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a file with 240 local links, re-ran the experiment and this time also ignored CHANGES.md (where most of our external links are). xrefcheck took 1.5-2s, remark took 2.5-3s. So not a huge difference here either :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this exploration!

I also wanted to see a check for local links only, but it turns out to be not faster...

So yeah, the rewrite of the claim looks good 👍

xrefcheck took 1.5-2s

This is quite suspicious though, to me it looks like there is nothing that forced the time to be more than a portion of second.

Initially, I wanted to make xrefcheck instant when checking local links (so that several seconds time demonstrated by the other tools seemed really long). Most user's mistakes come from local links (external links are usually copy-pasted), and a shorter time would facilitate documentation writing a lot.

Created #219 to investigate this later.

If after optimizations xrefcheck takes like 0.4s on your example, how do you think, would it be correct to include the mention of xrecheck being faster? Not in vague and incorrect terms as before, but referring to our experiment showing that remark takes X seconds and xrefcheck takes Y?

Copy link
Member Author

@dcastro dcastro Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite suspicious though, to me it looks like there is nothing that forced the time to be more than a portion of second.

Oh I just remember I compiled xrefcheck with no optimizations (stack install --fast) ^^ Just tried again with stack install, but got very similar results (1.3~1.5secs) :/

Another interesting data point: I re-ran the experiment, this time with --mode local-only. Like before, I added this file with 242 local links to the xrefcheck repo (don't forget git add a.md), and then ran:

$ time xrefcheck --ignore 'tests/markdowns/**/*' --ignore 'tests/golden/**/*' --ignore CHANGES.md --mode local-only
1.05s user 0.28s system 370% cpu 0.358 total

0.3 secs 😮

Even the 0.75 seconds you observed in morley in #219 is in my opinion really fast 😅

Regarding #219: I'm tempted to say that xrefcheck is pretty much as fast as it can be.
I suspect (but am not 100% sure) the primary bottleneck is in the network when checking external links (and is the reason for the 1.5-2 secs I saw earlier) and the secondary bottleneck in disk reads, and we can't do anything about either of those :/
Any performance gains we can squeeze out will (I think) be very marginal, and unnoticeable by a real world user.


If after optimizations xrefcheck takes like 0.4s on your example, how do you think, would it be correct to include the mention of xrecheck being faster? Not in vague and incorrect terms as before, but referring to our experiment showing that remark takes X seconds and xrefcheck takes Y?

I'd be okay with that, but if we decide to do that, then I think it would be fair to construct a proper reproducible benchmark and compare xrefcheck against the other alternatives we mentioned in the readme (not just remark), and then publish a table with the results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that likely I/O takes the most part of the time and we don't be able to introduce any significant performance improvement, unless there is some place in the verification logic that appears to be very non optimal.
If I fail to find such places, it would be reasonable to abort the issue.

then I think it would be fair to construct a proper reproducible benchmark and compare xrefcheck against the other alternatives we mentioned in the readme

Really, agreed.

README.md Show resolved Hide resolved
README.md Outdated
It is able to check multiple repositores at once if they are gathered in one folder.
Being written on JavaScript, it is fairly slow on large repositories.
Being written in JavaScript, it is fairly slow on large repositories.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was my personal observation at that moment.

However given that we are planning a public release, we may need to verify this again 🤔

Problem: We have a pipeline step to tag docker images on dockerhub
whenever a new version is released:
https://github.com/serokell/xrefcheck/blob/7dd5c4c3c954a531b5cad89857f31b27245f0ef9/.buildkite/pipeline.yml#L51-L56

However, this doesn't seem to be working, dockerhub only contains the
`latest` tag: https://hub.docker.com/r/serokell/xrefcheck/tags

The problem *seems* to be that the CI step is only triggered when it
builds a branch with a name matching the regex `/^v[0-9]+.*/`. But we
never use that format for branch names, so it's never triggered.

Solution:
1. Change the CI step to trigger when it detects a tag with a version
   number
2. Enable the "Build tags" option in buildkite:
   https://buildkite.com/serokell/xrefcheck/settings/repository
Improved the readme and fixed several problems:
* Mention support for GitLab - this is important and wasn't mentioned
  anywhere.
* Add a FAQ clarifying how xrefcheck behaves in some important
  situations.
* We don't need to get into a lot of detail about the syntax of the
  `xrefcheck: ignore` annotations, where they're allowed and where
  they're not. A general idea and a couple of examples are more than
  enough.
* Added the backlink `[↑](#xrefcheck)` where it was missing.
* Fixed inconsistent level headers: we we're using `###` where we should
  be using `##`
* `nix run` should now be `nix shell`
* Add a link to `tests/configs/github-config.yaml` which contains a list
  of all supported config options.
* Instead of mentioning GitHub Actions in the "usage" section and nix in
  a separate section, mention everything in the "usage" section.
* Fixed link to `stack2cabal`
* Fixed typos and rephrased some bits.
@dcastro dcastro merged commit a534c5f into master Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revise README
3 participants