[#147] Update readme #216

dcastro · 2022-11-07T14:41:46Z

Description

Improved the readme and fixed several problems:

Mention support for GitLab - this is important and wasn't mentioned anywhere.
Add a FAQ clarifying how xrefcheck behaves in some important situations.
We don't need to get into a lot of detail about the syntax of the xrefcheck: ignore annotations, where they're allowed and where they're not. A general idea and a couple of examples are more than enough.
Added the backlink [↑](#xrefcheck) where it was missing.
Fixed inconsistent level headers: we we're using ### where we should be using ##
nix run should now be nix shell
Add a link to tests/configs/github-config.yaml which contains a list of all supported config options.
Instead of mentioning GitHub Actions in the "usage" section and nix in a separate section, mention everything in the "usage" section.
Fixed link to stack2cabal
Fixed typos and rephrased some bits.

Also noticed an issue with the dockerhub tags, so I fixed it while I'm here:

Problem: We have a pipeline step to tag docker images on dockerhub whenever a new version is released:

xrefcheck/.buildkite/pipeline.yml

Lines 51 to 56 in 7dd5c4c

- command:

- nix-build docker

- nix run -f ci.nix pkgs.skopeo -c ./scripts/upload-docker-image.sh "docker-archive:$(readlink result)" "docker://docker.io/serokell/xrefcheck:${BUILDKITE_BRANCH}"

label: Push release to dockerhub

if: |

build.branch =~ /^v[0-9]+.*/

However, this doesn't seem to be working, dockerhub only contains the latest tag: https://hub.docker.com/r/serokell/xrefcheck/tags

The problem seems to be that the CI step is only triggered when it builds a branch with a name matching the regex /^v[0-9]+.*/. But we never use that format for branch names, so it's never triggered.

Solution:

Change the CI step to trigger when it detects a tag with a version number

Enable the "Build tags" option in buildkite: https://buildkite.com/serokell/xrefcheck/settings/repository

Related issue(s)

Fixes #147

✅ Checklist for your Pull Request

Ideally a PR has all of the checkmarks set.

If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).

Related changes (conditional)

Tests
- If I added new functionality, I added tests covering it.
- If I fixed a bug, I added a regression test to prevent the bug from
  silently reappearing again.
Documentation
- I checked whether I should update the docs and did so if necessary:
  - README
  - Haddock
Public contracts
- Any modifications of public contracts comply with the Evolution
  of Public Contracts policy.
- I added an entry to the changelog if my changes are visible to the users
  and
- provided a migration guide for breaking changes if possible

Stylistic guide (mandatory)

My commits comply with the policy used in Serokell.
My code complies with the style guide.

✓ Release Checklist

I updated the version number in package.yaml.
I updated the changelog and moved everything
under the "Unreleased" section to a new section for this release version.
(After merging) I edited the auto-release.
- Change the tag and title using the format vX.Y.Z.
- Write a summary of all user-facing changes.
- Deselect the "This is a pre-release" checkbox at the bottom.
(After merging) I updated xrefcheck-action.
(After merging) I uploaded the package to hackage.

dcastro · 2022-11-07T14:49:45Z

README.md

  It is able to check multiple repositores at once if they are gathered in one folder.
-  Being written on JavaScript, it is fairly slow on large repositories.
+  Being written in JavaScript, it is fairly slow on large repositories.


@Martoon-00 do we have any reference for this claim? Was this something you tried yourself (way back)?

Yep, that was my personal observation at that moment.

However given that we are planning a public release, we may need to verify this again 🤔

I see. Would you mind if we delete this particular sentence? It's just that, even if we conclude that the application's slow, I don't think we have enough confidence to assert that it's slow because of being written in js :/ Also, "slow" can be subjective, and I think this section should be more objective when describing other people's projects.

I agree that making such statements does not sound too proper 🤔

On the other hand, if we are just left with the mention of this alternative solution without mentioning any weak point, then it's not clear why another tool (xrefcheck) was necessary.

From my experience, at least one of those JavaScript-based tools was noticeably slow. I had an impression that it didn't do any parallelization (thus the claim about JS being the reason, but such a claim is really invalid), and it took a dozen seconds to check a small-to-middle-sized repository that contained only local (!) links.

Given that, maybe we could form more objective statements here instead?

So, I tried using remark on the xrefcheck repo, here are the steps to reproduce my experiment:

$ npm init $ npm install remark-cli remark-lint-no-dead-urls remark-validate-links --save-dev

Edit package.json and add this:

"remarkConfig": { "plugins": [ "remark-lint-no-dead-urls", "remark-validate-links" ] }

Run it and time it:

$ time npm run env -- remark . --frail --ignore-pattern 'tests/markdowns/**/*' --ignore-pattern 'tests/golden/**/*'

It seems to take about 8s on average to verify the xrefcheck repo.

The xrefcheck tools takes about 6~8s, so it's not a huge difference :/

However, I did notice that it does not handle "429 Too Many Requests" - so eventually links to github.com start failing and the tool reports false positives.

I replaced our claim about it being slow with a new claim. And also added "Resilience" as one of xrefcheck's aims.

Please have a look at the latest fixup and let me know what you think ^^

LGTM (learned new English words, e.g. resilience). As I understand there was no waiting for 429's during speed comparison (since default waiting interval is 30s) so it was fair test . Maybe one could compare speed on a repository with many local links since your test looks like competition of network libraries.

Created a file with 240 local links, re-ran the experiment and this time also ignored CHANGES.md (where most of our external links are). xrefcheck took 1.5-2s, remark took 2.5-3s. So not a huge difference here either :/

Thanks for this exploration!

I also wanted to see a check for local links only, but it turns out to be not faster...

So yeah, the rewrite of the claim looks good 👍

xrefcheck took 1.5-2s

This is quite suspicious though, to me it looks like there is nothing that forced the time to be more than a portion of second.

Initially, I wanted to make xrefcheck instant when checking local links (so that several seconds time demonstrated by the other tools seemed really long). Most user's mistakes come from local links (external links are usually copy-pasted), and a shorter time would facilitate documentation writing a lot.

Created #219 to investigate this later.

If after optimizations xrefcheck takes like 0.4s on your example, how do you think, would it be correct to include the mention of xrecheck being faster? Not in vague and incorrect terms as before, but referring to our experiment showing that remark takes X seconds and xrefcheck takes Y?

This is quite suspicious though, to me it looks like there is nothing that forced the time to be more than a portion of second.

Oh I just remember I compiled xrefcheck with no optimizations (stack install --fast) ^^ Just tried again with stack install, but got very similar results (1.3~1.5secs) :/

Another interesting data point: I re-ran the experiment, this time with --mode local-only. Like before, I added this file with 242 local links to the xrefcheck repo (don't forget git add a.md), and then ran:

$ time xrefcheck --ignore 'tests/markdowns/**/*' --ignore 'tests/golden/**/*' --ignore CHANGES.md --mode local-only 1.05s user 0.28s system 370% cpu 0.358 total

0.3 secs 😮

Even the 0.75 seconds you observed in morley in #219 is in my opinion really fast 😅

Regarding #219: I'm tempted to say that xrefcheck is pretty much as fast as it can be.
I suspect (but am not 100% sure) the primary bottleneck is in the network when checking external links (and is the reason for the 1.5-2 secs I saw earlier) and the secondary bottleneck in disk reads, and we can't do anything about either of those :/
Any performance gains we can squeeze out will (I think) be very marginal, and unnoticeable by a real world user.

If after optimizations xrefcheck takes like 0.4s on your example, how do you think, would it be correct to include the mention of xrecheck being faster? Not in vague and incorrect terms as before, but referring to our experiment showing that remark takes X seconds and xrefcheck takes Y?

I'd be okay with that, but if we decide to do that, then I think it would be fair to construct a proper reproducible benchmark and compare xrefcheck against the other alternatives we mentioned in the readme (not just remark), and then publish a table with the results.

I agree that likely I/O takes the most part of the time and we don't be able to introduce any significant performance improvement, unless there is some place in the verification logic that appears to be very non optimal.
If I fail to find such places, it would be reasonable to abort the issue.

then I think it would be fair to construct a proper reproducible benchmark and compare xrefcheck against the other alternatives we mentioned in the readme

Really, agreed.

README.md

Martoon-00 · 2022-11-07T15:35:53Z

README.md

  It is able to check multiple repositores at once if they are gathered in one folder.
-  Being written on JavaScript, it is fairly slow on large repositories.
+  Being written in JavaScript, it is fairly slow on large repositories.


Yep, that was my personal observation at that moment.

However given that we are planning a public release, we may need to verify this again 🤔

Problem: We have a pipeline step to tag docker images on dockerhub whenever a new version is released: https://github.com/serokell/xrefcheck/blob/7dd5c4c3c954a531b5cad89857f31b27245f0ef9/.buildkite/pipeline.yml#L51-L56 However, this doesn't seem to be working, dockerhub only contains the `latest` tag: https://hub.docker.com/r/serokell/xrefcheck/tags The problem *seems* to be that the CI step is only triggered when it builds a branch with a name matching the regex `/^v[0-9]+.*/`. But we never use that format for branch names, so it's never triggered. Solution: 1. Change the CI step to trigger when it detects a tag with a version number 2. Enable the "Build tags" option in buildkite: https://buildkite.com/serokell/xrefcheck/settings/repository

Improved the readme and fixed several problems: * Mention support for GitLab - this is important and wasn't mentioned anywhere. * Add a FAQ clarifying how xrefcheck behaves in some important situations. * We don't need to get into a lot of detail about the syntax of the `xrefcheck: ignore` annotations, where they're allowed and where they're not. A general idea and a couple of examples are more than enough. * Added the backlink `[↑](#xrefcheck)` where it was missing. * Fixed inconsistent level headers: we we're using `###` where we should be using `##` * `nix run` should now be `nix shell` * Add a link to `tests/configs/github-config.yaml` which contains a list of all supported config options. * Instead of mentioning GitHub Actions in the "usage" section and nix in a separate section, mention everything in the "usage" section. * Fixed link to `stack2cabal` * Fixed typos and rephrased some bits.

dcastro marked this pull request as ready for review November 7, 2022 14:46

dcastro requested review from Sorokin-Anton and Martoon-00 November 7, 2022 14:48

dcastro commented Nov 7, 2022

View reviewed changes

Martoon-00 approved these changes Nov 7, 2022

View reviewed changes

Sorokin-Anton approved these changes Nov 8, 2022

View reviewed changes

dcastro force-pushed the diogo/#147-update-readme branch from bec5bf3 to 0641f8f Compare November 16, 2022 11:40

dcastro added 2 commits November 18, 2022 11:36

dcastro force-pushed the diogo/#147-update-readme branch from 0641f8f to 2fd11bf Compare November 18, 2022 11:39

dcastro merged commit a534c5f into master Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#147] Update readme #216

[#147] Update readme #216

dcastro commented Nov 7, 2022

dcastro Nov 7, 2022

Martoon-00 Nov 7, 2022

dcastro Nov 7, 2022

Martoon-00 Nov 7, 2022

dcastro Nov 16, 2022 •

edited

Sorokin-Anton Nov 16, 2022

dcastro Nov 17, 2022 •

edited

Martoon-00 Nov 17, 2022 •

edited

dcastro Nov 18, 2022 •

edited

Martoon-00 Nov 18, 2022

Martoon-00 Nov 7, 2022

	- command:
	- nix-build docker
	- nix run -f ci.nix pkgs.skopeo -c ./scripts/upload-docker-image.sh "docker-archive:$(readlink result)" "docker://docker.io/serokell/xrefcheck:${BUILDKITE_BRANCH}"
	label: Push release to dockerhub
	if: \|
	build.branch =~ /^v[0-9]+.*/

[#147] Update readme #216

[#147] Update readme #216

Conversation

dcastro commented Nov 7, 2022

Description

Related issue(s)

✅ Checklist for your Pull Request

Related changes (conditional)

Stylistic guide (mandatory)

✓ Release Checklist

dcastro Nov 7, 2022

Choose a reason for hiding this comment

Martoon-00 Nov 7, 2022

Choose a reason for hiding this comment

dcastro Nov 7, 2022

Choose a reason for hiding this comment

Martoon-00 Nov 7, 2022

Choose a reason for hiding this comment

dcastro Nov 16, 2022 • edited

Choose a reason for hiding this comment

Sorokin-Anton Nov 16, 2022

Choose a reason for hiding this comment

dcastro Nov 17, 2022 • edited

Choose a reason for hiding this comment

Martoon-00 Nov 17, 2022 • edited

Choose a reason for hiding this comment

dcastro Nov 18, 2022 • edited

Choose a reason for hiding this comment

Martoon-00 Nov 18, 2022

Choose a reason for hiding this comment

Martoon-00 Nov 7, 2022

Choose a reason for hiding this comment

dcastro Nov 16, 2022 •

edited

dcastro Nov 17, 2022 •

edited

Martoon-00 Nov 17, 2022 •

edited

dcastro Nov 18, 2022 •

edited