Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Near real-time updates to crawled data #486

Open
dontcallmedom opened this issue Feb 2, 2022 · 1 comment
Open

Near real-time updates to crawled data #486

dontcallmedom opened this issue Feb 2, 2022 · 1 comment

Comments

@dontcallmedom
Copy link
Member

In a variety of contexts (CI in particular, but likely also in the context of the data re-used by spec authoring tools), it would be ideal if the content in webref reflected changes in the underlying documents in close to real-time.

One way we could enable this (at least partially) is by having spec repos trigger a webref update for the given spec whenever the main source file of the said spec is updated - this could be typically achieved with a webhook installed at the repo or (more likely for scaling) at the org level.

One issue is that if several updates are processed at the same time, they would likely trigger an error at the time of pushing the results; this could be avoided either using a different timing in how checkouts and crawls are organized, or by doing a full crawl (with HTTP caching optimizations to reduce the time / network impact).

@dontcallmedom
Copy link
Member Author

so it looks like solving w3c/reffy#850 will gets us with ~1min30 as a basis for a no-update workflow run, and updating one spec is probably in the order of ~10s, so running a full crawl might be reasonable approach to this, although we should expect the basis to grow in proportion of the number of specs being crawled.

For the more efficient single-spec update approach, we might be able to use https://github.com/softprops/turnstyle as a way to ensure trigger events are processed sequentially - see also https://github.community/t/race-condition-possible-from-rapidly-executed-concurrent-github-actions/137411/3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
@dontcallmedom and others