Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards tweaselORG/meta#36: Add auto-archiving script #54

Merged
merged 7 commits into from
Oct 26, 2023
Merged

Conversation

zner0L
Copy link
Contributor

@zner0L zner0L commented Sep 23, 2023

This is based on #32, which needs to be merged first.

scripts/lib/wayback.ts Show resolved Hide resolved
scripts/lib/wayback.ts Outdated Show resolved Hide resolved
scripts/lib/wayback.ts Show resolved Hide resolved
scripts/lib/wayback.ts Show resolved Hide resolved
archive-config.example.json Outdated Show resolved Hide resolved
research-docs/archived-urls.csv Outdated Show resolved Hide resolved
scripts/archive-links.ts Outdated Show resolved Hide resolved
scripts/archive-links.ts Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
@zner0L zner0L marked this pull request as ready for review September 28, 2023 14:09
Copy link
Member

@baltpeter baltpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working better than I expected it to. :D Nice!

package.json Outdated Show resolved Hide resolved
scripts/archive-links.ts Outdated Show resolved Hide resolved
scripts/archive-links.ts Outdated Show resolved Hide resolved
scripts/archive-links.ts Outdated Show resolved Hide resolved
scripts/archive-links.ts Outdated Show resolved Hide resolved
research-docs/README.md Outdated Show resolved Hide resolved
scripts/archive-links.ts Show resolved Hide resolved
scripts/archive-links.ts Show resolved Hide resolved
research-docs/README.md Outdated Show resolved Hide resolved
scripts/lib/wayback.ts Show resolved Hide resolved
Copy link
Member

@baltpeter baltpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something's wrong with the dates. When adding a new entry to the CSV, it produces a UNIX timestamp instead of an ISO 8601 string. And for the existing entries, it replaces their timestamp with NaN.

image

scripts/archive-links.ts Show resolved Hide resolved
baltpeter
baltpeter previously approved these changes Oct 26, 2023
Copy link
Member

@baltpeter baltpeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the typo, this looks good now. You can merge yourself after fixing.

scripts/lib/archiving.ts Outdated Show resolved Hide resolved
This was surprisingly hard to do. Because nodejs caches imports, we need to somehow clear the import cache.
This can be done naively (https://ar.al/2021/02/22/cache-busting-in-node.js-dynamic-esm-imports/#cache-invalidation-in-esm-with-dynamic-imports) for plain js files, but it leaks memory and fails for typescript files, since the lodaer doesn’t support it.
So, we run each archiving task in a new worker thread that has its own import context. But, since the adapters we are importing are written in typescript, we need to register a loader in the worker context, because it doesn’t have it loaded by default (like the main thread does via `tsx`).
Luckly, this feature has just been added to node, but we need to bump our node version to use it. Now we can dynamically import the all chnages to modules.
@zner0L zner0L merged commit 5ce69c4 into main Oct 26, 2023
@zner0L zner0L deleted the z_archive-script branch October 26, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants