Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sitediff fails with "Not a directory @ apply2files" if crawl only produces one page #149

Open
fgerards opened this issue Dec 16, 2022 · 6 comments

Comments

@fgerards
Copy link

Sitediff fails to compare 2 single-paged URLs/sites: the before/after entries in the snapshots directory are files, not directories containing other entries, so this should also be taken into account

Error occurs on Linux ubuntu laptop and on Macbook Air 2020 M1 with MacOS Ventura 13.1 when installing sitediff via homebrew in latest version

@jgam
Copy link

jgam commented Mar 1, 2023

any follow up comment on this?

@kirk-brown-ew
Copy link
Collaborator

Which version of Ruby are you using?
Can you provide an example of what you're doing?
On what line does the error happen?

@jgam
Copy link

jgam commented Mar 2, 2023

using ruby 3.1.3

when running sitediff crawl it simply finds only a single path that is '/' and outputs the error above

@jgam
Copy link

jgam commented Mar 2, 2023

for in stnce this is the how output looks like

Jimmyui-MacBook-Pro:~ jimmygam$ sitediff init https://mentree.club/
[success] Created /Users/jimmygam/sitediff/sitediff.yaml
Jimmyui-MacBook-Pro:~ jimmygam$ sitediff crawl
Reading config file: /Users/jimmygam/sitediff/sitediff.yaml
Visited https://mentree.club/, cached.
[error] Unknown parsing error for https://mentree.club/: Not a directory @ apply2files - sitediff/snapshot/before/timestamp  From page: {:referrer=>"/"}

1 page(s) found.
[done] Created /Users/jimmygam/sitediff/paths.txt.

@kirk-brown-ew
Copy link
Collaborator

We've been able to reproduce this issue by:

  1. Creating a static web page with no links.
  2. Running sitediff crawl.
  3. Adding a link to the page with either a reference to within the page or creating another page at the same level.
  4. Running sitediff crawl.

The first crawl creates the before site as a file. The second crawl wants to create before as a directory.

The solution that we see is to remove the before directory and re-running the crawl.

@tjhaygood
Copy link

Adding on to @kirk-brown-ew's comment, I've been able to resolve this issue entirely by running the following before running the crawl command:

mkdir -p sitediff/snapshot/before ; mkdir -p sitediff/snapshot/after

Your path before the /snapshot directory may differ. I'm running this via the Docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants