Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to prevent broken book images #1391

Open
pedrorijo91 opened this issue Oct 30, 2019 · 9 comments
Open

Find a way to prevent broken book images #1391

pedrorijo91 opened this issue Oct 30, 2019 · 9 comments

Comments

@pedrorijo91
Copy link
Member

As reported on #1390 , book images get broken from time to time.

I see 2 possible solutions:

  • download book images to this repo (we may need to update them from time to time?)
  • actively check for broken images (cron rake task to check links are not broken?)
@smithmf
Copy link

smithmf commented Oct 30, 2019

Took a stab at this one in pull request #1392. A simple rake task that will parse each img element from _ext_books.erb and if the URL cannot be retrieved, a warning is printed to the console. A cron job could be setup to execute this rake task

@peff
Copy link
Member

peff commented Oct 30, 2019

One problem with checking via rake task is that these are all run via Heroku's scheduler, which doesn't provide any mechanism for feedback (i.e., to tell us that some links are broken). After searching a bit, the usual proposed solution seems to be using the exception-notifier gem to send an email. But that implies configuring the site to send email, which is notoriously annoying.

I wonder if we could instead have it open an issue in this repository. That would require setting up a bot GitHub account, but once done, it would be pretty reliable.

Just downloading the images into the repo does side-step all of this, but it might be nice to get to the root of the problem. In particular:

  • we could be doing a full-on link check, not just checking for broken image links

  • we currently get no feedback from the existing cron jobs (e.g., to build the book or manpages)

@smithmf
Copy link

smithmf commented Oct 30, 2019

I was addressing the case of a successful request with a non-image body and ran into some URLs returning a 301. We could follow the eventual location from the response header, but then there is the potential risk of infinite redirects. Adding all that logic seems a bit overkill for this solution, seeing as like @peff mentioned, this isn't a likely long term solution.

It might make more sense here to punt and take the low-tech solution of downloading the images to the repo and updating the src attributes. A more automated is definitely preferable, but takes more design as @peff suggests.

@peff
Copy link
Member

peff commented Oct 30, 2019

I think any kind of link-checking will have to handle redirects at some point. It would be OK to just put a hard limit on looping (say, 10). I do wonder if there's a higher-level gem than net/http we could use that would do this sort of thing for us (likewise the https handling, which I notice you had to do manually).

@JeremiahKamama
Copy link

I would like to tackle this as my "good first issue"

@jnareb
Copy link
Member

jnareb commented Jan 9, 2021

All cover images for Packt books are broken, as they lead to the images in (long expired) cache.

For example, the cover image for "Mastering Git" book is now

https://www.packtpub.com/media/catalog/product/cache/e4d64343b1bc593f1c5348fe05efa4a6/b/0/b03537_cover.png

"Mastering Git" cover

instead of

https://static.packt-cdn.com/products/9781783553754/cover/smaller

"Mastering Git" cover

@pedrorijo91
Copy link
Member Author

added #1566 as a quick fix. but maybe we should just keep the images in the repo as we have with a few others (https://github.com/git/git-scm.com/tree/master/public/images/books)

@peff
Copy link
Member

peff commented Jan 11, 2021

Thanks for jumping on this, @pedrorijo91. I'm not opposed to importing book images. They might get out of date with the source images, but if the publishers/authors care, they can open a PR. I'd rather have slightly old images than broken ones.

But I'm also OK to punt on that to see if this actually happens again. It looks like our packt links were probably not the right ones in the first place here.

@js1darren
Copy link

As reported on #1390 , book images get broken from time to time.

I see 2 possible solutions:

  • download book images to this repo (we may need to update them from time to time?)
  • actively check for broken images (cron rake task to check links are not broken?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
@jnareb @peff @smithmf @pedrorijo91 @JeremiahKamama @js1darren and others