Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure Google can see lazy-loaded content #277

Closed
jimmleon opened this issue Nov 26, 2018 · 29 comments
Closed

Make sure Google can see lazy-loaded content #277

jimmleon opened this issue Nov 26, 2018 · 29 comments

Comments

@jimmleon
Copy link

I tried to run a test for my website as suggested in this link: https://developers.google.com/search/docs/guides/lazy-loading
by using a puppeteer script but the test results are: "Lazy images loaded correctly: Failed".

I then ran the test for the 2 demos provided in the docs, concerning the responsive images lazyload (srcset & with picture tag) https://www.andreaverlicchi.eu/lazyload/demos/with_srcset_lazy_sizes.html and the result was once again "Lazy images loaded correctly: Failed".

However, running the same test for the normal img demo (no srcset, no picture) the result was "passed".

Any idea what is wrong?

@verlok
Copy link
Owner

verlok commented Dec 7, 2018

Hi @DimLeon,
I'm sorry that I'm so late in replying but I've been very busy in the latest days.

That's a good point, and a good question, thank you for asking it.

I didn't get what is the "normal img demo" that is correctly working, but I'll do some tests myself using puppeteer too. I'll keep you updated.

@verlok
Copy link
Owner

verlok commented Dec 7, 2018

Hey @DimLeon,
I don't understand why, I did the following to test the "simple" demo...

node lazyimages_without_scroll_events.js -h --url https://www.andreaverlicchi.eu/lazyload/demos/simple.html

...and the results from puppeteer, which FAILED, says:

"If there are more images in the screenshot below, the page is using scroll events to lazy load images. Instead, consider using another approach like IntersectionObserver."

My objections:

  • the version of LazyLoad used on that page already uses IntersectionObserver
  • to check if the browser is a bot, I check the user agent for "googlebot" or "bingbot" which work in the real world but won't work using puppeteer

I copied the way of detecting a bot from LazySizes, which is the first script that the Google Developers page advices. :)

So help me out here, I don't understand what to do next to fix this issue.

@jimmleon
Copy link
Author

jimmleon commented Dec 10, 2018

Thanks for the reply @verlok. Also thank you for this plugin.
By "normal img demo" i meant the "simple" lazyload demo, my bad 🙂.
I tried running the script shown in this page: https://github.com/GoogleChromeLabs/puppeteer-examples/blob/master/lazyimages_without_scroll_events.js.
I once again attempted to test all demo urls found on the lazyload "Recipes" section.
the only demos that 'passed' where the "dynamic content", the "Lazy Lazyload" and the "Iframes".
The result i get in the failed ones is:

Lazy images loaded correctly: Failed
Found 1971192 pixels differences.
Dimension image A: 250x2874
Dimension image B: 250x2874

The result in the "passed" demos is:

Lazy images loaded correctly: Passed
Found 0 pixels differences.
Dimension image A: 250x500
Dimension image B: 250x500

Doesn' t really make sense to me. I don't know how to help, since there is no scroll events in your code and Intersection Observer is used instead.
Please let me know if you figure anything out.

I' will try to use the Googlebot Images crawler as indicated here: https://support.google.com/webmasters/answer/6066468?hl=en
and see the results
Cheers!

@rasulovdev
Copy link

rasulovdev commented Feb 4, 2019

I'm experiencing the same issues. I have ordinary img with data-src and elements with data-bg. Page doesn't pass Puppeteer tests.

i've tried:

  • v8.17.0 (it won't pass of course, but I tried)
  • v10.19.1
  • v8.17.0 / v10.19.1 conditional load
  • v10.19.1 with a polyfill suggested at Puppeteer output page.

It never passes🤷‍♂️

On the first image (Page without being scrolled) there are no images, on the last (Page after scrolling) – some of them.

EDIT: In my case this is caused by first screen being 100vh. After removing that rule, Puppeteer output screenshots are the same.

Test says "FAILED", but it's due to me using some parallax, so when two screenshots compared, images are misaligned after scroll.

@dan-ding
Copy link

Setting small dimensions (400 x 300) in that puppeteer script, I have yet to find a lazyload example (this or other libraries/script) that passes.
Medium articles pass -- however Medium articles also appear fine with javascript disabled entirely...

testing:
https://rawgit.com/GoogleChromeLabs/puppeteer-examples/master/html/lazyload.html
fails with the small viewport

which makes me wonder if the script is a good example;
puppeteer is creating a page and taking a screenshot of it in it's entirety - that function appears to not be firing intersectionObserver (https://github.com/GoogleChromeLabs/puppeteer-examples/blob/59355609ecb3c2e396a289b28f34d5116fc89b8e/lazyimages_without_scroll_events.js#L131) ???

I think using a placeholder - actually having a src - is key to making everyone happy.
besides, not having a src isn't great for accessibility

@verlok
Copy link
Owner

verlok commented Feb 24, 2019

Would you try with version 11.0.0?

@otterslide
Copy link

otterslide commented Feb 24, 2019

Would you try with version 11.0.0?

I just tried this version and I fetched as Google my site. Only the first image shows, the rest don't load.. but additionally Google ends up failing to even run masonry, so it's as if it runs into some error.
I'm also using srcset. If I take out the data-src and data-srcset from my images, fetch as google loads fine and masonry runs ok.

I'm not sure what could be the issue. Does the LazyLoad() have to be run after document.ready() ? Or can it be called anywhere? I called it in right after loading the script..

I tried with data-src only, and no srcset, and still same problem. On my end it loads fine however, but it seems Google runs into some issue.

@verlok
Copy link
Owner

verlok commented Feb 24, 2019

Yes, DOM needs to be ready.
To be sure of that, you can put the script that create the instance of lazyLoad at the end of the <body> tag, just before it’s closing tag.

From the recently updated README (that I invite you to read and give feedback, especially the “Getting started” section):

Be sure that DOM is ready when you instantiate LazyLoad. If you can't be sure, or other content may arrive in a later time via AJAX, you'll need to call lazyLoadInstance.update(); to make LazyLoad check the DOM again.

Hope this helps!

@verlok
Copy link
Owner

verlok commented Feb 24, 2019

Moreover, is there any chance that JS might be disabled in the Chrome run by Puppeteer?

@dan-ding
Copy link

dan-ding commented Mar 2, 2019

It runs @verlok as far i can see -- with the exception of when a screenshot is taken.
it's easy to replicate with chrome dev tools.

  • get a page with images loaded late (the puppeteer example works)
  • open dev tools
  • change the device to something small, like mobile
  • ensure caching is disabled and reload the page
  • without scrolling, print the page

the print preview is missing all the images which haven't been scrolled to; they don't show as they don't have a src set;

@verlok
Copy link
Owner

verlok commented Mar 3, 2019

Thank you @dan-ding for your explanation, but that’s not the case because the LazyLoad script reads the user agent string and, if found to be a Bot like Google Bot or Bing Bot or others, triggers the loadAll() method immediately. So when a browser is visiting the page behaves differently from when it’s a search engine crawler.

@dan-ding
Copy link

dan-ding commented Mar 4, 2019

Ya -- just trying to help(?) some understand what the puppet is doing.

I'm not a fan of the ua sniffing, however i'm not raising an issue with it.

@verlok
Copy link
Owner

verlok commented Mar 4, 2019

I'm not a fan of the ua sniffing

Me neither, but I didn’t find another way to do that.

If anyone have any suggestions, you’re welcome.

@verlok
Copy link
Owner

verlok commented Mar 4, 2019

image

@verlok
Copy link
Owner

verlok commented Mar 4, 2019

This is the code LazyLoad uses to detect whether or not the browser is a bot.

export const runningOnBrowser = typeof window !== "undefined";

export const isBot =
	(runningOnBrowser && !("onscroll" in window)) ||
	(typeof navigator !== "undefined" &&
		/(gle|ing|ro)bot|crawl|spider/i.test(navigator.userAgent));

First of all, it checks if we're running on a browser, then if the browser has the onscroll function in the window object, and at last it tries to do "ua sniffing" to check if it's a crawler.

In my previous comment, I tested that code in the browser console and as you can see, if window.onscroll is falsy, isBot becomes true

Also other lazyload libraries use this same technique, so I guess this is working, but:

  1. I cannot test this using Puppeteer
  2. I'm not sure Puppeteer has the window.onscroll function empty, or if its user agent contains the google bot user agent.

Can you help me out with this?

@dan-ding
Copy link

dan-ding commented Mar 5, 2019

@verlok i wanted to avoid the question of the UA as I understand why you put it in. It wasn't meant to be a complaint.
I have only one possible idea, but haven't tested it yet, so it may be useless...

Also -- the example puppeteer script does not set the UA to googlebot without modification.

I'll write up some better tests (since that example doesn't use the lazyload we care about ;) and get back to you

@verlok
Copy link
Owner

verlok commented Mar 5, 2019

Thank you 🙏🏼

@otterslide
Copy link

Google scrolls down just like any other browser. I am using another Jquery script that does not do any UA testing and Fetch as google was loading every image fully.
At least for Google, and probably major Bots, it is not necessary to check. In fact, Google will visit the website with multiple user agents, without announcing itself as Googlebot, just to make sure you're not cloaking your site or serving something that is different only to bots.

I should've tested this plugin better, but now the good fetch as Google tool that fetched the first 10,000pixels has been taken offline, and it's impossible to test lazy image loading any more, because the new tool only shows the first image. Very sad Google has chosen to do this.

@verlok
Copy link
Owner

verlok commented Apr 7, 2019

Thank you for your contribution @otterslide. I really don’t know how to test this as Google.

Just to make it clearer, LazyLoad first checks if the onscroll event is present in the window object (if it isn’t, it’s likely to be a crawler), then checks if the user agent matches any bot using a regular expression (just in case). If one of the previous, load all images immediately.

@otterslide
Copy link

otterslide commented Apr 7, 2019 via email

@knoxcard
Copy link

knoxcard commented May 16, 2019

I believe I solved this issue tonight, it finally dawned upon me...

new LazyLoad({
    elements_selector: '.lazy'
})
$('img').one('error', function(err) {
    $(this).remove()
}).one('load', function() {
    $(this).attr('draggable', 'false')
    $(this).attr('src', $(this).attr('data-src'))
    $(this).removeAttr('data-src')
})

@verlok
Copy link
Owner

verlok commented May 16, 2019

@knoxcard thanks for trying comment.

It just seems to add two event listeners to each image,

  • the error handler removes the img element which causes the error,
  • the load handler loads the image in the data-src by setting it in the src

What does this kind of code solve?

@jimmleon
Copy link
Author

jimmleon commented Jun 5, 2019

Some conclusions i came up to:

I used the google's search console and made some url inspection for my website. (the old 'fetch as google').

Unfortunately, my lazy images are not shown in the screenshot preview.
However, when i made a google search of a random url of my website and clicked "view cached", the lazy images in the cached preview of that page were displayed properly. The cached page -as far as i know- is a snapshot of the page by googlebot and the content in that snapshot is the crawlable content by google.

In relevance to the above, some 'seo-related best practices' advise developers using image lazy-loading techniques to add a <noscript> tag including the <img src="">.

Any thoughts on this or if this is indeed a good practice would be more than helpful.

@knoxcard
Copy link

knoxcard commented Jun 5, 2019

@verlok - html standards require <img> tags to have a src attribute. When I ran the source code produced by this library here: https://validator.w3.org/#validate_by_input, warnings and errors are displayed.

@jimblue
Copy link

jimblue commented Jun 22, 2019

Some ideas to check if user is a bot and load all images:

Very powerfull but heavy:
https://github.com/JefferyHus/es6-crawler-detect
https://github.com/gorangajic/isbot

A faster version:
https://github.com/mahovich/isbot-fast

@juanmardefago
Copy link
Contributor

@verlok As stated in the PR for the puppeteer script (puppeteer/examples#30) that is referenced on the google docs, adding the UA to the script makes it work.

Even though, the results should always be interpreted by a human, since I used that for testing the site I'm working on right now, and the test "failed", but the images were almost exactly the same, with the exception that the images at the far bottom had a different opacity, which, in my case, means that they were the last ones to be scrolled and that their 1sec opacity animation was just finishing. Of course the script would not know that, but since those were the only pixels which had a variation, I can take that result as a good one nonetheless.

I'll ask our client if we can upload those images to better exemplify it, since it might be somewhat confusing haha.

@juanmardefago
Copy link
Contributor

Also, since adding the UA to that script made it work, I guess that the onscroll event and window object exists on that puppeteer script, which might not be exactly the same as when the googlebot crawls the site.

Also, the script supposedly should tell whether or not a lazyload approach uses scroll events or Intersection Observer API, but I'm guessing it's not working as expected, since this solution uses the IO API.

For the time being, I wouldn't worry too much about this.

@juanmardefago
Copy link
Contributor

Here's the picture of how it ended up

page_diff

The script returned a Failed result, since there are lots of pixels highlighted in the diff, but I consider it a pass.

@verlok
Copy link
Owner

verlok commented Nov 21, 2019

Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants