Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(theme): use JSON-LD instead of microdata for blog structured data #9669

Merged
merged 23 commits into from Feb 15, 2024

Conversation

johnnyreilly
Copy link
Contributor

@johnnyreilly johnnyreilly commented Dec 26, 2023

Pre-flight checklist

Motivation

I originally contributed Structured Data support for blog posts back in 2021: #5322

@lex111 subsequently submitted a PR to migrate the approach to use microdata instead: #5355

I had reservations which I voiced at the time, but left it at that. Since then time I've had something of a baptism of fire around the world of SEO. And consequently I've been working with some excellent folk in the SEO industry to improve my own ranking. A thing that comes up repeatedly is a suggestion to use JSON-LD instead of microdata as that is what Google prefers: https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data#supported-formats

In general, Google recommends using JSON-LD for structured data if your site's setup allows it, as it's the easiest solution for website owners to implement and maintain at scale (in other words, less prone to user errors).

I raised #9274 to discuss this and received some good feedback.

I've now implemented JSON-LD support for the blog; both individual posts and the blog listing page. With this change in place, it's now possible to separately configure the Structured Data through swizzling the two new components:

  • BlogListPage/StructuredData
  • BlogPostPage/StructuredData

From @Josh-Cena:

Swizzability does seem desirable. I also wonder if there are cases in the wild where people swizzle blog component and inadvertently broke microdata. This sounds reasonable to me.

The default behaviour for these components is to produce JSON-LD structured data that aligns with the Schema.org and Google's Rich Results guidelines.

Let's talk for a moment about each of these components.

BlogListPage/StructuredData

This component is responsible for generating the Structured Data for the blog list page. It renders JSON-LD structured data that aligns with the https://schema.org/Blog schema. (Please note the examples at the bottom of the page which this implementation aligns with.)

BlogPostPage/StructuredData

This component is responsible for generating the Structured Data for the blog post page. It renders JSON-LD structured data that aligns with the https://schema.org/BlogPosting schema. (Please note the examples at the bottom of the page which this implementation aligns with.)

The BlogPosting schema is one of the structured data types that Google explicitly supports for Rich Results: https://developers.google.com/search/docs/appearance/structured-data/article#structured-data-type-definitions

All the Google-supported properties are included in the Structured Data generated by this component apart from dateModified which is optional. A number of other properties documented in the BlogPosting schema are included as well.

Test Plan

I will use the pull request preview on this PR to demonstrate that the Structured Data is generated as expected. I will also use the Structured Data Testing Tools to verify that the Structured Data is valid:

Expect screenshots to be added to this PR.

Test links

Deploy preview: https://deploy-preview-9669--docusaurus-2.netlify.app/

BlogListPage/StructuredData

If we go to the test preview of the /blog page: https://deploy-preview-9669--docusaurus-2.netlify.app/blog

We can validate with schema.org that the Blog structured data is valid: https://validator.schema.org/#url=https%3A%2F%2Fdeploy-preview-9669--docusaurus-2.netlify.app%2Fblog

image

BlogPostPage/StructuredData

If we go to the test preview of the /blog/releases/2.4/ page: https://deploy-preview-9669--docusaurus-2.netlify.app/blog/releases/2.4/

We can validate with schema.org that the BlogPosting structured data is valid: https://validator.schema.org/#url=https%3A%2F%2Fdeploy-preview-9669--docusaurus-2.netlify.app%2Fblog%2Freleases%2F2.4%2F

image

And we can also test this type with the Rich Results tool: https://search.google.com/test/rich-results

image

You can also see this in the Ahrefs Chrome extension: https://chromewebstore.google.com/detail/ahrefs-seo-toolbar-on-pag/hgmoccdbjhknikckedaaebbpdeebhiei?pli=1

image

Related issues/PRs

#9274

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Dec 26, 2023
Copy link

netlify bot commented Dec 26, 2023

[V2]

Built without sensitive environment variables

Name Link
🔨 Latest commit e0da5cf
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/658a901701f7a80008a486f9
😎 Deploy Preview https://deploy-preview-9669--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Dec 26, 2023

[V2]

Name Link
🔨 Latest commit 96073e8
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/65ce26f897d19b0008565473
😎 Deploy Preview https://deploy-preview-9669--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

github-actions bot commented Dec 26, 2023

⚡️ Lighthouse report for the deploy preview of this PR

URL Performance Accessibility Best Practices SEO PWA Report
/ 🟠 66 🟢 98 🟢 96 🟢 100 🟠 88 Report
/docs/installation 🟢 90 🟢 96 🟢 100 🟢 100 🟠 88 Report
/docs/category/getting-started 🟠 77 🟢 100 🟢 100 🟢 90 🟠 88 Report
/blog 🟠 71 🟢 100 🟢 100 🟢 90 🟠 88 Report
/blog/preparing-your-site-for-docusaurus-v3 🟠 66 🟢 96 🟢 100 🟢 100 🟠 88 Report
/blog/tags/release 🟠 70 🟢 100 🟢 100 🟠 80 🟠 88 Report
/blog/tags 🟠 77 🟢 100 🟢 100 🟢 90 🟠 88 Report

@johnnyreilly
Copy link
Contributor Author

Hi @Josh-Cena and @slorber!

I was wondering if there were any thoughts about this PR? There's been no comments on it and so I'm not sure if you're aware it is here? I've been checking back every week or so for a while but there appears to be no activity.

It's possible you're not interested in the PR - if so would you be able to let me know and I'll close it for tidiness sake?

Copy link
Collaborator

@Josh-Cena Josh-Cena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I'm indeed aware of this. However my repo access isn't renewed so there isn't much I can do. If you've tested it yourself and it works, I'm personally happy to try it out and improve it where necessary.

johnnyreilly and others added 2 commits January 29, 2024 05:36
…turedData/index.tsx

Co-authored-by: Joshua Chen <sidachen2003@gmail.com>
…turedData/index.tsx

Co-authored-by: Joshua Chen <sidachen2003@gmail.com>
@johnnyreilly
Copy link
Contributor Author

Yeah this PR was a Christmas project for me - I think it's a really good piece of work actually! (Of course I'm biased 😀)

I think it puts the structured data story of Docusaurus in a really great place as it offers a really good default JSON-LD structured data story and freedom for users to straightforwardly control the structured data produced through the magic of swizzling. (In fact if they wanted to they could easily use the same mechanism to stop producing structured data)

If you've tested it yourself and it works, I'm personally happy to try it out and improve it where necessary

I have indeed and I'm happy to take feedback to improve it as necessary.

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that seems reasonable to use the solution recommended by Google 👍

Review:

  • I'd like to get rid of the 2 meta attributes you added
  • We can probably reduce code duplication

@johnnyreilly
Copy link
Contributor Author

Thanks for the review @slorber - useful points, will address them soon!

Copy link
Collaborator

@slorber slorber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional changes requested and a few questions

If we merge this, should this be considered as a breaking change? 🤷‍♂️

packages/docusaurus-plugin-content-blog/src/index.ts Outdated Show resolved Hide resolved
Comment on lines 16 to 18
// We're using dangerouslySetInnerHTML because we want to avoid React
// transforming quotes into &quot; which upsets parsers.
// The entire contents is a stringified JSON object so it is safe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain how to reproduce that problem here?

Was the code we documented before affected by any issue?

        <script type="application/ld+json">
          {JSON.stringify({
            '@context': 'https://schema.org/',
            '@type': 'Organization',
            name: 'Meta Open Source',
            url: 'https://opensource.fb.com/',
            logo: 'https://opensource.fb.com/img/logos/Meta-Open-Source.svg',
          })}
        </script>

Can you show side-by-side examples in a repro, before/after, rendering differently in practice? And explain how it upsets parsers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the code we documented before affected by any issue?

Yes.

So this was a curious one. The issue surfaces in the Google Search Console, and relates to the unsuccessful parsing of the inner JSON when it is directly rendered internally to the <script type="application/ld+json"> element. The Google Search Console sends a notification asking you to fix this issue:

Parsing error: Missing '}' or object member name.

image

This happens because by not using the dangerouslySetInnerHTML approach, the " characters in the JSON-LD are rendered as &quot; - which is not valid JSON. So something like this:

<script type="application/ld+json">
  {
    &quot;@context&quot;: &quot;https://schema.org/&quot;,
    &quot;@type&quot;: &quot;Organization&quot;,
    &quot;name&quot;: &quot;Meta Open Source&quot;,
    &quot;url&quot;: &quot;https://opensource.fb.com/&quot;,
    &quot;logo&quot;: &quot;https://opensource.fb.com/img/logos/Meta-Open-Source.svg&quot;
  }
</script>

Rather than:

<script type="application/ld+json">
  {
    "@context": "https://schema.org/",
    "@type": "Organization",
    "name": "Meta Open Source",
    "url": "https://opensource.fb.com/",
    "logo": "https://opensource.fb.com/img/logos/Meta-Open-Source.svg"
  }
</script>

Curiously, Google will sometimes parse the &quot; style successfully. But more often it won't (TBH I'm surprised it ever succeeds). When I migrated to the dangerouslySetInnerHTML approach instead it always parsed successfully and this fixed the issue being logged in the Google Search Console:

image

For reference, this is when I implemented the fix on my own site: https://github.com/johnnyreilly/blog.johnnyreilly.com/pull/664/files#diff-c2bd2d1e0092d85d7acaff15ce9223d0202ef706c2497f7500b1a24db9bc0366

website/docs/seo.mdx Outdated Show resolved Hide resolved
@slorber slorber added the pr: polish This PR adds a very minor behavior improvement that users will enjoy. label Feb 10, 2024
@slorber slorber changed the title feat: JSON-LD structured data implementation for blog refactor(theme): use JSON-LD instead of microdata for blog structured data Feb 10, 2024
@johnnyreilly
Copy link
Contributor Author

johnnyreilly commented Feb 10, 2024

If we merge this, should this be considered as a breaking change? 🤷‍♂️

No - I can't think of any reason why it would be

Some additional changes requested and a few questions

Cool - I've addressed these. See my responses above!

@slorber slorber merged commit 60d9346 into facebook:main Feb 15, 2024
31 checks passed
@johnnyreilly
Copy link
Contributor Author

Hey @slorber,

It's been two weeks - just wanted to check in and see how the "amélioration" section of Docusaurus Google Search Console is looking? Does it look okay?

@slorber
Copy link
Collaborator

slorber commented Mar 4, 2024

So far it doesn't seem to affect SEO much.

CleanShot 2024-03-04 at 12 56 21@2x

But I'll keep monitoring this for a few more weeks to be sure. Impressions (purple) have slightly decreased, but it could be seasonality, Google algorithm changes, or something else 🤷‍♂️

Surprisingly the number of clicks (blue) remains as high as before so maybe the search just became more relevant?

Do you observe similar behavior on your site?


We still have the same breadcrumbs suggestions being reported:

CleanShot 2024-03-04 at 13 00 28

@johnnyreilly
Copy link
Contributor Author

But I'll keep monitoring this for a few more weeks to be sure. Impressions (purple) have slightly decreased, but it could be seasonality, Google algorithm changes, or something else 🤷‍♂️

Surprisingly the number of clicks (blue) remains as high as before so maybe the search just became more relevant?

I suspect this is just slight variability - essentially SEO unaffected. If things change massively then it's a concern; slight variance then it's likely just fine. (SEO will always vary slightly over time and that's out of our control in the main and nothing to worry about)

We still have the same breadcrumbs suggestions being reported:

have you done anything to remedy this? I didn't spot a PR but I might have missed.

TL;DR - so far it sounds fine

@slorber
Copy link
Collaborator

slorber commented Mar 4, 2024

Agree 👍 I still want to work on a few things for v3.2 so maybe we'll include this PR in v3.2 in a few weeks.

have you done anything to remedy this? I didn't spot a PR but I might have missed.

Not a high priority for me to investigate atm, I'll get back to it later so if you know how to fix the problem go ahead.

We have this being reported for docs and blog posts too. Not sure why I can't get this UI in English easily 😅

Missing "position" field (in "itemListElement")
Items with this problem are invalid. Invalid items cannot appear in the enhanced Google search results.

CleanShot 2024-03-04 at 15 26 18

@johnnyreilly
Copy link
Contributor Author

Not a high priority for me to investigate atm, I'll get back to it later so if you know how to fix the problem go ahead.

I'm pretty snowed right now, but I might see if I can take a look in a couple of weeks when things quiet down (I hope)

@slorber
Copy link
Collaborator

slorber commented Mar 21, 2024

Today's SEO results:

CleanShot 2024-03-21 at 09 15 52

Still less impressions, but more clicks, so maybe it's just Google targeting better the impressions 🤷‍♂️

Anyway, this doesn't destroy our SEO so it looks relatively safe to release.

@johnnyreilly
Copy link
Contributor Author

Looks good - ship it!

@FixTheAdmin
Copy link

Great work on this @johnnyreilly. Is the change live? I don’t see it in the changelog.

@johnnyreilly
Copy link
Contributor Author

I think it went live with 3.2

https://github.com/facebook/docusaurus/releases/tag/v3.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA pr: polish This PR adds a very minor behavior improvement that users will enjoy.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: migrate blog structured data back to JSON-LD
5 participants