Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate git-scm.com to a static site, generated via Hugo, served via GitHub Pages #1804

Open
wants to merge 219 commits into
base: main
Choose a base branch
from

Conversation

dscho
Copy link
Member

@dscho dscho commented Oct 16, 2023

Changes

This Pull Request adjusts the existing files such that the site is no longer served via a Rails App, but by GitHub Pages instead. A preview can be seen here: https://dscho.github.io/git-scm.com/ (which is generated and deployed from this Pull Request's branch, and will be updated via automation whenever that branch changes).

It is the culmination of a very long, and large, effort that started in February 2017 with the first attempt to migrate
the site to Jekyll
. Several years, and a substantial effort by @spraints, @vdye and myself, later, here is the result: No longer a Jekyll site but a Hugo site (because of render times: 20 minutes vs 30 seconds), search implemented using Pagefind.

The main themes of the subsequent migration from the Rails App to a Hugo-generated static site are:

  • We move the original Rails App files that contain Rails code mixed into HTML to content/, where the files defining the pages live in the Hugo world, then modify them to drop the Rails code and replace it with Hugo constructs. More often than not, we separate the commits that move the files from the commits that adjust the contents, to help Git realize that there has been a move (as opposed to a delete/add). This allows for noticing upstream changes that need to be reflected in moved & modified files when rebasing to upstream.

  • In Hugo setups, the files live in the following locations:

    • hugo.yml

      This is the central configuration file that tells Hugo how to render the site.

    • content/

      This defines the content of the pages that are served. Only a subset of Hugo's functionality is available here (the idea is to leave the complicated stuff to the layout used to render the pages).

      Most, but not all, of the files living in this directory tree are HTML files that are generated (and then committed) using external repositories, e.g. the ProGit book and its translations.

    • layouts/

      This is where the "boiler plate" is defined that ties the site together, i.e. the header, the footer and the sidebar as well as the main scaffolding in which the pages' content is to be rendered.

      This is the location where most of Hugo's functionality is available and complex stuff can happen such as looping or accessing site parameters.

    • layouts/partials/

      This directory contains recurring templates, i.e. reusable partial layouts that are used to structure the elements of the site. This includes the side bar, how videos are rendered, etc.

    • layouts/shortcodes/

      This directory contains so-called "shortcodes", i.e. reusable elements similar to partial layouts. The major difference is that shortcodes can be used within content/ while partial layouts can only be used from within layouts/.

      See https://gohugo.io/content-management/shortcodes/ for more information on this topic.

    • static/

      These files are not processed by Hugo, but copied as-are. Good for images, for example.

    • assets/

      These files are processed in specific ways. That is where the SASS-based style sheets live, for example.

    • data/

      These files define metadata that can be used in Hugo's functions. For example, it contains the list of documentation categories that are rendered in various ways.

  • In contrast to most Hugo-managed sites, we will refrain from using a Hugo theme, and instead stick with the existing style sheets.

    Likewise, we refrain from using Markdown at all: The existing site did not use it, therefore it makes little sense to start using it now.

  • In addition to Hugo's directories, we also have these:

    • script/

      This directory contains scripts to perform recurring tasks such as rendering Git's manual pages into HTML that are then stored inside contents/docs/.

      For historical reasons, these are Ruby scripts for the most part, as it is easier to follow the development when that functionality is extracted from the current Rails App and turned into Ruby scripts that can be run stand-alone.

    • .github/workflows/ and .github/actions/

      The latter directory contains a file that defines a custom GitHub Action that accommodates for the lack of Hugo support in GitHub Pages: By default, only Jekyll pages are supported out of the box, but Hugo sites require a custom GitHub workflow to deploy the site.

      The former directory contains files that define GitHub workflows that are typically run on a schedule, updating the various parts that are generated from external sources: the Git version, the ProGit Book, manual pages, etc. These workflows essentially keep the rendered HTML files in content/ up to date with the respective external repositories.

      These workflows can be seen in action (pun intended) here: https://github.com/dscho/git-scm.com/actions

    • _generated-asciidoc/

      This directory serves as a cache of "expanded AsciiDoc": many of Git's manual pages include content from other files, and therefore it is non-trivial to determine whether or not a manual page has changed and needs to be re-rendered (essentially, the only way is to expand them by inlining the included files). Caching this content speeds up updating the manual pages drastically.

  • Most of the core logic lives in layouts/. Hugo discerns between logic that is allowed in layouts/ and logic that is allowed in content/; The latter can only access so-called "shortcodes" https://gohugo.io/content-management/shortcodes/. These shortcodes are free to use the entire set of Hugo's functionality.

    tl;dr whenever we need to do something complicated that is confined to only a few pages, we have to implement it in layouts/shortcodes/ and insert the corresponding {{< shortcode-name >}} in the page itself. Whenever we need to something complicated that is used in more places, it is implemented elsewhere in layouts/.

  • Some of the logic that cannot be performed statically (such as telling the user how long ago the latest macOS installer was released, or adjusting the Windows downloads to reflect the CPU architecture indicated by the current user agent) are implemented using Javascript instead.

  • The site search needs to move to the client side, as there is no longer a server that can perform that functionality. Luckily, Pagefind (https://pagefind.app/) matured in the meantime, a very performant client-side search solution implemented in Javascript that relies on a search index that is generated at build time and that is served incrementally, as needed, via static files. This is what we use, then.

Context

Changes required to finalize the migration in addition to this Pull Request

  • This Pull Request is not actually meant to be merged, not to the main branch at least, but to the (not-yet-existing) gh-pages branch.

  • To successfully deploy to GitHub Pages, the Pages configuration needs to be switched from "Deploy from a branch" to "GitHub Actions":

    image

  • Once everything is golden in this Pull Request and the decision to move to GitHub Pages is final, git-scm.com needs to pointed to GitHub Pages (read: CNAME needs to be configured to make use of the GitHub Pages-deployed site).

  • The Pull Request branch could actually be pushed to gh-pages already way before closing this Pull Request, as https://git-scm.github.io/ would be serving a different site than https://git-scm.com/ before the CNAME entry is adjusted.

Why make these changes?

  • Heroku stopped their free tier and ever since https://git-scm.com/ has required sponsorship whose funding could be put to better use elsewhere.
  • Static sites are much easier to manage, and to develop. With this Pull Request, developing the site locally is as easy as checking out the repository and running hugo serve -w, then editing the files to your heart's extent.

@dscho dscho force-pushed the hugo branch 6 times, most recently from 1db01e4 to bd332cc Compare October 16, 2023 21:11
dscho added a commit to dscho/git-scm.com that referenced this pull request Oct 17, 2023
In the current effort to migrate https://git-scm.com/ to a static Hugo
site (see git#1804), we saw a bogus
tag that would confuse Hugo. We also saw a now-unused banner that we
probably do not want to bother migrating to Hugo.

So let's drop both.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the hugo branch 2 times, most recently from 4bd3b3f to 7c5e7c5 Compare October 17, 2023 12:00
@spraints
Copy link
Contributor

🎉 This is great! Thank you so much for picking this up! The demo site looks great!

@bglw
Copy link

bglw commented Oct 18, 2023

👋 Sneaking in here with some thoughts from the search side!

On first interactions, the search has some notable issues compared to the production rails search, for a few reasons on both sides of the fence.

  1. All tagged releases are indexed, so a search for rebase returns /docs/git-rebase/ and /docs/git-rebase/2.41.0/ and /docs/git-rebase/2.23.0/ and ...
    • The best fix here would be for you to omit the data-pagefind-body attribute from the numbered release pages, so that only /docs/git-rebase/ is indexed and returned
  2. Titles definitely need stronger affinity here. A search for list on the rails site returns rev-list-description, git-rev-list, and rev-list-options as the top results. Pagefind's search is significantly more varied, with a lot of results for mailing lists and related items.
  3. Typing rebase into the live search and hitting enter does not show the rebasing book result. Typing the query in does.
  4. The rails site live search has a nice Reference / Book split that would be great to recreate with filters, if possible.

(Amazing work migrating this to Hugo! ❤️)

dscho added a commit to dscho/git-scm.com that referenced this pull request Oct 18, 2023
In the current effort to migrate https://git-scm.com/ to a static Hugo
site (see git#1804), we saw a bogus
tag that would confuse Hugo. We also saw a now-unused banner that we
probably do not want to bother migrating to Hugo.

So let's drop both.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho self-assigned this Oct 18, 2023
@dscho
Copy link
Member Author

dscho commented Oct 18, 2023

Oh wow, Mr Pagefind himself! I'm honored to meet you, @bglw!

  • The best fix here would be for you to omit the data-pagefind-body attribute from the numbered release pages, so that only /docs/git-rebase/ is indexed and returned

I kind of wanted to be able to find stuff in old versions that is no longer present in current versions. That's why I added dscho@e9fa963).

  • Titles definitely need stronger affinity here. A search for list on the rails site returns rev-list-description, git-rev-list, and rev-list-options as the top results. Pagefind's search is significantly more varied, with a lot of results for mailing lists and related items.

Excellent!

Heh, thank you for that!

  • The rails site live search has a nice Reference / Book split that would be great to recreate with filters, if possible.

Right, I had not worked on that because I hoped that the sorting by relevance would be "good enough"...

@rimrul
Copy link
Contributor

rimrul commented Oct 20, 2023

About Heroku

That is true, but here has been an update since that 2022 mail.

https://lore.kernel.org/git/ZRHTWaPthX%2FTETJz@nand.local/

Heroku has a new (?)
program for giving credits to open-source projects. The details are
below:

https://www.heroku.com/open-source-credit-program

I applied on behalf of the Git project on 2023-09-25, and will follow-up
on the list if/when we hear back from them.

It does seem like the PLC is still in favor of moving to a static solution, though.

https://lore.kernel.org/git/ZRrfAdX0eNutTSOy@nand.local/

  • Biggest expense is Heroku - Fusion has been covering the bill
  • Dan Moore from FusionAuth has been providing donations
  • Ideally we are able to move away from using Heroku, but in the meantime
    we'll have coverage either from (a) FusionAuth, or (b) Heroku's new
    open-source credit system

About the preview:

Search

All tagged releases are indexed, so a search for rebase returns /docs/git-rebase/ and /docs/git-rebase/2.41.0/ and /docs/git-rebase/2.23.0/ and ...

That is true. And in both the search results page as well as the little preview (<div id="search-results">) it's not visually obvious which result is the current version and which results are older versions. Maybe that could be improved by adding the version number to the page title for non-current versions? Or maybe a filter in the search results to exclude historical documentation?
If we don't want to mangle the titles, pagefind would show the version number below the result if we configured it as metadata.

Minor issues

There are some broken links in the preview on https://dscho.github.io/git-scm.com/docs/ that lead to https://dscho.github.io/docs/ <topic>

There's a broken link on https://dscho.github.io/git-scm.com/about/free-and-open-source/ to https://dscho.github.io/git-scm.com/trademark. On the live site that redirects from https://git-scm.com/trademark to https://git-scm.com/about/trademark (dscho#1)

The "Setup and Config" headline on https://dscho.github.io/git-scm.com/docs/ is blue in the preview, but not in the live site. This is not happening for me in local testing.

There's some redirect that swallows anchors. https://dscho.github.io/git-scm.com/docs/ links to https://dscho.github.io/git-scm.com/docs/git#_git_commands , which redirects to https://dscho.github.io/git-scm.com/docs/git/ instead of https://dscho.github.io/git-scm.com/docs/git/#_git_commands
Looks like the slash-free version isn't possible with the GitHub pages/Hugo combination (gohugoio/hugo#492). We should update these links to contain the slash from the beginning to avoid the redirect.(dscho#3)

https://dscho.github.io/git-scm.com/downloads/mac/ has an odd grammar issue that https://git-scm.com/download/mac doesn't. (dscho#2) It says

which was released about 2 year, on 2021-08-30.

https://git-scm.com/download/mac correctly says

which was released about 2 years ago, on 2021-08-30.

Also note the slight url change there from dowload to downloads. There is a redirect for that, though, so that should be fine.

@rimrul
Copy link
Contributor

rimrul commented Oct 20, 2023

One additional note: There is a commit about porting the old 404 page, 18a3ac2, but I've only seen the generic GitHub pages 404 page on the preview in my testing.

@rimrul
Copy link
Contributor

rimrul commented Oct 21, 2023

Switching to pagefind also changed search behaviour in another way.

The rails site always searches the english content. Pagefind defaults to what they call multilingual search, i.e. searching only pages in the same language as the one you're searching from. That's theoretically a usability improvement, but with the partial nature of our non-english content (availability of any given language can vary from man page to man page, the book exists in languages that don't have any man pages, everything else only exists in english), we might need a fallback to english here. Pagefind offers an option to force all pages to be indexed as english, but I think we can slightly abuse mergeIndex with language set to en for a better result.

Copy link
Contributor

@rimrul rimrul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Only looked at the first 47 commits

application.js Outdated Show resolved Hide resolved
app/assets/javascripts/modernize.js Outdated Show resolved Hide resolved
layouts/_default/baseof.html Outdated Show resolved Hide resolved
script/cibuild Outdated Show resolved Hide resolved
static/js/application.js Outdated Show resolved Hide resolved
app/views/about/index.html.erb Outdated Show resolved Hide resolved
content/404.html Outdated Show resolved Hide resolved
content/about/branching-and-merging.html Outdated Show resolved Hide resolved
content/community/_index.html Outdated Show resolved Hide resolved
script/book.rb Outdated Show resolved Hide resolved
Gemfile Show resolved Hide resolved
dscho added a commit to dscho/git-scm.com that referenced this pull request Oct 24, 2023
This addresses that part of
git#1804 (comment):

	There are some broken links in the preview on
	https://dscho.github.io/git-scm.com/docs/ that lead to
	https://dscho.github.io/docs/ <topic>

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented Oct 24, 2023

The "Setup and Config" headline on https://dscho.github.io/git-scm.com/docs/ is blue in the preview, but not in the live site. This is not happening for me in local testing.

I managed to fix it via 2d0f6c8

@dscho
Copy link
Member Author

dscho commented Oct 24, 2023

All tagged releases are indexed, so a search for rebase returns /docs/git-rebase/ and /docs/git-rebase/2.41.0/ and /docs/git-rebase/2.23.0/ and ...

That is true. And in both the search results page as well as the little preview (<div id="search-results">) it's not visually obvious which result is the current version and which results are older versions.

Hmm. The more I think about it, the more I get convinced that the older versions of the manual pages should be excluded from the search, I thought it was a feature, but it looks as if it incurs more downsides than upsides.

@pedrorijo91
Copy link
Member

this was a major effort @dscho , thank you very much! sorry for the silence, but i've been busy with other stuff. in the meanwhile, and to ensure this effort wont be wasted, can you summarize what do you need to make this merge-ready?

what do you still need to tackle? where do you need help from other people? :)

@dscho
Copy link
Member Author

dscho commented Nov 6, 2023

can you summarize what do you need to make this merge-ready?

@pedrorijo91 Yes.

  • The search needs some love:
    • exclude the manual pages of previous versions from the search instead of trying to demote them; It's just too confusing
    • in the "live search" (i.e. when typing in the search box on any page other than the search results page), we will want to reinstate the "Reference"/"Book" separation of the search results. I'm currently unsure how we can accomplish that.
  • to make the URLs nicer by having no trailing slash (just like the existing Rails App), we will need to uglify the URLs.
  • general QA:
    • ensure that current URLs would work after migration
      • e.g. /about#branching-and-merging, /about#staging-area etc
    • add test -z "$(git grep "\\(href\|src\) *= *[\"']/")" to CI
  • rebase to the latest main

The big blocker is the "live search" one.

@dscho
Copy link
Member Author

dscho commented Nov 6, 2023

Oh, and there's a ton of work still needed to address @rimrul's excellent feedback.

dscho added a commit to dscho/git-scm.com that referenced this pull request Nov 7, 2023
In the current effort to migrate https://git-scm.com/ to a static Hugo
site (see git#1804), we saw a bogus
tag that would confuse Hugo. We also saw a now-unused banner that we
probably do not want to bother migrating to Hugo.

So let's drop both.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
dscho added 29 commits May 26, 2024 13:17
Updated via the `update-git-version-and-manual-pages.yml` GitHub workflow.
Updated via the `update-download-data.yml` GitHub workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants