Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all documentation hosting to stable URLs #2797

Open
6 tasks
daveverwer opened this issue Dec 19, 2023 · 25 comments · May be fixed by #3078
Open
6 tasks

Move all documentation hosting to stable URLs #2797

daveverwer opened this issue Dec 19, 2023 · 25 comments · May be fixed by #3078
Assignees

Comments

@daveverwer
Copy link
Member

daveverwer commented Dec 19, 2023

We currently host documentation on a variety of URLs:

  • [owner]/[repo]/main/documentation/package
  • [owner]/[repo]/0.1.0/documentation/package
  • [owner]/[repo]/0.1.0-pre1/documentation/package

and every time there is a new release, instead of redirecting to /0.1.0/documentation/package we now start redirecting to /0.2.0/documentation/package. We always set the canonical URLs to the new version and update sitemaps to point at the new version, but this is causing a huge amount of churn in what we are asking Google to index and is contributing to our ongoing search index issues.

We need to host documentation like this:

[owner]/[repo]/documentation/package

This is the canonical URL for a package's documentation, and is not a redirect. This should have a canonical URL meta tag and header pointing to itself and be marked for indexing.

[owner]/[repo]/[reference]/documentation/package

This is a canonical path to a reference specific documentation set, but should not be marked as canonical and should be excluded from Google indexing with a noindex tag and header. These pages should all point their canonical URL to the page above.

Notes:

  • We should host the most recent documentation on the canonical URL and the reference specific URL.

Steps

  • Host the canonical documentation for a package on the [owner]/[repo]/documentation/package URL
  • Set the canonical URL all documentation pages to be the [owner]/[repo]/documentation/package URL
  • Set noindex via a meta tag and HTTP header on every documentation page apart from the canonical page
  • Update the sitemap to only ever include the canonical documentation URLs, never reference specific URLs
  • Change the link on the package page to be the package's canonical documentation URL
  • Add a permalink icon to the documentation header that copies a link to a reference specific URL for the page's content
@daveverwer daveverwer self-assigned this Dec 19, 2023
@daveverwer
Copy link
Member Author

Just to write up some progress on this. I started a branch that removes the “no-reference” redirects and attempts to host documentation from those URLs. So, for example:

/SwiftPackageIndex/SemanticVersion/documentation/semanticversion

Hosts the documentation from the files in S3 at:

s3://{bucket}/swiftpackageindex/semanticversion/0.4.0/

Unfortunately, the hosting-base-path that we specify when we build documentation inserts a hard-coded /0.4.0/ reference into all of the generated files.

We need to generate a new set of documentation for the default documentation set, with a hosting-base-path that does not include a reference and store it in a “special” latest (or similar) directory in S3.

The issue here is that we don’t know what the “latest” version is at build time. It could be a default branch version, a pre-release, or a stable release.

@daveverwer
Copy link
Member Author

Progress is in the stable-url-doc-hosting branch.

@finestructure
Copy link
Member

finestructure commented Jan 30, 2024

As discussed, I've had a look into how many packages opt-in to generate docs but don't have any releases and the number is quite small: 14.

However, there's a different set of packages that is significantly larger that would currently be affected if we only ask Google to index packages with release docs, and that's packages that generate docs but have not had a release since they opted-into doc generation. They also don't have release docs. There are 95 of those, which is ~15% of all packages requesting docs (14 of 629).

While not ideal, I think as a first stab it'd ok if we didn't support having those indexed by Google off the bat since it's going to be quite tricky to do so. They're no worse off than they are currently, where we don't have them indexed by Google on any version, and the remedy is actually quite easy: simply tag a release.

Queries:

-- packages that generate docs on def branch and have no releases whatsoever
select p.url from
packages p join (
	select v.package_id
	from versions v
	where v.package_id in (
		-- has docs on default branch
		select distinct v.package_id
		from versions v
		where
		v.spi_manifest::text like '%documentation_targets%'
		and latest is not null
	)
	group by v.package_id
	having count(*) = 1
) t on p.id = t.package_id
order by p.url
-- packages that generate docs but have no latest release docs
select distinct p.url
from packages p
join versions v on v.package_id = p.id
where v.spi_manifest::text like '%documentation_targets%'
  and latest is not null
group by p.url
having count(*) = 1
order by p.url

@daveverwer
Copy link
Member Author

packages that generate docs but have not had a release since they opted-into doc generation

We could kick off re-builds of these for the latest stable release.

@finestructure
Copy link
Member

Unfortunately that won't work, because they're not opted into doc generation on those old tags. Only a new release would actually have an .spi.yml file with the doc targets set.

@daveverwer
Copy link
Member Author

Unfortunately that won't work, because they're not opted into doc generation on those old tags. Only a new release would actually have an .spi.yml file with the doc targets set.

Ah, of course. That's a shame.

@finestructure
Copy link
Member

I've thought of a pretty simple way for us to determine from within the builder itself whether a default branch build with docs should generate the docs for "latest release" or not. We can just list the tags and run them through SemanticVersion, just like we do in analysis, and if there are none, the branch build is a "latest release" doc set.

It's perhaps not 100% ideal in that the data doesn't come from the server but then again the source of truth is actually the repository we've checked out, so we're definitely looking in the right place.

@daveverwer
Copy link
Member Author

Yes that would work.

One thing we should consider is sending back with the API call and storing in our database is whether it generated a latest version. We will need to know definitively whether to add a nofollow and do a redirect, or a direct link to the latest documentation.

We should also not use the word latest as the directory name in the AWS bucket, either. That's a perfectly valid name for a default branch, and I can see projects using that as a branch name. Maybe !!!default or ___default or !!!latest or ___latest or something like that? Still all valid branch names but the chances of a collision is much less.

@daveverwer
Copy link
Member Author

From https://git-scm.com/docs/git-check-ref-format

  1. They cannot have ASCII control characters (i.e. bytes whose values are lower than \040, or \177 DEL), space, tilde ~, caret ^, or colon : anywhere.

So we could do ~latest to be safe

@finestructure
Copy link
Member

I've looked into this a bit this morning on the builder side and there are a few tricky parts we need to consider:

  • The builder part is going to be fiddly, because we need to essentially run doc generation twice for tags. The complication is that we need to interleave this with uploading and doc reporting. We also need to figure out on which doc gen to report back (probably just the existing reference one).

  • Running two doc builds will increase the run time and make us move closer to our 10min build time budget. I've had a look and the slowest total build duration we've currently recorded is 6mins. That is probably OK (most of that time is probably the build itself and doubling the doc gen will likely be OK).

    Normally, it would be a problem to only look at the "survivors" (i.e. those that managed to report back within the 10min deadline) but in this case those that take longer are already out of scope, so making it take even longer is not a problem. However, given that we're planning some infra changes we may actually pull them (if there are any, it's hard to tell) back into the 10min window and then we'd potentially knock them out again. Perhaps not a huge issue but just something to be aware of.

  • My biggest concern was the linkable-paths.json file that we report back. I was worried that it might contain references to the reference we're building. If that was the case we would have to report the one with the real reference (i.e. say 1.2.3 vs ~latest), because otherwise all past release versions would have the same references in the file. Luckily the file doesn't contain any reference to reference so we're OK here. (It might also not be a problem in the first place due to the way we're using linkable-paths.json - it would be used to point to the ~latest anyway.)

    However just generally we have to be careful in deciding for which tag doc set we report back. While they're equivalent in terms of doc content they will contain different variants and we're going to save these in the Version object.

Overall this is quite the change to how we're generating docs and I wish there was an alternative way to duplicate a doc set that we could simply tack on to the existing process.

It's really unfortunate that the doc archives have structural components embedded that don't make them re-hostable. I don't know if this is fundamentally impossible to change but I wonder if it'd be worth at least bringing up with the docc folks. Maybe there's an upstream change possible such that the doc gen complexities on our end could at least be only temporary if not outright avoided.

@finestructure
Copy link
Member

Another route worth exploring: Right now we generate docs as follows:

  • SwiftPM based
    • swift package generate-documentation
  • Xcodebuild based
    • xcodebuild docbuild
    • docc process-archive

It is my understanding that both processes essentially call out to docc under the hood.

I'm pretty sure we could also generate docs as follows in case of SwiftPM based builds:

  • docc convert
  • docc process-archive

The advantage would be that both now have the same second stage, docc process-archive and I believe that's the stage that actually takes the hosting base path parameter. In fact we know this, because xcodebuild docbuild doesn't take one. The info in the doc archive after xcodebuild docbuild must be free of base path parameters.

If docc convert is equivalent to xcodebuild docbuild and both produce the same input archive for docc process-archive, we could make this whole process easier.

Both paths of doc generation would generate a doc archive and then we run two passes of docc process-archive to write out two different doc sets with adjusted base paths. It would save us having to re-run doc generation for each target and do all the merging.

The downside of this process is that we might be diverging from how users generate docs and therefore make it harder to compare results in case there are problems (probably not a huge downside tbh.

@finestructure
Copy link
Member

finestructure commented Jan 31, 2024

It still feels like we're fighting a downstream problem that could perhaps be better addressed upstream in docc. For example, I've generated docs for the same package twice purely with different base paths. The only difference in the output was the base paths in the index.html files. All other files were the same (ignoring JSON key order randomness).

If somehow instead of taking full base paths the index.html files were based on a configurable variable, we would be able to drive any hosted archive off of any base path we choose either by injecting it at generation time and duplicating it or, ideally, dynamically at runtime by injecting a base path parameter when we fetch the docs. (I'm not going to suggest rewriting the index.html files on the fly 😬) I'm not sure how routing works in the JS app but maybe it'd be flexible enough to read the base path from a single location and augment all paths with it?

Pinging @ethan-kusters and @franklinsch et al - is this something worth discussing?

@finestructure
Copy link
Member

docc process-archive does something with a doc archive generated by our builder but it's not a result we would be able to rely on. It seems to be rewriting urls but it's also trimming most of the link and script tag content. It also doesn't rewrite the top level index.html file in documentation.

For reference, I ended up running

xcrun docc process-archive transform-for-static-hosting checkout/.docs/swiftpackageindex/semanticversion/0.4.0 --output-path checkout/.docs/swiftpackageindex/semanticversion/~release --hosting-base-path swiftpackageindex/semanticversion/~release

on an archive I created via

swift run builder generate-docs -s 5.9 -p macos-spm -c https://github.com/SwiftPackageIndex/SemanticVersion.git -r 0.4.0 --targets SemanticVersion -f

and the diff of one of the index.html files looked as follows (after running each file through tidy for easier diffing):

11c11
< "/swiftpackageindex/semanticversion/0.4.0/favicon.ico">
---
> "/swiftpackageindex/semanticversion/~release/favicon.ico">
13c13
< "/swiftpackageindex/semanticversion/0.4.0/favicon.svg" color=
---
> "/swiftpackageindex/semanticversion/~release/favicon.svg" color=
18c18
< var baseUrl = "/swiftpackageindex/semanticversion/0.4.0/"
---
> var baseUrl = "/swiftpackageindex/semanticversion/~release/"
19a20,27
> <script defer="defer" src=
> "/swiftpackageindex/semanticversion/~release/js/chunk-vendors.bdb7cbba.js"
> type="text/javascript">
> </script>
> <script defer="defer" src=
> "/swiftpackageindex/semanticversion/~release/js/index.2871ffbd.js"
> type="text/javascript">
> </script>
21,135c29
< "/swiftpackageindex/semanticversion/0.4.0/css/chunk-c0335d80.10a2f091.css"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/documentation-topic.1d1eec04.css"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/documentation-topic~topic.b6287bcf.css"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/documentation-topic~topic~tutorials-overview.d6f5411c.css"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/topic.d8c126f3.css"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/tutorials-overview.c249c765.css"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/chunk-2d0d3105.cd72cc8e.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/chunk-c0335d80.76a68cc5.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/documentation-topic.57e91f8a.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/documentation-topic~topic.1679ec90.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/documentation-topic~topic~tutorials-overview.90c61522.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-bash.1b52852f.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-c.d1db3f17.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-cpp.eaddddbe.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-css.75eab1fe.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-custom-markdown.7cffc4b3.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-custom-swift.5cda5c20.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-diff.62d66733.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-http.163e45b6.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-java.8326d9d8.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-javascript.acb8a8eb.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-json.471128d2.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-llvm.6100b125.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-markdown.90077643.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-objectivec.bcdf5156.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-perl.757d7b6f.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-php.cc8d6c27.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-python.c214ed92.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-ruby.f889d392.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-scss.62ee18da.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-shell.dd7f411f.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-swift.84f3e88c.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/highlight-js-xml.9c3688c7.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/topic.8cd0c0c4.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/tutorials-overview.2a32cd6f.js"
< rel="prefetch">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/index.038e887c.css"
< rel="preload" as="style">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/chunk-vendors.ba2dd0cb.js"
< rel="preload" as="script">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/js/index.e8a5d294.js"
< rel="preload" as="script">
< <link href=
< "/swiftpackageindex/semanticversion/0.4.0/css/index.038e887c.css"
---
> "/swiftpackageindex/semanticversion/~release/css/index.ff036a9e.css"
137,139d30
< <style type="text/css">
< .noscript{font-family:"SF Pro Display","SF Pro Icons","Helvetica Neue",Helvetica,Arial,sans-serif;margin:92px auto 140px auto;text-align:center;width:980px}.noscript-title{color:#111;font-size:48px;font-weight:600;letter-spacing:-.003em;line-height:1.08365;margin:0 auto 54px auto;width:502px}@media only screen and (max-width:1068px){.noscript{margin:90px auto 120px auto;width:692px}.noscript-title{font-size:40px;letter-spacing:0;line-height:1.1;margin:0 auto 45px auto;width:420px}}@media only screen and (max-width:735px){.noscript{margin:45px auto 60px auto;width:87.5%}.noscript-title{font-size:32px;letter-spacing:.004em;line-height:1.125;margin:0 auto 35px auto;max-width:330px;width:auto}}#loading-placeholder{display:none}
< </style>
142,148c33
< <noscript>
< <div class="noscript">
< <h1 class="noscript-title">This page requires JavaScript.</h1>
< <p>Please turn on JavaScript in your browser and refresh the page
< to view its content.</p>
< </div>
< </noscript>
---
> <noscript>[object Module]</noscript>
150,156d34
< <script src=
< "/swiftpackageindex/semanticversion/0.4.0/js/chunk-vendors.ba2dd0cb.js"
< type="text/javascript">
< </script><script src=
< "/swiftpackageindex/semanticversion/0.4.0/js/index.e8a5d294.js"
< type="text/javascript">
< </script>

@finestructure
Copy link
Member

finestructure commented Feb 5, 2024

I've tested the overhead of generating a ~release doc set in addition to a normal set for a tag for our largest doc set: swift-syntax:

CleanShot 2024-02-05 at 18 39 25@2x

It adds around 2 minutes of additional time, including all doc generation and uploading the 128 MB of docs. Subsequent processing doesn't impact our time limit and it happens asynchronously.

The total time for swift-syntax is just under 6 minutes, so we're well clear of the 10min limit here. NB: swift-syntax is likely one of the most critical packages but there's a chance that a package with a slower build time might be more at risk going over the limit.

I'll try to get typical build times for packages with docs.

@finestructure
Copy link
Member

FYI, I've chosen ~release as the name in case we want to at some point manage refs to the latest docs for any of the other significant versions as well: ~release, ~preRelease (~pre-release), ~defaultBranch (~default-branch).

@finestructure
Copy link
Member

The ~, however, poses a problem when pushing the files to S3. We'll either need to figure out how to properly encode it or choose some other way to avoid branch name collisions:

CleanShot 2024-02-05 at 19 01 13@2x

@finestructure
Copy link
Member

New run timed out, this is going to be a problem: https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6100021479

@finestructure
Copy link
Member

We could of course increase the timeout but the problem with that is that it'll then cause more trouble when he hit a slow build and make the delays worse. However, it should be possible to set the timeout dynamically based on package details such that we could give only the packages that are generating docs more time.

That in combination with the new machines should prevent us from running into timeout problems here.

@finestructure
Copy link
Member

Looking at the slowest doc builds, swift-syntax isn't actually in the top 10:

build_duration platform swift version runner_id builder_version package_name reference latest job_url
360.6515439748764 macos-spm 5.9 J1XnyXFH 4.28.8 AppStoreConnect 0.4.1 release https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6036436810
282.830687046051 macos-spm 5.9 TDmZkXJm 4.28.8 AppStoreConnect main default_branch https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6036436784
197.28208303451538 macos-spm 5.9 J1XnyXFH 4.28.9 Vercel main default_branch https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6039196540
158.17895805835724 macos-spm 5.9 J1XnyXFH 4.28.8 MetaCodable main default_branch https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6036285445
148.69907307624817 macos-spm 5.9 TDmZkXJm 4.28.7 Verge main default_branch https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/5970370087
133.36819994449615 ios 5.9 J1XnyXFH 4.28.9 Sublimation 2.0.0-alpha.1 pre_release https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6092051981
131.16940808296204 macos-spm 5.9 J1XnyXFH 4.28.9 swift-composable-architecture 1.7.3 release https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6100964488
128.8557449579239 macos-spm 5.9 J1XnyXFH 4.28.9 swift-openapi-request-dl 1.0.0 release https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6069665455
128.55133306980133 macos-spm 5.9 TDmZkXJm 4.28.9 swift-composable-architecture main default_branch https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6090102780
120.5449548959732 macos-spm 5.9 J1XnyXFH 4.28.9 swift-otel main default_branch https://gitlab.com/finestructure/swiftpackageindex-builder/-/jobs/6112471909

The slowest at 6min is actually already sitting at 7.5min job duration (due to cloning, reporting etc overhead), so we're awfully close or over if we duplicate doc generation.

@finestructure
Copy link
Member

FYI, I've had to change the url fragment to _release in order to work around the S3 upload issues.

@finestructure
Copy link
Member

I've looked into the documentation routing issue we discussed on Monday and unless I'm mistaken (which I hope 😅), the suggested solution outlined in the Custom Routing docs and in David's WWDC video won't work for us.
The problem is that the example deals with routing to a single doc archive on a site. For example, translated to our site for the package SemanticVersion, we have the following incoming request:

[ INFO ] GET /SwiftPackageIndex/SemanticVersion/0.4.0/documentation/semanticversion [component: server, request-id: D637AAA3-284B-46D5-BDBF-AEC5C03F842D]

If I route this to a doc archive without a base path (i.e. generated simply via xcodebuild docbuild), the webapp tries to make subsequent requests from /js, /css etc:

[ INFO ] GET /js/chunk-vendors.bdb7cbba.js [component: server, request-id: 35F8B45D-EF47-45E8-B8CB-44F930DCB576]

Now in the case of a single doc site, the custom routing docs simply route all /js, etc requests to the doc archive:

# Route files within the documentation archive.
RewriteRule ^(css|js|data|images|downloads|favicon\.ico|favicon\.svg|img|theme-settings\.json|videos)\/.*$ SlothCreator.doccarchive/$0 [L]

However, we can't do that, because we're hosting hundreds of archives, and different versions, and so we need the base path to know which doc archive to route to.

I've cross-posted this to the DocWG's slack here: https://swift-open-source.slack.com/archives/C04PCMXMBD0/p1707998191445999

@finestructure
Copy link
Member

I've created a branch no-redirect based on the doc re-writing changes in rewrite-doc-index-html and stable-url-doc-hosting that eliminates the redirects off the "canonical url", i.e. http://localhost:8080/SwiftPackageIndex/SemanticVersion/documentation is not a redirect anymore.

The rewriting seems to kick in ok (looking at the source), however the Vue app ends up in an error state for some reason:

CleanShot 2024-03-11 at 11 27 01@2x

Not sure what's going on there. There are no errors in the console (unless I'm looking in the wrong place) and there are no 404s or anything in the server logs either. Needs more investigation.

@finestructure
Copy link
Member

What's interesting is that

curl -s http://localhost:8080/SwiftPackageIndex/SemanticVersion/0.4.0/documentation/semanticversion

and

curl -s http://localhost:8080/SwiftPackageIndex/SemanticVersion/documentation/semanticversion

return the same html except for

  <link rel="canonical" href="/SwiftPackageIndex/SemanticVersion/0.4.0/documentation/semanticversion" />

and the former is displaying correctly.

@finestructure
Copy link
Member

I have a working doc hosting setup now from a stable URL via rewrites that doesn't require us to regenerate docs nor redirect. The one downside is that we need an additional "anchor" in the doc url in order to distinguish doc routes and make them routable in our DocC proxy.

Doc urls with references are unchanged:

http://localhost:8080/SwiftPackageIndex/SemanticVersion/0.4.0/documentation/semanticversion

Doc index.html snippet:

    var baseUrl = "/swiftpackageindex/semanticversion/0.4.0/"
    </script>
    <link href="/swiftpackageindex/semanticversion/0.4.0/css/chunk-c0335d80.10a2f091.css" rel="prefetch"/>

Default docs could be hosted as

http://localhost:8080/SwiftPackageIndex/SemanticVersion/current/documentation/semanticversion

Doc index.html snippet:

    var baseUrl = "/swiftpackageindex/semanticversion/current/"
    </script>
    <link href="/swiftpackageindex/semanticversion/current/css/chunk-c0335d80.10a2f091.css" rel="prefetch"/>

I was hoping to get

http://localhost:8080/SwiftPackageIndex/SemanticVersion/documentation/semanticversion

to work, but it doesn't. The problem here is that documentation cannot be part of the base path as it's part of the docc url. For example, there's also tutorial/.... So if we wanted documentation to be the "anchor" we'd actually have urls like

http://localhost:8080/SwiftPackageIndex/SemanticVersion/documentation/documentation/semanticversion

This does work but feels like an odd url.

In general, any path element of our choosing will do. For instance _ would also work

http://localhost:8080/SwiftPackageIndex/SemanticVersion/_/documentation/semanticversion

The reason we can't really drop the "anchor" is that it would overlay resource paths with our existing resources. For example, the list of DocC resource paths is

/documentation/**
/tutorial/**
index.html
/css/**
/data/**
/images/**
/img/**
/js/**

Some of these collide with our static resources. I have not tried but I could imagine we might be able to make this work if we either moved our resources to another path or in our routes checked for resources among both docc and our static resources when for example serving css/**.

If we did the latter, we'd be mapping {owner}/{package}/index.html to be the doc page, which should work because we don't really reference package pages via index.html.

We'd also have to ensure that the actual resources have different file names (likely but hard to control since we don't control DocC resource file names).

However, we'd now be mixing the rather messy DocC proxy routes with our existing routes, creating a bigger mess. Figuring out if a js/** 404 is due to a missing static resource, a missing docc resource, or a messed up route suddenly becomes much more difficult to tell. Or rather impossible without debugging into it. It just doesn't feel like a good solution.

Unless I'm missing another option I think we'd have to move our static resources to another base path if we wanted to avoid an additional anchor in our doc urls. Given that, I think I'd opt for

http://localhost:8080/SwiftPackageIndex/SemanticVersion/_/documentation/semanticversion

as the canonical doc url. _ is not too bad and also even unlikelier to collide with a branch name than even current or latest, release, or something similar.

Finally, there is some value in being able to tell from the url which part of the routing handles it. I.e. we'd know that any _ route comes from our DocC proxy. That's a separation we'd lose even if we avoided the messier resource overlay by moving our static resources out of the way.

@finestructure
Copy link
Member

PR #2961 is in preparation for this change. I've run the following additional manual tests to ensure all doc urls keep working:

  • generate restfile via rester-sitemap.swift https://swiftpackageindex.com/{owner}/{repo}/sitemap.xml (i.e. from PROD) using Rester-sitemap - this generates a restfile for each url listed in the sitemap and should give us full coverage of all doc urls
  • lightly edited the resulting file to make them take a ${base_url} parameter and update the tested return codes (200 instead of 301 where applicable etc)
  • split out a partial file to test the redirects
  • generated files for both SemanticVersion (a simple doc set) and HandySwift (a larger doc set with tutorials)

These files (attached below) can then be against DEV via

env base_url=https://staging.swiftpackageindex.com \
    rester doctest-SemanticVersion-partial.restfile

env base_url=https://staging.swiftpackageindex.com/SwiftPackageIndex/SemanticVersion/0.4.0 \
    rester doctest-SemanticVersion.restfile

env base_url=https://staging.swiftpackageindex.com/SwiftPackageIndex/SemanticVersion/~ \
    rester doctest-SemanticVersion.restfile

env base_url=https://staging.swiftpackageindex.com \
    rester doctest-HandySwift-partial.restfile

env base_url=https://staging.swiftpackageindex.com/FlineDev/HandySwift/4.0.1 \
    rester doctest-HandySwift.restfile

env base_url=https://staging.swiftpackageindex.com/FlineDev/HandySwift/~ \
    rester doctest-HandySwift.restfile

restfiles.zip

@daveverwer daveverwer linked a pull request May 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants