Proposal: Website 2020

This will hopefully serve as the documentation for the website once/if executed.

High-level plan

Kill all use and trace of the Apache CMS
Publish html directly to git
Allow for several sources to publish html

The result will be several sources, that can be run and managed independently, feeding content into the git repo housing our live html website.

This is a pragmatic perspective that sets us up to get a best-of-breed outcome acknowledging trends in all our website endevors:

All tools we’ve used have been heavily extended
Content takes a hit each tool change
All tools have limitations (strenghts/weaknesses)
Filling gaps involves extensions (bullet one)
Tools last on average 2-5 years
Many types of content actually exist: javadoc, release notes, download pages
We will always be in a hybrid situation

Think of it as "microservices for content" and avoiding a monolith.

Ideally this sets us up to acknowledge and embrace evolving our website tech without many of the above disadvantages. If we have a clean CSS and simple menu, we should be able to take HTML from anywhere.

When we want to add a new content source we do not need to figure out how to get it to work "through" the existing generator or redo everything that already works, we simply have it generate content directly to html directly to our site git.

As long as we maintain a common CSS and look and feel, we’re good.

DJ

That’s one point of view. I agree with a lot of it, but I’m worried… Perhaps I can characterize it as thinking that more options are better. However, nothing stopped anyone from migrating all the content from CMS to an external .md based system, or from .md to .adoc. But no one did it. Instead, people piled system on top of system, duplicating content in a partially intelligable state, and reducing the amount of organization and coherence at every step. I think reasonable priorities are:

as few systems involved in the website as possible
avoid extensions and writing our own systems, we don’t have time to maintain them let alone document them.
organize the input to the website generation process.
discourage adding more systems.

My most profound teacher said to pay attention to the balance between structure and behavior. We need simple and fairly rigid structure here so behavior can go towards improving the documentation rather than wondering how or why.

I’m very worried about having more than one process push content to the git "website" repo. Right now, we effectively have that process with the svn repo, and there’s some content that seems to have been added and abandoned. No one will tell me about it, including the person who put it there.

As a result, a requirement in my mind is that publishing the website needs to start with deleting everything previously there. The next step is adding the entire current content.

Also, "publishing straight to git" actually means committing to a local repository (with tooling this could be bare). This gives a preview opportunity, followed by pushing to the apache repo.

Kill all use and trace of the Apache CMS

TODO

DJ

I think that my antora preview is 99% there. Having a second opinion would be valuable, but that could be a time consuming endeavor.

Publish html directly to git

Apache allows a project to designate a git repository as their "website." All files in that repository are published as-is to the internet as the project’s website. HTML must be committed to this repository as it does not offer any generation of any kind.

TODO: what is the process for getting one of these repos?

TODO: can we get Infra to do a svn-git migration of our current flat-html?

DJ

IMO having the git repo as anything but "staging" will make the process 100% headed for disaster. What is this html content?

Allow for several sources to publish html

In the new architecture each content generator publishes rendered html directly to the site git.

DJ

Perhaps…. I think that something more like a maven build is more likely to be useful. If there needs to be more than one content generator, then there should be some sort of process to build the entire site and push it to git. Without this, no one will be able to understand how the site is built.

For Antora, it’s easy to set up a gitlab ci pipeline that builds a site and publishies it on netlify. I don’t know if anything similar is possible for github, but I thnk it’s worth investigating. My idea is something like any commit results in updating a preview site. If you like it you can push to "master", so asf shows it.

The following is a rough outline of the types of content:

Versioned documentation for a software distribution
Community/Developer documentation
Website front-page and "marketing" pages such as major features, benefits, etc
Examples
Javadoc
Release notes and download pages
Contributors page

DJ

Another possible division would be

asciidoc content
- human written
- generated
non-asciidoc content
- human written
- generated

A perpendicular categorization is:

Stuff that fits well in Antoras’s information architecture
Stuff that is outside the scope of Antrora’s infomration architecture.

While it’s easy to regard Antora as an asciidoc processor, IMO its more important contribution is providing a consistent and easy to understand information architecture framework.

Versioned documentation for a software distribution

All of our "product documentation" efforts to date have been in some way wiki-like in nature. They allow any kind of content to take any shape and do not encourage structure.

As a result our content is all miscellaneous odds and ends that do not fit together in any significant chapters or flow. Said another way we’re all "blog" and no "book."

The proposal for this is to use Antora tied to an effort to create a documentation outline that encourages contribution on-rails. Gaps in the documentation should be obvious, which hopefully encourages contribution

DJ

I suggest we heavily use Antora’s structuring abilities, using components, modules, and topics as much as poosible. I suggest that we drop any auto-generated TOC or navigation pages and think carefully about organization.

Community/Developer documentation

Learning how our community works and how to contribute (be a developer) is also an experience that really needs to be on-rails.

The proposal for this is to use Antora tied to an effort to create a deliberately smaller outline of how to get involved.

This content should be very focused on "developer onboarding", something all open source projects must nail to grow.

DJ

My idea is that the "common" component ought to be this, but right now it contains "all the old stuff". Some of this indeed appears to be community focused, but a lot is earlier versions of what’s also in the versioned component.

Website front-page and "marketing" pages, features, etc

When people come to the website they must get a human-perfect orientation that gives them the most important information in highlighted form with the least clicking.

There is no proven structure for gaining someone’s immediate attention and not losing them. They need to know "why TomEE", ideally with some pictures or video. There also needs to be a very small handful of pages to highlight features and further pull people in.

The proposal for this is to use the existing Jbake setup as it is free-form and enforces no structure. These pages must be enabled to continuously discard/reinvent (revolve vs evolve) and keep trying different ways to get people’s attention.

DJ

I don’t know enough about what you have in mind for this content to know what to think. There’s the front page, which I think is not generated. Otherwise, what is this content? I’d think if people write it, it will be in asciidoc, and can be rendered with Antora.

Examples

The examples section of the website are arguably the only truly successful part of the site in its current form. Both the Front-page and product-documentation parts of the site fall short of accomplishing what they should do.

The current library of examples is 180 and growing as the #1 place where new contributors find success contributing to TomEE. After improvements made in Dec 2018, contributions over the next 12 months doubled bringing in over 40 contributors all the examples.

The proposal for this is to continue the existing Jbake setup as it has proven to be very successful for this application and more enhancements are planned, such as:

Adding contributors faces to each example page
Automatically linking code to related online javadoc
Automatically suggesting related examples

DJ

I’m biased, but how is the jboake setup better than the Antora setup, which I’ve spent about 0 time on yet? Running everything through Antora will assure a uniform appearance and unified navigation.

Javadoc

The current "tomee-site-generator" will clone 34 repositories and branches across TomEE, Jakarta EE and MicroProfile to generate clean javadoc trees of each one.

The Javadoc tree for TomEE is created taking all modules and combining them into one tree so people get a single, fully-linked javadoc tree and do not need to be burdened by several small modules.

The Javadoc tree for Jakarta EE is created in the same spirit, grabbing the correct release branch of each API and version in Jakarta EE 8 and combining it together into one fully-linked "jakartaee-8.0" tree spanning the full platform.

The Javadoc tree for MicroProfile is created in the same spirit, grabbing the correct release branch of each API and version in MicroProfile 2.0 and combining it together into one fully-linked "microprofile-2.0" tree spanning the full MicroProfile umbrella spec.

Several motivations exist to grabbing the Jakarta EE and MicroProfile javadoc and publishing it on the TomEE site.

Oracle will no longer publish "javaee" docs. There is no plan current in the Jakarta EE side of the fence to publish unified javadoc. There is an industry gap we can fill that will generate website traffic to TomEE.
MicroProfile does not current publish fully-combined javadoc. There is a gab currently. We can fill this as well to provide value to the industry and generate traffic to TomEE.
A future plan for our examples is to link code to javadoc. Linking to javadoc on our own site has the advantage that they never leave the site and links are guaranteed stable.
Reverse linking. The javadoc itself can have links to the relevant examples that show how that class is used. This can be done having an index of each example, what api classes it uses and then inserting multiple @see links in the source prior to javadoc generation.

The proposal is to decouple this code from the current tomee-site-generator code as it is a separate concern, does take a very long time to generate, and following the spirit of this overall proposal should be fully independent and not be mixed in with anything JBake-related.

DJ

I applaud this. Once the javadoc is generated, which I would expect only really needs to happen for a release, there’s the question of how to get it into the site. An advantage of including it in the Antora processed content is that links into it will be checked by Antora while building the site. However, at the moment including pregenerated javadoc is not built into Antora although it’s an experiment I plan to make soon, and definitely in Antora’s future.

Release notes and download pages

The release notes and download page data at one point came entirely from https://svn.apache.org/repos/asf/tomee/sandbox/release-tools/

When this process was working at its best, release notes and download page entries were generated automatically as part of the release process.

Release cadence slowed and these tools decayed due to lack of knowledge transfer in their existence and how to maintain them.

As we increase our release cadence we have renewed need to automate the release overhead of updating download pages and creating release notes.

The proposal is to move this code from svn "sandbox" to a proper git repo and employ automation techniques to cause download pages and release notes to be automatically updated. This time not by a tool run by the person doing the release, but by a CI job based on the same technique we will need to automate publishing of docs or examples when they are updated.

The automated job will run on a timer and simply check dist.apache.org for a new release. It can also be manually triggered and re-run at any time via the corresponding CI job.

DJ

I wondered where those came from, I’ll have to look into this. Another workflow would be to have the tool generate asciidoc and commit it to a repository that is an Antora site source, so site builds will automatically include it and it will have consistent appearance.

Contributors page

We have had several attempts at maintaining a contributors page, none of them successful.

Manual attempts only reflected some individuals. Automated attempts were too clever and have broken over time.

The proposal is to create code to run via a CI job triggered via a git webhook that simply screen-scrapes this page when the TomEE repo is updated:

https://github.com/apache/tomee/graphs/contributors

This will allow us to ensure all 98 and growing contributors are listed and the page is updated when the contributor list changes as PRs are merged.

In the future we can potentially do more to encourage contributors by highlighting them on the TomEE website.

DJ

A better contributors page would be great! Screen scraping that github page strikes me as exceedingly fragile. Linking to it might be an option! I suspect easier than screen-scraping would be extracting the info from git ourselves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WEBSITE-2020.adoc

WEBSITE-2020.adoc

Proposal: Website 2020

DJ

Kill all use and trace of the Apache CMS

DJ

Publish html directly to git

DJ

Allow for several sources to publish html

DJ

DJ

Versioned documentation for a software distribution

DJ

Community/Developer documentation

DJ

Website front-page and "marketing" pages, features, etc

DJ

Examples

DJ

Javadoc

DJ

Release notes and download pages

DJ

Contributors page

DJ

Files

WEBSITE-2020.adoc

Latest commit

History

WEBSITE-2020.adoc

File metadata and controls

Proposal: Website 2020

DJ

Kill all use and trace of the Apache CMS

DJ

Publish html directly to git

DJ

Allow for several sources to publish html

DJ

DJ

Versioned documentation for a software distribution

DJ

Community/Developer documentation

DJ

Website front-page and "marketing" pages, features, etc

DJ

Examples

DJ

Javadoc

DJ

Release notes and download pages

DJ

Contributors page

DJ