Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File tree diff API #11319

Open
stsewd opened this issue May 8, 2024 · 1 comment
Open

File tree diff API #11319

stsewd opened this issue May 8, 2024 · 1 comment
Labels
Feature New feature Needed: design decision A core team decision is required

Comments

@stsewd
Copy link
Member

stsewd commented May 8, 2024

What's the problem this feature will solve?

Currently we can't link users to a specific page that has changed in a PR preview, or we can't suggest redirects for files that were renamed/deleted.

Describe the solution you'd like

File tree diff (FTD) is a feature that allows users to see the differences between the file trees of the generated documentation from two versions. This allows users to see which files were deleted or added, it can also list of files that were changed, and sort them by the number of lines changed on each file.

Haven't done much research about this, but doing a diff over two file trees should be a problem that's solved, worst case we can just do a manual diff using a set for the file tree, and using the unix diff command for the files. We could expose this as an API.

Note that this is a different product from the diff feature for Pull requests, that's focused on doing a diff over the HTML content itself, here we just care about the files that changed and how many lines were changed.

A basic example would be:

Current content:

  • index.html
  • api.html
  • guides.html

New content:

  • index.html (9 lines changed)
  • api.html
  • guides.html (deleted)
  • guidex/index.html (added)

Our API will list the files that were removed and the ones that were added, we can scope it to just track HTML files to start, and maybe limit the number of files returned.

There may be some tools that change all the pages on each build (like updating the commit on each file), that's were the sorting by number of lines changed comes into play).

Some features ideas that we can build on top:

  • Suggest redirects
  • "Go to preview of changed documents"
  • Stats of files/lines that were changed
  • Permalinks that were changed #foo -> #bar

Alternative solutions

We have discussed other solutions for this in the past, but they rely on the source files, not in the generated files. That's a problem since our serving and redirects work over the generated HTML files.

Alternative names

  • RTFD: Read the Docs Tree File Diff
  • FDD: File and directory diff

Additional context

@stsewd stsewd added Feature New feature Needed: design decision A core team decision is required labels May 8, 2024
@humitos
Copy link
Member

humitos commented May 10, 2024

All of this sounds like a really good idea to me. I see how we can implement some nice features on top of this data 👍🏼

Haven't done much research about this, but doing a diff over two file trees should be a problem that's solved, worst case we can just do a manual diff using a set for the file tree, and using the unix diff command for the files

We may need to do this at S3 if that's possible. Otherwise, we will need to rclone the two versions first (which may take lot of time -- however, we may be able to filter by *.html). Something along these lines would be a good first step to research about in my mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

2 participants