New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sitemap.xml.gz slightly more reproducible #3460
Conversation
The gzip format stores a timestamp inside it, but there's no real point to it being correct. If a site is rebuilt exactly the same twice, the timestamp *metadata* of files will be different sure, but this gzip file was the only one that also had *actual content* that is different each time.
Don't you fear that setting a static metadata to a file could introduce unexpected behaviors from third party implementations? On top of my head could be indexing robots that might look at it to update or not a sitemap and indexing but I don't know really, just sharing a thought. |
Valid point. And I don't know how to be fully sure. There are like 4 places where timestamps could come into question:
|
Since sometimes code is faster than words, I'd like to humbly propose another approach with #3468 If you like the idea, I can work on fixing tests and improving the code ofc |
I had an even better idea- the reproducible value won't be something fake, but instead the max of all the dates mentioned in the sitemap. I'll try to rework this PR to that idea. |
Oh what, the sitemap populates |
Now instead the date of the gzip file will change only once per day, based on the pages' update date. The sitemap.xml itself also changes once per day already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a nice bargain indeed
Are there plans to release a new version which includes this change? Avoid needless changes to the compressed archive in static websites maintained on github would be nice. |
The gzip format stores a timestamp inside it, but there's no real point to it being correct.
If a site is rebuilt exactly the same twice, the timestamp metadata of files will be different sure, but this gzip file was the only one that also had actual content that is different each time.
Now instead the date of the gzip file will change only once per day, based on the pages' update date. The sitemap.xml itself also changes once per day already.