Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page resources duplication for translations (even in 0.123) #11959

Closed
TiGR opened this issue Feb 1, 2024 · 19 comments
Closed

Page resources duplication for translations (even in 0.123) #11959

TiGR opened this issue Feb 1, 2024 · 19 comments

Comments

@TiGR
Copy link

TiGR commented Feb 1, 2024

There is an issue #11453 already dealing with somewhat similar problem. It is marked as solved with 0.123.0, but that solution only works with md files. In our case, all translations are in html format, and all the resources are duplicated to all the languages and there is no way to prevent this from happening. In our case it takes dozens of gigabytes of extra space, creating all sorts of problems. We even tried avoiding this by loading the resources only from the base language, but it doesn't matter as hugo still duplicates all the resources to all the languages.

What version of Hugo are you using (hugo version)?

$ hugo version
hugo v0.123.0-DEV+extended linux/amd64 BuildDate=unknown

Does this issue reproduce with the latest release?

Yes.

@bep bep self-assigned this Feb 1, 2024
@bep bep added this to the v0.123.0 milestone Feb 1, 2024
@bep
Copy link
Member

bep commented Feb 1, 2024

I'm not sure I follow.

I may be misunderstanding this, but then you need to provide me with a full and running example project.

@TiGR
Copy link
Author

TiGR commented Feb 1, 2024

We have something like this:

content/
└── news/
    └── 2023/
        └── 01/
            └── 1/
                ├── image1.jpg
                ├── image2.jpg
                ├── index.md
                ├── index.es.html
                └── index.fr.html

We get this built:

news/
└── 2023/
    └── 01/
        ├── image1.jpg
        ├── image2.jpg
        └── 1.html
es/
└── news/
    └── 2023/
        └── 01/
            ├── image1.jpg
            ├── image2.jpg
            └── 1.html
fr/
└── news/
    └── 2023/
        └── 01/
            ├── image1.jpg
            ├── image2.jpg
            └── 1.html

All images are duplicated in all languages.

@bep
Copy link
Member

bep commented Feb 1, 2024

Is this a multihost setup? As in: Do you have multiple baseURLs?

@TiGR
Copy link
Author

TiGR commented Feb 1, 2024

No, we have only one baseURL

@TiGR
Copy link
Author

TiGR commented Feb 1, 2024

BTW, you tagged it as regression, but it is not - resources were always duplicated for us. We hoped that 0.123 would fix this, but it does not.

@bep
Copy link
Member

bep commented Feb 1, 2024

We have something like this:

Again, we have certainly tested this. This behaviour is disabled if you have either

  • Multihost setup
  • Or markup.goldmark.duplicateResourceFiles config is set to true.

If you can give me a failing and running test case, I can look at it. But I cannot guess from the above.

@gohugoio gohugoio deleted a comment from XoL1507 Feb 1, 2024
bep added a commit to bep/hugo that referenced this issue Feb 1, 2024
bep added a commit to bep/hugo that referenced this issue Feb 1, 2024
@bep bep closed this as completed in #11969 Feb 1, 2024
bep added a commit that referenced this issue Feb 1, 2024
@TiGR
Copy link
Author

TiGR commented Feb 2, 2024

@bep this issue is still there with the latest commits.

I've created reproduction case for this issue.

So, the problem is, that we use a lot of images through shortcodes, since we need complicated code to be generated (carousels, complicated images, etc). And all these images are being duplicated into all languages. The code reproduces it - file featured.png from news/1 is duplicated both in public/news/1/featured.png and in public/es/news/1/featured.png.

@bep bep reopened this Feb 2, 2024
@bep
Copy link
Member

bep commented Feb 2, 2024

Yea, sorry, I closed the wrong issue. Thanks for the repro, I will look at it later today.

@bep
Copy link
Member

bep commented Feb 2, 2024

OK, I have found the culprit. I had a head-scratching moment with this one. @jmooring if I could borrow your brain for a minute.

We added a

markup.goldmark.duplicateResourceFiles

Config, which is default false for Markdow/Goldmark, but

  • Is true in Multihost sites
  • It is always true for other Markups (e.g. HTML)

The last bullet is this issue. I guess @TiGR 's challenge is that there's no way for him to configure it his way. We could always push this problem to another day, it behaves like before for HTML and friends.

@bep bep added the NotSure label Feb 2, 2024
@artch
Copy link

artch commented Feb 2, 2024

I'm watching this thread and my 2 cents is that it's also an issue for me. Having the same localized page in different formats (md/html) makes perfect sense for me. Page resources management should not be related to page markup format, it feels like an unexpected behavior.

@bep
Copy link
Member

bep commented Feb 2, 2024

Page resources management should not be related to page markup format, it feels like an unexpected behavior.

It behaves the same as before, which I guess is not unexpected. The reasoning behind the current behaviour is that for Markdown/Goldmark we have render hooks which provides "portable links". You have fixed it with a shortcode, but this isn't the case for all HTML users.

@artch
Copy link

artch commented Feb 2, 2024

It behaves the same as before, which I guess is not unexpected.

Before this change it worked the same way for Markdown and HTML. A non-optimized way but at least the same way. After this change it will start to behave differently depending on file format which will cause confusion. The idea that it's tied to Goldmark configuration is a little bit hidden and will require efforts to realize.

In my opinion if there is no way to make it work for both Markdown and HTML pages identically, this option should be disabled by default. However, for me personally proper full support of this feature would be a game changer in regard to resources consumption optimization in CI/CD pipelines.

@bep
Copy link
Member

bep commented Feb 2, 2024

@artch yea, I don't think we disagree about this; it's just that we (mostly I and @jmooring) have spent a fair amount of thinking about ways to do this (and some other changes) with the least amount of noise from people shouting that we somehow broke their site. We can still fix this, though.

We could possibly add a duplicateResourceFiles (default false) to the top level, so the defaults would be:

markup.duplicateResourceFiles=true
markup.goldmark.duplicateResourceFiles=false

@TiGR
Copy link
Author

TiGR commented Feb 2, 2024

That would be perfect!

@jmooring
Copy link
Member

jmooring commented Feb 2, 2024

We could possibly add a duplicateResourceFiles (default false) to the top level, so the defaults would be:

markup.duplicateResourceFiles=true
markup.goldmark.duplicateResourceFiles=false

The proposed default value of markup.duplicateResourceFiles is not clear to me.

@bep
Copy link
Member

bep commented Feb 2, 2024

The proposed default value of markup.duplicateResourceFiles is not clear to me.

Thinking about it, it's not clear to me either. I'm closing this for now, as this certainly works as designed and this isn't breaking behaviour vs how it behaved.

We need to finish this release and get it out the door without creating new problems.

Please create a new proposal for the above use case outlining how this should work.

@bep bep closed this as completed Feb 2, 2024
@artch
Copy link

artch commented Feb 2, 2024

@bep I understand that you have put some efforts into designing this. You surely have reasons to implement things as you see fit. I am just providing feedback from an outside person that linking page resource management features to formatting configuration options looks highly counterintuitive to a Hugo site maintainer, despite the fact it seems logical to you. If there is a chance that it will have to be redesigned later, it's better to spend additional time now and make it right from the beginning.

@jmooring
Copy link
Member

jmooring commented Feb 2, 2024

linking page resource management features to formatting configuration options looks highly counterintuitive

Well, we could put config options anywhere we want. But it would not change the underlying relationship with content format. Markdown is the only content format that provides render hooks, the critical components that allow the de-duplicated, multilingual, single-host approach to work without shortcodes. And the chances of creating render hooks for html, adoc, pdc, rst, and org are zero.

What we have now is backwards compatible.

You're certainly welcome to suggest changes, but you'll need to create a new issue (proposal) for that.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants