Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The service worker cache is causing hard to debug issues mainly in production environments #12830

Open
ahukkanen opened this issue May 6, 2024 · 1 comment
Labels
type: bug Issues that describe a bug

Comments

@ahukkanen
Copy link
Contributor

ahukkanen commented May 6, 2024

Describe the bug

Decidim is currently using a service worker caching strategy that tries to fetch most user-facing pages from the cache as defined at:

registerRoute(
({ request }) => request.mode === "navigate",
new NetworkFirst({
networkTimeoutSeconds: 3,
cacheName: "pages",
plugins: [
new CacheableResponsePlugin({
statuses: [0, 200]
}),
new ExpirationPlugin({
maxAgeSeconds: 60 * 60
})
]
})
);

This is causing hard to replicate and hard to debug issues mainly in production environments where there is typically more load involved, as the timeout is set to so low.

Generally speaking, this caching strategy is pretty bad because Decidim involves a lot of dynamic functionality, such as submitting a new proposal, voting in a PB, answering a survey or commenting on a topic. If the service worker sees a page load taking too long time, it will automatically revert to the cached page which causes issues with many dynamic functionalities, especially those ones that are trying to load a page causing a redirect.

On the other hand, it is really nice that this strategy exists for the ActiveStorage resources because they can no longer be served through direct file URLs as was the case with CarrierWave. It is a shame because nowadays all file requests are causing some Ruby processing time for the CPU as the files have to be resolved dynamically using a blob identifier. There are ways how we can fix this problem when using external storages (check #12576) but there is currently no way to improve the performance significantly when using a direct file storage, i.e. the default Decidim configuration. With CarrierWave it was rather easy as these files could be directly served by the web server, causing less load on the Ruby backend.

To Reproduce

  1. Create a new budgets component with a single budget in it
  • Configure "vote in all" workflow and use rule "minimum projects" with 1 project at minimum
  1. Create some projects inside the newly created budget
  2. Publish the component
  3. Create a manual/artificial delay at the "checkout" controller that is over the configured cache timeout (i.e. add sleep 3 inside the checkout action)
  4. Go to the participant view and to the newly created component
  5. Make sure you have developer's console closed so that the cache won't be bypassed
  6. If you had developer's console open before, reload the page after closing it
  7. Perform the PB voting process
  8. Click "Vote budget"
  9. See that user gets no feedback for their action because the server took a longer time to process the request
  10. The user is left confused what just happened and if their vote has been cast or not (and the user sees the view as if they didn't vote)
  11. Get tons of messages from citizens who are confused why the system is not working

Expected behavior

Caching is hard. Therefore, we should not do it by default and only in cases where the caching strategy can be planned properly for the actual instance and the actual use case being cached.

Currently there is even no way to control the service worker caching options other than customizing the whole sw.js file producing the service worker entrypoint, as defined here:

// https://developers.google.com/web/tools/workbox/modules/workbox-recipes#pattern_3
registerRoute(
({ request }) => request.mode === "navigate",
new NetworkFirst({
networkTimeoutSeconds: 3,
cacheName: "pages",
plugins: [
new CacheableResponsePlugin({
statuses: [0, 200]
}),
new ExpirationPlugin({
maxAgeSeconds: 60 * 60
})
]
})
);

Instead, the thinking should be the other way around: default to what works the best (i.e. no caching) and then let individual implementers to add caching in case they feel comfortable with it. And only in some specific cases, e.g. where there is no dynamic functionality expected to happen on those pages.

As a starting point, I suggest limiting the caching only to the active storage routes as this generally shouldn't cause any issues, as the information stored through ActiveStorage is rather static. This would mean changing the lines pointed above to:

registerRoute(
  ({ url }) => url.pathname.startsWith("/rails/active_storage/"),
  new NetworkFirst({
    networkTimeoutSeconds: 3,
    cacheName: "storage",
    plugins: [
      new CacheableResponsePlugin({
        statuses: [0, 200]
      }),
      new ExpirationPlugin({
        maxAgeSeconds: 60 * 60
      })
    ]
  }),
);

Screenshots

No response

Stacktrace

No response

Extra data

  • Device: (any)
  • Device OS: (any)
  • Browser: (any, supporting service workers)
  • Decidim Version: 0.27.0+
  • Decidim installation: (any)

Additional context

No response

@ahukkanen
Copy link
Contributor Author

Regarding the caching in general in Decidim, I think there hasn't been any planning on the proper caching strategies because there are many such oversights regarding the caching strategies currently being used.

I believe the thinking process has been that performance is bad -> let's add caching everywhere as a band-aid.

It should be rather:

  • Performance is bad
  • How can we improve performance?
  • How can we test it under actual high loads?
  • How can we validate the effect of performance improvement changes?
  • Implement performance improvements
  • Test the effect of the improvements
  • Repeat until there is nothing more to fix
  • Once all performance issues are fixed, start planning a proper caching strategy
    • What does caching mean in the context of the application?
    • What we can cache?
    • What we definitely cannot cache?
    • How do we implement caching? Front-end, back-end or both?
    • Are there any potential issues with the nit-picked places for caching? How does it work under different parameters, such as different users, different languages, different times, different locations, different user roles, etc.
  • Once you know what can be done regarding caching, implement the defined strategy
  • Test the caching functionality thoroughly

If the goal is to improve the load times for the pages, there are also other alternatives than just caching. E.g. turbolinks (or nowadays hotwired/turbo) or similar technologies can improve the load times in a significant manner apart from the first page load. I know it was never implemented because it was seen as "too hard". The matter of the fact is that caching is harder to implement properly.

@carolromero carolromero added the type: bug Issues that describe a bug label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Issues that describe a bug
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants