Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on thread-unsafe plugin usage patterns #5

Open
yyx990803 opened this issue Apr 1, 2024 · 9 comments
Open

Thoughts on thread-unsafe plugin usage patterns #5

yyx990803 opened this issue Apr 1, 2024 · 9 comments

Comments

@yyx990803
Copy link

Context: https://github.com/sapphi-red/parallel-js-plugin-experiment/blob/main/research-for-design.md

  • It's great that many common caes can be solved with module.meta!

  • Agree that we should make chunk.meta work like module.meta and maybe also need something like currentBuild.meta (exposed via plugin this context maybe?), essentially officially supported mechanisms for storing thread-safe data.

  • Regarding plugin-vue - we do have the option to re-design plugin-vue to be more threading-friendly. But for now, cache-miss is not the end of the world.

  • We don't need to worry about @rollup/plugin-node-resolve or @rollup/plugin-commonjs - these are built-in features of Rolldown so they are not going to be needed anyway. Similar for @rollup/plugin-replace, vite:esbuild, vite:esbuild-transpile etc.


How can we handle (A7) pattern in parallel plugins? Also for builtin rust plugins.

This seems to be the biggest blocker. I think we may need to identify cases of:

  1. Using api for exposing / mutating serializable options
  2. Using api for injecting functions (e.g. sveltePreprocess)

(1) Seems to be straightforward, but (2) is going to be problematic. Can you link to some examples of how Qwik uses this pattern?

How can we handle (A8) pattern?

I think we'll have to enforce serializable config formats (I believe both PostCSS and Babel supports that, i.e. specifying plugins via package names so the loading of the plugin is deferred to be controlled by the tools) - luckily should be less of an issue when users move towards Lightning CSS and Oxc for the transforms?

Re simple functions in configs - we might be able to serialize that? But maybe this will just be a hard limitation.

How can we handle (A9) pattern?

vite-plugin-inspect may have to be implemented as a built-in. This should not be common.

How can Vite use Rolldown? ((A10) pattern)

Not sure if I understand the issue with (A10a) - couldn't Rolldown just ignore those hooks like Rollup does? For example, configResolved is called by Vite during config resolution and don't really need to be parallelized.

My general thought on Vite internal plugins is that Rolldown will expose a "plugin container" API - initially this can be a Node.js API where Vite core use to replace the current PluginContainer implementation, but at a later stage we introduce a small Rust core in Vite which uses Rolldown as a dependency - the PluginContainer will then be used via Rust APIs and have as many internal plugins Rustified as possible.

How well does vitejs/vite#16089 work with parallel plugins?

We will need input from @patak-dev and @sheremet-va on this.

@patak-dev
Copy link
Sponsor

Context: https://github.com/sapphi-red/parallel-js-plugin-experiment/blob/main/research-for-design.md

This is an incredible thorough research! It is really useful too for the Environment API work.

  • It's great that many common caes can be solved with module.meta!

It would be good if we start moving what we can in Vite core to use it too. One issue here is that the current implementation during dev is buggy (for example, meta is shared between client and ssr). We'll work on this for Vite 6 so we can recommend this pattern to the ecosystem.

  • Agree that we should make chunk.meta work like module.meta and maybe also need something like currentBuild.meta (exposed via plugin this context maybe?), essentially officially supported mechanisms for storing thread-safe data.

We are now exposing the current environment (during dev and build) in the plugin context. See this.environment PR for context.
During build, the environment is of type BuildEnvironment and it is basically an abstraction of the current build. We can add this.environment.meta there. Something interesting here is that we may try to push for aligning dev and build regarding having a single plugin pipeline during build too. We want to be able to run rollup/rolldown builds in parallel, and use this.environment in plugins to make it safe to use the same plugin concurrently in several builds. This is the same we already do during dev (with client/ssr, and that we are lifting now to any number of environments). See this thread for context.
We were also already discussing with @sheremet-va if we should expose an environment cache during dev, but concluded we better avoid giving an easy way to have a global cache and better ask plugin authors to use environment-keyed WeakMaps (we are going to replace all internal WeakMap<ResolvedConfig, Data> caches with this pattern in core):

const countResolveIdCallsPlugin = () => {
  const cache = new WeakMap<Environment, Data>()
  return {
    name: 'count-resolve-id-calls',
    buildStart() {
      cache.set(this.environment, initData())
    },
    resolveId() {
      cache.get(this.environment).ids++
    }
    buildEnd() {
      console.log(this.environment, cache.get(this.environment).ids)
    }
  }
}

The idea here is that this cache is only for the plugin to consume as @sapphi-red described in the doc, so it is thread safe. Sometimes the plugins that need to share a cache are also a small group of related plugins, separated because it is easier to implement as separate plugins. I think thinking on running groups of plugins in the same thread instead of a thread per plugin is good. Maybe plugins could tag themselves with a group name, and all with the same name are run together so they can easily share their internal caches?

My general thought on Vite internal plugins is that Rolldown will expose a "plugin container" API - initially this can be a Node.js API where Vite core use to replace the current PluginContainer implementation, but at a later stage we introduce a small Rust core in Vite which uses Rolldown as a dependency - the PluginContainer will then be used via Rust APIs and have as many internal plugins Rustified as possible.

I'm lately thinking that an option is also for PluginContainer to end up living in Vite core. The same idea, just that it may be more Vite related than Rolldown related and if we already have Vite internal plugins in Rust, having PluginContainer in core may not be a stretch.

How well does vitejs/vite#16089 work with parallel plugins?

We'll continue to open discussions like the this.environment PR where possible. I think we shouldn't release Vite 6 with the Environment API until we are sure that it is future proof for Rolldown. For what we're seeing so far, there are a lot of core vite plugins that are reducing their dependency on the whole Vite server when we migrate them to using the Environment API. My current thinking is that Rolldown may end up having the concept of Environment internally (or at least a partial Environment that Vite can then extend). For Rolldown, a build will be single Environment, but we will need the concept to be there if we want this.environment.meta for example.

@yyx990803
Copy link
Author

I think thinking on running groups of plugins in the same thread instead of a thread per plugin is good. Maybe plugins could tag themselves with a group name, and all with the same name are run together so they can easily share their internal caches?

Yes - I thought of this too. In a way they are meant to be run in the same thread like sub-plugins of a bigger one.

I'm lately thinking that an option is also for PluginContainer to end up living in Vite core. The same idea, just that it may be more Vite related than Rolldown related and if we already have Vite internal plugins in Rust, having PluginContainer in core may not be a stretch.

I'm thinking more about consistency and avoiding duplication, since Rolldown already implements logic that loads and applies plugins - if we implement the container in Vite, how can we leverage the underlying parallelization logic implemented by Rolldown?

@patak-dev
Copy link
Sponsor

I'm thinking more about consistency and avoiding duplication, since Rolldown already implements logic that loads and applies plugins - if we implement the container in Vite, how can we leverage the underlying parallelization logic implemented by Rolldown?

Yes, you're right here. Anything that could be shared with build and exposed to Vite would be great. Some of the Vite specific things:

  • We have a scan flag we add to the context while scanning. Rolldown could provide a better way to extend the context and forward flags. Or we can rework things on Vite's side. pluginContainer.resolveId() currently passes this._scan as a parameter to the resolveId hooks. The resolve plugin could have directly read that from the context as this.scan.
  • We use Vite's server moduleGraph to implement things like getModuleIds(), we call moduleGraph.ensureEntryFromUrl when we do a load(), and in other places. We'll need to do the same, or move the dev module graph to Rolldown too. Maybe it is something that will be needed anyways to implement the core plugins
  • We also use the server watcher.
  • Environments are also going to be an integral part of pluginContainer, but I think some concept of it will end up in Rolldown anyways.

So yes, maybe this should be in Rolldown and I'm just not seeing where to draw the line right now, especially for things like the dev ModuleGraph. If we move that to rolldown, it may be an opportunity to see if we could have make the module graph in dev and the module structures in build more aligned. I think there were projects running a dev server during build just to get the module graph populated and use it to do optimizations (I think Astro did this at one point, but they moved away from it).

@sapphi-red
Copy link
Owner

sapphi-red commented Apr 3, 2024

How can we handle (A7) pattern in parallel plugins? Also for builtin rust plugins.

This seems to be the biggest blocker. I think we may need to identify cases of:

  1. Using api for exposing / mutating serializable options
  2. Using api for injecting functions (e.g. sveltePreprocess)

(1) Seems to be straightforward, but (2) is going to be problematic. Can you link to some examples of how Qwik uses this pattern?

Here's the places qwik uses api.

It seems qwik uses to communicate between qwik and qwik city and qwik city's internal plugins and most of them are (1). createOptimizer is not serializable. But maybe it doesn't need to use the same optimizer inistance for all threads.

How can we handle (A8) pattern?

Re simple functions in configs - we might be able to serialize that? But maybe this will just be a hard limitation.

Ah, yeah, given that rolldown can process the config file itself, maybe Rolldown can do some variable inlining so that users can use variables outside the function scope in some simple cases.

How can Vite use Rolldown? ((A10) pattern)

Not sure if I understand the issue with (A10a) - couldn't Rolldown just ignore those hooks like Rollup does? For example, configResolved is called by Vite during config resolution and don't really need to be parallelized.

Many plugins expect configResolved hook to be called at least once to get the config value. Vite needs to call the hooks for plugins running in a non-main thread. But there's no way to do that unless Rolldown lets Vite to call arbitrary hooks for plugins running in threads.

A similar issue exists with ssr flag of resolveId/load/transform hooks (It'll be replaced with this.environment by the environment API PR, but that also has the same issue.). scan flag has the same issue, too. Vite injects that by replacing all plugins hooks. This is no longer possible if a plugin runs in a non-main thread.

@sapphi-red
Copy link
Owner

sapphi-red commented Apr 3, 2024

It would be good if we start moving what we can in Vite core to use it too. One issue here is that the current implementation during dev is buggy (for example, meta is shared between client and ssr).

I guess it should be fine for plugins I put in the examples. In those plugins, it will be only used by client modules. That said, It'd be nice to fix it.

I think thinking on running groups of plugins in the same thread instead of a thread per plugin is good.

That's an interesting idea. I was only thinking about running each plugin in all threads like the figure below. The problem with that is if a plugin that processes many files were slow, the perf won't improve.

image

We were also already discussing with @sheremet-va if we should expose an environment cache during dev, but concluded we better avoid giving an easy way to have a global cache and better ask plugin authors to use environment-keyed WeakMaps (we are going to replace all internal WeakMap<ResolvedConfig, Data> caches with this pattern in core):
The idea here is that this cache is only for the plugin to consume as @sapphi-red described in the doc, so it is thread safe.

Actually, counting the number of resolve id calls matches (A5). Because I was thinking of running a single plugin in all threads, that each WeakMap only has the information of that thread and needs to aggregate the number across all threads to get the total count.

@sapphi-red
Copy link
Owner

I checked how api is used. Most of them were exposing serializable values or updating by serializable values. api.sveltePreprocess expects a function to be exposed but that function only receives serializable values and returns serializable values.
I added how to handle api in the proposal (e1813fa). This should cover most of the cases.

@sapphi-red
Copy link
Owner

I also checked whether meta works (#6). It was able to implement it. Something like this worked fine.

const plugins = [
  {
    name: 'worker1',
    renderChunk(code, context) {
      const meta = context.getModuleInfo('foo').meta
      meta.bar = name // set thread id
    }
  },
  {
    name: 'worker2',
    renderChunk(code, context) {
      const meta = context.getModuleInfo('foo').meta
      return '' + meta.bar // random value is returned because all threads run in parallel
    }
  }
]

@underfin
Copy link

underfin commented Apr 9, 2024

The SharedArrayBuffer may be a better solution for sharing data across the js threads. Here also has some data structure using SharedArrayBuffer allocate memory, ShareMap and SharedMemoryDatastructures.

@sapphi-red
Copy link
Owner

SharedArrayBuffer cannot be accessed from napi (at least now), so I guess we cannot use it for meta. Also it is more difficult to handle for plugin authors. But for context.createSharedState, it might be better to use it instead of having it in rust as it won't require accessing data from rust side in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants