Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for emitting modules into a subdirectory #4742

Open
shadaj opened this issue Oct 17, 2022 · 28 comments
Open

Support for emitting modules into a subdirectory #4742

shadaj opened this issue Oct 17, 2022 · 28 comments
Labels
enhancement Feature request (that does not concern language semantics, see "language")

Comments

@shadaj
Copy link

shadaj commented Oct 17, 2022

Was: Linker crashes when emitting modules into a subdirectory

If exporting a method/value under a module name that contains a /, the linker crashes because it attempts to pass the entire name directly to createTempFile. Having a slash in the module name is useful when integrating with frameworks such as Next.js, which use directories to map components to application routes.

For example, running fastLinkJS with a project containing the following export:

object Next {
  @JSExportTopLevel(name = "component", moduleID = "foo/bar")
  def component(): js.Any = ...
}

results in the following crash

[error] java.lang.IllegalArgumentException: Invalid prefix or suffix
[error]         at java.base/java.nio.file.TempFileHelper.generatePath(TempFileHelper.java:60)
[error]         at java.base/java.nio.file.TempFileHelper.create(TempFileHelper.java:125)
[error]         at java.base/java.nio.file.TempFileHelper.createTempFile(TempFileHelper.java:159)
[error]         at java.base/java.nio.file.Files.createTempFile(Files.java:878)
[error]         at org.scalajs.linker.PathOutputDirectory$Impl.$anonfun$writeAtomic$2(PathOutputDirectory.scala:71)
[error]         ...
@sjrd
Copy link
Member

sjrd commented Oct 17, 2022

Uh oh, that was never meant to be allowed. The infrastructure of OutputDirectory is only meant to represent a single, flat directory. We should disallow this at compile time instead of crashing at link time.

Having a slash in the module name is useful when integrating with frameworks such as Next.js, which use directories to map components to application routes.

I sympathize, but IMO this falls under the category of "frameworks that want a specific shape for the output code rather than semantics". We don't guarantee that we will support every possible framework's idea of how the target .js files should "look like".

Have you considered defining the structure separately, as a hierarchy of .js files that import the relevant files of the Scala.js output?

@shadaj
Copy link
Author

shadaj commented Oct 17, 2022

Yeah, so for now I've just been maintaining a bunch of glue JS files that just import and re-export the Scala.js modules. Because in Next.js' case they are so boilerplate-y though (https://github.com/shadaj/slinky/blob/main/docs/pages/docs/%5Bid%5D.js), was thinking it might not be too hard to support this use case in Scala.js since we don't have any restrictions on how the code inside the module is laid out, just where it is placed.

But also see your point that this does introduce some more responsibilities that Scala.js may not want to handle.

@gzm0
Copy link
Contributor

gzm0 commented Oct 19, 2022

My 2cts:

  • My original idea was to not restrict module IDs in the IR specification, very similar to how ESModules do not define what the module string means.
  • Per consequence, how modules are mapped to files is specific to the linker implementation.
  • In the current linker implementation the mapping from module ID to filename is defined via OutputPatterns (and arguably very crude).

I could see the value of extending OutputPatterns to something more general. Challenges:

  • OutputPatterns needs to be serializable, so it's the usual flexibility / complexity tradeoff (we can't just make it a function).
  • The full output universe needs to be understood, otherwise the incremental linker cannot remove unnecessary files anymore.

@keynmol
Copy link

keynmol commented Oct 19, 2022

@armanbilge asked me to reference my own usecase for this - a while back I attempted to do the same for Cloudflare Pages. Quoting my Discord message:

Cloudflare pages can work it's magic on files like /admin/api/get_[[[id]].js but iirc I get errors trying to use / in the name of the module when using ES modules

Current thinking is to output filenames in admin__api__get_[[id]] format and externally rewrite to paths - perhaps there's a better way?

@gzm0
Copy link
Contributor

gzm0 commented Oct 20, 2022

There's one more challenge I forgot: How to synthesize internal module names. If they need to go into special directories, things might get more complicated (OTOH, a simple prefix config might be enough).

@gzm0
Copy link
Contributor

gzm0 commented Oct 23, 2022

@shadaj @keynmol in your use cases, how would you expect internal modules (modules with shared code between the modules you declare) to be named / structured?

@shadaj
Copy link
Author

shadaj commented Oct 23, 2022

For Next.js, we would probably want to dump all internal modules into a components folder, but don't have any naming restrictions on the actual modules.

@keynmol
Copy link

keynmol commented Oct 28, 2022

For cloudflare, I believe only modules that explicitly export a function with a certain name (e.g. onRequest or onRequestGet) are considered by CF for routing.

Therefore other files and their names (intermediate modules) are likely to be ignored (*), so I would personally not expect any structure from them - they can all be dumped top level for all I care.

If I have time, I will test the (*) assumption to make sure it works as I described.

@sjrd sjrd added the enhancement Feature request (that does not concern language semantics, see "language") label Dec 11, 2022
@gzm0
Copy link
Contributor

gzm0 commented Dec 17, 2022

For cloudflare, I believe only modules that explicitly export a function with a certain name (e.g. onRequest or onRequestGet) are considered by CF for routing.

Therefore other files and their names (intermediate modules) are likely to be ignored (*), so I would personally not expect any structure from them - they can all be dumped top level for all I care.

FWIW, this is a problem: We cannot guarantee that an internal module does not export a function with a name that is special to cloudflare.

@sjrd sjrd changed the title Linker crashes when emitting modules into a subdirectory Support for emitting modules into a subdirectory Jan 23, 2023
@armanbilge
Copy link
Member

Just stumbled into this again. I understand that ESM-style npm packages are typically organized into subdirectories e.g.

import '@shoelace-style/shoelace/dist/components/button/button.js';

So it would be nice to be able to make such things with Scala.js as well.

@gzm0
Copy link
Contributor

gzm0 commented Apr 22, 2023

Maybe, but do we have examples of where this is actually useful? For shoelace:

import '@shoelace-style/shoelace/dist/components/button/button.js';
        --------------- -------- --------------- ------ ------
        scope           package  build artifacts module redundant

Even what the doc lists as utilities (e.g. include) is under the components "namespace".

@micheal-hill
Copy link

I'd like to add another voice for supporting structured directories. It's pretty tedious to manually setup the file structure for Next.js or Remix by hand.

Is this something where a PR would be considered? Alternatively, is there a mechanism that would allow extending the plugin to have this functionality within build.sbt?

For cloudflare, I believe only modules that explicitly export a function with a certain name (e.g. onRequest or onRequestGet) are considered by CF for routing.
Therefore other files and their names (intermediate modules) are likely to be ignored (*), so I would personally not expect any structure from them - they can all be dumped top level for all I care.

FWIW, this is a problem: We cannot guarantee that an internal module does not export a function with a name that is special to cloudflare.

I don't think it's important to make this guarantee for CloudFlare or any other tooling. Assuming that generated modules can have some user-defined directory structure + internal modules live somewhere else, I would expect it to be easy to post-process this into whatever structure the user needs/wants.

@gzm0
Copy link
Contributor

gzm0 commented Jul 2, 2023

Is this something where a PR would be considered?

IMO, yes, for sure. However, I'd urge you to first propose and discuss the exact changes you'd like to do (in this issue). Especially for internal modules, it is not clear to me how the interface could look like.

Alternatively, is there a mechanism that would allow extending the plugin to have this functionality within build.sbt?

You could move the modules around after they are built by Scala.js. But that would also involve adjusting the import paths (since they are relative). So, yes, a mechanism could be built, but to the best of my knowledge, there isn't any at the moment.

@gzm0
Copy link
Contributor

gzm0 commented Jul 3, 2023

A couple of thoughts I had:

The following might get us quite far:

  • Interpret / in module IDs as paths
  • Add a setting internalModuleIDPrefix
  • (optional) specify that generated internal module IDs never contain a slash (IIUC already the case)

Unclear with this approach is how to deal with module ID equality: Right now, the following are considered different module IDs: a/b, a//b, a/./b, a/c/../b`.

@sjrd
Copy link
Member

sjrd commented Jul 3, 2023

Unclear with this approach is how to deal with module ID equality: Right now, the following are considered different module IDs: a/b, a//b, a/./b, a/c/../b`.

Given that any slash today results in a crash, and that a single . or .. likely does as well, I think we can forbid leading, trailing and double slashes, as well as path elements that are . or ...

@micheal-hill
Copy link

Unclear with this approach is how to deal with module ID equality: Right now, the following are considered different module IDs: a/b, a//b, a/./b, a/c/../b`.

Given that any slash today results in a crash, and that a single . or .. likely does as well, I think we can forbid leading, trailing and double slashes, as well as path elements that are . or ...

Wouldn't it be simpler to just use something like java.nio.Path.normalize instead of trying to be clever about it?

@sjrd
Copy link
Member

sjrd commented Jul 13, 2023

Our linker cross-compiles to JS. In Scala.js we don't have access to Path, so we cannot do that.

@gzm0
Copy link
Contributor

gzm0 commented Jul 15, 2023

Also FWIW: Paths are platform specific (at least if we're using the default FileSystem). So we need to be careful: We do not want the linker's output to depend on the platform it is running on.

@micheal-hill
Copy link

micheal-hill commented Jul 17, 2023

Apologies that I don't have a better understanding of how the plugin works internally - I've not had a lot of time to do some digging just yet. I've also never done anything with cross compilation, so maybe my understanding is all garbage!

Our linker cross-compiles to JS. In Scala.js we don't have access to Path, so we cannot do that.

At some point it needs to execute platform-specific code though, no? ie. there is some write(path: String, contents: ???) function somewhere along the way, which has a different implementation in JS to JVM? Given this assumption:

  • Could modify this function to create intermediate directories if required
  • Create a new normalize function using the same convention; ie. dispatching to separate implementations based on the platform being executed.

@sjrd
Copy link
Member

sjrd commented Jul 17, 2023

There are two notions of "platform" at play here. Usually we mean JVM v JS, indeed. But here we're talking about the OS and its filesystem. We want the linker to produce identical outputs regardless of the OS and of the filesystem(s) being used. Therefore, we cannot rely on normalize (whether on the JVM or JS).

@micheal-hill
Copy link

micheal-hill commented Jul 25, 2023

I did some experimenting here; the linker already has runtime-specific (ie. Node, JVM) implementations and appears to require little change to (see the sketch here; note that this is pretty scrappy code that I'll clean up and test if people are happy with the approach).

There are two notions of "platform" at play here. Usually we mean JVM v JS, indeed. But here we're talking about the OS and its filesystem. We want the linker to produce identical outputs regardless of the OS and of the filesystem(s) being used. Therefore, we cannot rely on normalize (whether on the JVM or JS).

I'm not sure that there's anything that we need to do to support various OS here. The underlying runtime can deal with this for us.

@gzm0
Copy link
Contributor

gzm0 commented Jul 27, 2023

I'm not sure that there's anything that we need to do to support various OS here. The underlying runtime can deal with this for us.

As far as actually writing the modules to the filesystem is concerned, this is correct. However, we also need to consider the usage sites of the modules (i.e. the imports):

When the linker splits off some code into an internal module, it will need to import that code. Currently, it will emit:

import * as <internal name> from "./<moduleID>.<ext>"

Where the specifics are determined by the OutputPatterns.

So far, the assumption was that moduleID does not contain any special characters for the underlying filesystem (i.e. no / or \). If we start supporting subdirectories, we must change/break this assumption.

At this point, we need to take care, that:

  1. The code we generate is the same on all platforms/operating systems (e.g. it is not OK if you get back slashes in imports when you build on windows but you get forward slashes when you build on *nix).
  2. The code we generate works on all JS platforms. Since (at least last time I checked), what a module identifier means is not specified in the ESM spec, this is necessarily trade-offy. Given that we already assume that ./ denotes a relative import, it is IMO OK to assume further slashes mean subdirectories.

You might correctly observe at this point, that (apart from the changes you're already working on in your branch), there is nothing to do for the success path. However, we need to make sure that we are not generating broken code (e.g. `import ... from './a/../b\foo.js'). This is what @sjrd and I have been discussing about.

HTH

@micheal-hill
Copy link

When the linker splits off some code into an internal module, it will need to import that code. Currently, it will emit:

This makes sense. It seems that I didn't do enough testing; I can replicate this once I start writing non-trivial code. Are you able to point to where this code is emitted from the linker?

it is not OK if you get back slashes in imports when you build on windows but you get forward slashes when you build on *nix

I don't understand why this is important, but a simple solution here is to use only / as the path separator. If a path contains only /, Windows will understand it correctly (I did an experiment to test this, some sample code here).

But more importantly I think upholding this assumption will end up with broken code if any form of directory nesting is allowed, regardless of the separator used. For example:

  • Assuming that / is used as the path separator character and have some modules emitted to <build root>/util/foo.
// If I'm working on a *nix system, I'm likely to have an export that looks like
@JSExportTopLevel("baz", moduleID = "app/bar")
// This would import modules from `foo` using `import <names> from "../util/foo"`; which would work correctly

// If I'm working on a Windows system, I'm likely to have an export that looks like
@JSExportTopLevel("baz", moduleID = "app\\bar")
// This would import modules from `foo` using `import <names> from "../util\\foo"`; which I believe _would break_ because of the mixed characters - Though I have not tested this assumption.
  • A mirror of this situation occurs if \ is used as the path separator: code generated on Windows will be fine, but anything generated on *nix will die.

Note that in both of these cases, the break occurs because the user followed the conventions of their OS but scala-js uses only a single character for the separator. One might argue that the user could be educated to follow an alternative pattern (eg. always use *nix paths, even on Windows), but that is a situation that I expect would be ripe for confusion.

So if this is a hard requirement; I don't think that there's any simple way to both support any OS and support emitting into subdirectories.

@sjrd
Copy link
Member

sjrd commented Jul 28, 2023

We definitely have to standardize on / on all systems, even Windows. We will need to forbid any other special characters of file systems, notably \ and :. The rationale for Windows users is that those are URI relative paths. Not file system ones.

@micheal-hill
Copy link

We definitely have to standardize on / on all systems, even Windows. We will need to forbid any other special characters of file systems, notably \ and :. The rationale for Windows users is that those are URI relative paths. Not file system ones.

Just to be clear; you're saying that the user must always use something that looks like @JSExportTopLevel("baz", moduleID = "app/bar"), right?
This seems reasonable to me. I'm worried that it's a little bit awkward for Windows users, but it looks like it's the easiest to implement. Perhaps also improves consistency for anyone collaborating between Windows/*nix 🤷

Assuming you're happy with the direction I've started looking at earlier (the link here), I can start figuring out how to get the linker to emit import statements with the correct relative paths.
I believe that org.scalajs.linker.standard.ModuleSet.Module is at the center of this, but I haven't figured out how the paths are being populated yet - it would be much appreciated if anyone has any pointers

@gzm0
Copy link
Contributor

gzm0 commented Jul 30, 2023

Just to be clear; you're saying that the user must always use something that looks like @JSExportTopLevel("baz", moduleID = "app/bar"), right?

Yes.

Perhaps also improves consistency for anyone collaborating between Windows/*nix

Yes. In fact, this is a requirement: The same input with the same options must produce the same output on all systems/OSs.

Assuming you're happy with the direction I've started looking at earlier (the link here), I can start figuring out how to get the linker to emit import statements with the correct relative paths.

That looks like a good start to me.

I believe that org.scalajs.linker.standard.ModuleSet.Module is at the center of this, but I haven't figured out how the paths are being populated yet - it would be much appreciated if anyone has any pointers

HTH

@gzm0
Copy link
Contributor

gzm0 commented Oct 12, 2023

Nikita's comment on Discord just made me realize that when building this, we need to be extremely careful to keep relative path imports meaningful.

@micheal-hill
Copy link

When I have some time I'll aim to come back to this; however for now I've had some life/work stuff get in the way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request (that does not concern language semantics, see "language")
Projects
None yet
Development

No branches or pull requests

6 participants