Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

Unclear how adapting imports and acyclic instantiation works #129

Open
alexcrichton opened this issue Jan 14, 2021 · 12 comments
Open

Unclear how adapting imports and acyclic instantiation works #129

alexcrichton opened this issue Jan 14, 2021 · 12 comments

Comments

@alexcrichton
Copy link

I was thinking about this a bit more today about how you would take an imported function that takes/returns a strings, and then connect that to a core wasm module which takes/returns pointers/lengths. The current explainer has an example which it claims shows how to do this:

(adapter_module
  (import "print" (adapter_func $print (param string)))
  (adapter_module $ADAPTER
    (import "print" (adapter_func $originalPrint (param string)))
    (adapter_func $print (export "print") (param i32 i32)
      ...
      call_adapter $originalPrint
    )
  )
  (adapter_instance $adapter (instantiate $ADAPTER (adapter_func $print)))
  (module $CORE
    (import "print" (func $print (param i32 i32) (result i32 i32)))
    (func $run (export "run")
      ;; core code
    )
  )
  (instance $core (instantiate $CORE (adapter_func $adapter.$print)))
  (adapter_func (export "run")
    call $core.$run
  )
)

I don't think this examples works, however. The $ADAPTER module receives a pointer/length, but it has no memory to load values from. Similarly if adapter_func $print wanted to return a string there's no way for $ADAPTER to call malloc to allocate space for the return string.

I think this may have been an accidental mistake from rebasing on top of module linking? I may also be missing something about how this is expected to work. For adapting imports though it seems like you need to first instantiate the module in question to have access to memory/malloc/etc, and then afterwards you can create adapters referencing those items. The problem though is that instantiation of the module requires the imported functions (e.g. "print" in this case), which is a cyclical dependency.

@lukewagner
Copy link
Member

lukewagner commented Jan 15, 2021

Great question! Indeed, pondering this question is what led me to initially file module-linking/#12. What I ultimately realized is that one can break this cycle by splitting out memory+alloc into some form of $LIBC module that gets instantiated before $adapter. You can see this in the worked-out example. Thus, the final instance-import DAG would be:

$libc <-- $adapter <-- $core <-- $the_outer_adapter_module_instance

Now, it's natural to ask: what if $libc needs one of its imports to be adapted? Given that wasm has built-in memory.grow, the only example I could think of here is some error logging. Thinking about the $libc concretely, though, it seems like $libc shouldn't be forcibly putting itself into the public import of every libc-based adapter module; that should be up to the libc client. Thus, I think $libc should use table/funcref-based callbacks to let its client app decide what to do (which conveniently breaks the cycle). I'll admit this isn't a 100% feel-good answer, but I feel like it's the lesser of two evils for now, and "libc" is a rather special case anyhow.

@fgmccabe
Copy link
Contributor

Hmm. Originally, adapter fusion was something that was done by the host as part of the import process. In that world, one would not need to specially break out malloc/free because the host has access to both the modules being linked and the adapter function being generated.
Unless I am missing something?

@lukewagner
Copy link
Member

The question here is more at the spec level: if we're specifying the way that adapter code is linked with core code in terms of Module Linking, and Module Linking requires acyclicy between instances (for good reasons, esp. once you add type imports/exports), then how do we resolve the fundamental cycle pointed out above? The engine sees all at instantiation-time of course, but that's more an implementation detail.

@alexcrichton
Copy link
Author

The motivation for this came up when thinking about compiling interface types to normal wasm instructions (e.g. producing a module-linking-using-module without interface types at all), and when dealing with imports I wasn't quite sure what to do. I think the example in the readme will need an update one way or another because as-written it can't adapt I think?

For now I settled on the table/funcref scheme where an elem segment initializes a table, but if you call imports as part of the start function it will trap since it will call a null funcref.

@lukewagner
Copy link
Member

@alexcrichton Would splitting out libc, as I suggested above, also work?

@alexcrichton
Copy link
Author

It would, yeah, although the use case I was originally working with was emulating shared-nothing linking between two modules as-produced today by toolchains, none of which currently have libc split out. If we were to reorganized the two binaries, though, it would mean that there's two libc modules (one for each shared-nothing module) and we'd have to make sure all libcs come first.

I was hoping that with module linking and interface types we wouldn't have to worry too too much about the structure of modules (e.g. libc and such), but rather just be able to fuse any import/export adapter as necessary (and also compiling to something without interface types if necessary to).

@lukewagner
Copy link
Member

If we were to reorganized the two binaries, though, it would mean that there's two libc modules (one for each shared-nothing module) and we'd have to make sure all libcs come first.

FWIW, if both happened to depend on the same (or compatible) versions of libc, there could be just 1 libc module, 2 libc instances. Also, if we're talking about two shared-nothing modules A and B where A imports B, then after fusion, you'd have:

B_libc <-- fused B+? adapters <-- B_core <-- fused A+B adapters <-- A_core <-- fused A+? adapters
                                  A_libc <---'----------------------'

(That is, only fused A+B adapters and A_core depend on A_libc so A_libc can be instantiated after B_core.)

I was hoping that with module linking and interface types we wouldn't have to worry too too much about the structure of modules (e.g. libc and such), but rather just be able to fuse any import/export adapter as necessary (and also compiling to something without interface types if necessary to).

If we're talking about a toolchain that generates a single shared-nothing module, then yes, the toolchain would have to worry about libc insofar as it has to wire it up to its own internal import/export adapters. But if we're talking about shared-nothing-linking of separate shared-nothing modules, then I think libc details are completely encapsulated and so we don't have to worry. E.g., in a future with wasm GC, there may be no need for import adapters to call libc, but shared-nothing linking would still work the same. Does that match what you're thinking?

@lukewagner
Copy link
Member

Oh, and one addendum: my hope is that the core toolchain can mostly not care about interface types by:

  1. starting off with libc separate from the main module, expressed via module linking
  2. when targeting non-interface-typed output, libc can be statically fused with the main module, producing a single module as today (or not)
  3. when targeting interface-typed output, the synthesized import adapter can simply import libc, with the outer adapter module wiring everything up

which keeps all the interface types logic in some isolated late stage.

@alexcrichton
Copy link
Author

Nah what you're saying all makes sense and sounds reasonable to me, I'm just trying to rationalize it with the current state of the proposal. I'm trying to figure out a way where we can transition from what we have today to interface types that doesn't involve scaling a cliff in one go, so an incremental step is to take today's monolithic modules which try to do things like import a function to print a string and adapt it (or take a string). That can't generally be done with the proposal as-written super easily because the monolithic module's malloc/free/memory can't be referenced when the import adapter is inserted.

I would prefer if we could dogfood interface types without requiring everything to transition to module-linking first (since that will likely take quite some time), but I think the table-initialized-via-elem-segment is probably the way to go for that.

@lukewagner
Copy link
Member

Gotcha, that makes sense.

@RossTate
Copy link

For what it's worth, I'm been exploring how to design JS-specialized adapters for GC, and I ran into a similar problem. WebAssembly's kind of module system is known to not admit certain forms of decomposition, and consequently can't express things like .class files (even putting aside the more advanced dynamic linking aspects of Java or efficiency concerns like pre-imports and just considering static linking). It might be the case that adapters (whether for interface types or for JS) are too intimately coupled with the wasm that they're adapting for the "core" and "adapter" portions to be reliably decomposable into separate modules. But that's still a "might" at the moment.

@titzer
Copy link

titzer commented Mar 23, 2021

I'm a bit late to this discussion because I don't follow the discussion on either module linking or interface types (but perhaps I should). In implementing Jawa I added a simple extension to imports called import arguments where imports can take any exportable entity as a direct argument. With relaxed section order, this allows expressing mutual dependencies between modules. I've validated that this works for expressing .class files because I built an actual (simplified, but working) JVM on top of it.

I explained that in a document linked in the presentation I gave, but I fear the mechanism may have been lost in the larger point I was trying to make about Wasm expression other languages, which many people seemed to regard as kooky. As such, the mechanism didn't receive as much discussion as I was expecting (none, in fact). It was hard for me to tell whether the idea was lost in the noise, or just bad.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants