Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node package improvements #2403

Open
hoodmane opened this issue Apr 18, 2022 · 35 comments
Open

Node package improvements #2403

hoodmane opened this issue Apr 18, 2022 · 35 comments

Comments

@hoodmane
Copy link
Member

hoodmane commented Apr 18, 2022

It would be good to release an alpha version with #2394 to help the node people test whether #2393 and related issues are fixed. We also have problems with webpack which probably aren't fixed, maybe we should try to improve the webpack situation first too.

I'm not sure how to version this stuff. I wouldn't normally want to include #2387 and #2391 in a patch release, but we could backport everything else. Or we could throw them into 0.20.1 they aren't humongous changes. Or we could plan to do a very small v0.21.0 release.

@alexmojaki @mneil @rth @ryanking13

@ryanking13
Copy link
Member

I think v0.20.1 would be good because I recently found that we didn't version pyodide-build v0.20.0 correctly in PyPI (https://pypi.org/project/pyodide-build/#history), so we can bump the version together.

@hoodmane
Copy link
Member Author

I think it would be good to release an alpha first e.g., 0.20.1a1 so that if there are more dumb mistakes with the node/webpage stuff we can iron those out first.

#2405 would also be good to include.

@ryanking13
Copy link
Member

I think it would be good to release an alpha first e.g., 0.20.1a1 so that if there are more dumb mistakes with the node/webpage stuff we can iron those out first.

#2405 would also be good to include.

I'm +1 with it

@alexmojaki
Copy link
Contributor

Seems like this release could be made now? I'm keen to try it out.

@ryanking13
Copy link
Member

ryanking13 commented Apr 29, 2022

I made an alpha release v0.20.1a1, though we didn't fix webpack situation yet... #2387 and #2391 are included to prevent conflicts with recent changes and I didn't want to mess up. Excluded commits are #2411, #2407, #2389, #2305.

@rth @hoodmane could you publish JS package to npm registry? I don't have access to our npm registry.

@rth
Copy link
Member

rth commented Apr 29, 2022

Thanks @ryanking13 ! Just curious where did push 07be1f4 for that release, as I can't see it on main.

If you send me your npm username, I'll add you.

@ryanking13
Copy link
Member

ryanking13 commented Apr 29, 2022

Thanks @ryanking13 ! Just curious where did push 07be1f4 for that release, as I can't see it on main.

@rth I just pushed it. But... commit hashes are different. I think I am misunderstanding something about git cherry-pick.

If you send me your npm username, I'll add you.

It's ryanking13.

@rth
Copy link
Member

rth commented Apr 29, 2022

Thanks! No worries, as long as it worked. I sent you the invite.

@ryanking13
Copy link
Member

ryanking13 commented May 2, 2022

I find out that I published the wrong stuff (TypeScript files, not JavaScript) in npm pyodide@0.20.1-alpha.1.

It looks like npm does not allow removing and republishing the same version. So I deprecated pyodide@0.20.1-alpha.1 for now. We should publish again in another version name (Maybe, pyodide@0.20.1-alpha.2?). @rth, should we republish other stuffs (pyodide-build and the main repository) in a new version as well or just the npm package?

Sorry about the mistake. FYI, this happened because I called npm publish in src/js not in dist, which was changed at #2394.

@rth
Copy link
Member

rth commented May 2, 2022

No, worries, let's push a pyodide@0.20.1-alpha.2 and I think you should be able to yank the other one.

@ryanking13
Copy link
Member

No, worries, let's push a pyodide@0.20.1-alpha.2 and I think you should be able to yank the other one.

Pushed to pyodide@0.20.1-alpha.2.

It seems like unpublishing npm package is forbidden when there is a dependent package (npm/cli#1686 (comment)).

@alexmojaki
Copy link
Contributor

Sorry about the mistake.

Being able to make mistakes is the entire point of this issue! Thanks for making a release so that it can be tested. I tried it out and here's what I found:

  • Many of the types/functions in pyodide.d.ts are declared but not exported, particularly PyodideInterface and its methods like unpackArchive and setInterruptBuffer.
  • unpackArchive doesn't accept ArrayBuffer for the first argument, I think it should.
  • I get a lot of errors about missing files in pyodide/node_modules/ws/lib, e.g:
ERROR in ../node_modules/pyodide/node_modules/ws/lib/receiver.js 3:21-38
Module not found: Error: Can't resolve 'stream' in '/home/alex/work/pyodide-worker-runner/node_modules/pyodide/node_modules/ws/lib'
Did you mean './stream'?
Requests that should resolve in the current directory need to start with './'.
Requests that start with a name are treated as module requests and resolve within module directories (node_modules).
If changing the source code is not an option there is also a resolve options called 'preferRelative' which tries to resolve these kind of requests in the current directory too.

BREAKING CHANGE: webpack < 5 used to include polyfills for node.js core modules by default.
This is no longer the case. Verify if you need this module and configure a polyfill for it.

If you want to include a polyfill, you need to:
        - add a fallback 'resolve.fallback: { "stream": require.resolve("stream-browserify") }'
        - install 'stream-browserify'
If you don't want to include a polyfill, you can use an empty module like this:
        resolve.fallback: { "stream": false }
 @ ../node_modules/pyodide/node_modules/ws/wrapper.mjs 2:0-41 7:0-79
 @ ../node_modules/pyodide/pyodide.mjs 318:20-32
 @ ../lib/index.ts 13:0-38 17:11-22
 @ ./index.ts 11:0-39 34:27-40

@rth
Copy link
Member

rth commented May 2, 2022

Thanks a lot for testing it @alexmojaki . Maybe @mneil would be interested in having a look.

@hoodmane
Copy link
Member Author

hoodmane commented May 2, 2022

I think the webpack issue is tricky. Ideally webpack should not try to bundle pyodide.asm.js. But pyodide.js is the output of rollup so it's not clear how to get a /* webpackIgnore: true */ comment into it (ideally at the same time we could remove the Microsoft copyright statement...)

@mneil
Copy link
Contributor

mneil commented May 2, 2022

I have tried this with webpack and have not got it working the way I'd like yet enough to propose both the correct webpack config or changes to the distribution itself.

When loading in the browser you need pyodide.js, pyodide.asm.js, pyodide.data, and the wasm file. It makes me wonder why these files are all separate in the first place? I can see holding off on wasm loading and keeping it separate for a worker but is there a reason the other files are separated?

Those files assume loading via fetch which means a couple things in the browser...

  • they must be copied by the user out of node_modules after install into a location the web server is serving
  • webpack doesn't bundle files loaded with fetch (without complex loaders or replacement) so it's not simple to use webpack to combine them later

Nodejs loading is all synchronous and has access to node modules. This isn't really an issue in node. But, the package itself could distribute files that make it easier to use in browser with no impact on node.

@hoodmane
Copy link
Member Author

hoodmane commented May 2, 2022

These are separate files because Emscripten generates them that way. But we could switch to rolling up (or hopefully esbuilding soon) pyodide.asm.js into pyodide.js. I think Emscripten's loading logic expects pyodide.asm.data to be a separate file with that name. Also, it is encoded in binary which is not possible in a JS file. So in any case that file should stay separate. I doubt any bundler is smart enough to understand how Emscripten tries to load it anyways.

We really want pyodide.js to be code split from the rest of a webpack bundle. Running webpack over the Emscripten JavaScript output is not a good idea IMO. Even without the resolution problems, it could still create confusing bugs.

@mneil
Copy link
Contributor

mneil commented May 2, 2022

Not the best for for bundle size but the data file could be bases64 encoded and inlined into the asm.js. THe benefit of that is loading / setup on the browser becomes simpler. Downside is you get about a 30% bump in filesize on the encoded file.

Running webpack over the Emscripten JavaScript output is not a good idea IMO

Are there specific concerns about this? One other issue I have right now with the pyodide.js file is that it forces the use of some globals. It is also checking for "IS_NODE" in a way that fails in webpack since process, fs, require, can all be mocked or replaced entirely causing pyodide.js to believe it is in node when it is in the browser.

Bringing pyodide.asm.js and pyodide.js into a single file (and potentially encoding the data file as well into a single file) and removing the hard requirement of globals for the file system in particular would allow users like myself to choose their own filesystem replacement, their own fetch implementation, etc...

@hoodmane
Copy link
Member Author

hoodmane commented May 2, 2022

bases64 encoded ... a 30% bump in filesize

Also an increase in load time and ram usage. See the Emscripten docs:

Embedding files is much less efficient than preloading them. You should only use it for small files, in small numbers.

I don't think we are interested in going this route.

Are there specific concerns about this?

Emscripten tries to import ~4 different node modules only in node, not in browser. So these would need to be added to webpack externals, or we would need to patch Emscripten not to do this or... something? But also, it's 2.7 Mb of Javascript which has already been bundled and treeshaken and completely prepared for use by Emscripten, having webpack try to parse and optimize that again seems like it would be unnecessarily bad for build times. And it is completely possible webpack could mess things up.

Another thing to consider is that we get better load times by splitting these files the way we do, because pyodide.js is small and so it can initiate other requests we need for packages.json, pyodide.asm.js, and pyodide_py.tar all at the same time.

Also, splitting the files significantly improves the speed of partial rebuilds, particularly when only working on the Python source files or the Typescript source files and not the C code. rollup is really slow, though with esbuild it would be less of a problem.

You may be able convince us, I am not very knowledgeable about the Javascript ecosystem.

would allow users like myself to choose their own filesystem replacement, their own fetch implementation, etc...

We are interested in allowing those features, but I'm not sure the normal import system is the best way. We could for instance add a loading setting that tells Pyodide whether it is in the browser or not and overrides the guess.

@bollwyvl
Copy link
Contributor

bollwyvl commented May 3, 2022

add a loading setting that tells Pyodide whether it is in the browser

👍 explicit is better than implicit, yadda yadda: if loadPyodide's ConfigType included a fetchImpl, etc. it would be very interesting for the jupyterlite case, where we're already implementing a fake web server that has interesting things that should appear to be coming from it, and various other servers. Among them: move some of the micropip customization stuff out of the python-in-js side, and work with it a little more directly in the bootstrap lifecycle before we get access to the pyodide instance.

@bollwyvl
Copy link
Contributor

bollwyvl commented May 3, 2022

30% bump in filesize on the encoded file.

oof! no, thanks!

@alexmojaki
Copy link
Contributor

I think the webpack issue is tricky. Ideally webpack should not try to bundle pyodide.asm.js. But pyodide.js is the output of rollup so it's not clear how to get a /* webpackIgnore: true */ comment into it (ideally at the same time we could remove the Microsoft copyright statement...)

@hoodmane are you saying that errors like Module not found: Error: Can't resolve 'stream' in '/home/alex/work/pyodide-worker-runner/node_modules/pyodide/node_modules/ws/lib' come from trying to bundle pyodide.asm.js, which should only be loaded at runtime?

@hoodmane
Copy link
Member Author

hoodmane commented May 9, 2022

Yeah more or less. The other issue is that webpack thinks that node apis should be shimmed in the browser but most modules we load are node shims for browser apis.

@mneil
Copy link
Contributor

mneil commented May 13, 2022

I've put together a sample webpack application that loads pyodide. I avoided processing the pyodide.js file at all as well as handled copying the files out of node modules into the distribution folder. This way minimizes modifying pyodide at all. One exception is that I deleted the globalThis.loadPyodide as it's not needed to have that global.

I mentioned in #2549 that to make this work pyodide needs to exports all the files that it requires. I changed the package.json exports to:

"exports": {
    "./distutils.tar": "./distutils.tar",
    "./pyodide.asm.wasm": "./pyodide.asm.wasm",
    "./pyodide.asm.js": "./pyodide.asm.js",
    "./pyodide.asm.data": "./pyodide.asm.data",
    "./pyodide_py.tar": "./pyodide_py.tar",
    "./packages.json": "./packages.json",
    ".": {
       "require": "./pyodide.js",
       "import": "./pyodide.mjs"
    },
    "./pyodide.mjs": "./pyodide.mjs",
    "./pyodide.js": "./pyodide.js",
    "./package.json": "./package.json"
 },

I am using 0.20.1-alpha.2. I tried using that version with and without webpack. I am unable to load any packages like micropip. The build seems broken.

Example code, copying the files manually out of the node modules folder and not using webpack.

<script src="pyodide.js"></script>
<script>
  async function init() {
    const pyodide = await loadPyodide();
    console.log(await pyodide.runPythonAsync("1+1"));
    await pyodide.loadPackage(["micropip"]);
  }

  init();
</script>

Error:

pyodide.asm.js:10 Error: Failed to load 'http://localhost:9000/micropip-0.1-py3-none-any.whl': request failed.
    at pyodide.asm.js:10:178043
    at Generator.next (<anonymous>)
    at fulfilled (pyodide.asm.js:10:176792)

@hoodmane
Copy link
Member Author

Well I think the issue is that we didn't deploy all the packages to npm to reduce download size -- we didn't want to deploy 200+ mb to npm. Try using:

pyodide.loadPackage({indexURL : "https://cdn.jsdelivr.net/pyodide/v0.20.1a1/full/" });

@rth
Copy link
Member

rth commented May 13, 2022

I think the issue is that we didn't deploy all the packages to npm to reduce download size

I think that's what we want, though? So there is an issue in v0.20.1a1 that packages.json doesn't include all packages (while it should), however aside from that last time I tried to use it in node it does download and cache the packages node_modules when needed (even, without specifying the indexURL). That was without webpack though. So if packages need to be present in node_modules already, maybe we could have some CLI tool (or webpack plugin to do that) but IMO it would be better not to bundle all packages into the pyodide NPM package as most users shouldn't need that, and this apporach doesn't scale as we get more packages.

But then again I have little experience with the JS packaging ecosystem, so if someone is sure there is no way around it, we can certainly change it.

@mneil
Copy link
Contributor

mneil commented May 13, 2022

I just pulled the v20 off the cdn, to see how it resolves these and I'm realizing that it's not broken. It's that the other wheels that pyodide has internally are on the cdn and not in the node module. Once I've installed it locally and run it in the browser I'm not sure how to proceed. I am not using webpack when doing any of this. Just trying to use pyodide on my own page without specifying an indexURL.

The indexURL is automatically determined. It knows I'm at http://localhost:9000 for example. It uses that url to load all the other files like pyodide.asm.data, the wasm, etc.... But these extra packages like micropip, or anything on this list https://pyodide.org/en/stable/usage/packages-in-pyodide.html are not accessible at that url.

If I set indexURL to be the cdn prefix then I can get it to load those packages except that now it also tries to load all my other files from the cdn (like the wasm) - which is not what I would expect or want. If this is the solution then the node module only needs to have pyodide.js in it and no other files.

As an end user this is confusing / difficult to manage. However, I also understand that all those "pyodide" packages are likely large and there were already concerns about the size of the application on npm. I'll think about how what I would want to happen vs. the constraints I can think of and try to enumerate them. Please help expand on this.

  1. Some end users may have restrictions or be behind proxies that do not allow them to pull data off a CDN
  2. In node I may be running behind a firewall that restricts outside requests or blocks outbound requests on specific ports
  3. We want to limit the size of the package on npm
  4. Not all users will want all packages available from pyodide and may need only n packages
  5. Users may have no restrictions and want to try out pyodide in a development type experience that transitions to production easily

Publish pyodide specific modules to npm

We could publish each of the packages that pyodide loads with loadPackage as individual node modules. Create a namespace on npm and do npm i @pyodide/pyodide @pyodide/micropip. Change pyodide to handle loading from individual modules. I think this solves 1,2,3,4 but may complicate the 5th point.

Add new configuration to loadPyodide

indexURL should be reserved for the pyodide index and not used for loading core files like the asm.data, wasm, etc.... Add a new option to configure where the core files come from. By default it could be indexURL for backwards compatability. Takes care of 3,4,5 but still complicates 1 and 2

Document the need to download these individually

Require users download individual pyodide packages like micropip when using them with npm. Similar to "Publish pyodide specific modules to npm" except that the files stay on a cdn or similar. Document how to do this and or make it easy for users to get the files. This solves 1,2,3,4,5. I think this is the most complicated solution in terms of usability but it handles the most cases. Use pyodide as it is today, have a way to get it all locally easily and document the requirement. It's also low level of effort to implement.

@ryanking13
Copy link
Member

Thanks for the experiments and for opening a discussion @mneil. Maybe we need to use a separate issue for this conversation.

Document the need to download these individually

I was thinking of providing a simple command (#2402) to install wheel files from CDN, which can be done automatically or manually. I'm not sure how these downloaded packages can be easily shipped in bundlers though.

@wenq1
Copy link

wenq1 commented May 22, 2022

Two issues still with 1.20.1-a2:

  1. Doc says it should be imported as below
let pyodide_pkg = await import("pyodide/pyodide.js");

However, in reality if using ESM, one should import as below:

import    { loadPyodide }                                       from 'pyodide/pyodide.mjs';

Note that the filename should be 'pyodide.mjs' not 'pyodide.js', and there's no default export from 'pyodide.mjs'

  1. In nodejs 16, it fails with below
RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
    at process.abort (/tmp/project_root/node_modules/pyodide/pyodide.asm.js:10:234679)

Code used is:

        let     artifacts_dir           = path.join (project_root, 'node_modules/pyodide');
        let     pyodide         = await loadPyodide ({
                indexURL:       artifacts_dir,
        });

@hoodmane
Copy link
Member Author

Thanks for the info about the docs being wrong.

In node v16.14.0 I don't get RuntimeError: abort(undefined):

import { loadPyodide } from "pyodide/pyodide.mjs";

async function main(){
    let pyodide = await loadPyodide({"indexURL" : "./node_modules/pyodide"});
    pyodide.runPython("print('1+1 is', 1+1)");
    console.log(pyodide.version);
}
main();

prints:

warning: no blob constructor, cannot create blobs with mimetypes
warning: no BlobBuilder
Loading distutils
Loaded distutils
Python initialization complete
1+1 is 2
0.20.1a1

@wenq1
Copy link

wenq1 commented May 22, 2022

For the same code, i get

warning: no blob constructor, cannot create blobs with mimetypes
warning: no BlobBuilder
Loading distutils
Loaded distutils
Python initialization complete
undefined
/root/project_root/node_modules/pyodide/pyodide.asm.js:10
var Module=typeof _createPyodideModule!=="undefined"?_createPyodideModule:{};var readyPromiseResolve,readyPromiseReject;Module["ready"]=new Promise(function(resolve,reject){readyPromiseResolve=resolve;readyPromiseReject=reject});if(!Module.expectedDataFileDownloads){Module.expectedDataFileDownloads=0}Module.expectedDataFileDownloads++;(function(){var loadPackage=function(metadata){var PACKAGE_PATH="";if(typeof window==="object"){PACKAGE_PATH=window["encodeURIComponent"](window.location.pathname.toString().substring(0,window.lo

...

xit"](code);ABORT=true}quit_(code,new ExitStatus(code))}if(Module["preInit"]){if(typeof Module["preInit"]=="function")Module["preInit"]=[Module["preInit"]];while(Module["preInit"].length>0){Module["preInit"].pop()()}}var shouldRunNow=true;if(Module["noInitialRun"])shouldRunNow=false;run();
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            

RuntimeError: abort(undefined). Build with -s ASSERTIONS=1 for more info.
    at process.abort (/root/project_root/node_modules/pyodide/pyodide.asm.js:10:234679)
    at process.emit (node:events:526:28)
    at emit (node:internal/process/promises:140:20)
    at processPromiseRejections (node:internal/process/promises:274:27)
    at processTicksAndRejections (node:internal/process/task_queues:97:32)


Node version:

$ node -v
v16.14.2

$ file $(which node)
/usr/local/opt/node@16/bin/node: Mach-O 64-bit executable x86_64

Oops... Sorry I didn't mention that I'm using an Intel version of node on on M1 mac... maybe that's an issue

@hoodmane
Copy link
Member Author

I'm using an Intel version of node on on M1 mac... maybe that's an issue

Hard to know. It is even possible that there could be a regression from v16.14.0 to v16.14.2 though this is probably even less likely. Do you have nvm? Do other versions of node have the problem too?

@hoodmane
Copy link
Member Author

We appreciate the bug report even if it isn't obvious how to reproduce it.

@wenq1
Copy link

wenq1 commented May 22, 2022

Thanks I can confirm that with node arm version, it works as expected. It might well just be the issue with mixed m1, rosetta2, and nodejs/wasm.

EDIT: the issue with the RuntimeError is also resolved on Intel. It's the issue with my own setup.

drwxr-xr-x  10 wc  staff   320 May 22 12:58 node_modules
-rw-r--r--   1 wc  staff  4893 May 22 12:58 package-lock.json
-rw-r--r--   1 wc  staff  1204 May 22 13:00 package.json
-rw-r--r--   1 wc  staff  1993 May 22 13:00 test-pyodide.js
wc@wcAir test-pyodide % node test-pyodide.js 
warning: no blob constructor, cannot create blobs with mimetypes
warning: no BlobBuilder
Loading distutils
Loaded distutils
Python initialization complete
1+1 is 2
0.20.1a1
wc@wcAir test-pyodide % which node
/opt/homebrew/opt/node@16/bin/node
wc@wcAir test-pyodide % file $(which node)
/opt/homebrew/opt/node@16/bin/node: Mach-O 64-bit executable arm64

@rth rth changed the title Release alpha version with node fixes Node package improvements Jul 17, 2022
@TheLostLambda
Copy link

Sorry @hoodmane for pulling this issue back up out of the grave, but your "./node_modules/pyodide" option for the indexURL is something I spent literally hours trying to find.

import { loadPyodide } from "pyodide/pyodide.mjs";

async function main(){
    let pyodide = await loadPyodide({"indexURL" : "./node_modules/pyodide"});
    pyodide.runPython("print('1+1 is', 1+1)");
    console.log(pyodide.version);
}
main();

My only problem is that that breaks when the application is bundled for deployment (the node_modules directory isn't available) — do you have any idea how to actually ship Pyodide with a bundler? Or should I give up and just pull it in from a CDN?

@hoodmane
Copy link
Member Author

Well there's the https://github.com/pyodide/pyodide-webpack-plugin which I believe can bundle Pyodide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants