[Bug]: Yarn requires state files even when they aren't necessary #6197

robbyemmert · 2024-04-01T22:22:29Z

Self-service

I'd be willing to implement a fix

Describe the bug

I'm setting up github actions to work with yarn. I am using my package.json to store commands useful for deployment, so I can simply run things like "yarn dockerize" in my actions. However, yarn is looking for non-source controlled files before it runs commands. This never was the case with yarn 1.x.

So far, the only solution I can find is: don't use yarn as a script runner.

There's no way I should have to run a full yarn install just to alias some commands. Is there a way to disable these warnings?

To reproduce

Try running a yarn alias without a full yarn install

Environment

System:
    OS: macOS 13.2.1
    CPU: (10) arm64 Apple M1 Pro
  Binaries:
    Node: 20.12.0 - /private/var/folders/b2/jpj_glzn7zz0l5q9tvgbgm900000gn/T/xfs-0a07125a/node
    Yarn: 4.1.1 - /private/var/folders/b2/jpj_glzn7zz0l5q9tvgbgm900000gn/T/xfs-0a07125a/yarn
    npm: 10.5.0 - ~/.nvm/versions/node/v20.12.0/bin/npm
  npmPackages:
    jest: ^29.7.0 => 29.7.0

Additional context

I miss the days when yarn was intuitive and easy to use. I feel like requiring stateful non source controlled files is an antipattern. I've had a lot of issues with yarn 2+ that fall into this category.

robbyemmert · 2024-04-01T22:22:43Z

First, I got this error. Even though my source code explicitly says I use yarn 4, yarn didn't know to use v4.

After enabling corepack in my github action, I'm getting this error:

robbyemmert · 2024-04-01T22:29:44Z

A workaround is to just use shell scripts or NPM to run your scripts, instead of yarn.

However, I would expect the following test cases:

Yarn should intuit that I have 4.x enabled from my source code, without manual setup. It should attempt to run any yarn commands with the version of yarn that I have specified in my source code.
When running commands without a full yarn install, I would expect yarn to run the commands anyway (maybe with a warning, at most), with the understanding that if my commands contain references to local packages, the command will fail with an error such as "command not found: babel" since the packages don't exist locally, yet.

clemyan · 2024-04-02T02:21:22Z

Yarn should intuit that I have 4.x enabled from my source code, without manual setup. It should attempt to run any yarn commands with the version of yarn that I have specified in my source code.

The Node TSC is discussing whether to enable Corepack by default -- the ball is in their court now. Follow the discussion at nodejs/TSC#1518.

Yarn 1 could mimic Corepack and read the packageManager field, but there are a lot of pitfalls there:

Having a stable (and frozen) version of Yarn suddenly download stuff simply by running it is a huge change and would come at a surprise to a lot of users.
If Yarn 1 uses Corepack's cache to do that it would be subject to new Corepack version changing its cache location and structure (and yes that has happened over Corepack's history) and can accidentally corrupt Corepack's cache
If Yarn 1 does not use Corepack's cache, then the user will download duplicate copies of Yarn modern when they later switch to Corepack

If you have ideas how to make Yarn 1 safely adopt packageManager, you can open an issue at the Yarn 1 repo.

When running commands without a full yarn install, I would expect yarn to run the commands anyway (maybe with a warning, at most), with the understanding that if my commands contain references to local packages, the command will fail with an error such as "command not found: babel" since the packages don't exist locally, yet.

Essentially duplicate of #2701.

tl;dr: Yarn cannot determine if any commands you run comes from a dependency package without an install, and cannot safely proceed.

Say you have a script "clean": "del cache/*". Is del a dependency bin or a built-in command? Should the answer change before and after an install? Should Yarn proceed without being sure of the answer, given the command is potentially destructive?

robbyemmert · 2024-04-02T22:17:42Z

Until corepack is enabled by default, here are a few options:

You can infer from package.json what version of yarn to use. The commands to enable corepack/switch yarn versions are pretty standard. It should be pretty easy to automatically enable it based on package.json
You are correct that you cannot guarantee commands will run without a yarn install. However, why do you need to check? It strikes me that checking creates more errors than it's worth. Just attempt to run the script. There are already errors to describe most of the scenarios where a script requires an install first (i.e. command not found: babel). That will be an obvious indicator to the dev to run install first. We are used to these errors with literally every other package manager (including yarn 1.x).

My concern is that yarn 2+ is currently breaking the following principles (pending future functionality that we have no idea when it will be released).

Principle 1 - Source controlled dependencies: It's widely considered best practice to source control your development environment in a declarative manner. Even linters, runners, and cli tools are typically referenced in package.json, and installed automatically on setup. Most of the node ecosystem values this principle in some fashion. This means that Joe has Yarn 4, Jill has Yarn 2, and Jane has Yarn 3. They all run yarn install and get different behavior, even though the source control specifies the specific version of yarn to use.

Principle 2 - Modular design: It's great that yarn can check if all of your dependencies are installed, and it's great that yarn can run scripts, but yarn should not presume on the order that developers plan on running their scripts, or what order yarn features will be used in. For example, if a developer wants to use a custom yarn script to verify that all system dependencies are properly installed before running yarn install, that is currently not possible. Or if developers want to script some of their own package management features (such as automatically logging in to a private registry) that's also not possible. An easy way around this is to make it possible for developers to choose which features they want to use in which order. For example, if there were a yarn verify (maybe where you could pass yarn verify --pattern @my-company), but then scripts were executed without asking questions, then it would be possible for the developer to do something like:

"scripts": {
...
  "prepare": "npm login ...etc && docker login ...etc",
  "start-infra": "docker compose up -d",
  "dev": "yarn verify && nodemon"

Principle 3 - Don't make assumptions: I would highly recommend taking an approach of equipping developers with all workflows with an excellent package manager and command runner, rather than building a opinionated (i.e. facebook specific) tool based around a default workflow or use case. The way this could play out is giving developers a suite of options for building any type of node application. In any case, we know that developers are going to want to version control dependencies, install them, verify they are installed, and run application-specific scripts. But we don't know what order developers plan to run these things in. Therefore, it doesn't make sense for yarn to couple itself to a specific workflow (which is really a business decision), otherwise it is essentially making business decisions on behalf of its users, and risks alienating everyone who wants to make a different business decision (totally unnecessary IMO, yarn has historically been the most versatile node tool out there).

Sorry for the TLDR. I hope this offers a helpful perspective. I would be happy to provide feedback on any potential solutions or give more info if it would help.

clemyan · 2024-04-03T04:56:48Z

Just attempt to run the script. There are already errors to describe most of the scenarios where a script requires an install first (i.e. command not found: babel).

If the bin name matches an external command, it would not error. It would run that command instead. Go back to my example of "clean": "del cache/*". Without an install, this will run the OS command del, which can behave differently from what the del bin provided by a dependency.

If Yarn did not throw without an install, running the same script can do two completely different things before and after an install. And given that most user would not be aware this can happen, throwing should be the only correct default behavior when attempting to run a script without install.

IMO a flag to opt-into the less safe behavior (i.e. "just run the script") or explicitly declare all commands to be external (e.g. yarn run --no-bins script) is acceptable, but I'll let the core team chime in on that.

it doesn't make sense for yarn to couple itself to a specific workflow (which is really a business decision), otherwise it is essentially making business decisions on behalf of its users

No, it is a correctness concern. Whether a command used in a script refer to a dependency bin or external command should not depend on whether an install was performed. Let me know if you disagree with that.

(i.e. facebook specific)

Facebook has not been involved in Yarn since a long time ago, see https://yarnpkg.com/getting-started/qa#is-yarn-operated-by-facebook. AFAICT the last time a Facebook employee made a commit to Yarn was more than 4 years ago.

robbyemmert · 2024-04-08T17:25:52Z

No, it is a correctness concern. Whether a command used in a script refer to a dependency bin or external command should not depend on whether an install was performed. Let me know if you disagree with that.

I guess the question is what "Correctness" is. From a UX/DX perspective, I would consider the standard of correctness to be expected functionality. Up until now, yarn, npm, and to my knowledge most package managers would run a different command based on whether or not yarn had been installed. Therefore, I would say this is expected functionality. I would expect to run a different command based on whether or not a yarn install has been performed.

However, I do like the idea of having a command (or maybe some file system indicator) to verify whether or not a yarn install has successfully completed. This could maintain expected functionality, AND give developers the ability to build in checks where appropriate.

Here is how others (including yarn) have solved this type of issue in the past

Many unix systems throw a verification prompt when installing packages, notifying the user that changes will be made to their filesystem, and asking if the amount of storage space they will occupy is ok. However, we obviously don't want this functionality in build systems. So some systems provide a flag to skip this check.

Example: apt install node -y // The -y flag automatically selects "Yes" on all prompts.

Yarn has used similar functionality to differentiate intent in safe and intuitive ways:

yarn global add X would install a dependency globally instead of installing it to the project folder.
yarn add X would assume you want to install the dependency to the local project folder.
yarn add X in a workspace enabled project root would prompt a confirmation asking for a flag override if you seriously want to add a dependency to the root instead of installing it from a workspace folder.

Here are a few ideas to use these existing paradigms to handle the use cases we've suggested:

Use case 1: Legacy functionality — Running a command in the terminal, respecting whatever happens to be in the local project bin folder:

As a developer, I want to run a script in my development environment that may or may not require node_modules, and may or may not require global binaries.

yarn my-script will run scripts, local bin binaries, or global binaries, as it always has. Yes, this would produce different results potentially, based on whether or not yarn install has run.

Use case 2: Safely execute local, version controlled utils:

As a developer, I want to be sure my commands reference local binaries or scripts.

I could run something like the following:
yarn local my-script This feels like a very yarny way to do this
or yarn my-script --verify-install This would imply a narrow scope of ensuring that a yarn install has run
or yarn my-script --local-only This would imply a broader scope of ensuring that only local version controlled assets are used (and as such erroring if yarn install hasn't run).

Use case 3: Create a workspace alias for (potentially) external commands, to improve and standardize the developer experience (and IDE integration). (My specific use case)

As a developer, I want to create a yarn script to run my build script so that IDEs will automatically pick it up, allowing my team mates to run it from their editor GUI, and for easier integration with CI/CD pipelines.

I want to alias yarn dockerize to something like ./build-docker-image.sh

If yarn install hasn't run, it would be fair to fail the command with a warning something to the effect of "You haven't installed your project dependencies yet. If you want to execute this in a global context, please use XXXXXXXXX"

I could accomplish this by running yarn global dockerize (The yarny way), or
yarn dockerize --no-verify Narrowly telling yarn not to verify the install first, or
yarn dockerize --global-only Telling yarn to use only global resources, and to ignore the local bin folder.

Summary

I think that all of the above suggestions would meet every use case we've discussed (as of now, in this thread). My personal preference would be having yarn my-script run as it always has, and adding either flags or command options (i.e. global/local) to improve safety, and only proactively throwing errors there.

However, if you want to go for a safety first, (IMHO convenience second), approach, then I would recommend providing a way to override the default functionality with a flag.

Something to the effect of yarn my-script --no-verify or yarn my-script --force

Which approach do you think is most feasible?

Hopefully this is helpful!

Thanks,

robbyemmert added the bug Something isn't working label Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Yarn requires state files even when they aren't necessary #6197

[Bug]: Yarn requires state files even when they aren't necessary #6197

robbyemmert commented Apr 1, 2024

robbyemmert commented Apr 1, 2024 •

edited

robbyemmert commented Apr 1, 2024

clemyan commented Apr 2, 2024 •

edited

robbyemmert commented Apr 2, 2024

clemyan commented Apr 3, 2024

robbyemmert commented Apr 8, 2024

[Bug]: Yarn requires state files even when they aren't necessary #6197

[Bug]: Yarn requires state files even when they aren't necessary #6197

Comments

robbyemmert commented Apr 1, 2024

Self-service

Describe the bug

To reproduce

Environment

Additional context

robbyemmert commented Apr 1, 2024 • edited

robbyemmert commented Apr 1, 2024

clemyan commented Apr 2, 2024 • edited

robbyemmert commented Apr 2, 2024

clemyan commented Apr 3, 2024

robbyemmert commented Apr 8, 2024

Here is how others (including yarn) have solved this type of issue in the past

Here are a few ideas to use these existing paradigms to handle the use cases we've suggested:

Use case 1: Legacy functionality — Running a command in the terminal, respecting whatever happens to be in the local project bin folder:

Use case 2: Safely execute local, version controlled utils:

Use case 3: Create a workspace alias for (potentially) external commands, to improve and standardize the developer experience (and IDE integration). (My specific use case)

Summary

robbyemmert commented Apr 1, 2024 •

edited

clemyan commented Apr 2, 2024 •

edited