Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted files are committed to the repository #2598

Closed
cy6erskunk opened this issue May 26, 2020 · 26 comments
Closed

Unwanted files are committed to the repository #2598

cy6erskunk opened this issue May 26, 2020 · 26 comments

Comments

@cy6erskunk
Copy link

Unwanted files are committed to a repository after updating Lerna to version 3.22.0 (which contains the change added in #2445).

At least this change does not feel like a minor one, it definitely should be explicitly documented, not only in that PR description.

Expected Behavior

If temporary files are generated during build, they should not be committed back to the repository.
At least this beahviour should not be default, but rather enabled with a switch.

Current Behavior

All untracked files are committed by default.

Possible Solution

Gitignore all the stuff would work, but it should not be.

Steps to Reproduce (for bugs)

  1. Introduce changes to a package.
  2. Generate some secrets and/or trash.
  3. Publish new version with Lerna
  4. All the secrets and trash are committed to the repository

Your Environment

Executable Version
lerna --version 3.22.0
@akoenig
Copy link

akoenig commented May 26, 2020

^^ We also faced this situation today, where a Google Cloud Platform service account keyfile has been committed and pushed to master 🙂 – We were relieved somehow that the issue has been raised here. I also think that this is not a minor change and therefore "should have been" introduced in a major release.

@simllll
Copy link
Contributor

simllll commented May 27, 2020

We are also experiencing this, I don't see a valid reason why these files should be added to git after release.

Maybe this one helps, we could add the "-u" flag to only add modified pathes, but do not add "new" files generated during build process:
https://stackoverflow.com/questions/7124726/git-add-only-modified-changes-and-ignore-untracked-files

In the meantime we have to downgrade lerna, as this is is a breaking change for us.

@alasdairhurst
Copy link

alasdairhurst commented May 27, 2020

Agreed this isn't great, and posted in the original PR with my comment before noticing this. I mentioned that I hoped someone wouldn't have sensitive info published into their repo/package but it looks like it's already happened!

@sunlanterns @evocateur

@fictivecreations
Copy link
Contributor

@simllll Unfortunately using the "-u" flag doesn't solve the original issue this change was for as it would mean that lerna wouldn't correctly add any new packages you may have created between the last release and the current one.

As for the actual code solution - It seems correct to me that if you have any files that show up in your repo that you don't want pushed/published they should be gitignored, and then lerna should respect the gitignore. This way it's a defined config you're relying on and not just a side-effect of the way lerna happens to do things.

As for the versioning/release documentation, I'm not a proper maintainer, just a contributor, so @evocateur made the decisions there, and might have some explanation for why it was released the way it was.

@ediabal
Copy link

ediabal commented May 27, 2020

I'm seeing comments about this being broken for days. This was merged 4 days ago and only released yesterday. Why would someone work directly off of master / auto updating without checking merges for breaking changes. Seems more likely to keep the .gitignore up to date and use package versioning to protect projects from this.

@cy6erskunk
Copy link
Author

I'm seeing comments about this being broken for days. This was merged 4 days ago and only released yesterday.

Version 3.22.0 was published 3 days ago (Sun May 24 2020 23:05:55 GMT+0000 (UTC))

Why would someone work directly off of master / auto updating without checking merges for breaking changes.

Because some people may (wrongly?) believe that lerna uses semver and minor change should not introduce breaking changes?

Seems more likely to keep the .gitignore up to date

I would agree with this one, especially if there would be some notice about possible consequences

and use package versioning to protect projects from this.

Not sure, that I understand this part, @ediabal

@ediabal
Copy link

ediabal commented May 27, 2020

@cy6erskunk I was mistaken with the timeline with the release & I agree that it would have been best to follow semantic versioning.

The last part you have me quoted for is to try to get your package.json set to a version and not auto upgrade your versions (best to commit a .lock file to do this). That could be messed up with not following semantic versioning but a .lock file should protect you.

Overall this wouldn't be an issue if .gitignore and .lock files were correctly set in place to collect automated/env/build files and capture what versions you are using.

@simllll
Copy link
Contributor

simllll commented May 27, 2020

@sunlanterns I tried to read all related threads, but I still don't get it, why should lerna be responsible to add build artifacts? E.g. you have a simple typescript project, but also some plain .js files in it... Now tsc builds type defs and translates ts files to js files. Perfectly fine, this is what I want to deploy, but I definitely do not want these files to get checked in. I see .gitignore to protect general things to not get checked in,but making hundrets of exceptions for all ts files that are compiled to js files and still letting other "plain" js files "be committed" doesn't sound like a very good plan. There are a lot of other scenarios too, e.g. like people said before: adding credentials, temporary files etc..it's a build pipeline we are talking about.

Isn't it like it should be the general approach to not auto add all build artifacts and as it is mentioned in the original bug issue that the opposite (of checking in all build artifacts) is a very special case?

@fictivecreations
Copy link
Contributor

@simllll
You seem to have gotten the wrong idea. I also had build artifacts I didn't want checked in as a part of the original error - the problem was I had these build artifacts added to my .gitignore and lerna wasn't expecting them to be there, so when it tried to publish packages from these additional build artifacts the git commands errored out, because it was trying to push files that were technically in .gitignore.

To further explain, the basic use case that wasn't working was:

  1. Run a build script (like yarn build) that generates several packages from one root source such that you end up with two build folders at the root that you want lerna to publish. Let's say BuildOptionA and BuildOptionB
  2. Do a lerna publish such that it pushes packages for both BuildOptionA and BuildOptionB
  3. Lerna was (before the fix) erring out because it was trying to push versioned package.json files from my generated BuildOptionA and BuildOptionB that had been added to my .gitignore. Git saw those specified files being attempted to be pushed up and was like 'wait! you said to ignore these in gitignore, why are you listing them here?'

When you add all current files instead with git, it automatically works around the files you've already added to your .gitignore as intended. Hopefully that explanation makes sense.

To be honest, I don't understand how you wouldn't get frustrated with not adding all your build artifacts to .gitignore - after lerna has finished wouldn't you have to go delete all those extra files so that you don't commit them to your git repo as you continue to work?

@simllll
Copy link
Contributor

simllll commented May 28, 2020

Thanks @sunlanterns for explaining your scenario. I guess I need to set this up in a demo to see the whole picture, your build is building artifacts that you want to deploy, but these artifacts are "complete" packages (they have their own package.json's). You still ignore these artifacts through gitignore, but lerna wasn't able to publish these packages somehow.
So we agree on:

  • we do not want to get build artifacts to get checked in
  • newly created packages through build process should get picked up and published by lerna

What I'm still struggling with, is the gitignore approach:
All our new setups have a distinct dist folder and therefore it's not an issue at all, but there are thousands of old code bases out there and also we have especially one that comes from the "good old times" where the whole code base was in JS. The package structure is as simple as: "package-folder/folders/someJSFile.js". So far so good, but now we have several files converted to typescript. So the file looks like "package-folder/folderB/folderC/someTSFile.ts" besides this ts file, some other js files live in this folder too. So the only way to get this working with a gitignore file would be to add ALL possible build artifacts of someTSFile.ts (e.g. full path to someTSFile.js and someTSFile.d.ts). Now imagine this scenario with several hundret files. It's also not really safe, as soon as one developer changes one js file to ts they struggle begins again.
So I agree in general having them covered in gitignore would be the better approach, but it's not always doable and not the "default" mode of e.g. tsc (outputPath is the same as the src file). In addition to that there are still some other scenarios. Therefore I assume this approach breaks a lot of setups.

Back to the original issue, can you confirm my two points above? If I understand correctly it would be sufficient if we find a way to let lerna publish these newly created packages without any error message (e.g. maybe https://codeclimber.net.nz/archive/2016/12/19/ignoring-files-on-git-but-just-for-a-while-with-assume-unchanged/)

@cy6erskunk
Copy link
Author

The last part you have me quoted for is to try to get your package.json set to a version and not auto upgrade your versions (best to commit a .lock file to do this). That could be messed up with not following semantic versioning but a .lock file should protect you.

@ediabal well, we do have fixed versions in package.json and also use .lock file. However, we use automation to create PRs with the updates of the project dependencies, and in this case that PR was merged after a quick review of the description of the changes for version 3.22.0.

Overall this wouldn't be an issue if .gitignore and .lock files were correctly set in place to collect automated/env/build files and capture what versions you are using.

Once again, I do agree, that it is a great idea to have a proper .gitignore file.
However, it was quite natural not to mention some files which were added to the project folder during production build which happens only on CI.

In my opinion, adding an option to the publish command (e.g., --commit-anything) enabling current behavior would help people to achieve what was asked in #2151 (i.e., have projects that are entirely generated and be able to publish them without committing to a repo) but would not affect the others.

To be honest, I don't understand how you wouldn't get frustrated with not adding all your build artifacts to .gitignore - after lerna has finished wouldn't you have to go delete all those extra files so that you don't commit them to your git repo as you continue to work?

@sunlanterns One would not be frustrated with not adding the build artifacts (or even build environment stuff), which are only generated/used in CI, especially when a clean environment is created for every build.
Also, there're cases when these artifacts are required to let Lerna do its job.

@fictivecreations
Copy link
Contributor

fictivecreations commented May 28, 2020

@simllll
I have a couple of thoughts for your use case.

  • If you aren't already adding those compiled files to the .gitignore are you relying on your developers to know not to push those compiled files up to github while they're working locally? To me that seems like more trouble than getting the files added to the .gitignore but I acknowledge that's a personal preference.
  • So I would look into maybe naming your compiled .ts files a certain way, or the ts folders a certain way, whether that's putting an _ in front of their name, or adding an additional suffix like .comp.js. Then they would be easily targetable from the .gitignore file
  • EDIT:: And I don't think that solution you linked would work. It would require lerna knowing which files shouldn't be pushed up and lerna just doesn't have that kind of knowledge about your repo without something like the .gitignore file

@cy6erskunk
I understand your use case, and I definitely understand why you might be upset that the change wasn't labeled a breaking change (I feel the need to reiterate that this part of the issue was out of my control).

But I really do believe that if you have "files I don't want merged to github" the correct solution is to add them to the gitignore since that's it's express purpose.

@cy6erskunk
Copy link
Author

@sunlanterns I don't disagree with you on the matter of .gitignore file, but I would also love to have an option to avoid some unwanted swap/tmp/whatever files being committed to the repo (they could appear due to failure of some other tool, for example).
I mean, --do-not-commit-all-the-world or something like that.
Having this behavior properly documented would help people in the future, I guess.

@simllll
Copy link
Contributor

simllll commented May 28, 2020

I very much agree with @cy6erskunk comment, in my case there are workarounds to get a "more" valid npmignore, but it's a lot of effort and several changes are needed to get this running. And there are only "workarounds" because typescript allows me to define some stuff. There are other build tools out there which do not do that though in this fashion. Committing (and especially the side effect of it: adding unversioned files) still doesn't seem as a valid default for me.

Thank you for your thoughts how we could solve it @sunlanterns ! I looked into several ways, unfortunately I couldn't find any easy & any kind of bullet proof fix. Last approach I tried was that I basically modify the gitignore during "lerna build" and dynamically get untracked files (via git status), commit this .gitignore file and try to overcome with this somehow.. but still this is hacky and super unreliable. In the end I probobaly would need to run "assume untracked" manually too. It just doesn't feel right.
Other approach with changing filenames would break a lot of other dependencies as this package is used via "deep"-imports as well.

Anyway, regarding your last point:
Running "git status" will show untracked files, combining this output with an assume unchanged command. Couldn't that solve the original issue?

@fictivecreations
Copy link
Contributor

@simllll
I don't think that would work with git status, you'd run into the same error you would with the -u solution. There are cases where the files that lerna does need to push up to your github are also untracked. You'd be marking those as assume unchanged as well, and would break lerna's process as a result. I wish I could explain that with a more specific example but it's been a few months at this point since I originally did all my research into fixing this issue.

I also agree there should be an easier way for you to add these files to .gitignore. I had the thought that maybe you could use git status to more easily add the correct files to gitignore and found this: https://stackoverflow.com/questions/15862598/how-to-add-all-currently-untracked-files-folders-to-git-ignore

@fubar
Copy link

fubar commented May 28, 2020

I had a couple of directories in my repo that were not meant to be committed. Lerna committed them. This is an obvious security risk that should not need explanation.

@fubar
Copy link

fubar commented May 29, 2020

@evocateur this is a major security incident that needs intervention by a maintainer. Note that we have also reported this to npm.

@fubar
Copy link

fubar commented May 29, 2020

@evocateur to extend on this, the suggestions to explicitly exclude files in .gitignore are impractical and outright unrealistic. They would require any and every file that could possibly be created or moved to the working directory knowingly or unknowingly by me or by any program that I run to be added to .gitignore, or else they WILL get published without prompt for review or confirmation. The lack of sensitivity to the severity of this issue is very surprising to me. People have their private credentials published without their knowledge as we speak.

@simllll
Copy link
Contributor

simllll commented Jun 3, 2020

Is there any progress on this? I just ran into similar troubles in another "older" project, there are Gulpfile tasks that create files, with a random chunk in it (based on the filecontent, it uses "usemin" and other optimizations tasks), they lie in the same folder than the original file though, and therefore it's basically impossible to find a valid gitignore rule. Had to downgrade lerna there too to get things running again.

@openjck
Copy link

openjck commented Jun 8, 2020

I'm familiar with a project that leaked secrets due to this bug.

For whatever it's worth, I think that leaks caused by this bug are going to be under-reported. Developers don't want to draw attention to them.

@fubar
Copy link

fubar commented Jun 8, 2020

@openjck thanks for reporting and agreed, people don't go advertising that their credentials have leaked.

@fubar
Copy link

fubar commented Jun 8, 2020

The lack of attention to this issue is rather disheartening. Let me try tagging all of the maintainers @evocateur @gigabo @hzoo @ndelangen @xtuc

@evocateur
Copy link
Member

I'd appreciate some empathy as a burnt-out maintainer that attempted to solve an issue for a random stranger on the internet and made a mistake.

Very close to quitting for real (instead of burnout-induced intentionally-ignoring) thanks to all of your "helpful" comments in this thread. But don't let me stop you from feeling entitled to my volunteer labor, by all means please continue to attack me.

@sunlanterns Thank you for being patient, more patient than I have energy for right now, in explaining the fact (that I agree with wholeheartedly) that a properly-specified .gitignore is the solution for these problems.

I just cannot believe people can't be bothered to add files containing secrets to their bloody ignore files, as if Lerna made a contract with them read their minds and always do the right thing. Just boggling.

Yes, I probably should have put this behavior behind a flag, but there wasn't exactly a lot of activity on the PR beyond the original author, and it seemed fine to me at the time. That's why I felt it didn't merit a major.

SemVer isn't the law. It's a convention implemented by humans. Humans can make mistakes.

@fubar
Copy link

fubar commented Jun 8, 2020

I feel you @evocateur, I know how it feels. Nobody is attacking anyone. You sound like a capable individual so I'm sure you understand that people get a bit freaked out when they unexpectedly have their secrets published and then feel helpless when nobody responds for two weeks. While I wholeheartedly believe that the only safe security policy to publishing is to whitelist all safe files and not blacklist all unsafe files (practicality issues aside, like you said, humans can make mistakes, and gitignore is for git which involves a multi-step stage, commit and push process), this isn't something you need to solve personally. If you feel close to burnout then by all means take time for yourself and pass the torch to maintain this repo to someone else! We're certainly grateful for this tool and for all your efforts. Having over 40 thousand repos that use lerna requires a certain level of responsiveness to security issues, and I'm sure you know of people who are qualified and happy to help out. So again, thanks for your contributions and let's see what consensus can be reached here!

@evocateur
Copy link
Member

Released as v3.22.1. Thanks all for your patience.

@fubar
Copy link

fubar commented Jun 9, 2020

Thanks @evocateur, appreciate the quick turn around in the end! As for making this the new default in the next major release, it may be worth asking the community for their vote on the preferred default behaviour. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants