Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically detecting smart Git hosts #1628

Merged
merged 4 commits into from Mar 26, 2015

Conversation

nwinkler
Copy link
Contributor

Added logic to automatically detect smart Git hosts that allow shallow cloning. This is done by sending an ls-remote request to the server and then evaluating the returned HTTP header fields. For this, Curl
verbose logging is enabled for the ls-remote request, since Curl verbose logging sends the returned HTTP headers to stderr.

If the stderr output contains the desired header

Content-Type: application/x-git-upload-pack-advertisement

then the server supports shallow cloning.

This approach uses Git and Curl for the heavy lifting. Instead of implementing the request to the server using a simple HTTP client, Git is used, since it takes care of authentication using stored credentials.

The used approach should also work for BitBucket, which only sends the Content-Type header when a specific user agent is used. Using Git to make the request enables this behavior.

The function to detect the smart Git host (GitRemoteResolver.prototype._supportsShallowCloning) returns a promise that is resolved when the server's request is evaluated. The promise handling required an addition to GitHubResolver.js - to always resolve the promise to true, since GitHub supports shallow cloning.

This should fix the issues #1558, #1559 and #1568 - and provide a better solution than #1393, which disabled shallow cloning for everyone.

I don't have an instance of GitHub Enterprise that I could test this with - but I'm pretty confident that it is working. It would be great if someone with access to a GitHub Enterprise instance could verify this.

If you run Bower with -V (for verbose logging), you should see a message indicating whether the host in question (for non-GitHub hosts) supports shallow cloning, i.e. is a smart host.

The code could be optimized to cache information about the hosts - I haven't added this yet since I wanted to get feedback on whether the approach I've taken is sensible/acceptable.

var stderrString;
var isSmartServer;

stderrString = stderr.toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moot as stderr is already a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, of course. I copied this from here: https://github.com/bower/bower/blob/master/lib/core/resolvers/GitRemoteResolver.js#L180 - toString() is used on stdout.

@sheerun
Copy link
Contributor

sheerun commented Jan 11, 2015

  • Travis is failing for some reason
  • We probably need at least one test for this feature
  • Can you squash commits?

@nwinkler
Copy link
Contributor Author

Sure - I'll take a look.

@sheerun
Copy link
Contributor

sheerun commented Jan 12, 2015

Btw. there helpers.require helper that allows for mocking of requires in tests. It may be helpful.

@nwinkler
Copy link
Contributor Author

Do you have an example for using helpers.require to mock a spawned command (via the cmd function)? That would be helpful.

I checked the existing test cases, but I couldn't find any instance where cmd was mocked.

@sheerun
Copy link
Contributor

sheerun commented Jan 12, 2015

Something like this should work:

var gitRemoteResolverFactory = function (handler) {
  return helpers.require('lib/core/resolvers/GitRemoteResolver', {
    '../../util/cmd': handler
  });
};

var gitRemoteResolver = gitRemoteResolverFactory(function (cmd) {
  return Q.all(["stdout", "stderr"]);
});

EDIT: fixed few things

@nwinkler
Copy link
Contributor Author

Thanks - this looks helpful indeed! I'll give it a try.

@nwinkler
Copy link
Contributor Author

@sheerun Thanks a lot! Your code example was really helpful.

  • Added unit tests to verify the smart host detection and the --depth 1 parameter when cloning.
  • Squashed commits
  • Rebased onto master

Please let me know what you think.

@nwinkler
Copy link
Contributor Author

Sorry for being such a nuisance, but is there anything I can do to get this moving?

@sheerun sheerun mentioned this pull request Mar 19, 2015
16 tasks
@sheerun
Copy link
Contributor

sheerun commented Mar 23, 2015

@nwinkler After I rebase it on master, the tests are failing :( Could you look at it?

@nwinkler
Copy link
Contributor Author

Sure, I'll take a look at it either tonight or tomorrow morning.

// negotiation that needs to take place.
//
// The above should cover most cases, including BitBucket.
GitRemoteResolver.prototype._supportsShallowCloning = function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow this is so excellent!

Almost seems like this method deserves its own npm lib

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - appreciate the feedback!

@sheerun
Copy link
Contributor

sheerun commented Mar 24, 2015

this._shallowClone probably needs to be a lazy promise, instead of eager one.

Also, sometimes parsing url fails, and this._remote is null.

@sheerun
Copy link
Contributor

sheerun commented Mar 24, 2015

Also, probably result of this should be cached between class instances (as far as I see GitRemoteResolver is instantiated for each source in bower.json; instead there should be only one remote call per host).

@nwinkler
Copy link
Contributor Author

Yes, I'm aware that the result should probably be cached per host. I wanted to add that once I'm sure that the approach taken is acceptable.

Can you give me an example under which circumstances you see the remote parsing fails?

And I'm not sure I'm following with regards to eager/lazy promises - can you give an example?

@nwinkler
Copy link
Contributor Author

I think I've found the culprit with regards to the failing test cases. It's in test/core/resolverFactory.js. The it('should recognize git remote endpoints correctly') test is creating several instances of GitRemoteResolver (one per URL), and they are all executing git ls-remote (through _supportsShallowCloning).

It looks like the way to solve this would be to mock the GitRemoteResolver._supportsShallowCloning call for these invocations. I've tried using helpers.require, but I'm not sure how to inject the mocked instance into the resolverFactory in the test.

This test (and the one after it) is essentially "spamming" the cmd throttle queue with calls to git ls-remote to detect whether the host supports shallow cloning. These queued requests are causing the timeouts in the other test cases, causing them to fail.

@sheerun
Copy link
Contributor

sheerun commented Mar 24, 2015

I feel the tests should pass even before you implement mocks. That's why I suggested lazy promise (i.e. check shallow cloning support only when _shallowClone is actually used, instead executing it right away in constructor. This can be done by using _shallowClone to cache promise, and calling _supportsShallowCloning directly. Something like:

_supportsShallowCloning: =>
  return @_cachedClone if @_cachedClone
  @cachedClone = Q.resolve(true)

After that you'd need to properly handle failure case of shallow clone checking, and default it to false.

So for example if ls-remote takes more than two seconds / returns non-zero code you assume that host doesn't support shallow cloning.

After that point probably mocks are OK, but I'm not sure how to handle it. Probably we'd need to use https://github.com/mfncooper/mockery and intercept cmd calls selectively (or even introduce new abstraction layer on commands, so we can mock them more easily).

@nwinkler
Copy link
Contributor Author

Yes, that makes sense. The lazy promise should be easy to add. Thanks for the hint!

nwinkler and others added 2 commits March 24, 2015 17:13
Added logic to automatically detect smart Git hosts that allow shallow
cloning. This is done by sending an `ls-remote` request to the server
and then evaluating the returned HTTP header fields. For this, Curl
verbose logging is enabled for the `ls-remote` request, since Curl
verbose logging sends the returned HTTP headers to `stderr`.

If the `stderr` output contains the desired header

  Content-Type: application/x-git-upload-pack-advertisement

then the server supports shallow cloning.

This approach uses Git and Curl for the heavy lifting. Instead of
implementing the request to the server using a simple HTTP client, Git
is used, since it takes care of authentication using stored credentials.

The used approach should also work for BitBucket, which only sends the
Content-Type header when a specific user agent is used. Using Git to
make the request enables this behavior.

The function to detect the smart Git host
(`GitRemoteResolver.prototype._supportsShallowCloning`) returns a
promise that is resolved when the server's request is evaluated. The
promise handling required an addition to `GitHubResolver.js` - to always
resolve the promise to `true`, since GitHub supports shallow cloning.

Added test cases to verify the new functionality.
@nwinkler
Copy link
Contributor Author

I've changed the implementation to only call git ls-remote when needed, and not from the constructor. Please take a look at the updated code and let me know whether you think I should something else.

The tests run fine for me locally now - everything rebased on master.

Let me know - I can spend some more time on this tonight/tomorrow if required.

@nwinkler
Copy link
Contributor Author

Put some more work into this and added the following:

  • Caching for hosts that support shallow cloning. They should only be requested once, minimizing the number of requests.
  • Verfication that the this._remote is not empty. In this case, false is returned.

Added several unit tests to verify the above.

@nwinkler
Copy link
Contributor Author

Let me know if I should squash commits at one point...

@sheerun
Copy link
Contributor

sheerun commented Mar 24, 2015

We probably need to add caching, as without it this feature can slow down things instead ;/

We can probably extract this functionality it as util. Maybe even npm module.

I can squash commits for you when merging. You'll still be present as an author.

@nwinkler
Copy link
Contributor Author

I've added caching in my last commit, it will only try once per host.

Looking forward to seeing this merged! And thanks for all the help!

@nwinkler
Copy link
Contributor Author

Just a note to show how significant change is. For one of our typical projects, the time for bower install with v1.3.9 was two minutes. With the change in v1.3.10 (disabling shallow cloning), this time increased to seven minutes. With the above changes, it's down to two minutes again.

Really looking forward to having this in the official version.

@sheerun
Copy link
Contributor

sheerun commented Mar 26, 2015

Looks good to me now :) Thank you for this amazing work!

I'll release it as soon as I finish tests for the login command: #1732

sheerun added a commit that referenced this pull request Mar 26, 2015
Automatically detecting smart Git hosts
@sheerun sheerun merged commit 4d59d26 into bower:master Mar 26, 2015
@sheerun
Copy link
Contributor

sheerun commented Mar 26, 2015

@nwinkler For some reason few tests are failing on master after merging (only on travis).

Could you look at it?

@nwinkler
Copy link
Contributor Author

I'm taking a look now. It's strange, since in the PR Travis build, everything was fine: https://travis-ci.org/bower/bower/builds/55695755

@nwinkler
Copy link
Contributor Author

I don't think it's my code that caused the unit test issue. I created a new branch off master before you merged in my PR, and the same error happens there: https://travis-ci.org/nwinkler/bower/jobs/55915627

It looks like something changed in the environment or dependencies between yesterday and today.

@nwinkler
Copy link
Contributor Author

@sheerun I found the culprit for the failing tests: The request module updated from v2.53.0 to v2.54.0. The semver in bower's package.json has ^2.51.0, which leads to an update to v2.54.0.

If you set the version of request to 2.53.0 in package.json, the tests will start to work again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants