Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getProxyFromURI Extraction Refactor #1346

Merged
merged 1 commit into from Jan 13, 2015
Merged

getProxyFromURI Extraction Refactor #1346

merged 1 commit into from Jan 13, 2015

Conversation

seanstrom
Copy link
Contributor

The getProxyFromURI function seems like a good candidate to remove from the
request.js file. That file is bloated and this fits well as a module.

This is still in need of a few more changes, but I figured people can start reviewing.

@simov @FredKSchott @nylen review?

@FredKSchott
Copy link
Contributor

YES!!! Love the concept of refactoring this out. I'm not sure I like the implementation though. Granted the code you're refactoring was previously a complete mess, but the use of shortCircuit is equally confusing to me. Is there a way to document shortCircuit and its use, or possibly implement this without the need for a whole new mechanism?

@nylen
Copy link
Member

nylen commented Jan 9, 2015

I like the concept, but I agree with Fred, just use if statements instead of shortCircuit.

I'm also not sold on the amount of functional decomposition going on for relatively simple operations that are only done once - can you explain why you took this approach, and maybe we can find a better balance between the current code and the proposed changes.

I want to see some more tests of the proxy behavior before we merge this. I'll add that to a PR I'm working on which should be ready early next week.

@seanstrom
Copy link
Contributor Author

The idea of shortCircuit is to take in data that you'll want to return early on, if it meets a given conditional. Similar to the multiple if statements in procedural order. Except you chain conditions with the condition function by passing in a function to compose with.
When one returns true, it doesn't run the rest of the condition function callbacks.

Overall the shortCircuit thing is me having fun, I was just trying to reduce multiple if statements in a clever way ;). I think I accomplished that, but it is pretty adhoc.

The rest of the code I can explain in more detail.

Any function that I have prefixed with get, aside from getProxyFromURI, is used to extract away little implementation specific details. I then use them with decent names, in order to make the code a little more expressive. Functions with the format also have the same principle applied to them.

The noProxyZoneObject function tries to encapsulate the formatting and breaking apart of a noProxyZone, a domain in the noProxy list.
I use that so I don't need branching logic around formatting specific pieces and etc.

matchNoProxyZoneHost and matchNoProxyZonePort are functions around the boolean operations of matching a hostname in the noProxyList. Again this is another way of extracting away implementation details of how its matching, and tries to give an expressive name to the operation.
matchNoProxyZoneHostWithPort exists to do both operations.

matchNoProxyZone wraps the branching logic around picking which matching operation we want to use, one with a port or one without. This could be inlined into the matchInProxyList function, but I wanted that function to just call out to one function. Reason being is that I wanted extract more branching logic from original for loop. It also reads nice to me :)

matchInProxyList just kickstarts the looping process. Though I removed the for loop in favor of the some iterator. I believe it has the same behavior where it will return early if it finds a single match.

hasWildCard, insideNoProxy are functions I only made to be used with shortCircuit.
They can be inlined if/when we move away from that.

Naming is subject to change for better, more expressive names, but what's there is okay.
Also comments are not finished, I just moved all the comments into single area for reference.

@seanstrom
Copy link
Contributor Author

Overall what I did here was method extraction, from the large body of code that was getProxyFromURI.
Unless there's a big performance issue that arises from the changes, I would consider this just as quick.
Additionally I would think this is more maintainable, a bunch of small functions that compose to bigger operations. More specifically I tried to make each function to have Single Responsibility, to do one thing and to do it well.

@nylen
Copy link
Member

nylen commented Jan 9, 2015

The idea is a good one: request.js is too long and unwieldy, and we definitely want to break off pieces like this both to make it shorter and to make unit testing easier.

However:

  • shortCircuit is clever, but the readability gained from using existing language constructs like if is more important in the long run.
  • The existing code is easier to read and maintain: it has a straightforward flow of execution, appropriate comments, and it is shorter.
  • I like the use of .some() instead of the for loop, but I don't think it's a good idea to have functions for basically every line of code. Breaking code into functions is great when the same fairly generic logic needs to be used in multiple places, and that's where the SRP really shines, but that's not the case here.

Right now I'd take a PR to split out the existing logic into a new file with minimal changes (and I'd probably name it lib/proxy.js and call it as proxy.getProxyFromURI, in case we break out other proxy-related functions in the future).

@seanstrom
Copy link
Contributor Author

I agree shortCircuit isn't the right answer for everyone, just a fun example of code.
That will be removed shortly since I am not crazy about it.

The existing code is monolithic and just nests branching logic when it needs to.
To me, that's by no means easier to maintain, rather more obvious because it doesn't use any functions at all. Because of that it sacrifices readability and organization for this straightforward-ness.
Out of original and current refactor, I would imagine the second to be easier to understand and change overall. If you look at the original, it gives off the sense that you need large code comments to explain what the big function is even doing.

Though it doesn't need to do it that way, and that's why I started breaking things out.
Extract method refactoring is a natural refactor of something that is too big, later we can conflate highly related functions into one since that can be also apart of the refactor (Inline Method Refactoring).
This refactor isn't finished, so I understand why anyone would be on the fence, since it favors Extract Method over Inline Method heavily.

That being said, I rather decouple things by default, sometimes a little over-engineered ;), than just leave things as they were. Overall, the whole point of breaking out the code is to refactor it to a "better" state.
I know that is highly opinionated goal, but I don't think I am that far off.

My suggestion would be to:

  • Inline more functionality into the some callback.
  • Remove shortCircuit in favor of more readable, predictable code.
  • Leave some if not most of the helper functions that add expressive names to implementation specific code. Meaning stuff like get or format prefixed methods.

@nylen thoughts?

@nylen
Copy link
Member

nylen commented Jan 9, 2015

Yeah, shortCircuit was a cool idea which reminds me of your soc but I'd rather not create a new pattern here since it doesn't save much.

I agree there is a too much if/else nesting in the existing logic, that can be broken up a bit. I would say to inline as much as you can stand to, that way we try to preserve a straightforward execution flow.

@seanstrom
Copy link
Contributor Author

Yeah we can all agree that shortCircuit and soc were a stroke of genius, but they're too ahead of their time, ahah ;) just kidding. To be honest, there's not much we can do but do a little nesting of if statements, just because we return null in the very end.

Im in the middle of applying the changes we just agreed on, I think the next revision will be much nicer, now that I can see everything in little pieces I can notice more similarities and conflate things.

Thank you for all the feedback!

@seanstrom
Copy link
Contributor Author

@nylen @FredKSchott made some changes. Tell me what you think.

@seanstrom
Copy link
Contributor Author

No love :(. @simov @nylen @FredKSchott anyone?

@simov
Copy link
Member

simov commented Jan 12, 2015

I really enjoyed this conversation 👍

I really like the fact that all of the code now is in one file, and I'm not sure that the shortCircut stuff in utils was ever going to be used by someone else.

I think your code gets a bit overly verbose, most of your functions (and they are alot), I think except two of them, are used only once. Do you really thing that this peace of code is ever going to be re-used in some way?

Without being familiar with that part of the code (the old function) have too much local variables, also three levels of nesting if-for-if-else but on the other hand it's only 50 lines of code, so I can't imagine it transforming into more than 3 functions.

Having too much function requires too much context switching and remembering what each and every function and tiny scope in it do. So you must strike the balance. Again I'm not familiar with that part of the code but try to look at it in terms of what's get done, not in terms of each and every single line.

@seanstrom
Copy link
Contributor Author

I appreciate the feedback, but I have a couple of concerns.

I don't think there's a "Lines of Code to Proper Amount of Functions" equation and I really don't think that shortness of code justifies that its better.
For example I could code golf an answer but it doesn't mean its the best answer.
On the other hand, extracting to many things can add an overhead.

But I will contend that the only complexity I added to the getProxyFromURI function, aside from shortCircuit which is removed now, were function helpers.
The reason some of them are short one off functions is because they give a name to a process that is being performed.
For example formatNoProxyZone is a function that gets used once, but it extracts the workings of how a noProxyZone is formatted and gives it a name to what it is semantically doing.
Now if we change the way we format a noProxyZone, that process is still encapsulated in that function.
Another example would be the getHttpProxy and getHttpsProxy functions.
These or only used once, but I feel that they give a name to us looking for the HTTP(S) Proxy.
I think it makes the code look more declarative.

In conclusion I would say that each function is a good bite size, and can be debugged and understood easily. The true beauty is that the names to the functions are hopefully expressive enough to where one can read the the program and know what it is trying to do. Without knowing each internal implementation detail, which the original was almost the opposite, you would see all the details and then translate that to what's trying to do.

What are your thoughts on that? Does that make sense? I could totally be wrong, but I feel that I am coming from a good place.

@simov
Copy link
Member

simov commented Jan 12, 2015

The functions per code was exactly as your statement a few post above, when you said that you sometimes overly engineer things and rather decouple them - just a point of view.

So how the added complexity can be justified?

As for explaining me the encapsulation and stuff - do we really need that in this case?

Hopefully I'm not getting too annoying with this, but you asked me in the first place :)

@nylen
Copy link
Member

nylen commented Jan 12, 2015

These simple operations don't need names.

We're not trying to prescribe coding style mathematically. However, I think this is a good rule of thumb: if a function is only referenced once and it is one line long, then it should always be inlined!

I hope this will help explain why. Here's another way to give the operation of "get the http proxy" a name:

if (uri.protocol === 'http:') {
  // get the http proxy setting
  return process.env.HTTP_PROXY || process.env.http_proxy
}

This comment is unnecessary because it is already pretty obvious what is going on from the code. I could see adding a link to some spec or explanation that these environment variables are commonly used to specify proxies, but really, someone who doesn't know that has no business changing this code.

Pulling this logic out into its own function has an additional major drawback: it requires the human reader of the code to switch context. Multiply this by 10 or so times of using this technique, and you have a piece of code that (to me) is very difficult to reason about because it now requires simulating a fairly complicated call stack in your head.

I was the one who reviewed and merged the original PR for this change (#1096). I caught a bug with the line hostname.indexOf(noProxyHost) === hostname.length - noProxyHost.length: requests to google.com will not be proxied if no_proxy contains oogle.com. I can tell you definitively that had this code been written in your proposed style, I would not have caught this bug because it is many times easier to reason about code that executes mostly linearly and fits on a single page.

I get what you're going for here: decomposing this logic into functionally pure units which clearly describe what they are doing. We all know that decomposing things into functions is a good thing, and has lots of other benefits like testability, but this is too much decomposition.

Practicality (the ability for a human like me with limited brainpower to quickly review an unfamiliar piece of code) trumps beauty (functional purity and declarative style). So, I'd accept a PR which makes minimal changes to the existing code, and really I'm not even sold on that - 3-4 levels of if/for nesting is not really even approaching WTF levels in my opinion.

@seanstrom
Copy link
Contributor Author

To be honest, I feel like you have good points @nylen.
Would you agree that this is more of a matter of a opinion than anything else?
If so I can make the minimal changes to please the other contributors and the project.
I'm here to help not argue aha.

But let me pick your brain a bit more.
Most of the arguments made against more functions is that there's mental overhead from having to read the functions and following indirection from function to function.
I can see the argument against the indirection, but I can't imagine any of us to be un-familiar with that, and I really don't think it's any kind of anti-pattern. To many functions can be a problem if there not helping much, but I think that's a per incident scenario to really decide that.
For example, any function that is used once doesn't always means that it doesn't deserve to be a function. A very small function can be considered pointless, though I would say if its doing somethings that are a little obscure, they could be reasoned to be in a function.

That's overall the decision I made, that we're at a disagreement with, right?
I have several one line functions, that I think are nice, and others feel don't deserve to be functions.
Which is fine, I'll inline those things. That to me is still a better state then when we started.
The original code IMO looked more like C written in Javascript than well organized Javascript.
I tried my best to get it to a better state. :)

I'll make more minimal changes to the original code and get back to you guys.
My opinion may still be disagreed with but I'm going to continue to push for that since I feel that will help request, and that's what I am here to do.

Thanks again @nylen @simov for your patience, I know I am stubborn, but I won't hold this up anymore.
Hopefully this experience hasn't left a sour taste in anyone's mouth, just because of my opinion and style.

@FredKSchott
Copy link
Contributor

@seanstrom I like what you're trying to do here, but as a reader jumping into this new file (its changed a bit since I last saw it) I have trouble navigating and understanding it. Even though my only concern is with the main, bottom function it was still a frustrating experience having to jump up and down across the file to see what's going on.

I wouldn't block this from being merged, in many ways this is good and clean code, and its certainly not bad code. But I did want to share my experience trying to understand it.

@seanstrom
Copy link
Contributor Author

@FredKSchott did you review the latest code, I just updated it a second ago

@FredKSchott
Copy link
Contributor

Whoops! No I did not, looking now :)

@FredKSchott
Copy link
Contributor

New solution looks perfect, imo. Awesome stuff @seanstrom!

@seanstrom
Copy link
Contributor Author

Hooray! @FredKSchott your cheers put me in a good mood!

@seanstrom
Copy link
Contributor Author

@nylen @simov comments?

@nylen
Copy link
Member

nylen commented Jan 13, 2015

Ah, yeah this is good stuff, thanks. Leaving a couple notes now, none of which will block me from merging (assuming you will squash these commits together when we're done).

}

function noProxyZoneObject(zone) {
zone = zone.trim().toLowerCase()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this parseNoProxyZone, then you can do ...

@nylen
Copy link
Member

nylen commented Jan 13, 2015

A couple of things I really like about this:

  • The tricky piece of logic (uriInNoProxy) is exposed in a way that makes it very easy to unit test later.
  • Pulling formatHostname into a function improves clarity rather than sacrificing it.


// if the noProxy is a wildcard then return null

if(noProxyIsWildCard) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space in if (

@simov
Copy link
Member

simov commented Jan 13, 2015

@seanstrom this code is looking good to me 👍 good job everyone involved.

EDIT: Oh and btw, the end result is exactly 3 functions besides the exported one 😛

@nylen
Copy link
Member

nylen commented Jan 13, 2015

I wasn't sold on the refactoring originally but this is some pretty code. Thanks @seanstrom.

nylen added a commit that referenced this pull request Jan 13, 2015
@nylen nylen merged commit 5d515b0 into request:master Jan 13, 2015
@seanstrom seanstrom deleted the support/extract-getProxyFromURI branch January 13, 2015 16:34
@seanstrom
Copy link
Contributor Author

@simov hahahahaha i didn't even realize it mapped to what you mentioned early.
hilarious 😺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants