Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same hash after re-assigning the value of a function variable #118

Closed
oscard0m opened this issue Jun 25, 2022 · 3 comments
Closed

Same hash after re-assigning the value of a function variable #118

oscard0m opened this issue Jun 25, 2022 · 3 comments

Comments

@oscard0m
Copy link

Hi, object-hash team!

I have the following scenario:

  • 2 objects with the same shape
  • both of them have a property whose value is taken from a variable.
    • that variable is a function
    • I redefine the return value of that function before assigning it to the second object

Because of the changes applied to that function I would expect a different hash but I'm obtaining the same hash.

here's an example that reflects the scenario I just described: https://codesandbox.io/s/object-hash-with-variables-0bskw

Is this correct? Would be reasonable to try to cover this use case?


You can find the full context in this other issue: ezolenko/rollup-plugin-typescript2#228

@agilgur5
Copy link

agilgur5 commented Jun 25, 2022

Hi 👋 , by way of introduction, I help maintain rollup-plugin-typescript2 which uses this library (and used to solo maintain TSDX, which is downstream of rpt2, so have been in the issues before).

fn.toString() doesn't consider many use-cases

@oscard0m provided a great simple reproduction of this use-case, which reminded me of some of the behavior I've seen from this library's toString(), so I did some investigation.

Indeed, it seems that this line in the code calls fn.toString() to hash against. The problem therein, as @oscard0m points out, is that two equivalent function strings may not actually be equivalent functions, as they can use variables etc that can reference different things. Or, in other words, a function's scope, arguments, closure, etc impact its behavior and therefore should be hashed as well.

So the toString() implementation will consider two functions with equivalent code to be equal even if they have entirely different contexts. A more complicated example would be in the case of polymorphism, where two equivalently written functions could have entirely different behaviors.

previous issue history

I dug into the issues a bit and it seems like the topic of function equivalence has actually been brought up a few times in different ways: #98 about higher-order functions, #46 about equivalent anonymous functions, and perhaps the core issue, #21 about closures.

next steps and possible workarounds

Thinking about how to workaround this though, I came up with a similar blank as #21 (comment) did some years ago; I'm actually not sure if this is possible to properly detect from the perspective of the single argument given to the library, an object. One might not have access to the whole closure, variable stack, etc from that perspective.
A debugger could make that accessible, but indeed I'm not sure that the JS runtime itself makes that available. Unless there are some newer developments now? But that would also likely be runtime-dependent as well 😕

This does remind me of a similar problem with React hooks, where useCallback must be passed all of the variables that the closure "depends" on.
The exhaustive-deps ESLint rule created for hooks (as mentioned in the link) could theoretically be adapted for this use-case, but it's a lot more complicated in this scenario as a whole object is passed, not just a function. And, in the case of rollup-plugin-typescript2, we're pretty deep into the call stack and don't necessarily even have a way of accessing that either 😕

There is an alternative hooks solution, hooks.macro, which seems potentially feasible to modify here; as it's a Babel macro, one actually could have access to the scope, closure, etc at compile-time. So, a similar macro could potentially be created as an add-on solution for users with this kind of use-case.


Understandably though, this is a pretty high complexity problem, and as this library is more in maintenance mode with only one maintainer, I wouldn't expect something like this to be taken on anytime soon (if ever).

In light of these challenges, from the perspective of rollup-plugin-typescript2 (and perhaps other folks who stumble upon this issue), our best option might just be to add an additionalCacheDependencies option (or something better named -- cache invalidation and naming are the two hardest problems, right?) that would work similar to useCallback's dependencies array and just be added to the hash as a workaround for this problem.

@agilgur5
Copy link

agilgur5 commented Jun 26, 2022

Did some more thinking about this problem and ended up making a trip down memory lane to the pre-ES6 days and remembering how prototypal inheritance works in JS 😅
These two StackOverflow questions and all their answers (and duplicates) were really a throwback. I'll try to summarize them below.

yea this isn't possible...

I'm actually not sure if this is possible to properly detect from the perspective of the single argument given to the library, an object. One might not have access to the whole closure, variable stack, etc from that perspective.

So this is, practically speaking, impossible. Closures are literally designed so that they're not introspectable. The ES spec does not expose this to the end-user. And back in the ES5 days, closures are how we made "classes" with "private" properties (and ES private fields are a pretty recent feature that came after ES6 too). So this isn't meant to be possible.

(... aside from a fun answer about brute forcing)

there are some debugging tools and newer features, but not really a solution

A debugger could make that accessible, but indeed I'm not sure that the JS runtime itself makes that available. Unless there are some newer developments now? But that would also likely be runtime-dependent as well 😕

Indeed, some of the answers point to using console.dir on a function to get [[Scopes]]... but that is literally a debugging feature of the runtime. While object-hash does have some "runtime detection", e.g. detecting native functions, console.dir doesn't return anything per the spec, so we can't even use that output on a compatible runtime (I also confirmed this in Node too). It's a debugging feature through and through.

There's one answer about using ES6 Proxies which would qualify as a "newer development", but that would require the user knowingly wrapping things with Proxies, which well, is an unrealistic expectation.
MobX (a fantastic library), which uses Proxies in ES6+ compatible environments for observables, is also known for having its fair share of gotchas, as, basically, a user sometimes needs to understand how observables work in some use-cases for proper tracking.

compile-time knowledge is more or less required

at compile-time

This is a pretty important caveat I made in my previous comment, compile-time vs. runtime. Some of the answers get complicated enough to mention parsing the JS code returned from toString, doing reimplementation, and that JS is Turing complete, but even those come with the massive caveats of having access to all source code etc, which obviously object-hash does not have access to from its perspective.

While a Babel macro or ESLint rule is technically possible, that's still a workaround, and not applicable in all situations.
For instance, in rpt2's use-case, that macro would have to be run on your rollup.config.js, which is a config file for your bundler that runs things like rpt2 on your TypeScript source code... so, while possible, that adds another transpilation step of having to transpile your own Rollup config (this is slightly more simplified with configPlugins and I do use rpt2 on my own TS Rollup config), so not particularly viable.

This does remind me of a similar problem with React hooks, where useCallback must be passed all of the variables that the closure "depends" on.

Of course, I think it's a pretty big statement in and of itself that the React team, with Facebook/Meta's backing and some compiler/runtime subject matter experts working on advanced features like hooks, scheduling, etc, couldn't solve this problem and had to settle for a dependencies array plus an ESLint rule.

I also remembered another similar scenario, that of babel.config.js's api.cache functionality which is literally necessary for proper cache invalidation of dynamic, function-based configs, instead of static, JSON files. And Babel literally is a compiler, but it doesn't compile its own config (well, not yet at least) in order to detect things like scope changes (and there's a chicken-and-egg problem with that too, which TS has run into in the past too).

next steps

So I think if neither React nor Babel has solved this, it might be a bit presumptuous to think of solving this.

Indeed, as I mentioned previously, I think a Babel macro or ESLint rule might be the only real "solution" to this, which are not particularly practical for this use-case.

That being said, given that this is like at least the 4th time a similar issue has cropped up in this library, a warning similar to Babel's might be good to add to the docs with regard to any objects with "dynamic" properties like functions. Or perhaps object-hash could (should?) warn when it detects something dynamic like a function, stating the impossibility of properly hashing dynamic properties and possible brittleness therein.
Or could straight up error upon detecting a function, but that gets into a bit of a rabbit hole of what this library is supposed to to hash and what it attempts to.

But yea, I think that's about all that can be done about this, and there isn't really a "bugfix" or "feature" that can really be created for this problem.

@addaleax
Copy link
Collaborator

addaleax commented Jul 4, 2022

There’s a lot of text here, but yeah, ultimately it boils down to:

  • This is a hashing library
  • Hashing algorithms require serialization the of its input into byte sequences
  • There is no cross-platform, standard way of consistently serializing functions in a way that distinguish different instances
  • Consequently, either the hashes for similarly defined functions are the same, or it is not possible to hash functions at all

Given the popularity of this project, and the lack of maintenance resources, my primary concern for this library is keeping compatibility with existing versions, and so I don’t see a reasonable way to take action here. (The real problem is that a lot of people don’t think about what this library inherently cannot do before starting to use it – but that’s not an easy problem to solve.)

@addaleax addaleax closed this as completed Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants