-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for hashing symbols #1753
Conversation
This is actually a bug that is introduced by minification. The minified version changes if (typeof o.toString === 'function') {
return hashString(o.toString());
} into if ("function" == typeof t.toString)
return B("" + t); This expression |
Hi @leidegre. Those of us who want to see immutable maintained are currently working in a fork while we wait for any further news from Lee. I'd like to incorporate a fix for your issue, but this code seems like a bad idea since it it can easily cause hash collisions. I think it will be possible to leverage the existing mechanisms to hash Symbol objects, particularly since there's no need to worry about the method of extracting hashes that is in place specifically for Internet Explorer, which does not have a native Symbol implementation. |
@conartist6 hash collision? Only private symbols without a description cause hash collisions! I think that's acceptable. This is a non-issue for me atm. I needed this at one point but I had to design around it but it would be great if symbols were supported in the future since they are part of the language now. |
OK, but that's not true. With your code Really we want symbols to have object semantics so that two symbols have the same hash code only if they are the same symbol. We have to use the symbol primitive, because The only downside is that Symbols do leak this way. Once created and inserted into an immutable structure they'll never be GC'd. But I think for 99% of code this is fine. Private symbols are the area of concern, and my impression is that private symbols are generally global or module-level references, held for the life of a program. That said, it would certainly be possible to construct a leak: let m = new Map();
while(true) {
const s = Symbol();
m = m.set(s, null).delete(s);
// s is now gone from m, but the hash cash prevents the GC collecting s
} As of yet I can't imagine any legitimate usage for that pattern. |
Right. It's difficult to come up with a general solution that works all the time. Think of the cost of what you are suggesting. Immutable.js is slower than it needs to be because it tries to edge as close as possible to whimsical JavaScript semantics when all you really care about are strict value semantics. It's okay that a couple of strings hash to the same hash code. It is not okay that most things hash to the same hash code. If you care about hash collisions you should review the hash function used by Immutable.js because it's a really bad hash function. It fails every conceivable statistical test there is for hash functions. I would fix that before worrying about strings and symbols with the same description having the same hash code. |
I take your point that there's lots that could be done to improve here. I'm interested in what it is, but for the moment I'm accepting the current implementation as a given. Given the current implementation, I don't see that what I'm proposing incurs any significant perf penalty so long as there are not an indefinite number of symbols in use as keys. |
Also even that repo says, "See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simplest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table." |
@conartist6 I thought you'd point that out. Did you actually read the paper tough? Have you run the benchmarks? The thing about the HAMT is that it benefits from a good hash function. If you benchmark the results you'll note that the hash quality does make a difference and it's proportional to the size of the hash table. I've done extensive testing on this and I'm just pointing out some stuff that I have learned implementing persistent data structures. I still use Immutable.js but I have written both a persistent vector and persistent (hash) map implementation in JavaScript that out performs Immutable.js in almost all situations with gains between 2x to 6x and as much as up to 50x in the bizarre case. |
I did not read the paper. I'm really trying not to go down a rabbit hole here. I'm have more than enough pet projects to keep me busy. So I'm sure you're right that there's a better way to hash strings, and it's great to know that you have expertise in that area. Can we circle back around to how we got into this with Symbols though? Here's how I see it. Hashing
Caching an autoincrementing ID:
Do I have the considerations right? |
My experience with symbols have been that they all have descriptions. I think it's a reasonable constraint.
I just dislike this approach altogether. I don't think objects as keys makes any sense but that's not how Immutable.js works. And these opinions that I have about how I think it should work is also why I wrote my own implementation which puts greater emphasis on value semantics. I just limit the set of allowed keys to JavaScript primitive types (including Date). And I treat arrays as tuples, so you can do But again, It's just my opinion. I understand perfectly well that you're probably best served by keeping most of Immutable.js as-is. I just don't agree with some of the design decision Lee took with Immutable.js. |
Yeah immutable decided to keep faith with the decisions made by TC-39 here and I don't blame Lee for that. What you're building sounds interesting too though. Is it published anywhere? The composite keys would be very neat. You're not wrong that objects as keys are generally a bad sign, and that a symbol with a description would almost always make more sense. My goal (with iter-tools) is to abstract operations over data structures so that a library like Immutable can be smaller and more focused, and hopefully permitting more data structures libraries with different opinions to flourish and interoperate. What I really want to see happen is actually something Lee took up with TC-39 -- reverse iterators. Otherwise there can never be an efficient implementation-independent way to write a |
To be clear, I don't blame Lee. He's done a great job with Immutable.js, I simply make different tradeoffs for performance reasons. The stuff I've written is part of a system that I cannot share atm but I can share some of the relevant parts in a gist. I use xxhash32 which I've found to be about 30% slower than the djb2/Bernstein hash function used by Immutable.js but it's not a bottleneck and has proper hash function properties which passes the smasher test suite. I use this as a basis of all my equivalence testing (again, for good HAMT hash map performance you need a uniform distributions of bits in all the bits). |
While interesting to some. This is the kind of stuff I don't really care about. I have very stringent performance requirements and all these protocols tend to be just too slow. Iterators are orders of magnitude slower than plain old for loops in JavaScript which is really a shame because it is a nice abstraction but it's a no go for me. Also stuff like async iterators and overhead from promise creation. All the garbage generated just grinds stuff to a halt. I would be much happier if I was able to exert more control over my JavaScript execution. I'm really hoping that it will be possible to use WebAssembly and JavaScript seamlessly with very little overhead to be able to get more performance out. The performance characteristics of things matter a lot to me. |
I ran into this error trying to insert a symbol into a Map.
Alternatively, it is possible to use the
String
constructor to create a string from a symbol, assuming it has a description, otherwise it will behave as an empty string (hash collision wise).Can we merge this or fix this some other way?