Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Expose tokensMap as public #1975

Open
matthew-dean opened this issue Aug 15, 2023 · 6 comments
Open

[Feature request] Expose tokensMap as public #1975

matthew-dean opened this issue Aug 15, 2023 · 6 comments

Comments

@matthew-dean
Copy link
Contributor

There are a number of useful class properties / methods on the BaseParser / CstParser that aren't exposed via TypeScript, so it's hard to know if they're stable / safe to use. One of those is tokensMap. I'd been creating this exact type of map for use when parsing, so I was surprised to find the same structure on the instance itself when debugging. I'd like to eliminate creating this map, but I'm not sure if tokensMap is safe to use.

@bd82
Copy link
Member

bd82 commented Aug 24, 2023

In theory it could be changed to a Map instead of a dictionary at some point.
I am not sure its that important to expose it, particularly when it is trivial to create it...

e.g. something like:

import { createToken as orgCreateToken } from "chevrotain";

const tokensMap = {}

function createToken(opts) {
  const newToken = orgCreateToken.call(null, opts);
  tokensMap[opts.name] = newToken
  return newToken;
}

@matthew-dean
Copy link
Contributor Author

@bd82 Your code is precisely what I am doing. I just would prefer to use internal structures if they are stable. I think you're probably right though that Maps can be slightly faster for lookups than object keys (although V8 and other engine internals are constantly changing).

@msujew
Copy link
Collaborator

msujew commented Aug 24, 2023

FYI, object key lookups in v8 (and most other engines as well) are multiple times faster than map lookups. Noticed that while experimenting with #1793.

@matthew-dean
Copy link
Contributor Author

matthew-dean commented Sep 3, 2023

@msujew

FYI, object key lookups in v8 (and most other engines as well) are multiple times faster than map lookups.

Do you have data / benchmarks to back that up? Why would a structure that is specifically designed for fast mapping (Map) be much slower than an object used as a map?

An initial Google search returns a lot of evidence of maps not being faster in almost every regard, but also much more memory efficient.

However, you have to be careful with testing performance (I guess?) because thousands of repeated object writes / lookups may be optimized with inline caching to profile and return the same internal shape for an object, which would not happen as frequently with a real-world case with limited writes / reads. So a test of "object vs. map" can show results that are actually in-applicable (unless, I think, you use a fresh set of keys or an isolated / fresh v8 process per single run of a single test case).

But I'm happy to see more data, if you have it, because I think this is something that is not 100% clear to most JS developers. It seems like a Map would be theoretically faster in every way, because it's an optimized structure for, well, mapping. So it would be counter-intuitive if it was otherwise.

@matthew-dean
Copy link
Contributor Author

matthew-dean commented Sep 3, 2023

@msujew Here is what a V8 developer said in response to some on this issue who said:

I'll have to revert back to the old-school "object as map" coding style.

The V8 developer said:

If you do that, you will have fallen victim to a misleading microbenchmark.

In the very special case of using consecutive integers as keys, a plain Object will be faster, yes. Nothing beats a contiguous array in that scenario. So if the "indexed accesses everywhere in your codebase" that you mentioned are indeed using index sets like the integers from 0 to 1M, then using an Object or Array is a good idea. But that's a special case. If the index space is sparse, things will already look different.

In the general case of using arbitrary strings in random order, a Map will perform significantly better than an Object. Even more importantly, the way such object property accesses are handled (in V8, and quite possibly in other engines too) has non-local effects: if one function puts excessive stress on the slow path of the object property lookup handling system, then that will likely slow down some other functions relying on that same slow path for their property accesses.

The fundamental reason is that engines optimize different things for different usage patterns. An engine could implement Objects and Maps pretty much the same under the hood; but that wouldn't be the ideal behavior, because different usage patterns benefit from different internal representations and implementation choices. So engines allow you to provide them with a hint: if you use a Map, the engine will know that you're planning to use the thing as a map (duh!), where random keys will come and go. If you use an Object, then the engine will (at least at first) assume that you want the set of optimizations that work best for your average object, where the set of properties is fairly small and static. If you use an Array (or Object with only integer properties, which is nearly the same thing in JS), then you're making it easy for the engine to give you fast integer-indexed accesses.

Using "x" + i as key is a good suggestion to demonstrate how quickly a microbenchmark can be changed so it appears to produce opposite results. But here's a spoiler: if you do (only) this modification, then a large part of what you're measuring will be number-to-string conversion and string internalization, not Map/Object access performance itself.

So that's from two sources: one is a v8 developer, and another is someone profiling with some awareness of v8 internals. Both with the same takeaways:

  • Objects as maps are faster only in very niche cases, specifically with linear object numerical keys (0, 1, 2)
  • In almost all other cases, such as in the case with something like a CSTChildren dictionary, or a tokensMap in this case, where the key names are arbitrary and non-linear, the evidence suggests a Map will outperform an object-as-map and use less memory

But, again, if you have evidence that contradicts these results, it would probably be worth sharing!

@matthew-dean
Copy link
Contributor Author

One more point: something like an IToken object in Chevrotain is probably best to stay as an object map (or, alternatively, class instances) because it is a fixed shape and that shape is re-used a bunch of times. So that could benefit from inline caching where individual maps could not. So I think you have to think about what is the "map" you're trying to store and how many "copies" of that map shape you might consume. But this gets into nuance where a JS engine dev would know better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants