Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share the strings representing scopes (alternative approach) #9896

Merged
merged 2 commits into from
Dec 5, 2020

Conversation

alainfrisch
Copy link
Contributor

An alternative to #9748 (see that PR for the discussion).

@gasche
Copy link
Member

gasche commented Sep 10, 2020

If I understand correctly, this implementation has the downside that we recompute the whole output each time we grow the scope, which gets us in the base quadratic-complexity case of string concatenation: we share the outputs, but we compute them less efficiently than with the previous String.concat approach. I suppose that this is less of a problem than the previous approach (the number of time each scope is reprinted would be much larger than the typical depth of scopes), but this is still not the aha-fully-satisfying PR.

@alainfrisch
Copy link
Contributor Author

Again, I know little about those scope strings, but it seems that all intermediate strings will actually be needed at some point (i.e. if we build "X.Y.x", we will also need to build "X" and "X.Y"). At least this is what I observe empirically on examples I tried.

@gasche
Copy link
Member

gasche commented Sep 10, 2020

Again, I know little about those scope strings, but it seems that all intermediate strings will actually be needed at some point (i.e. if we build "X.Y.x", we will also need to build "X" and "X.Y"). At least this is what I observe empirically on examples I tried.

Ah! That would be nice.

@stedolan could you maybe confirm that this sounds likely?

@alainfrisch
Copy link
Contributor Author

Some results, compiling typing/typecore.ml on trunk and the two PRs (allocated words returned by OCAMLRUNPARAM=v=1024):

trunk 9896 9748
size of cmo 1_127_159 997_176 993_979
allocated words 45_941_649 46_112_767 46_142_730

So this actually allocates a bit less than #9748 (on this example), but it does not fully remove the duplication of strings (to be investigated).

@alainfrisch
Copy link
Contributor Author

The lack of sharing in this PR is likely caused by the anonymous function case.

@alainfrisch
Copy link
Contributor Author

alainfrisch commented Sep 10, 2020

The last commit shares the computation of the xxx.(fun) strings:

trunk 9896 9748
size of cmo 1_127_159 994_174 993_979
allocated words 45_941_649 45_941_602 46_142_730

@alainfrisch
Copy link
Contributor Author

Remaining lack of sharing is related to multiple values with the same name. Keeping the sharing there would require memoization/hash-consing, but this might not be worth the trouble.

@alainfrisch
Copy link
Contributor Author

Adding hash-consing (through a weak table):

trunk 9896 9748
size of cmo 1_127_159 993_973 993_979
allocated words 45_941_649 45_960_516 46_142_730

@xavierleroy
Copy link
Contributor

Mr. Caution speaking here: ephemerons and weak tables are not very stable yet, with known issues (#9424) and alternate APIs being considered, so I'd rather not use them in the compiler, at least in the bootstrap cycle. I know there is already one use in typing/btypes.ml, but I'm hoping it's just for error recovery. Besides, the gain from using a weak table appears tiny.

@alainfrisch
Copy link
Contributor Author

Thanks @xavierleroy . Do you have a preference between this PR (minus the commit that introduces hash-consing) and #9748 (+ something to free the memoization table between compilation units?)?

@gasche
Copy link
Member

gasche commented Sep 14, 2020

I have a preference for the overall approach here, but it contains larger API changes as well that I have not reviewed yet.

@xavierleroy xavierleroy added this to the 4.12 milestone Oct 7, 2020
Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the API changes, they make the type of scopes opaque which is fine.

I think that the implementation is too complex and slightly ugly at places, Alain got a bit enthusiastic about beating a benchmark target and we don't need all the complexity. I would be in favor of simplifying a bit (accepting to lose small amounts of sharing in the process). I think the general approach is good and I would be in favor of merging eventually.


module WeakString = Weak.Make(struct include String let hash = Hashtbl.hash end)

let memo_str = WeakString.merge (WeakString.create 128)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor of avoiding weak pointers to simplify the runtime surface fo the code, especially given that the sharing gains reported are rather small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed during today's meeting, I've reverted this hash-consing.

Note that if the concern is just the use of Weak tables, one could just use a regular table, and clear it after each compilation unit / toplevel phrase. But it was agreed that the tiny lack of sharing is not worth the trouble.

lambda/debuginfo.ml Outdated Show resolved Hide resolved
@alainfrisch
Copy link
Contributor Author

I think all comments have been addressed.

@gasche
Copy link
Member

gasche commented Dec 1, 2020

The history is kind of a mess right now. We could squash in a single commit, but I think it would be nicer if you could keep the " Make Scoped_location.scopes abstract" commit, and then aggregate all other changes in a single commit.

@alainfrisch
Copy link
Contributor Author

Done. I think parts of the second commit should be part of the first one, but honestly, I think it's a bit pointless to spent time on that. (I'd be in favor of squash-merging anyway.)

@gasche
Copy link
Member

gasche commented Dec 1, 2020

Ah, indeed, the separation does not really work. Let's squash-merge then.

This is good to merge when the CI is green, and we want to include it in 4.12 as it fixes a cmo-size regression that only appears there.

@gasche gasche added the merge-me label Dec 1, 2020
@xavierleroy
Copy link
Contributor

CI is positive: the AppVeyor failure is unrelated (the wretched "weaklifetime" test). Cannot resist the "merge-me" label...

@xavierleroy xavierleroy merged commit 764e2da into ocaml:trunk Dec 5, 2020
xavierleroy added a commit that referenced this pull request Dec 5, 2020
Share the strings representing scopes.

Closes: #9748
(cherry picked from commit 764e2da)
@xavierleroy
Copy link
Contributor

Cherry-picked to 4.12 (8bc0461).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants