Add value kinds on blocks #10716

chambart · 2021-10-19T17:19:13Z

This PR adds one more case on the Lambda.value_kind type: Blocks, and populates it when the type of a variable is obviously a block.

This is not used in the compiler, but is useful for flambda2. In particular it helps in some complex unboxing situations.

Since types could be quite large and to avoid problems with recursive types that wouldn't be detected by the crude check, there is a fuel counter to prevent producing deep value_kind. In practice there isn't too much information loss by keeping that limit low as most of the time when accessing deeply nested structure, there would be multiple points with type annotations, allowing to recover the full type where it is needed.

I'm not completely sure that the Transient_expr thing is correct or that it is the way it is supposed to be used.

For instance type t = { foo : t } [@@unboxed] let f { foo } bar = ()

No branch are cut by it that wouldn't be by the fuel limit

lthls · 2021-11-05T15:12:10Z

Note: the scary Transient_expr stuff has been removed (it wasn't doing what we wanted anyway).
We would still like a review of the code in typeopt to compute value kinds from someone with knowledge of the type-checker.

garrigue

The code in Typeopt looks fine to me.
It could of course do more, but at least what it infers seems correct.

On a side note, I am bit surprised that scrape_ty calls correct_levels for every expansion. This should not be strictly needed (type definitions in the environment are supposed to only contain types at generic level), but maybe it's better to stay on the safe side.

gasche · 2021-11-06T07:17:08Z

Note that the call to correct_levels has been linked to performance issues with the generalized usage of scrape_ty. It was changed recently to be used only on Tconstr nodes rather than all types, for performance reasons: see #8776 and #10624.

gasche · 2021-11-06T08:01:53Z

I'm disheartened by the addition of fuel in yet another place. Fuel in general marks that we gave up on understanding the termination properties of our algorithms, and I suspect that having fuel in more places is a form of technical debt that we are building up. Some remarks on this.

Doing without fuel?

In #10479 we propose to rewrite get_unboxed_type_representation to avoid fuel. It may be the case that the same technique lets us do without fuel here -- I'm not sure which are the situations where your fuel is necessary for non-termination. The consensus on #10479 is that we want to merge it eventually, but it still requires more work in particular to measure the performance impact, and I don't have time to push on this right now. So I'm not suggesting that the present PR should wait on #10479, but I would prefer to understand whether it's the same sort of non-termination we are talking about, or something different, etc.

Factoring the code?

There are many analyses in the compiler that work on the type expressions -- they all require termination control due to cyclic and/or recursive types, and that logic is subtle. We could think of factoring the termination-control as a reusable service, but could we also factor the analysis code? This would also be useful for our work-in-progress on constructor unboxing, where we compute the "shape" of datatypes in a way that seems related to the present analysis (we approximate blocks by their tags, with no information on the leaves except eventually the arity, but we have a more fine-grained approximation of unions, keeping a set of tags). Is there a mother-analysis that could be implemented once, and then partially-erased to derive simpler approximations?

Explaining the code

What's above are what-ifs for the medium term. In the short term, if we are to have another kind of fuel in the compiler codebase, could you please explain in comments:

what the fuel is for (what is the recursive structure of the function, and what are the known sources of non-termination?)
how the fuel limit was chosen?

Different kinds of fuel

There seems to be two different usage modes of things called "fuel" in the compiler:

One notion of fuel is only present to avoid non-termination, and avoids infinite traversals on circular objects. We typically choose a somewhat-high fuel limit (1000?) to accept all human-written programs, even large ones; we never expect the fuel limit to be reached on natural code. This is the sort of fuel used in get_unboxed_type_representation.
One notion of fuel is used to avoid combinatorial explosions in traversal time or space, in situations where we have decided to approximate the final result. We typically choose a low fuel limit (10? less?), and we expect the limit to be reached in typical usage of the compiler. This is the sort of fuel used for wildcard explosion during pattern-matching typing in typecore.

(In both cases, tuning of the fuel limit is important for performance, even though the limits are very different.)

I think that we should be careful to distinguish those, and keep track of which is which. Ideally we would even use different names. For example we could use fuel only for the first situation (avoid non-termination), and cutoff or depth in the second situation (avoid combinatorial explosion).

I'm not sure that I understand, from quickly looking at the code, which of these two sorts of fuel is being added here. It may be that the fuel is doing both things at once. (The combinatorial-explosion fuel can always be used as termination-control fuel due to its low limit, but the converse is not true.) It might be enlightening to actually distinguish two kinds of fuel in the same process, to clarify the situation.

gasche · 2021-11-06T08:03:59Z

(cc @stedolan and @lpw25, who looked at #10479 and might by now have developed similar deviant interests in fuelology.)

lthls · 2021-11-06T08:56:28Z

The fuel variable might have been named depth here (so fits better with your second case). We know that without a limiting factor we could loop forever (i.e. on type 'a t = T of 'a * 'a t), but more importantly most of the time we're only interested in the "head" shape of the block so it's better to stop quickly (in addition, no sharing is done with these block shapes so we could end up allocating a lot if we went deeper).
The current code uses an initial value of 2, which means that we go one step further than strictly needed, but this could still be useful in cases like x.foo.bar, where no variable would hold the shape of x.foo.

gasche · 2021-11-06T09:33:08Z

Out of curiosity: why support x.foo.bar but not x.foo.bar.baz?

If we were serious about the property that we only care about "head" shapes, this could be made clear in the types (type value_kind = Flat of flat_kind | Block of .. * flat_kind list; or possibly polymorphic variants to avoid the Flat constructor), and then the code would be written without fuel/depth, but an auxiliary function that only computes flat shapes.
But if I understand what you are saying, having depth=2 is just a natural choice for the approximation, not a strong/structural property of the system, so maybe the current, more flexible presentation is better.

chambart · 2021-11-09T16:39:07Z

Is there a mother-analysis that could be implemented once, and then partially-erased to derive simpler approximations?

If it is possible for other analyses, I'm all for it. We don't need something much more complex than that for a first version, but if is there is something to expose the shape of all kinds of value, we can use that.

As to which fuel/depth value to use as @lthls said, it is a trade-off between cost and benefit. We don't expect many situations where more than a depth 2 would be useful, so it might not be worth doubling the cost for those rare cases. While I did notice some normal examples where 2 was required. So this is very heuristic. If the representation was shared in memory between multiple annotations, it would be ok to build something really deeper. Otherwise, for most situations depth 1 would be sufficient with also an annotation on the getfield, but that would not be always sufficient.
We might later want to increase that limit for some contexes. In particular to generate optimized versions of polymorphic comparisons.

The general type traversal is not going to be doable soon, so this is not an option right now. What would you suggest changing on that code for it to be ok ?

gasche · 2021-11-09T19:44:14Z

I suggested doing the "Explaining the code" part for now. (Note: I won't be able to do an actual review on the patchset before a few weeks, so please don't count on me for the approval.)

chambart force-pushed the more_value_kind_upstream_rebased branch from 9efe700 to 491fb39 Compare October 19, 2021 17:32

lthls mentioned this pull request Oct 21, 2021

Propagate more information on blocks #10452

Open

chambart added 8 commits November 3, 2021 11:57

Annotate blocks with some value_kind

3436677

Rebase fix: access transient repr

00d00d6

Update depend

b38bb6e

Update test references

70f4073

Prepare Changes

9e4ea2a

Handle cases where scrape keeps the Record_unboxed

8914930

For instance type t = { foo : t } [@@unboxed] let f { foo } bar = ()

Regression test for a bug with recursive unboxed types

8c2b39d

Update test reference after fix

98b0aed

chambart force-pushed the more_value_kind_upstream_rebased branch from 23af545 to 9e66b7b Compare November 3, 2021 10:57

chambart added 5 commits November 4, 2021 14:19

Update tmc test references

9de7394

Properly handle unboxed recursive variants

21174fd

Add a test for recursive unboxed variants

6e6e749

Change the value_kind test into an expect test

eba3922

Partially update Changes

c713b20

chambart force-pushed the more_value_kind_upstream_rebased branch from 7d3bf5d to c713b20 Compare November 4, 2021 13:19

chambart added 2 commits November 4, 2021 15:12

Fix recursion detection for value_kind

086e275

Remove the visited limit.

50979e2

No branch are cut by it that wouldn't be by the fuel limit

chambart force-pushed the more_value_kind_upstream_rebased branch from dfb181d to 50979e2 Compare November 4, 2021 15:46

Update .depend

cb03daa

garrigue reviewed Nov 6, 2021

View reviewed changes

lthls mentioned this pull request Jul 1, 2022

Optimise reconstruction of existing blocks in switches (flambda) #8958

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add value kinds on blocks #10716

Add value kinds on blocks #10716

chambart commented Oct 19, 2021

lthls commented Nov 5, 2021

garrigue left a comment

gasche commented Nov 6, 2021

gasche commented Nov 6, 2021 •

edited

gasche commented Nov 6, 2021

lthls commented Nov 6, 2021

gasche commented Nov 6, 2021

chambart commented Nov 9, 2021

gasche commented Nov 9, 2021

Add value kinds on blocks #10716

Are you sure you want to change the base?

Add value kinds on blocks #10716

Conversation

chambart commented Oct 19, 2021

lthls commented Nov 5, 2021

garrigue left a comment

Choose a reason for hiding this comment

gasche commented Nov 6, 2021

gasche commented Nov 6, 2021 • edited

Doing without fuel?

Factoring the code?

Explaining the code

Different kinds of fuel

gasche commented Nov 6, 2021

lthls commented Nov 6, 2021

gasche commented Nov 6, 2021

chambart commented Nov 9, 2021

gasche commented Nov 9, 2021

gasche commented Nov 6, 2021 •

edited