Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up GC by prefetching during marking #10195

Merged
merged 11 commits into from Aug 10, 2021
Merged

Conversation

stedolan
Copy link
Contributor

@stedolan stedolan commented Feb 3, 2021

This PR rewrites the core marking loop of the major GC, using prefetching to make better use of the processor's memory parallelism. This removes essentially all of the cache misses that occur during marking, speeding up GC.

On a microbenchmark with about 800MB of heap, the time for a Gc.full_major improves from about 1.8 seconds to 0.5 seconds.

Of course, most programs don't spend 100% of their time in GC, so improvements are not generally this dramatic. On the few programs it's been tested on, marking time is reduced by 1/3 - 2/3, leading to overall performance improvements of anywhere around 5-20%, depending on how much GC the program does. (More benchmarking is needed!)

I only expect to see any improvements on programs where the heap is much bigger than the cache. Programs that use less than a few tens of megabytes are unlikely to be improved much by this patch, although they shouldn't get slower.

Algorithm

The current marking algorithm is a pure depth-first-search: marking always proceeds with the object on top of the mark stack, and new blocks are pushed to the stack as they are found. Since the objects are unlikely to be in cache, the GC spends much of its time stalling on cache misses as it waits for the header of a new object to be loaded.

The algorithm in this PR uses a small (currently 256-entry) circular buffer as a queue in front of the mark stack, known as the prefetch buffer. During marking, the next object to be scanned is drawn from the prefetch buffer (if it has at least Pb_min elements), or popped from the mark stack if the prefetch buffer is empty or close to empty. When new pointers are discovered during marking, these are prefetched and enqueued in the prefetch buffer rather than being followed immediately. Blocks are only pushed to the mark stack when the prefetch buffer overflows.

This ensures that new objects scanned have generally been prefetched at least Pb_min steps ago, which means they are very likely to already be in cache.

Mark stack changes

This PR also contains a small change to the mark stack structure. It can already contain intervals, to indicate that an object is partially scanned, but those intervals are represented by a (value, offset) pair. This patch changes that to a (start, end) pair, where start = value + offset and end = value + Wosize_val(value). This saves a cache miss in determining Wosize_val(value) when traversing the tail end of a long array.

Configuration parameters

The testing and benchmarking of this patch has been done on Intel x86_64 processors with a non-default configuration:

./configure --disable-naked-pointers CC='gcc -Wa,-mbranches-within-32B' AS='as -mbranches-within-32B'

The patch should work with naked pointers enabled, although no optimisation work has been done in this configuration and I expect performance to be much worse.

The -mbranches-within-32B option is a workaround for a bug in most recent Intel processors, the Intel JCC Erratum. Intel's microcode workaround for this bug introduces a performance issue in instruction decoding when branch instructions straddle 32-byte boundaries. Most programs are not seriously affected by this as instruction decoding is rarely the bottleneck. However, as the prefetching GC loop generally hits cache, instructions execute quickly (approx 2.9 of them per cycle) and it is possible to become performance-bound by decoding. On the markbench.ml microbenchmark above, removing the -mbranches-within-32B option had a cost of up to 20%. (The numbers for trunk use the same configuration, although it is not as strongly affected).

(We should perhaps consider enabling -mbranches-within-32B by default. It has a 1-2% code size penalty and a performance improvement that's usually irrelevant and occasionally dramatic. It's annoying to ask AMD users to pay the code size penalty for Intel's mistake, though).

Remaining work

  • Code cleanup There's some commented out test / debug code in this patch, and not much commenting otherwise. It needs some cleanup before merging, but this version should be good for initial discussion / testing.
  • Benchmarking This has not yet been benchmarked on interestingly-large programs.
  • Forward_tag optimisation This patch does not currently implement the Forward_tag lazy short-circuiting optimisation, which is somewhat tricky to fit with the prefetching style. Very preliminary testing has suggested that this optimisation might not be very important, as most lazy values seem to be forced early (and so caught by the minor GC's short-circuiting) or not forced at all. More benchmarking is needed, either to show that this optimisation doesn't matter much or to show that it can be implemented efficiently.

[The design and initial implementation of this patch was in collaboration with Will Hasenplaugh, remaining bugs are my own]

@xavierleroy
Copy link
Contributor

This sounds extremely promising! Thank you!

I'm not the most competent reviewer for this kind of code. Nonetheless I had a look at the diff, and I was put off by lots of commented-out code, and unguarded uses of __builtin_prefetch (we now have caml_prefetch defined in caml/misc.h). Also, all CI checks are failing. So, could you please make this PR ready for review and tell us?

@stedolan
Copy link
Contributor Author

stedolan commented Feb 8, 2021

I'm not the most competent reviewer for this kind of code. Nonetheless I had a look at the diff, and I was put off by lots of commented-out code, and unguarded uses of __builtin_prefetch (we now have caml_prefetch defined in caml/misc.h). Also, all CI checks are failing. So, could you please make this PR ready for review and tell us?

Hmm, there was more of that than I remembered! I've deleted the commented-out code and removed the cause of the CI failures (*), so this should be ready to review now. I ran into #10203 when trying to benchmark, though, so you'll have to wait a bit longer for sandmark numbers.

(*) the issue was an assertion that all zero-length blocks are atoms, which isn't actually true because of how ocamlopt statically allocates empty arrays.

runtime/major_gc.c Outdated Show resolved Hide resolved
Copy link
Contributor

@gadmm gadmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say that with naked pointers you expect performance to be much worse, I understand that you might mean that you just have not spent time on optimising it yet, but for what it is worth here is a comment about that case.

}
caml_prefetch(Hp_val(v));
caml_prefetch(&Field(v, Queue_prefetch_distance - 1));
pb[(pb_enqueued++) & Pb_mask] = v;
Copy link
Contributor

@gadmm gadmm Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For naked pointers, what about:

  • Prefetching and queuing the naked pointer here (replacing Is_major_block with Is_block_and_not_young (which is cool bit hacking by the way))
  • Additionally prefetching the page table entry
  • Checking Is_in_heap when popping the block off the queue at the start of the loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page table will be removed soon. Please don't make this code more complex because of it.

uintnat min_pb = Pb_min;
struct mark_stack stk = *Caml_state->mark_stack;

uintnat young_start = (uintnat)Caml_state->young_start;
Copy link
Contributor

@gadmm gadmm Feb 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be + 1 ? The definition of Is_young gives Is_young(caml_young_start) == 0, and although unlikely, caml_young_start - 1 is allowed to contain the header of a 0-sized block with a black gc tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right! Nice catch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be + sizeof(header_t) or if you want + Whsize_wosize (0).

@stedolan
Copy link
Contributor Author

When you say that with naked pointers you expect performance to be much worse, I understand that you might mean that you just have not spent time on optimising it yet, but for what it is worth here is a comment about that case.

I mean both that I have not spent time on optimising it, and that I think performance won't be great even with time spent optimising it. The code here often exhausts the available memory parallelism (on the Skylake core I was testing on), so doing more memory accesses is going to take more time, regardless of how well they're prefetched.

Having said that, I think the prefetching strategy you describe is the right one (at least for 64-bit targets, 32-bit ones have a different pagetable).

@gadmm
Copy link
Contributor

gadmm commented Feb 18, 2021

I meant for both page tables: for the 32-bit page table, let us hope that the first level is in cache already (the size of the first level is 512 words and it might not be randomly distributed), and only prefetch the second level. I am guessing that this would be good enough. Do you have another strategy in mind?

About exhausting memory parallelism (please correct me if I say anything false, I do not consider myself an expert), I am wondering whether you are already doing something against TLB misses. They can be a bottleneck for memory parallelism, and OCaml programs with default settings are susceptible to them since the runtime will not use huge pages. I tested this on your program without the prefetching optimisation: with default settings (in particular transparent hugepages set to enabled=madvise) I get 1,75% of dTLB load misses, against 0,64% with THP enabled=always, and 0,01% with OCAMLRUNPARAM=H. This has a noticeable effect on the gc duration (2.707 s/gc vs. 2.461 s/gc vs. 2.285 s/gc, respectively). Furthermore, it seems that more recent processors tend to offer more memory parallelism than Skylake, but only if your bottleneck is not the TLB.

For reference, THP enabled=always gives a speed-up of about 3% on memory-intensive Coq benchmarks.

(OCAMLRUNPARAM=H requires that you pre-allocate 500 huge pages of size 2MB with hugeadm, which is likely to fail if your computer was not started recently. A simpler solution is to use jemalloc with the option thp:always, which benefits the whole allocated memory and not just the major heap, outperforms the H options in terms of TLB misses (0,00%) and still gives good results when the memory is fragmented (0,20% TLB misses vs. having to reboot). For reference see the command lines I used and the results.)

@stedolan
Copy link
Contributor Author

Prefetching helps with TLB performance as well. Prefetching allows the processor to handle more memory requests in parallel, which include page walks if necessary.

As you say, hugepages will improve performance due to better TLB utilisation, but I think that's independent of the changes here. Below are some numbers from my laptop, where it seems that hugepages have a smaller effect on the prefetching code than on trunk. (This could be explained by the prefetching code managing to overlap more TLB misses than trunk does) Times are noisy, so only 2 sig figs. Some of the trunk times have even more variance than that.

trunk:

dTLB misses time
THP enabled=madvise 1.1% 3.8 s/gc
THP enabled=always 0.4% 3.4 s/gc
jemalloc thp:always 0.2% 3.3 s/gc

prefetching:

dTLB misses time
THP enabled=madvise 0.5% 0.60 s/gc
THP enabled=always 0.2% 0.56 s/gc
jemalloc thp:always 0.02% 0.56 s/gc

Furthermore, it seems that more recent processors tend to offer more memory parallelism than Skylake, but only if your bottleneck is not the TLB.

Interesting! Do you have a reference? (To be specific, the detail I'm most interested in is the number of L1 line fill buffers, as these tend to be the limiting factor on random accesses from a single core. More memory parallelism elsewhere in the system doesn't help GC much, as it can be difficult to saturate using non-sequential accesses from a single core)

@gadmm
Copy link
Contributor

gadmm commented Feb 22, 2021

What your figures show corresponds indeed to what I understood with "exhausts the available memory parallelism". I understood it to mean: there is just enough memory parallelism without the page table, but by chance there is nothing spared for prefetching the page table. According to this, even with the strategy I proposed, we should expect it to be noticeably slower with the current page table than without. But given that you did not seem to do anything special for the TLB misses, I expect the combination prefetching+page table+THP to be anywhere between no gain compared to without THP (the TLB was not a bottleneck) to having same performance as without the page table (the TLB was the bottleneck, and prefetching removes the cost of the page table). To test the claim that performance will not be great with the page table, we need to make sure to run numbers with huge pages (which is just a question of using jemalloc with LD_PRELOAD as we have seen). Unless I misunderstood and you meant something completely different.

Another implicit question was whether with more memory parallelism you could tweak the parameters to make it go faster, in no-naked-pointers mode, but your answer implicitly seems to say no.

Interesting! Do you have a reference?

As a non-specialist, I enjoyed Lemire's blog, see for instance 1,2,3 for the claims in question. You can easily find other experiments like this showing increased MLP at L1 level for M1 and Zen2.

Regarding your suggestion of adding -mbranches-within-32B, since this is relevant for performance in this patch, how about going ahead and proposing the change (here or in a separate PR?).

@xavierleroy
Copy link
Contributor

I understood it to mean: there is just enough memory parallelism without the page table, but by chance there is nothing spared for prefetching the page table.

Just to make sure we're on the same page (ha ha): the naked-pointers mode and its companion page table are going to disappear, so new improvements to memory management like the one in this PR does not need to apply to naked-pointers mode. Actually, the less it applies to naked-pointers mode the better, as a performance incentive to go with the no-naked-pointers mode.

(Likewise, 32-bit target architectures are less and less relevant, and OCaml might stop supporting them at some point in the future, so new improvements to memory management should squarely focus on 64-bit architectures.)

On the other hand, analyzing the performance improvements of this PR and the hardware limitations it can run into is fine, of course.

Re: huge pages, that's the matter for another PR. I agree the current support for HP in the OCaml runtime system is not very usable. If jemalloc can help, that's good.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review with a few comments.

#if defined(__GNUC__)
#define CAMLnoinline __attribute__ ((noinline))
#elif defined(_MSC_VER)
#define CAMLnoinline __declspec(noinline)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for @dra27: is this going to work on all supported versions of MSVC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It even works on some unsupported ones! (Visual Studio .NET 2002 was when it was introduced; a fact you will I hope be pleased to know I did not have from memory...)

uintnat min_pb = Pb_min;
struct mark_stack stk = *Caml_state->mark_stack;

uintnat young_start = (uintnat)Caml_state->young_start;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be + sizeof(header_t) or if you want + Whsize_wosize (0).

struct mark_stack stk = *Caml_state->mark_stack;

uintnat young_start = (uintnat)Caml_state->young_start;
uintnat half_young_len = ((uintnat)Caml_state->young_end - (uintnat)Caml_state->young_start) >> 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using rotate1 here too would make it clearer?

runtime/major_gc.c Outdated Show resolved Hide resolved
runtime/major_gc.c Show resolved Hide resolved
/* Already black, nothing to do */
continue;
}
// FIXME work accounting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything missing for work accounting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but I remember not fully understanding the existing work accounting logic, so this is a note to myself to go back over and compare the two to make sure they count words the same way.

}
} else if (work <= 0 || stk.count == 0) {
if (min_pb > 0) {
// drain pb before quitting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand what's going on, you're draining pb by pushing everything back to the stack. Wouldn't it be faster to do it explicitly here with a small loop rather than go through the generic code below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly more complicated than that. If we reach this code because work <= 0, then you're right.

But if we reach this code because stk.count == 0, we have not yet reached the end. It's possible that there's some bottleneck in the heap layout, and the entire rest of the heap is accessible only through a half-dozen pointers, all of which happen to be in the prefetch buffer leaving the mark stack empty.

So, we should switch to a mode where we continue marking, but try to drain the prefetch buffer. The most likely outcome is that we do a couple more iterations before hitting this code again with min_pb == 0 and quitting. However, it's also possible that we discover a huge amount more marking work by following the pointers in the prefetch buffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the flow would be easier to follow if you separated the two cases:

if (work <= 0){
  // push everything back to stack and quit
}else if (pb_enqueued > pb.dequeued + min_pb || stk.count == 0 && pb_enqueued > bp_dequeued){
  // Dequeue from prefetch buffer
}else if (stk.count > 0){
  // take something from the stack
}else{
  // we're done
}

Just a suggestion. If you don't like it, I won't insist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @damiendoligez, that would be much clearer that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at this, but I don't currently see any easy way to make this change without duplicating lots of scanning logic for the push everything back to stack and quit path. This path isn't trivial: entries in the prefetch buffer are possibly-unmarked headers, while entries in the mark stack are intervals needing scanning. Pushing an entry from the prefetch buffer to the stack requires darkening, and dealing with No_scan_tag, Infix_tag and closure offsets.

@stedolan
Copy link
Contributor Author

stedolan commented Mar 1, 2021

@gadmm:

What your figures show corresponds indeed to what I understood with "exhausts the available memory parallelism". I understood it to mean: there is just enough memory parallelism without the page table, but by chance there is nothing spared for prefetching the page table. According to this, even with the strategy I proposed, we should expect it to be noticeably slower with the current page table than without. But given that you did not seem to do anything special for the TLB misses, I expect the combination prefetching+page table+THP to be anywhere between no gain compared to without THP (the TLB was not a bottleneck) to having same performance as without the page table (the TLB was the bottleneck, and prefetching removes the cost of the page table). To test the claim that performance will not be great with the page table, we need to make sure to run numbers with huge pages (which is just a question of using jemalloc with LD_PRELOAD as we have seen). Unless I misunderstood and you meant something completely different.

Another implicit question was whether with more memory parallelism you could tweak the parameters to make it go faster, in no-naked-pointers mode, but your answer implicitly seems to say no.

I think there are some misconceptions here: with more memory parallelism this code should go faster, with no tweaking of parameters required. I'll try to explain why.

The processor's memory subsystem accepts requests to read memory, and after some delay it produces their contents. This delay ranges from 3-4 cycles (L1 hit) to several hundred cycles (main memory access). (In extreme cases, it can be even longer, if there is a TLB miss that causes a page walk to main memory as well as a main memory access).

The memory subsystem can make useful progress on several requests simultaneously, known as "memory level parallelism" or MLP. This is what Lemire is measuring in the blogpost you linked. Most contemporary x86 processors have maximum MLP around 10, some newer ones have more. (MLP is a bit more complicated than just one number: max MLP varies depending on which caches are involved, and some resources are shared between cores while others are not. But one number is enough for here).

As an example, if you issue 20 load instructions on cycles 1 through 20, on a machine with max MLP of 10 and a 300-cycle latency to main memory, then you would expect all of the loads to be completed by about cycle 600: the memory subsystem will accept the first 10 very quickly and begin the loads, but instructions 11-20 will have to queue.

However, most programs do not issue a load instruction on each cycle, and so much of the available MLP often goes unused. Processors have a number of tricks to try to use more MLP: for instance, they will speculate through branches and begin performing loads before knowing for certain which way the branch goes, and they will detect sequential accesses and read ahead of the program. Unfortunately, none of these tricks are particularly effective with GC marking, which does a lot of unpredictable pointer chasing. Pointer chasing is particularly bad for MLP: the address of the next word to be examined is known only when the previous load returns, forcing things to be sequential.

The prefetching GC in this PR works differently. It immediately issues up to several hundred load instructions, through software prefetch instructions. (The exact number varies. The buffer size is 256, but it's not always full, and even when full many values are not pointers). The number of issued requests vastly exceeds the maximum MLP available on the processor, so most of these requests queue. This tries to ensure that the memory subsystem is always working at full capacity: whenever a load instruction returns data, there is always another one queued ready to go.

This should transparently make use of larger MLPs available on other processors. (Anyone have an M1 Mac lying around to test with? I haven't tested on a processor with more than 10). In some years' time when processors with MLP in the hundreds exist, we should perhaps consider enlarging the prefetch buffer, but 256 should be enough for now.

@gadmm
Copy link
Contributor

gadmm commented Mar 1, 2021

@stedolan Thanks for the detailed explanation of how your patch uses MLP. This clarifies to me your comment about exhausting memory parallelism. What your figures suggest is indeed that in practice, we should expect diminished returns of working on huge pages support after this PR (whereas in theory there can be an effect). If I manage to have an AMD Zen2/3 in the near future, I will try to reproduce your test of the TLB bottleneck with your patch (people with M1s need not bother since it works with 16KB pages and does not support larger pages).

@xavierleroy
Copy link
Contributor

Thanks for the explanations on MLP. One naive question:

The prefetching GC in this PR works differently. It immediately issues up to several hundred load instructions, through software prefetch instructions. (The exact number varies. The buffer size is 256, but it's not always full, and even when full many values are not pointers). The number of issued requests vastly exceeds the maximum MLP available on the processor, so most of these requests queue.

What does "queue" means? Is is possible for the program to stall because it is issuing too many prefetch requests and exceeding the maximum available MLP? If a stall is possible, isn't that strictly worse than prefetching less or not prefetching at all? I'm confused.

@stedolan
Copy link
Contributor Author

stedolan commented Mar 2, 2021

@gadmm I'm still interested in M1 benchmarks - not because of TLB issues, but because I believe the M1 has higher MLP than the machines I've been testing on.

@xavierleroy Yes, this GC sometimes stalls from exceeding the maximum available MLP. This is a good thing.

Fundamentally, marking is a process does many memory accesses, most of which miss cache. Only a trivial amount of computation is done to each word of memory touched, so the process is limited by memory performance, not CPU. One way or another, the CPU is going to spend some time stalled, waiting for memory.

When trunk stalls, it generally stalls while scanning field i of an object A, which points to object B, whose header is not in cache. The hardware continues to speculate past the stall, and will often predict that field i+1 of A is next to be read, and begin fetching that too. However, this is limited to fields of the same object: the address of the next object to be scanned depends on the result of reading B's header, which the processor cannot know (and does not attempt to guess) until the load of B has returned. So, during a stall, there are at most a couple of active memory loads, to the objects in fields i, i+1 (and maybe beyond, but not far beyond). The memory subsystem is underutilised, because it can handle 10 (or more...) active loads.

When the prefetching GC stalls, it is generally not during loads, which tend to hit in L1 cache (because they were prefetched). However, it is continually issuing prefetches for memory that will be needed later, and sometimes we run into the max MLP and stall. During this kind of stall, there are as many active loads as possible, so this stall only occurs when the memory subsystem is fully utilised.

In other words, stalling due to hitting maximum MLP is the ideal state for a marking loop: it means the memory accesses have been scheduled well enough to fully saturate the memory subsystem, at low enough bookkeeping overhead that the CPU has nothing to do but wait for the result.

Here are some performance counter results for trunk vs. prefetching. These results should be taken with a pinch of salt: first, they're from the markbench.ml program posted above, which is something of an ideal case for prefetching, and second, they don't separate the heap-initialisation phase from the actual GC work, so are a bit noisy.

Trunk:

  L2 accesses (unit: number of cacheline transfers):
       162,451,158      l2_rqsts.demand_data_rd_miss
       597,006,374      l2_rqsts.pf_miss
       138,159,976      l2_lines_out.useless_hwpf


  Stalls (unit: cycles spent while stalled):
    21,937,001,731      cycle_activity.stalls_l2_miss
    29,481,074,997      cycle_activity.stalls_total
    27,324,990,838      cycle_activity.stalls_l1d_miss
    28,639,113,821      cycle_activity.stalls_mem_any
    17,972,511,169      cycle_activity.stalls_l3_miss
        73,101,333      l1d_pend_miss.fb_full
    33,092,457,045      resource_stalls.any

  11.569266782 seconds time elapsed

Prefetching:

  L2 accesses (unit: number of cacheline transfers):
         5,666,881      l2_rqsts.demand_data_rd_miss
       534,525,411      l2_rqsts.pf_miss
         6,667,602      l2_lines_out.useless_hwpf

  Stalls (unit: cycles spent while stalled):
       540,698,446      cycle_activity.stalls_l2_miss
     4,489,759,835      cycle_activity.stalls_total
       969,167,512      cycle_activity.stalls_l1d_miss
     2,351,860,341      cycle_activity.stalls_mem_any
       344,528,248      cycle_activity.stalls_l3_miss
     1,622,623,969      l1d_pend_miss.fb_full
     4,430,692,002      resource_stalls.any

  3.702075640 seconds time elapsed

Stalls due to max MLP come under l1d_pend_miss.fb_full - on this Skylake processor, the limiting factor in MLP is the number of L1 line fill buffers available. Note that these essentially never happen with trunk. Despite that, the prefetching GC spends a lot less time stalled, and in particular has massively fewer stalls_{l1d,l2,l3}_miss - that is, load instructions that stall due to missing cache.

@gadmm
Copy link
Contributor

gadmm commented Mar 2, 2021

Of course @stedolan, this is not what I meant. Thanks again for your explanations.

@xavierleroy
Copy link
Contributor

I understand you want to maximize MLP. But I still cannot convince myself this will minimize GC time, which is what we actually care about.

An hypothetical scenario: so many locations are prefetched so much in advance (at the cost of multiple stalls) that by the time an actual load is done, the corresponding prefetched data was already flushed out of cache and has to be fetched again.

Another hypothetical scenario: by issuing so many memory requests, the OCaml process is going to starve other processes executing on the same core (e.g. via hyperthreading), or even on other cores.

A third hypothetical scenario: by issuing so many memory requests, the OCaml process is consuming lots of energy and draining your batteries faster.

In other words: an ideal prefetching policy is one that makes sure all loads hit L1 cache, not one that maximizes MLP.

@stedolan
Copy link
Contributor Author

stedolan commented Mar 4, 2021

I understand you want to maximize MLP. But I still cannot convince myself this will minimize GC time, which is what we actually care about.

OK, I'll try going through your scenarios.

An hypothetical scenario: so many locations are prefetched so much in advance (at the cost of multiple stalls) that by the time an actual load is done, the corresponding prefetched data was already flushed out of cache and has to be fetched again.

This is a real concern, and is one of two factors controlling Pb_size, the size of the prefetch buffer. This controls how far ahead of the GC the prefetching happens. It should be big enough to get a reasonable MLP and hide latency, but also not so big that by the time the actual loads are done the data is flushed out of cache.

On most x86 machines of the last decade, L1 is 32KB consisting of 512 * 64byte lines and 10-20 active operations is sufficient to get good MLP. So Pb_size should be somewhere between maybe 20 and 500.

In this patch it is currently 256. If you set it to, say, 2^16, then you will definitely see the issue you describe.

Another hypothetical scenario: by issuing so many memory requests, the OCaml process is going to starve other processes executing on the same core (e.g. via hyperthreading), or even on other cores.

If this mattered, it would already occur in code that naturally achieves high MLP, such as naively summing an array of boxed integers.

A third hypothetical scenario: by issuing so many memory requests, the OCaml process is consuming lots of energy and draining your batteries faster.

This strategy does not issue any more memory requests than the current GC (assuming, as above, that you haven't set Pb_size to a huge value). It just does a better job of parallelising them.

The power management strategy on most modern processors is "race to idle": since the idle states consume so much less power than the active state, the best strategy is to get the work done as quickly as possible so that you can get back to the idle state. In this model, the most power-efficient approach is whatever's fastest.

In other words: an ideal prefetching policy is one that makes sure all loads hit L1 cache, not one that maximizes MLP.

I prefer your earlier phrasing: the goal is to minimize GC time.

The marking loop is limited by memory performance, since its bookkeeping overhead is low. It has a fixed number of memory accesses to do, assuming again that Pb_size isn't so large that we do useless ones. The variables we care about are then related by:

GC time = (number of memory accesses) * (average memory latency) / (average memory-level parallelism)

Since we do not control the number of memory accesses needing to be done nor their latency, then minimising GC time is the same as maximising MLP.

struct mark_stack stk = *Caml_state->mark_stack;

uintnat young_start = (uintnat)Caml_state->young_start;
uintnat half_young_len = ((uintnat)Caml_state->young_end - (uintnat)Caml_state->young_start) >> 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check-typo says this line is too long.

Suggested change
uintnat half_young_len = ((uintnat)Caml_state->young_end - (uintnat)Caml_state->young_start) >> 1;
uintnat half_young_len =
((uintnat)Caml_state->young_end - (uintnat)Caml_state->young_start) >> 1;

@dra27
Copy link
Member

dra27 commented Mar 4, 2021

Out of curiosity, I just tested this on msvc64 on a Threadripper 3990X... @stedolan's markbench program built with ocamlopt (averaged over 5 runs, FWIW):

  • On 9f804a2 (I built an old branch by mistake): 2.129s
  • On 5b9789c (this PR's base): 2.368s
  • On 8c3c478 (this PR's present HEAD): 2.709s
  • With #define caml_prefetch(p) _mm_prefetch(p, 3): 0.732s

The caml_prefetch change for MSVC could go in this PR, but I need to work out how to do it properly first (I'm not sure 3 means the same thing here, but it's too late to stare at dissassemblies...)

The two slow-downs are worrying.

@stedolan
Copy link
Contributor Author

stedolan commented Mar 8, 2021

I can reproduce the difference between the first two even on linux amd64, looking into it now. I'm surprised by the large slowdown to this PR with broken prefetching - there's definitely some overhead for the prefetching bookkeeping if no prefetching is being done, but I'm surprised it's that large. I'll have a look.

@stedolan
Copy link
Contributor Author

stedolan commented Mar 9, 2021

The difference between 9f804a2 and 5b9789c is explained by #9756 and #9951: #9756 made it go ~45% faster, and then #9951 made it go ~65% slower, leading to a ~20% slowdown overall.

(Don't read too much into the 65% number: this is a microbenchmark. #9951 replaced an optimisation with a different version, but the old version happened to work particularly well on this benchmark).

I'm not sure what's going on between 5b9789c and this PR. One issue is that 5b9789c isn't actually the base commit of this PR, so it's possible that the slowdown was introduced on trunk since 5b9789c. In my testing (limited to x86_64 Linux on Intel), this PR with caml_prefetch disabled has generally outperformed trunk, so I'd like to see if the result persists when testing against 8a90546 or recent trunk.

@dra27
Copy link
Member

dra27 commented Mar 9, 2021

Today:

  • On 8a90546 (this PR's actual base!): 2.326s
  • On 8c3c478 (this PR's present HEAD): 2.765s

And then with --disable-naked-pointers (sorry about that...):

  • On 9f804a2: 1.806s
  • On 8a90546: 2.054s
  • On 8c3c478: 2.275s
  • With #define caml_prefetch(p) _mm_prefetch(p, 3): 0.633s

@dra27
Copy link
Member

dra27 commented Mar 9, 2021

I did the same with mingw-w64:

Which hints that the slowdown is down to the C compiler

Copy link
Contributor

@jhjourdan jhjourdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this PR looks good. I have a few comments, see bellow.

I am slightly worried about the Forward_tag optimization. I don't buy the argument that most lazy values get forced soon. It depends much on the kind of program. If we decide to get rid of this optimization in the major heap, then we should have strong benchmarks showing that there is no significant performance loss.

One possibility to keep this optimization with the prefetching buffer is to store pointers to values instead of values in the prefetch buffer. This means that we do one more dereferencing per pointer in the heap, but we an hope that this is a cache hit, so this should really be cheap.

runtime/caml/major_gc.h Show resolved Hide resolved
(Tag_val(v) >= No_scan_tag || !Is_black_val(v))) {
v = Val_hp(Op_val(v) + Wosize_val(v));
}
me.start = Op_val(v);
Copy link
Contributor

@jhjourdan jhjourdan Mar 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point, it may be the case that v == end + 1 and that end is the end of the chunk. In this case, aren't we dereferencing a pointer outside of the chunk when we are looking for the header of v?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If v is one past the end of the chunk, then the header of v is the the last word of the chunk. Did I misunderstand your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was misleading. As I explained above, v could be two past the end of the chunk. Then, when we read the header, we read one past the end of the chunk.

Anyway, think about blocks: when we have finished scanning the chunk, then v points to a block which is entirely outside of the chunk, including its header. So, we are effectively reading a header which is outside of the chunk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the reason we are not getting a segfault from this is the padding that caml_stat_alloc_aligned adds.

runtime/caml/major_gc.h Show resolved Hide resolved
}

p += Whsize_hp(Hp_op(p));
v = Val_hp(me.end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If me.end is the end of the chunk, then it seems like this creates a pointer more than one-past the end of the chunk, which is UB. (Well, ok, it is unlikely that the compiler will optimize this in such a way that this triggers a bug).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I'll change this to work on header_t* instead of value.

while (me.end <= end) {
value v;
if (stk->count < stk->size/4) {
stk->stack[stk->count++] = me;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight change in the logic here: the previous code called mark_stack_push while we are pushing manually here.

There are two differences:
1- We do not perform here the "Some(42)" optimization
2- Work accounting is not taken into account

Could you comment on why you decided to drop these two features here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of (1) is an oversight.

For (2), I think the work accounting here was wrong? It looks like trunk takes credit twice for marking the same object if there is a mark stack overflow. (The GC pacing logic relies on there being a fixed amount of work, determined by the shape of the heap at the time marking started).

Copy link
Contributor

@jhjourdan jhjourdan Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damiendoligez, can you confirm the previous implementation was wrong wrt. work accounting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the previous implementation is wrong. It takes credit for the first few words, then pushes the block to the stack. If the stack overflows and the block is evinced, it will take credit again for these few words.

runtime/major_gc.c Outdated Show resolved Hide resolved
Comment on lines +649 to +698
mark_entry m = stk.stack[--stk.count];
scan = m.start;
obj_end = m.end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we will be reading the block while it does not come from the prefetch buffer (and hence it might not be in the cache). It seems like the heuristic here is that we expect blocks at the top of the stack to be in the cache. Is that the idea? Is this something that you made experiments about? Could you please add a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is that the prefetch buffer is at its minimum size, so its entries are likely to be "not yet fetched". So we'll take a cache hit anyway. Then it makes sense to get something from the stack instead of prematurely emptying the prefetch buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @damiendoligez says. (And yes, I benchmarked this)

Comment on lines +679 to +738
*Caml_state->mark_stack = stk;
realloc_mark_stack(Caml_state->mark_stack);
stk = *Caml_state->mark_stack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Caml_state->mark_stack = stk;
realloc_mark_stack(Caml_state->mark_stack);
stk = *Caml_state->mark_stack;
realloc_mark_stack(&stk);

Or perhaps you do not want to do that because you want to guarantee that the compiler place stk in registers? If so, then this is worth a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, most of the variables in this function are about using registers. I'll add some comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I suspect that a good compiler could figure out that &stk is not escaping and hence keep this in a register, but this is not robust anyway.

Could you use the same pattern in mark_stack_push, and get rid of the parameter of realloc_mark_stack and mark_stack_prune ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any compiler that would make such a change without inlining everything.

We certainly could and should use the same pattern in mark_stack_push, but again I'd prefer to leave the ephemeron codepath as is in this patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any compiler that would make such a change without inlining everything.

Sure. But C/C++ compilers tend to inline much these days.

We certainly could and should use the same pattern in mark_stack_push, but again I'd prefer to leave the ephemeron codepath as is in this patch.

As you prefer. But this conflict is not going to be so complicated to fix. And if we don't do this change here, we will probably forget it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a TODO comment?

Comment on lines +413 to 436
/* auxiliary function of mark_ephe_aux */
Caml_inline void mark_ephe_darken(struct mark_stack* stk, value v, mlsize_t i,
int in_ephemeron, int *slice_pointers,
intnat *work)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, the in_ephemeron parameter should be removed and the code simplified accordingly, since this function is now only called from the ephemeron logic.

In addition, I wonder whether it would be possible to avoid having two implementations for essentially the same thing. Do you think we could call do_some_marking from mark_ephe_aux?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to change the ephemeron path in this PR, because there are a number of known bugs being worked on in that code, and I don't want to cause conflicts with #9424. Once both are merged, we can do some cleanup.

work--;

CAML_EVENTLOG_DO({
slice_fields++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer incremented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new version this is now incremented much more often, maybe this user-facing change (fix) deserves to be logged in the changelog.

@stedolan
Copy link
Contributor Author

Thanks for the review! I'll have a look at this today.

@stedolan
Copy link
Contributor Author

I am slightly worried about the Forward_tag optimization. I don't buy the argument that most lazy values get forced soon. It depends much on the kind of program. If we decide to get rid of this optimization in the major heap, then we should have strong benchmarks showing that there is no significant performance loss.

One possibility to keep this optimization with the prefetching buffer is to store pointers to values instead of values in the prefetch buffer. This means that we do one more dereferencing per pointer in the heap, but we an hope that this is a cache hit, so this should eally be cheap.

I was thinking along similar lines, but in order to decide whether this optimisation is worthwhile we really need some programs to test with. This is where I got stuck: I have yet to find anything that puts more than a couple of thousand Lazy values on the major heap. Do you have anything in mind?

@stedolan
Copy link
Contributor Author

stedolan commented Aug 3, 2021

I just re-ran some benchmarks, comparing the current version to the original. It turns out that one of the refactorings has introduced a slowdown, causing about ~10% more instructions to be executed (and a similar, but more variable, slowdown in time). I've restored the old loop logic (the difference is about which order various tests occur in), as it is measurably faster. It's a bit less readable, but at least it's commented this time.

@stedolan
Copy link
Contributor Author

stedolan commented Aug 9, 2021

This has now been "approved" for about 3 weeks, so unless someone strenuously objects I'm going to hit "merge" later today.

@gadmm
Copy link
Contributor

gadmm commented Aug 9, 2021

I am finishing up my analysis of the inner loop which I intend to write later at the previously-mentioned link, with a very small improvement/simplification to propose regarding Is_block_and_not_young (I was redoing benchmarks after the last performance change). Here is what I found.

Essentially, instead of a single test using the rotation trick (even if it is cool), I am testing it in two separate conditions Is_block(v) && Is_not_young(v), where Is_not_young performs the fast range check but without the rotation. It goes faster in both synthetic and real-world benchmarks (it is hard to know by how much due to code layout effects, which I have tried to control the best I could following Stephen's advice, but it is not more than 1.5% of the marking time—at best the rotation trick does not improve performance). Your synthetic benchmark shows Is_block_and_not_young going faster but this is because the heap has a very regular pattern and so it does not show the effects of branch mispredictions. I tested OCaml and Coq runs.

I understand the result as follows: in real programs, Is_block_and_not_young and Is_block will cause a lot of branch mispredictions (they are false only 15-40% of the time in the real programs I tested, with very poor predictability). To backtrack from a branch misprediction, the faster the condition is computed the better, so Is_block performs better (2 cycles less of latency). On the other hand Is_not_young is by far almost always true, and thus it is free thanks to branch prediction and instruction-level parallelism. (Another observed situation with Coq is short slices with 60-80% of immediates and good prediction, in which case Is_block should also win. I have asked Coq developers whether they weren't needlessly scanning large arrays of immediates somewhere.)

The improvement is small but in any case it simplifies the code.

The fact that a separate Is_not_young check is free thanks to instruction-level parallelism also explains why it is possible to replace it with an optimised page table check that skips the marking of statics, without suffering from a performance penalty, as I found, contrary to initial claims.

Thank you to Stephen for his help (and patience) in private conversation during the review!

@lpw25
Copy link
Contributor

lpw25 commented Aug 10, 2021

My understanding is that @gadmm's comment is about a potential further improvement. That can be done in a later PR, so I'm going to click merge. Feel free to keep discussing the improvement here though.

@lpw25 lpw25 dismissed jhjourdan’s stale review August 10, 2021 12:51

My understanding is that this has been addressed

@lpw25 lpw25 merged commit 1dc70d3 into ocaml:trunk Aug 10, 2021
stedolan added a commit to stedolan/ocaml that referenced this pull request Aug 10, 2021
Speed up GC by prefetching during marking

(cherry picked from commit 1dc70d3)
@gadmm
Copy link
Contributor

gadmm commented Aug 10, 2021

Sure @lpw25 (but it creates more work to open a PR just for this). Let me know @stedolan if you are interested in the improvement/simplification.

For the curious here are some benchmarks. (I offer it without interpretation. @stedolan asked for benchmarks on interestingly-large programs, but for an interpretation it is better to do it with all the details elsewhere.)

  • Before: OCaml 4.12 plus the fix to work accounting
  • After: this PR together with the patch to prefetch during sweeping, rebased on top of 4.12
  • PT: with page table
  • NNP: with --disable-naked-pointers
  • After, PT (1): what is now in OCaml 4.14
  • After, PT (2): the prefetching strategy for the page table that I suggested, which I also tested and which I can propose for review if you let me know you are interested after all.

Things measured:

  • pace (ns/w): time spent inside the marking loop to perform one unit of work (i.e. decide that one word is still live this major cycle).
  • total time (mm:ss): wall-clock time of opam running; this measurement also benefits from the patch to prefetch during sweeping (take into account that there is a lot of fluctuation with this measure)
  • latency (ms): contribution of the marking loop in a mark slice to the tail latency (also fluctuates)

The test is to install some ocaml or coq packages with opam, which then runs OCaml (in the first case) or Coq (in the second case). This is with 2 physical cores and 2 jobs; the gains with all logical cores busy are likely to be lesser (since hyperthreading helps keep the physical cores busy during stalls), which would be another interesting benchmark to run.

OCaml, compiling OCaml, dune, lablgtk3 and couple of smaller packages

Throughput pace (ns/w) rel. (%) total time rel. (%)
Before, PT 18.6 4:09
Before, NNP 18.0 -3% 4:08 -0.4%
After, PT (1) 5.5 -70% 3:12 -23%
After, PT (2) 4.9 -74% 3:11 -23%
After, NNP 4.2 -78% 3:09 -24%

Latency is not an interesting measure for OCaml itself, and the following table aggregates the measures of different programs so the absolute value is not meaningful in itself (for instance ocamlc gives much longer mark slices than ocamlopt even though both have the default GC settings); however the relative improvement is consistent across programs and is going to be interesting.

Latency max ≤99.99% (ms) rel. (%)
Before, PT 219
Before, NNP 204 -7%
After, PT (1) 55 -75%
After, PT (2) 51 -77%
After, NNP 42 -81%

Coq, compiling some heavy Mathcomp libraries

(mathcomp-character and mathcomp-odd-order)

Throughput pace (ns/w) rel. (%) total time rel. (%)
Before, PT 19.6 13:41
Before, NNP 18.8 -4% 13:30 -1.3%
After, PT (1) 7.1 -64% 12:55 -5.6%
After, PT (2) 6.0 -70% 12:52 -6.0%
After, NNP 5.0 -75% 12:41 -7.3%

Coq's GC settings are optimised for throughput (with a minor heap of 256MB and a space overhead of 200%), so a lesser share of the time is spent in the major GC compared to OCaml. Latency is not an interesting measure for Coq, but the gain is as above.

(edit: clarify that the total time also measures the improvement of the patch that introduces prefetching during sweeping)

poechsel pushed a commit to ocaml-flambda/ocaml that referenced this pull request Sep 3, 2021
Speed up GC by prefetching during marking

(cherry picked from commit 1dc70d3)
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Sep 9, 2021
Speed up GC by prefetching during marking

(cherry picked from commit 1dc70d3)
stedolan added a commit to stedolan/ocaml that referenced this pull request Oct 5, 2021
Speed up GC by prefetching during marking

(cherry picked from commit 1dc70d3)
stedolan added a commit to stedolan/ocaml that referenced this pull request Dec 13, 2021
…efetching

Speed up GC by prefetching during marking

(cherry picked from commit 1dc70d3)
chambart pushed a commit to chambart/ocaml-1 that referenced this pull request Feb 1, 2022
23a7f73 flambda-backend: Fix some Debuginfo.t scopes in the frontend (ocaml#248)
33a04a6 flambda-backend: Attempt to shrink the heap before calling the assembler (ocaml#429)
8a36a16 flambda-backend: Fix to allow stage 2 builds in Flambda 2 -Oclassic mode (ocaml#442)
d828db6 flambda-backend: Rename -no-extensions flag to -disable-all-extensions (ocaml#425)
68c39d5 flambda-backend: Fix mistake with extension records (ocaml#423)
423f312 flambda-backend: Refactor -extension and -standard flags (ocaml#398)
585e023 flambda-backend: Improved simplification of array operations (ocaml#384)
faec6b1 flambda-backend: Typos (ocaml#407)
8914940 flambda-backend: Ensure allocations are initialised, even dead ones (ocaml#405)
6b58001 flambda-backend: Move compiler flag -dcfg out of ocaml/ subdirectory (ocaml#400)
4fd57cf flambda-backend: Use ghost loc for extension to avoid expressions with overlapping locations (ocaml#399)
8d993c5 flambda-backend: Let's fix instead of reverting flambda_backend_args (ocaml#396)
d29b133 flambda-backend: Revert "Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382)" (ocaml#395)
d0cda93 flambda-backend: Revert ocaml#373 (ocaml#393)
1c6eee1 flambda-backend: Fix "make check_all_arches" in ocaml/ subdirectory (ocaml#388)
a7960dd flambda-backend: Move flambda-backend specific flags out of ocaml/ subdirectory (ocaml#382)
bf7b1a8 flambda-backend: List and Array Comprehensions (ocaml#147)
f2547de flambda-backend: Compile more stdlib files with -O3 (ocaml#380)
3620c58 flambda-backend: Four small inliner fixes (ocaml#379)
2d165d2 flambda-backend: Regenerate ocaml/configure
3838b56 flambda-backend: Bump Menhir to version 20210419 (ocaml#362)
43c14d6 flambda-backend: Re-enable -flambda2-join-points (ocaml#374)
5cd2520 flambda-backend: Disable inlining of recursive functions by default (ocaml#372)
e98b277 flambda-backend: Import ocaml#10736 (stack limit increases) (ocaml#373)
82c8086 flambda-backend: Use hooks for type tree and parse tree (ocaml#363)
33bbc93 flambda-backend: Fix parsecmm.mly in ocaml subdirectory (ocaml#357)
9650034 flambda-backend: Right-to-left evaluation of arguments of String.get and friends (ocaml#354)
f7d3775 flambda-backend: Revert "Magic numbers" (ocaml#360)
0bd2fa6 flambda-backend: Add [@inline ready] attribute and remove [@inline hint] (not [@inlined hint]) (ocaml#351)
cee74af flambda-backend: Ensure that functions are evaluated after their arguments (ocaml#353)
954be59 flambda-backend: Bootstrap
dd5c299 flambda-backend: Change prefix of all magic numbers to avoid clashes with upstream.
c2b1355 flambda-backend: Fix wrong shift generation in Cmm_helpers (ocaml#347)
739243b flambda-backend: Add flambda_oclassic attribute (ocaml#348)
dc9b7fd flambda-backend: Only speculate during inlining if argument types have useful information (ocaml#343)
aa190ec flambda-backend: Backport fix from PR#10719 (ocaml#342)
c53a574 flambda-backend: Reduce max inlining depths at -O2 and -O3 (ocaml#334)
a2493dc flambda-backend: Tweak error messages in Compenv.
1c7b580 flambda-backend: Change Name_abstraction to use a parameterized type (ocaml#326)
07e0918 flambda-backend: Save cfg to file (ocaml#257)
9427a8d flambda-backend: Make inlining parameters more aggressive (ocaml#332)
fe0610f flambda-backend: Do not cache young_limit in a processor register (upstream PR 9876) (ocaml#315)
56f28b8 flambda-backend: Fix an overflow bug in major GC work computation (ocaml#310)
8e43a49 flambda-backend: Cmm invariants (port upstream PR 1400) (ocaml#258)
e901f16 flambda-backend: Add attributes effects and coeffects (#18)
aaa1cdb flambda-backend: Expose Flambda 2 flags via OCAMLPARAM (ocaml#304)
62db54f flambda-backend: Fix freshening substitutions
57231d2 flambda-backend: Evaluate signature substitutions lazily (upstream PR 10599) (ocaml#280)
a1a07de flambda-backend: Keep Sys.opaque_identity in Cmm and Mach (port upstream PR 9412) (ocaml#238)
faaf149 flambda-backend: Rename Un_cps -> To_cmm (ocaml#261)
ecb0201 flambda-backend: Add "-dcfg" flag to ocamlopt (ocaml#254)
32ec58a flambda-backend: Bypass Simplify (ocaml#162)
bd4ce4a flambda-backend: Revert "Semaphore without probes: dummy notes (ocaml#142)" (ocaml#242)
c98530f flambda-backend: Semaphore without probes: dummy notes (ocaml#142)
c9b6a04 flambda-backend: Remove hack for .depend from runtime/dune  (ocaml#170)
6e5d4cf flambda-backend: Build and install Semaphore (ocaml#183)
924eb60 flambda-backend: Special constructor for %sys_argv primitive (ocaml#166)
2ac6334 flambda-backend: Build ocamldoc (ocaml#157)
c6f7267 flambda-backend: Add -mbranches-within-32B to major_gc.c compilation (where supported)
a99fdee flambda-backend: Merge pull request ocaml#10195 from stedolan/mark-prefetching
bd72dcb flambda-backend: Prefetching optimisations for sweeping (ocaml#9934)
27fed7e flambda-backend: Add missing index param for Obj.field (ocaml#145)
cd48b2f flambda-backend: Fix camlinternalOO at -O3 with Flambda 2 (ocaml#132)
9d85430 flambda-backend: Fix testsuite execution (ocaml#125)
ac964ca flambda-backend: Comment out `[@inlined]` annotation. (ocaml#136)
ad4afce flambda-backend: Fix magic numbers (test suite) (ocaml#135)
9b033c7 flambda-backend: Disable the comparison of bytecode programs (`ocamltest`) (ocaml#128)
e650abd flambda-backend: Import flambda2 changes (`Asmpackager`) (ocaml#127)
14dcc38 flambda-backend: Fix error with Record_unboxed (bug in block kind patch) (ocaml#119)
2d35761 flambda-backend: Resurrect [@inline never] annotations in camlinternalMod (ocaml#121)
f5985ad flambda-backend: Magic numbers for cmx and cmxa files (ocaml#118)
0e8b9f0 flambda-backend: Extend conditions to include flambda2 (ocaml#115)
99870c8 flambda-backend: Fix Translobj assertions for Flambda 2 (ocaml#112)
5106317 flambda-backend: Minor fix for "lazy" compilation in Matching with Flambda 2 (ocaml#110)
dba922b flambda-backend: Oclassic/O2/O3 etc (ocaml#104)
f88af3e flambda-backend: Wire in the remaining Flambda 2 flags (ocaml#103)
678d647 flambda-backend: Wire in the Flambda 2 inlining flags (ocaml#100)
1a8febb flambda-backend: Formatting of help text for some Flambda 2 options (ocaml#101)
9ae1c7a flambda-backend: First set of command-line flags for Flambda 2 (ocaml#98)
bc0bc5e flambda-backend: Add config variables flambda_backend, flambda2 and probes (ocaml#99)
efb8304 flambda-backend: Build our own ocamlobjinfo from tools/objinfo/ at the root (ocaml#95)
d2cfaca flambda-backend: Add mutability annotations to Pfield etc. (ocaml#88)
5532555 flambda-backend: Lambda block kinds (ocaml#86)
0c597ba flambda-backend: Revert VERSION, etc. back to 4.12.0 (mostly reverts 822d0a0 from upstream 4.12) (ocaml#93)
037c3d0 flambda-backend: Float blocks
7a9d190 flambda-backend: Allow --enable-middle-end=flambda2 etc (ocaml#89)
9057474 flambda-backend: Root scanning fixes for Flambda 2 (ocaml#87)
08e02a3 flambda-backend: Ensure that Lifthenelse has a boolean-valued condition (ocaml#63)
77214b7 flambda-backend: Obj changes for Flambda 2 (ocaml#71)
ecfdd72 flambda-backend: Cherry-pick 9432cfdadb043a191b414a2caece3e4f9bbc68b7 (ocaml#84)
d1a4396 flambda-backend: Add a `returns` field to `Cmm.Cextcall` (ocaml#74)
575dff5 flambda-backend: CMM traps (ocaml#72)
8a87272 flambda-backend: Remove Obj.set_tag and Obj.truncate (ocaml#73)
d9017ae flambda-backend: Merge pull request ocaml#80 from mshinwell/fb-backport-pr10205
3a4824e flambda-backend: Backport PR#10205 from upstream: Avoid overwriting closures while initialising recursive modules
f31890e flambda-backend: Install missing headers of ocaml/runtime/caml (ocaml#77)
83516f8 flambda-backend: Apply node created for probe should not be annotated as tailcall (ocaml#76)
bc430cb flambda-backend: Add Clflags.is_flambda2 (ocaml#62)
ed87247 flambda-backend: Preallocation of blocks in Translmod for value let rec w/ flambda2 (ocaml#59)
a4b04d5 flambda-backend: inline never on Gc.create_alarm (ocaml#56)
cef0bb6 flambda-backend: Config.flambda2 (ocaml#58)
ff0e4f7 flambda-backend: Pun labelled arguments with type constraint in function applications (ocaml#53)
d72c5fb flambda-backend: Remove Cmm.memory_chunk.Double_u (ocaml#42)
9d34d99 flambda-backend: Install missing artifacts
10146f2 flambda-backend: Add ocamlcfg (ocaml#34)
819d38a flambda-backend: Use OC_CFLAGS, OC_CPPFLAGS, and SHAREDLIB_CFLAGS for foreign libs (#30)
f98b564 flambda-backend: Pass -function-sections iff supported. (#29)
e0eef5e flambda-backend: Bootstrap (#11 part 2)
17374b4 flambda-backend: Add [@@Builtin] attribute to Primitives (#11 part 1)
85127ad flambda-backend: Add builtin, effects and coeffects fields to Cextcall (#12)
b670bcf flambda-backend: Replace tuple with record in Cextcall (#10)
db451b5 flambda-backend: Speedups in Asmlink (#8)
2fe489d flambda-backend: Cherry-pick upstream PR#10184 from upstream, dynlink invariant removal (rev 3dc3cd7 upstream)
d364bfa flambda-backend: Local patch against upstream: enable function sections in the Dune build
886b800 flambda-backend: Local patch against upstream: remove Raw_spacetime_lib (does not build with -m32)
1a7db7c flambda-backend: Local patch against upstream: make dune ignore ocamldoc/ directory
e411dd3 flambda-backend: Local patch against upstream: remove ocaml/testsuite/tests/tool-caml-tex/
1016d03 flambda-backend: Local patch against upstream: remove ocaml/dune-project and ocaml/ocaml-variants.opam
93785e3 flambda-backend: To upstream: export-dynamic for otherlibs/dynlink/ via the natdynlinkops files (still needs .gitignore + way of generating these files)
63db8c1 flambda-backend: To upstream: stop using -O3 in otherlibs/Makefile.otherlibs.common
eb2f1ed flambda-backend: To upstream: stop using -O3 for dynlink/
6682f8d flambda-backend: To upstream: use flambda_o3 attribute instead of -O3 in the Makefile for systhreads/
de197df flambda-backend: To upstream: renamed ocamltest_unix.xxx files for dune
bf3773d flambda-backend: To upstream: dune build fixes (depends on previous to-upstream patches)
6fbc80e flambda-backend: To upstream: refactor otherlibs/dynlink/, removing byte/ and native/
71a03ef flambda-backend: To upstream: fix to Ocaml_modifiers in ocamltest
686d6e3 flambda-backend: To upstream: fix dependency problem with Instruct
c311155 flambda-backend: To upstream: remove threadUnix
52e6e78 flambda-backend: To upstream: stabilise filenames used in backtraces: stdlib/, otherlibs/systhreads/, toplevel/toploop.ml
7d08e0e flambda-backend: To upstream: use flambda_o3 attribute in stdlib
403b82e flambda-backend: To upstream: flambda_o3 attribute support (includes bootstrap)
65032b1 flambda-backend: To upstream: use nolabels attribute instead of -nolabels for otherlibs/unix/
f533fad flambda-backend: To upstream: remove Compflags, add attributes, etc.
49fc1b5 flambda-backend: To upstream: Add attributes and bootstrap compiler
a4b9e0d flambda-backend: Already upstreamed: stdlib capitalisation patch
4c1c259 flambda-backend: ocaml#9748 from xclerc/share-ev_defname (cherry-pick 3e937fc)
00027c4 flambda-backend: permanent/default-to-best-fit (cherry-pick 64240fd)
2561dd9 flambda-backend: permanent/reraise-by-default (cherry-pick 50e9490)
c0aa4f4 flambda-backend: permanent/gc-tuning (cherry-pick e9d6d2f)

git-subtree-dir: ocaml
git-subtree-split: 23a7f73
@d-netto d-netto mentioned this pull request Apr 12, 2022
3 tasks
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
5bf2820278 Merge OCaml 4.14 (#55)
34432bedff Set VERSION to 4.14.0+jst
4e1d21c0cb Re-enable probes (amd64 only, as yet untested)
70840805f7 Regenerate .depend for upstream build system
16cb75b6c2 Fix upstream build system after merge (+Bootstrap)
b635e3c4f9 Enable debug runtime in CI
5029967055 Apply ocaml-flambda/flambda-backend#916
7640f4bcd5 Ensure that tail recursion modulo cons is not used with local allocations
d3b6161b03 Resolve merge conflicts in asmcomp/i386
09ac6e11cc Testsuite fix for Fortran tests
9f04dc41da Avoid having scope side-effects in loosen_ret_modes
c32a0b754e Bugfix for error messages in Includecore (see records_errors_test.ml)
ec525635eb Resolve merge conflicts in testsuite
48e13dabb7 Resolve merge conflicts in tools, debugger, toplevel, etc.
942760fbb6 Bugfix: take an instance before typing Pexp_constraint
4ede4874f6 Resolve merge conflicts in stdlib/ runtime/
df43e0282a Resolve merge conflicts in middle_end/ driver/ asmcomp/ (amd64 only)
b7b12b25a9 Resolve merge conflicts in bytecomp/ driver/ so that ocamlc builds
2fa00c13d3 Resolve merge conflicts in lambda/
627d504f22 Resolve merge conflicts in typing/
2bf19bf270 Resolve conflicts in parsing/ & utils/, and regenerate the parser.
05e2ed4cdd Add [@tail hint] annotation to select default behaviour explicitly (#43)
159d8df6d7 Merge remote-tracking branch 'ocaml/4.14' into ocaml-jst
56818c4c9b jst.dune build fix (after flambda-backend merge)
567b95591a Merge flambda-backend changes
409bdce6c6 Support building with dune (#49)
839c1ccdd8 Merge flambda-backend changes
ab34788a5c Merge flambda-backend changes
baf31dfd04 New uniform treatment for misplaced attribute warnings (#44)
679b5001e3 This test requires systhreads to be available
ba9c5ea7b1 Add ability to bootstrap flexdll on Jenkins CI (#11567)
a73ef7a49c Merge pull request #11556 from FardaleM/doc_typo
adcf2cb2c6 Fix [@deprecated_mutable], which couldn't be triggered. (#11524)
040f05bdf0 Fixup Changes
7bfcdc30b6 Merge pull request #11487 from purplearmadillo77/fma_test
9fac589c2f More prudent deallocation of alternate signal stack (#11496)
5d5c5b61b6 Merge pull request #11468 from dra27/i686-mingw-ipv6
7c16b4b0d9 Merge pull request #11373 from dra27/flexlink-detect
a691ba7f78 Do not elide the whole module type error message (#11416)
ec8dea9a81 Merge pull request #11417 from lpw25/fix-virtual-class-type-constrs
b2c7990ac8 tests/lib-bigarray-2/has-gfortran.sh: don't print anything on stdout
757e0a718a Stop calling ranlib on created / installed libraries (#11184)
9acf32acf8 Document limitation on `caml_callbackN` (#11409)
397925772e Merge pull request #11396 from gasche/fix11392
888e84365c Merge pull request #11397 from Octachron/tast_mapper_fix_for_with_modtype
d2689fced7 Refactor the initialization of bytecode threading (#11378)
ed346d06d5 Merge pull request #11380 from damiendoligez/fix-fortran-test-on-macos
0731e8c81f Better documentation for [string_of_float]. (#11353)
2c2e99049a Merge pull request #11267 from dra27/more-_MSC_VER
ef960fb5f9 Guard more instances of undefined _MSC_VER
b6602b11f7 Changes
70c5a7d3df misc.h: fix preprocessor conditional on _MSC_VER
98fedc5cd2 Merge pull request #11236 from Nymphium/missing-since2
66b63e2f24 Do not trigger warning when calling virtual methods introduced by constraining "self" (#11204)
bfb4b1e608 increment version number after tagging 4.14.0
15553b7717 release 4.14.0
54c6b3fee4 last commit before tagging 4.14.0
80b20ed92c Changes: split highlights
fea7946d27 Merge pull request #11133 from Octachron/warning-man-414
4df514973d `odoc`ify the parsetree comments (#11107)
8311a408b1 Fix bigarray 32bit integer overflow of offset in C imp. (#11118)
cfee1a6f61 increment version number after tagging 4.14.0~rc2
466023cefb release 4.14.0~rc2
ab5e660a9b last commit before tagging 4.14.0~rc2
766b6283ef Fix #11101 by making `occur ty ty` succeed (#11109)
db17a8ecd5 increment version number after tagging 4.14.0~rc1
93c2a7f104 release 4.14.0~rc1
679a9504fe last commit before tagging 4.14.0~rc1
5af2478516 Merge pull request #10740 from gasche/tmc-manual-pr
25478af4e4 Do not pass `-no-pie` to the C compiler on musl/arm64 part 2 (#11036)
56c9302e8e Merge pull request #11031 from fabbing/fp_exn_handler
849218e01b Merge pull request #10850 from dra27/flexdll-0.40
a6369a2e8a Merge branch '4.14' into fp_exn_handler
31ca227f92 With frame-pointers entertrap restores base pointer
ddf99786f8 riscv: Generate frametable in data section to improve code relocatability (#11042)
dee5315ad1 increment version number after tagging 4.14.0~beta1
87abd773d5 release 4.14.0~beta1
2ca77b9c33 last commit before tagging 4.14.0~beta1
cc7de730d4 Merge pull request #10900 from kostikbel/printf
b53c79e59b update Change for #10397
cc606b8cf7 Merge pull request #10835 from hannesm/fix-32bit-relocations
113d1e5b68 Merge pull request #10914 from sanette/search
b9ad032486 Merge pull request #10815 from Julow/odoc-200-fixes
716c17c08d Merge pull request #10397 from mjambon/unsupported-on-windows
433f3f5548 Merge pull request #10794 from wiktorkuchta/warn-57
d480c0c4df Merge pull request #10719 from stedolan/arityfix
99dadf22cc Merge pull request #10828 from madroach/aarch64_openbsd
85039e70ad Env: clear uid_to_loc (#11012)
c8c51a735d Merge pull request #11000 from gasche/translattribute-fix
6aa8b021be Merge pull request #10998 from thierry-martinez/fix.doc.seq_fold_lefti
ea7af37703 fix a minor regression from #10462
f9b3a4edc5 increment version number after tagging 4.14.0~alpha2
e1bcd25f9a release 4.14.0~alpha2
aafa87e87d last commit before tagging 4.14.0~alpha2
0c16ba6740 #10836: recognize unrecoverable errors in the signature inclusion check (#10952)
4d3c53688f Merge pull request #10959 from COCTI/fix10907
35158b280e add reviewer and remove comment closing
b5973cd266 add PR number, start CI
1eb55f575a Enable native code on aarch64-*-openbsd*
98453445b4 In_channel.input_all: fix bug (#10978)
5cc897c5be Merge pull request #10968 from nrolland/patch-1
4a3f29d57a Recommend using quoted string literals for regexes (#10946)
fe1b2183c2 Fix #10907: Wrong type inferred from existential types
ae1a31b019 Merge pull request #10839 from Et7f3/fix_show_regression
3ff3fcc0c7 Introduce the Thread.Exit exception (#10951)
db915bd895 Do not put deprecation warnings on int8, uint8, int16, uint16 (#10937)
15a457bde5 Document exception raised by read_line (#10948)
b2fe7590f3 Merge pull request #10846 from voodoos/objinfo-optional-shape
00ab75652e Merge pull request #10932 from wiktorkuchta/empty-anchors-css
683d80e264 increment version number after tagging 4.14.0~alpha1
7c420e8375 release 4.14.0~alpha1
f4b50329cc last commit before tagging 4.14.0~alpha1
fd39912851 [4.14] Add deprecation warnings on {Int32,Int64,Nativeint}.format (#10922)
7705e92064 Merge pull request #10825 from gasche/shape-strong-call-by-need
36f04706eb Shape.decompose_abs
1dbedf0052 shape: never erase unknown variables into anonymous leaves
9f9017fed3 Changes
6080fbf901 shape: a remark on memoization with non-canonical data structures
d9c5dc87a0 replace the wrapped lazy thunks by pre-memoization inputs
c1ff336cdc typing/shape.ml: memoized, strong call-by-need
5c1f723715 [minor] shape: refactor the printing of shape items
3b7dbe2c31 use an environment instead of repeated substitutions
7038d473ff Shape: nicer handling of fuel
d406f47470 reduce the shapes after type-checking
4f980723bf shape: disable all reductions
8b7d2a04d5 shapes: more sharing in includemod
fff88af334 shape: only refresh vars when necessary
445db99496 Free the alternate signal stack if sigaltstack fails (#10891)
71fd0fbfc4 change manual to include a subsection about 'let exception' (#10848)
29b933a831 add @since on Atomic (#10841)
5798e801e1 fix #show regression in 4.14
b6163a49ee Fix a crash in `Obj.reachable_words` (#10853)
0b6f4b390d Fix more display differences between ocamlnat/ocaml (#10849)
61eeefec00 Merge pull request #10712 from NathanReb/fix-type-var-naming
08eac16563 Merge pull request #10797 from dra27/fix-volatile
2a6df5eb69 Fix #10822: Bad interaction between ambivalent types and subtyping coercions (#10823)
0fdbf79351 Ensure right-to-left evaluation of arguments in cmm_helpers (#10732)
18944505af Merge pull request #10589 from wiktorkuchta/manual-spaces
150874b85c Merge pull request #10820 from Octachron/fix-10781
451cb76c0d Merge pull request #10813 from wiktorkuchta/manual-numbers
c764a35f2e Merge pull request #10726 from xavierleroy/free-alt-sig-stack
3370799429 Merge pull request #10575 from Octachron/dump-dir
9886564f47 Merge pull request #10806 from gasche/ocamlopt-dshape
76040803f6 Merge pull request #10799 from Octachron/yet_another_missing_cmi_error
49e28143a4 Revert "Merge pull request #10736 from xavierleroy/bigger-stack"
b0fb4ab86b Merge pull request #10739 from shindere/fix-install
17275714a6 Merge pull request #10764 from alainfrisch/afrisch_fix_oo_compil
0f152a868f Merge pull request #10783 from MisterDA/unix-create-process-doc-use-unix-stdout-instead-of-stdlib-stdout
3f170f317e Merge pull request #10771 from Octachron/abstract_row_repr_bug
9c247a8c9a change entry for #9444
ae1f37cf76 Merge pull request #9444 from let-def/printtyped-extra
3a2a1415dc Add [@poll error] attribute (#10462)
2a11a0a98c Merge pull request #10718 from voodoos/shapes
e8d560740b Merge pull request #10736 from xavierleroy/bigger-stack
cd2089784d first commit on branch 4.14
5000b93cad last commit before branching 4.14
11cdd82e6f Fix white spaces in Changes
f310f749bd Update magic numbers for 4.14
096ab9cdd3 Some more documentation on C compilers in file INSTALL
db47011e45 More detailed recommendations about C compilers in INSTALL file (#10685)
d0e6520f38 Merge pull request #10642 from MisterDA/win32unix-posix-Sys.remove-Unix.unlink
c5f4866038 Merge branch 'trunk' into win32unix-posix-Sys.remove-Unix.unlink
ed7876a8bf Merge pull request #10752 from wiktorkuchta/common.ml.in-open
e23ec809c3 new ephemeron API, implemented on top of old ephemerons (#10737)
8170460a0d Merge pull request #10751 from stedolan/add-ocamlnat-dlambda
284af7de6e Fix compile error due to unused open
ac1aabd3f7 Add -dlambda to ocamlnat options
718e646cc0 Merge pull request #10746 from Octachron/more_lax_nesting_rule_for_camltex
8d1990b499 caml-tex: allow to nest warnings and errors within an example
2bcef4bc17 Merge pull request #9760 from gasche/new-trmc
35b86df0ae Additions to the module Seq (#10583)
0891d8e819 bow to check-typo
0d80ae3dac [minor] complete the renaming of 'TRMC' into 'TMC'
a862307634 [review] move the main TMC comment to tmc.mli
4a338a825f [review] improve code readability within TMC disambiguation
1b10dd9aef TMC: much thinking about which @tailcall annotations to preserve where
e6fbcc8857 [WIP] TMC: Changes
d2a2dc9a26 [review]: in TMC ambiguity errors, print the ambiguous callsites
955528b22a [review] copyright headers and .mli file
2dcd81886a TMC: do not warn on ambiguous sub-terms of non-ambiguous programs
9725d90cd7 TMC testsuite: ambiguities with many arguments
67251d9295 [review] new TMC test
333806b419 TMC: implement [@tail_mod_cons] for non-recursive lets
9306f658ae [review] TMC: make 'offset' distinct from 'lambda' for clarity
d92ef9898b [review] tmc: use a placeholder value that is better for debugging
9c9952012a [review] interface for Tmc.Constr
7c17ea641e [review] TMC: rename 'return' to 'lambda'
f881fc435a [review] rename 'con' into 'constr'
9242408e05 TMC: support Tupled functions and partial applications
1daea333b6 [refactoring] move Simplif.exact_application to Lambda
41b9afcf1c TMC: some semantic preservation tests
640f24643d TMC: testsuite
1b8771ed6b TMC: warn when a tail-call is broken by the TMC transformation.
0e04aa1b29 TMC: warn if there is no optimization opportunity
7dc0d86d78 TMC: error if several different calls could be optimized
10d756bccf TMC: Constructor composition in direct style: [benefits_from_dps : bool].
e30d162ea8 TMC: [minor] improve dummy name generation for let-bound arguments
4d2d0b0dfa TMC: optimize constructor composition
e9397f6605 TMC: generalize `Choice.t` to use binding operators
90dd724a3e TMC: code-generation tests
e0df0a1517 TMC: product representation of choices
35dff8cacf preliminary implementation of TMC (tail modulo cons)
a84b25b222 prepare for TMC (tail modulo cons) transformation
443be9c2af Buffer: reimplement UTF encoders with the new UTF Bytes encoders. (#10733)
6116cd5798 Fix #10735 as suggested by @lpw25 (#10738)
2d91196a66 Update Changes
7274e23659 Merge pull request #10743 from dra27/be-quiet
d44caf421b Don't display prune information for .gitignore
996dc40113 Merge pull request #10697 from MisterDA/win32unix-WSADuplicateSocket-or-DuplicateHandle
cab43ad3b6 Merge pull request #10715 from dra27/ocamlnat-hooks
d0129e2914 Fix Changes entry for 10717.
d9cfda1e1a Windows: Sys.remove, Unix.unlink now remove symlinks to directories
78e59967d8 Test Sys.remove, Unix.unlink behavior on symlinks and directories
d1d4352a1b Merge win32unix/dup2.c into win32unix/dup.c and factor code
2ecf1ecb20 win32unix: use WSADuplicateSocket in bindings of dup and dup2
7997b65fdc Merge pull request #10710 from dbuenzli/utf-support
910849911b Update changes file.
1df0d3bbee String: add UTF decoders and validations.
230f9c60f9 Bytes: add UTF codecs and validations.
b71489f03e Ensure that functions are evaluated after their arguments (#10728)
d564b171ad Eliminate Topeval.phrase_name entirely
18c4d16b3b Fix performance bug in Obj.reachable_words (#10731)
1067f77656 Merge pull request #10714 from dra27/x86-assembler-hook
60de0907c8 Add X86_proc.with_internal_assembler
4c52549642 Merge pull request #10722 from Octachron/fix_unused_functor_warning
5a80188d5b Merge pull request #10720 from dra27/autoconf-tweak
fe9b014c95 fix the marking of unused values in module type definitions
5915a72545 Fix typos in Changes
0e50ef1029 Merge pull request #10717 from shindere/simplify-man-pages-installation
2250fd8a22 abstract row_field (#10627)
577ccbf11d Fix AC_CONFIG_HEADERS on CRLF-checkouts
304abe94c2 Uchar: add UTF codec tools.
0684867f70 Merge pull request #10672 from wiktorkuchta/webman-disc
54e10a5c7a Merge pull request #10527 from wiktorkuchta/toploop
f8dc1ef370 Merge pull request #10681 from lthls/boolean-lifthenelse
9587b17a97 Merge pull request #10713 from Octachron/babylonian_manual
05faf05a10 Changes
a58ec8866f Add comment on Simplif.split_default_wrapper's pattern matching
848304c9dd Review: Document Switch.Make parameters
ff41359014 Enable if-then-else elimination in Cmmgen
37298b0b9f Enforce boolean Lifthenelse in native mode
5a44941190 Mention the help directive on toplevel startup
bf0859daec webman: Use li::marker for coloring bullets
cd418bfcbd Use babel to allow underscore in labels
e2c2611f53 Simplify the installation of man pages
7fb10211f6 Merge pull request #10541 from COCTI/abstract_fk_and_commu
89e4e1eca0 dune: disable some warnings
cc9ae80bb1 Wrong unmarshaling of function pointers in debugger mode (#10709)
ec872d1f6c add reviewer
056783366a comments
07eddeb6b1 use internal GADTs to ensure that Cunknown and FKprivate do not leak
024007974f make field_kind and commutable abstract types
f44236171d specify that caml_alloc_custom_mem is available since 4.08 (#10704)
a7bf9cbaf3 Improve type variable name generation and recursive type detection when printing type errors (#10488)
d17f6f1a19 Merge pull request #10706 from dra27/changes-needed
87b02aee9d Merge pull request #10702 from MisterDA/win32unix-cast-strictly-aligned-pointer-stat
dbb60dc4ee Fix cast of more strictly aligned pointer
f1d2830b7f Disable the commit message Changes escape hatch
9417670a0d Merge pull request #10705 from wiktorkuchta/dotmerlin
7ef6094f59 Allow the native toplevel assembler to be replaced
dd042539c6 Introduce native-toplevel specific hooks module
85ba4d7494 Make Topeval.phrase_name less exposed
89b4062ac4 HACKING.adoc: Remove mention of .merlin files
542151096a Factor out load/lookup functions in native Topeval
dc90ad4944 Remove duplicated type definition in Topeval
0be3ae695b Merge pull request #10690 from dra27/build-toplevel-lib
294e717a74 Merge pull request #10693 from lpw25/fix-includemod-ident-collision
c937590071 Merge pull request #10692 from gpetiot/expose-parse-module_type
826a6b4e85 Merge pull request #10694 from MisterDA/readme-win32-cygwin-64bits
cd794ec0fb Always build ocamlnat
42aa9631f2 Add Changes entry
301ffcfd8a Add changelog entry
f1c53b8a7c On Windows, advise to build with 64-bit Cygwin
1ccade8d41 Fix ident collision in includemod
5f77554bbe Expose Parse.module_type and Parse.module_expr
0b3f8dd77d Merge pull request #10658 from shindere/ocaml-version
41cac4b5ce Add detailed information about the current version of OCaml to the Sys module
0239407657 Make the documentation of ocaml_version in stdlib/sys.mli more precise
1acc501adc Add a macro to explicitly control whether this is a dev version or not
05f1d2ef96 Move version check from tools/check-parser-uptodate-or-warn.sh to Makefile
38449ab1b0 Let manual/src/html_processing/src/common.ml be generated by configure
47e7dc8e85 Let configure rather than make generate manual/src/version.tex
b64a764a31 Add --enable-native-toplevel
9ab0d9d036 Always build the native toplevel libraries
c6a3496711 Update comment in utils/config.mlp
89b1c6efa7 Remove the reference to the VERSION file from tools/pre-commit-githook
86b9930082 ocamlyacc: rely on runtime/caml/version.h
20924c9e26 Introduce the OCAML_VERSION_EXTRA C preprocessor macro
74ad8eb2b1 Add copyright header to runtime/caml/version.h.in
c71428e7a3 Transform runtime/caml/version.h into a configured header
b9ec96722d Document where the OCaml version is defined and how to update the VERSION file
ce97e67006 Define OCaml's version in build-aux/ocaml_version.m4
e9eb2fb5f0 Restore the -native option in Inria CI'sother-configs job
45612f14af Revert "Merge pull request #10682 from gpetiot/fix-letop-binding-loc"
93c99bac88 Merge pull request #10675 from xavierleroy/runtime-macro-deprecation
f0bdb45572 Changes entry for #10675
d57a4bdf55 Remove tests/compatibility
e88608c8e8 Rename field of internal structure from `alloc` to `allocated`
b7ec62b56f Deprecation of C macros, continued
f63965c1e4 Extend CAML_DEPRECATE to VS2019
e165c285ee Deprecation of C macros
9ee7de9278 Merge pull request #10682 from gpetiot/fix-letop-binding-loc
b3b3d35f4a parser.mly: fix locations of letop-bindings
5396b14423 Merge pull request #10676 from shindere/inria-ci-test-pic
d5f5076624 Fix an overflow bug in major GC work computation (#10680)
d5cdbbb775 Documented M, m, n, w, W, t, c and H options of OCAMLRUNPARAM, Fixes #8697 (#10666)
ba2dc5342f Merge pull request #10679 from dra27/absolutely-CC
98e16f0334 Merge pull request #10659 from lpw25/fix-freshening-substs
a112ad8bd6 Add Changes entry
134a55dfee Bootstrap
10074989bb Fix freshening of identifiers
1ab49ceb58 Merge pull request #10678 from lpw25/expose-warnings-description
fc543aa54a Add Changes entry
f23e9745ad Expose descriptions in Warnings module
f1be534e3d Use $cc_basename in configure for detection
f4d6410448 Allow for gcc with prefixes in configure
667a9eb8cb Also test --with-pic in Inria's other-configs CI job
8f713c59f7 Fix script for Inria's other-configs CI job
6e053e0dbb Str library: avoid global state in RE engine (#10670)
082341c298 Do not use obsolete macro 'Modify'
3eb0ab091f Install Changes, LICENSE, and READMEs (#10669)
07dfba8594 Add {In,Out}_channel.with_open_{bin,text,gen} and In_channel.input_all (#10596)
64023f55c6 Ignore `_opam` in git and check-typo. (#10649)
537a707540 Release howto: enumerate all opam packages
adc906e67e remove comments about repr-ing (#10665)
ab0c892cfa Merge pull request #10662 from gasche/pr10661
aa8100ae2e Merge pull request #10656 from dra27/check-typo-fixes
8da8b7e028 Merge pull request #10382 from lpw25/clean-envs-check-type-decl
a20ae64440 Add Changes entry
17294adcdb Bootstrap
60c2126f1d Don't repeat environment entries in Typemod.check_type_decl
c8c3a95a97 Merge pull request #10582 from kit-ty-kate/source-highlighting-tabs
3151f1f138 Add more merge constraint tests
e93f6f8e5f Merge pull request #10504 from dra27/systhreads-configure
da4515c289 Merge pull request #10621 from dra27/simplify-shared-configure
e0d1668e27 fix indentation (#10657)
38b9d45f66 Merge pull request #10644 from wiktorkuchta/typedecl-t
8d3c2e0a18 check-typo: check for executable after pruning
8a3dee3cf0 check-typo: prune directories in .gitignore
3748449025 Merge pull request #10516 from gasche/switch-types
46b5e1d577 [minor] switch.ml: distinguish types for arguments, tests and actions
43af0f8901 [minor] make Bytegen.comp_primitive a robust matching (#10646)
b44c16eaac manual: Fix inconsistent styling in typedecl.html
74f89236ea reorder the 4.13 Changes
0a18ad2597 .mailmap update
9d537af8a8 (very important Changes cleanup)
a517a240b7 Synchronize 4.13 and trunk changes
5158c85e0f Buffer documentation tweaks (#10525)
f406684bb4 Fix a flaky test by refactoring TypePairs (#10638)
359f46f224 Add `Out_channel.{is,set}_buffered` to control buffering of output channels (#10538)
8a3347438b Merge pull request #10635 from dra27/freestanding-sak
0d4065b749 Merge pull request #10632 from dra27/fix-10630
0bcaddb887 Merge pull request #10637 from gasche/outcomdetree-constructor-record
8420375777 [refactoring] Outcometree: introduce a record type for constructors
e20fe18de4 Restore minor heap pointer after a Stack_overflow (#10633)
69ba948203 Allow the C compiler to be overidden for sak
f9fe08c9c1 Expose more Pprintast functions (#10618)
84a5813756 Ensure -fPIC doesn't get passed to an assembler
61ecb0735d add immediate attribute (#10622)
0f2cbf6d3d Merge pull request #10629 from nojb/bigarray_c_elts
531bb0400e AMD64 integer multiply immediate cannot write result to a stack location (#10628)
68cdc38ee4 manual: add missing Bigarray C element kinds
f1b9c23d18 Merge pull request #10421 from sliquister/gc-doc
7ad8c13683 Force normalization on access to `row_desc` (#10474)
7317226e4c Small refactoring in predef.ml when computing the initial environment   (#10597)
75680ff87b Fix performance regression introduced in 4.08 (#10624)
a2cef4d71b #10598, fix exponential blow-up with nested module types (#10616)
cffde3184d Merge pull request #10619 from shindere/fix-o-regression
0117428c3e Support more arguments to tail calls by passing them through the domain state (#10595)
817796733f Pack ocamldebug modules to minimize clashes (#9621)
9410b9c91a Limit CSE of integer constants (#10615)
b8efbfb0eb Fix marking of if condition as inconstant in flambda Fix #10603 (#10611)
36940c6dcd Clarify flexdll.h/flexlink warnings
87036673b4 Simplify configure for shared library support
ba4dad634b Merge pull request #10511 from dra27/cygwin-without-flexdll
c1099dff6b Fix regression introduced by #9660
de6be27675 Correctly configure Cygwin when flexdll missing
3e85887d68 Merge pull request #10602 from jmadiot/manual-errormsg-pre
1be4fa768f manual: fewer hevea-generated css classes (#10605)
fa43873b3b Merge pull request #10599 from stedolan/lazysigs
c35fc2cb02 Style changes
335cf1ace8 alldepend & whitespace
ee57207099 Bootstrap
9b1c5bca86 Changes
b7efd52b9b Lazy environment scraping in Mtype
143dadbde5 Add Env.find_strengthened_module
27f1686aa7 Optimise Subst composition with identity
b44e5fdf70 Strengthen module types to aliases
3fb495f9c3 Lazy Env.lookup_modtype and Env.lookup_module
46f803efb6 Lazy strengthening and module type declarations
397ac57f9d Lazy signature substitutions
cd105d99ba Deobfuscate ignored-partial-application and non-unit-statement check (#10606)
93b7f1c73e manual: keep white spaces in error messages
6a12ddb222 Merge pull request #10601 from Octachron/manual_separate_library_tex_files
c96ba796ab Merge pull request #10587 from dra27/remove-SRC
632cb780e0 manual: fix odoc build
e616d72866 Move #10471 to 4.13 section
cd3b2d0035 Correct Makefile include'd in manual
1a5000f418 Merge pull request #10489 from dra27/main_os
3a0b1b1e15 Merge pull request #10451 from dra27/no-scripting-for-4.13
5fe616551e manual: more explicit include of module documentation in latex mode
4e9eb755e6 Add various fast-paths in typechecking. (#10590)
5ba9e0e0d6 Merge pull request #10588 from dra27/manual-mkdir-again
dbce6c200f Merge pull request #10593 from voodoos/fix-untypeast-for-patterns
eccaa452ca Add {In,Out}_channel to Stdlib (#10545)
011e05309a Support debugging multithreaded programs until they create threads (#10594)
8d93e23625 Add Changes entry
e929691fa1 Fix untyping of patterns without named existentials
452a9e1283 Add a test illustrating wrong untyping
b4a6d23f5b Add missing reviewer
2c7c86adce Merge pull request #10586 from dra27/fix-manpages-partial
6905c8986d Clarify: it doesn't need to be that complicated
fa2e1ccf14 Remove unnecessary CAML_LD_LIBRARY_PATH
a4ac8615d9 Remove unnecessary COMPFLAGS in manual
1951ae0fbb Eliminate need for abspath
a9e71bd760 Fix incorrect pattern rule
ffee03bb71 curl quietly
e6d42ce95a Use ROOTDIR not SRC in manual build system
e517c653c9 Remove no longer used SRC from ocamldoc/Makefile
d12ddb5a12 Fix manpages build when libraries are disabled
23c8433d30 More parameter passing registers for Power and s390x (#10578)
7a40c57ed8 Merge pull request #10565 from wiktorkuchta/string-trunc
c644bdfd25 oprint: Truncate strings only after 8 bytes
97d57dedff Add a Changes entry
6711b12463 check-typo: highlight_tabs.ml contains tabs
ef16261ae4 Error source highlighting: align using tabs to match tabs in the source
c7aa1a9009 Add a test for error highlighting in presence of tabs
b5ffb01571 manual: Fix typo in GADT examples (#10581)
600733aa78 Merge pull request #10580 from Fourchaux/typos
2ba87eeb16 Typos
90c1365901 Add the missing license field to the opam file (#10579)
812436fd30 Change inlining cost of flambda switches. (#10458)
f5674ecbc5 Remove Obj.{marshal,unmarshal}. (#10568)
672965bfda Merge pull request #10549 from xavierleroy/arm64-signals
50495f632a Changes entry for #10549
12523517cf Support naked pointer checker on ARM64/Linux and ARM64/macOS
b36a5708c6 Add stack overflow detection for ARM64/Linux and ARM64/macOS
3770a8708e Add signal-handling macros for ARM64/macOS
d9519e0773 Cosmetic: reorder cases in signals_osdep.h
1bd0e663b6 Merge pull request #10558 from Octachron/apply_selective_typing_prim
70750e2038 restore %apply and %revapply for abstract types
3371e635e9 fix the dune build after #10577
091f5d4b64 Merge pull request #10577 from shindere/configure-sys
b88604d8f7 Merge pull request #10574 from wiktorkuchta/remove-htmltransf
718990c2cf Let configure generate stdlib/sys.ml
851d0ff6aa Add naked-pointers flag to `ocamlc -config` output (#10531)
172bec8073 Remove manual/tools/htmltransf
2642557895 Merge pull request #10573 from Octachron/parsetree_typo
1393ccabb1 Fix a typo in parsetree.mli
f0d61fa106 Merge pull request #10497 from wiktorkuchta/webman-css
9671c8769b Add Changes entry
7f58ef0211 Merge pull request #10567 from v0idpwn/patch-1
53659d6fbf Update IRC info
fb99b2be83 Tidy-up the testsuite/manual commands (#10411)
da44d603ab Merge pull request #10380 from dra27/cygpath-locale
1dc70d32da Merge pull request #10195 from stedolan/mark-prefetching
00c84b8a1c Add set_uncaught_exception_handler to systhreads (#10469)
6abc9727d2 Disable colors if NO_COLOR env var set (#10560)
49c81d7089 Move 10442 to 4.12 maintenance section
4b6c210dda Merge pull request #10446 from dra27/fix-10442
80482ed60b parser.mly: location fixes for type constraints and punning (#10555)
53430f21c9 Restore Toploop.use_file (#10557)
3b5812a0a0 Merge pull request #10539 from COCTI/revert_dup_kinds
a5bd913f51 fix bug of non-shared field_kind
01777771e5 Revert "right way to duplicate field kinds"
a6a071c2f8 fix pretty-print of gadt-pattern-with-type-vars (issue #10550) (#10551)
d269e04acf Restore old loop logic in prefetch GC
90f648ca07 Rebase & unbreak naked pointers checker
4a7577d430 Optimise instrumented runtime mode in marking
1a7b84dd53 Fix an off-by-one error
02c4f42efc CAML_INSTR support for mark prefetching
58a25975c2 changes
1d35b2573d typo
ba2fd8ae8a bugfix for redarkening logic
369e2ec70b changes after review
401bf4811e cleanup
3354bab3ab Speed up GC by prefetching during marking (prototype)
8949e28fa1 fix no-flat-float-array tests after #10542
44e473e6a4 Compiling for HaikuOS 64 bits  (#10546)
328b0b7ce0 Merge pull request #10542 from lpw25/fix-unboxed-immediate64
3723072576 Add Changes entry
59fae6784c Fix detection of immediate64 types through unboxed types
9c67b253f6 Fix mapping of intervals in Ast_mapper (#10543)
d53908d714 Add doc for ocamldoc's cross-ref with text (#10536)
1271f676e7 Merge pull request #10471 from AltGr/arm32-musl
ae87e89535 Merge pull request #10448 from damiendoligez/fix-bootstrap-ci-script
82755626a4 Merge pull request #10535 from nekketsuuu/nekketsuuu-subsub
d5ae88280d Merge pull request #10537 from nekketsuuu/nekketsuuu-mkdir-webdircomp
c973882e03 Remove comment about HP/UX 9
d9558baf2d Fix private polymorphic variant inclusion (#10529)
7090cc3810 minor fix: mkdir before copying CSS for webman
cc739fde51 webman: Remove chapter number on subsubsection correctly
50deeb2d87 Add explanation for shorthand representation of functor with multiple arguments (#10530)
011d316a3b Merge pull request #1599 from dra27/win-env-tests
cdc32182ae Merge pull request #10526 from xavierleroy/more-random
49b9cd3e30 webman: Center BNF syntax bars `|`
d56cbd0463 webman: Italicize command-line argument variables
27ae0f55e2 webman: Styling for Unix/Windows specific blocks
321a2279e2 webman: Add padding to .cellpadding1
cfb2545d9b webman: Add spacing to tables for aligned exprs
f6355cf1a7 webman: Fix variables being bolded, not italicized
afe24ba70b webman: Make table headers bold
e311dabeeb webman: Center tables
7c1fd96f3a webman/api: Improve constructor color contrast
cfd3c039f6 webman: Remove styling for the ordinal "th"
a50a4d42f8 webman: Set (mathematical) variables in italics
b4eb567b0d webman: Set command-line arguments in monospace
8900ac5521 webman: Use SCSS variables for font-family
61764cc647 webman: Increase heading font-weight
d5fe4293e6 webman: Make display style blocks centered
ef6d0501ae webman: Italicize grammar symbol names
a194d25e0e webman: Follow inline code styling of htmlman
f8390bc460 webman: Remove colored background for inline code
986e8329c5 gc.mli: tweak phrasing
5dc1b2f510 use the Opam way of making "sed -i" portable
5f0cbe101b Merge pull request #9424 from bobot/fix_Ephemeron_segfault_with_missing_write_barrier
4a93ebb12a Random: add functions bits32, bits64, nativebits
0ff72d96d0 Merge pull request #10524 from wiktorkuchta/directive-error
3c97584f46 Show expected and received type on directive error
c329c2c0b7 Merge pull request #10522 from xavierleroy/no-ligatures
fb1af2e229 toplevel: Move running directives to Topcommon
f203a5d45d Run the Polling pass earlier (#10523)
aededd1364 Remove `\usepackage{ae}`, redundant with `\usepackage{lmodern}`
265bba9394 Disable ligatures in typewriter fonts
9f1b1ed637 Merge pull request #10500 from gasche/comment-on-match-exn
ff86950c25 [minor] make Lambda.Not_simple local to its only user
1940cb007b [minor] remove unused exception Switch.Not_simple
1475d99119 [minor] a code comment on match-with-exception compilation
b3d2cdcbe6 Merge pull request #8516 from lpw25/simplify-type-class-structure
73b648e6db Add Changes entry
e8dd0178a2 Add helpers for warning_scope in typeclass.ml
990ea08408 Remove unused csig_inher field
964a13c31b Move method spine generalisation logic into Ctype
dbd5002427 Change warning 13 message
dc351b3c7d Rename Concr
f7e6b79738 Keep class signature row up-to-date
a7f13f9acb Merge pull request #10506 from wiktorkuchta/refman-ext
4716c8e063 [Weak] keep *set_* and *blit_* from breaking marking invariants.
6909cadc72 [Weak] Add a tests for ephemeron setting above not alive keys
a1b917ad7e Merge pull request #10400 from dra27/debian-i686
b7ef616ff7 Set object name of self type
19203f86f1 Fix warning attributes in class signatures
845ac93066 Treat class_signature more like type_expr
8f9b7667ab Remove private_self and public_self
e6aa05bf93 Track dummy methods via their scope
b43ecd16db Represent ancestor variables more directly
1aa65fa967 Give more precise errors for virtual methods
013717de96 Add dummy methods in all cases
37c5cb1d30 Bootstrap
6e5b86355a Change representation of class types
3ca0fad1ba Tweak configure for supporting arm32 archs with musl
6ad7a11a58 Revert "Remove propagation from previous branches (#9811)" in trunk (#10492)
caf5108506 Collect vars and meths without references
b28ce4ac22 Build met_env on second pass
0ca24f9d10 Relax object duplication restrictions
73c8addba2 Make the two-passes of class_field explicit
a734bc6ae5 Add more classes tests
08d93dd02d manual: Move "Private types" into its own file
c5e96ec070 manual: Fix module aliases section appearing twice
e608b7c95a Give caml_send* arguments the correct types (#10498)
a20e0a7efc Merge pull request #10502 from gasche/fix-dune-build
b1f4d7b2ad stdlib/genlex.{ml,mli}: use attributes to turn deprecation alerts off
54ce3bee52 dune: tell dune about asmcomp/polling.ml
91058bd6f3 Disable systhreads if unix isn't built
80a40377dd Use DOES_NOT_EXIST to clarify expectation
24f565bcf1 Always restore alteration to environment string
de11da3096 Use for instead of while for stepwise loops
39d29beb0f Environments.remove -> Environments.unsetenv
551af346f3 Changes
bc527fc4c1 Correct indentation of function clauses
791dd868c1 Use unset in the testsuite to harden tests
30256c0de4 Add unset directive to ocamltest
0fc19c04a4 Harden win-unicode test against putenv failure
e7161fdd1a Eliminate open Unix from testsuite
ddd9ec2a91 Deprecate the Stream and Genlex stdlib modules (#10482)
3e69d9db43 Revert "Give caml_send* arguments the correct types (#10461)"
1b3ffee3f1 Changes entry and comments for Safepoints (#10039)
758fc7ddd6 Safepoints (#10039)
6a42e42306 Give caml_send* arguments the correct types (#10461)
9209a4f757 Merge pull request #10493 from wiktorkuchta/binutils-install
ea7c8bd522 Mention binutils as installation prerequisites
7053a45357 #3959, #7202: in script mode, handle directive errors (#10476)
98a27ddf9d Merge pull request #10487 from nchataing/cstr_res_unfolding
363f8dcf07 Update change entry
1c7e469f21 Compute STDLIB_MODULES with a C auxiliary
7a529fbafa Abstract wmain/main difference with main_os
7087674e08 Update dependencies
679528770f Move function from Types to Btype
a37a7e2d87 Add Changes entry
2953c5be97 Move logic to get the type path from a constructor return type in Types
0c2eb39351 [doc] Fix: String.{starts,ends}_with introduced in 4.13 (not 4.12) (#10486)
005ba108db Add comment to amd64/emit.mlp about caml_c_call (#10481)
cbbb5e025c Allow explicit binders for type variables (#10437)
424a2f6727 Merge pull request #10472 from gasche/caml_sys_random_seed
46ec3beac7 caml_unix_random_seed: clarify fallback logic
54bedf4344 caml_sys_random_seed: split unix-specific part to its own function
e1a37e1a9e Merge pull request #10475 from xavierleroy/doc-undefined
37257fb488 st_stubs.c: fix initialization of thread ID when a thread starts (#10478)
e988cb9843 Merge pull request #10473 from edwintorok/riscv-cfi
e390e6473f Merge pull request #10361 from Octachron/semdiff_type_decl
0692959556 Changes: add entry for RISC-V CFI directives
9136587829 Changes for #10475
570721f186 Documentation of iterators and concurrent modification
e015077bf0 asmcomp/riscv: emit CFI directives
d4b73d36a3 runtime/riscv.S: add DWARF CFI directives
26cc0f65a1 runtime/riscv.S: introduce END_FUNCTION macro
650ba029a5 Remove unused cstr_normal field from the constructor_description type (#10470)
47e5a7acb6 Normalize type_expr nodes on access (#10337)
f68acd1a61 Merge pull request #10192 from MisterDA/windows-unix-domain-sockets-socketpair
11c5f76d29 Split labels and polymorphic variants tutorials; move GADTs into tutorial (#10206)
ec880ee809 Pun labelled arguments with type constraint in function applications (#10434)
fb81f86756 update Changes for #10361: add reviewers
4d5f4f2037 diffing_with_keys: switch to a record
b3d44c38fd Merge pull request #10468 from matthewelse/trunk
3cbf4fab97 Functorized diffing (with improved documentation) (#4)
69560194fc swaps and moves
16cb5722e0 diffing: adjust weight for fields and constructors
24434c4e1c update Changes
b8f0f2f569 more tests
8adc48b8b0 diffing for variant definitions
3cac9dd6c5 diffing for record definition
fe2dab7770 Emulate socketpair of Unix domain sockets of stream type on Windows
566350611b Factorize code setting cloexec/inherit flag on Windows
f40e8996aa Support Unix domain sockets on Windows
adb2dad313 diffing: define few common functions
bc0def2418 Add #10468 to Changes
9adfb96935 Correctly pretty print `Psig_typesubst`
32d25e7d97 Merge pull request #10407 from antalsz/pr-better-error-messages
0c993326e1 Respond to final round of review for #10407
2d62e825ca Make `Errortrace.*_error` only contain nonempty traces
4aacb71cb2 Add some extra periods to some new error messages for consistency
80855e2d2f Respond to more review for the structured error messages (#10407)
1820e785a5 Respond to review for the new structured error messages (#10407)
2abe3e4d3d Use the new structured errors (#10170) for better error messages
abd5490962 Add new test cases for failures of type-specific unification methods
7fad56ed20 Add more new test cases for module inclusion errors
79a60b42bc Add new test cases for module inclusion errors
7849115b07 Bug fix: `equal_private` was being used in too many places
d967318148 Remove Cmm.memory_chunk.Double_u (#10433)
70b03f9210 Filename.chop_suffix: fail cleanly when not a suffix
5bcddeb863 Document what happens on overflow in float -> int conversions
e5e9c5fed5 Merge pull request #10402 from Octachron/manual_ocamldoc_fix
12df60ccce Move #10454 change entry to 4.13 section
8b548de2e5 Merge pull request #10454 from lpw25/nondep-row-more
d5ff640550 fix Changes for #10449
dfc9f19139 fix work accounting in marking phase (#10449)
da57ec08f3 Add Changes entry
c6a50a3f58 Check row_more in nondep_type_rec
fc9a672251 Add another nondep test
aa890e2df4 Merge pull request #10328 from lpw25/better-disambiguation-error
526ea2a188 Add Changes entry
29151abc66 manual: document ocamldoc paragraph insertion
e22cbaf48f Merge pull request #10444 from Fourchaux/typos
40494f0038 Merge pull request #10447 from edwintorok/riscv
063cd86c18 asmcomp/dune: fix build on RISC-V
1768cbcfd5 Give more precise error when disambiguation could not possibly work
3ce12cb967 Fix ordering of dirs in Load_path.prepend_dir
f3d9587c30 Fix #directory in the toplevel
3d52bcad84 Rename Load_path.add to Load_path.append_dir
a741669681 Add repro case from Issue 10442
cff9abaabf Typos
0733be3c7c Changes: merge duplicated `Type system` heading for 4.13.0 (#10443)
83eca0dabc Remove unnecessary surrounding parentheses for immediate objects (#10441)
a208a29bcc Merge pull request #10438 from Zett98/runtop-eval-option
d0d877a9c3 added -e eval option for runtop and natruntop
1dce5eb240 first commit after branching 4.13
f779a50353 last commit before branching 4.13
dd7927e156 bump magic numbers for 4.13 (and trunk)
0ba253df46 stdlib: add Array.fold_left_map (#9961)
c19f9f7588 document @since for Format.stag (#10439)
f9066346fd improve documentation of live_words field of gc stats
1e15c0deaf Avoid overwriting closures while initialising recursive modules (#10205)
970407162b make CI bootstrap script compatible with Macos
da28d0e9bf inria ci: update remove-sinh-primitive.patch
7eaf05b3cf Added some missing C99 float operations (#944)
b5821db337 Add Random.full_int (#9489)
efbd3595c1 changes: fix issue number
cd65210491 Merge pull request #10430 from Drup/print_bytes
ef6b4b0c9a Add missing `Format.pp_print_bytes` function.
400540d393 Make build_other_constrs work with names instead of tags. (#10428)
1037341d8c Merge pull request #10408 from dra27/sys-time
9b0847d3e2 Merge pull request #9919 from dra27/tidy-prims-in-headers
efef92afec Fix caml_sys_time on Windows
756dd27995 Allow CSE of immutable loads across stores (#9562)
0340410346 Inria CI "main" script: do not lower priority on macOS
fd59303165 Simplify tools/Makefile (#10265)
600aad5d29 Merge pull request #10379 from lthls/remove-pidentity
fe96962fd4 Merge pull request #10252 from stedolan/unboxed-as-kind
ce2810b40a review
dc08b85b16 Remove primitives Pdirapply and Prevapply
2bc428664f Remove the Pidentity primitive
4ff6bc40bd review fixes
06ede5fdf5 Bootstrap
1820718aa1 Move type_unboxed.unboxed into type_kind
bd9ec5d198 Better compaction heuristic (#10194)
1a4fda312d dune: add asmcomp/dataflow.ml
f0ecd9b8aa Merge pull request #2245 from chambart/bytelink_order
da5fd2dd32 Improve error message for link order error in bytelink
6e3c90dfea Add %frame_pointers (#10419)
9b887df04b Spilling and reloading: avoid behavior exponential in loop nesting (#10414)
a7a5dbbb4e Merge pull request #10420 from stedolan/filter-arrow-comment
bba647eb93 Merge pull request #10416 from Octachron/detect_rec_in_show
846a2803d7 Fix typo
5eddf636c5 Merge pull request #10417 from dra27/10405-bootstrap
a3c655a00d Bootstrap
22e3e03f82 toplevel: detect recursive definitions in #show
5fbd6e15c8 Restore the tree after checking labelled modules
5e45b2e9fa Add a generic backward dataflow analyzer and use it for liveness analysis (#10404)
a6f9c4461d Merge pull request #10412 from gasche/env-store-datatype
25e376be04 Env: split store_type by adding store_{constructor,label}
9ffd97f3ee Merge pull request #10405 from trefis/subst-locs
7feef2d660 Merge pull request #10401 from Octachron/signature_group_2
8ce8bff9db Merge pull request #9809 from dra27/complete-suspicion
91285e0ad5 signature_group review: more symmetric interface
3ecc6cda20 Merge pull request #10135 from dra27/faster-flexdll
79f6c2228b Add {Int,Int32,Int64,NativeInt}.{min,max} (#10392)
e51a8a92eb review: Changes
5e7bcdb4d8 review: with_constraints delete or replace one item
b6ff5baeff signature_group review: next + unfold
4060c05ea0 Eliminate the ocamlopt not found error
08f9a8660d Use native flexlink during the build
23cae5b8a4 Use ocamlopt-compiled flexlink in testsuite
474255bb0e Control the flexdll bootstrap with configure
c555a5bd6e Allow bootstrapping flexdll for the Cygwin ports
375b7cd4bb Overhaul flexlink binary locations during build
b96040decb Always bootstrap flexdll if the code is present
731440fe59 Automatically bootstrap flexdll
5f87ca619c Bootstrap flexlink without partialclean
ea2b165323 Merge pull request #10406 from kit-ty-kate/fix-ocamldep-help
95e24f2429 Update the changelog
abb50647d2 Partially reindent Makedepend.run_main
a7e97d85ae Fix ocamldep -help
d0a466062f make depend
8fb2de19fb Changes
3b1b916a40 update test
fae4bbc0a4 update locations for destructive substitutions
09f9db599c bugfix: with constraints and #row component
49528a6944 Merge pull request #10307 from nchataing/env_refactoring
bb052b09e0 fix the dune build after #10170
5d26dfc724 Refactor type_descriptions in the typing env
099b86a046 Merge pull request #10170 from antalsz/pr-better-error-types
e48d97fed2 Merge pull request #10289 from Zett98/do_not_print_options_in_usage_message
00dfb07866 Fix formatting issues
e16653e908 Add missing copyright headers to `errortrace.ml{,i}`
7157ce2fc2 Add Changes entry for #10170
c8d8ef7ae2 Fix names of arguments to `Ctype.{,raise_}scope_escape_exn`
1d2ad7742d Replace `report_error` and `trace_format` with 3 separate functions
ee9dd52d42 Return `prepare_expansion_head` to the global scope
16564969f6 Change `Printtyp.trace_format` into a GADT and expose it
bd030c0e12 Address all the excellent reviewer suggestions
e87be39194 Maintain more structural information in type-checking errors
14ff896d91 Add tests before improving the internal type error representation
b5a5f01383 Fix minor bug in `Ctype.full_expand` and document invariant
6da9c31652 dont print options in usage msg
e0745c14e6 more documentation
72a87e1989 update Changes
c52f1d06c2 with constraint: ignore ghost components
9ce0b9b364 Signature_group: replace_in_place
206fbfa516 include shadow all items in a group:
e3af864038 Signature_group: ghost-aware iteration over signatures
2427ba7cb6 [refactoring] Typemod.simplify: use List.filter_map for readability
9107f181ca floats.c: small optim of caml_{frexp,modf}_float (#10398)
07ab35957d A little precheck environment info
c04c072537 Merge pull request #10146 from damiendoligez/ocamltest-move-test-block
c4c637d9f3 Tune i386 test to be 32-bit mode of x86-64
fb4141cb59 change keword for postponing test block
2a4aa7a324 Merge pull request #10396 from Octachron/revert_list_printing
f6edb38adc Revert "Merge pull request #9336 from Octachron/mundane_list_printing_fix"
aec3ecbfc7 Test evaluation order for for loops (#10394)
62d89c046b Merge pull request #10384 from dra27/tabular
aef2562713 Merge pull request #10390 from gretay-js/update_dune_build
fb89e81169 Merge pull request #10385 from Octachron/6985_more
4c6ff22df8 Add emitenv and cmm_invariants to dune file
5776b22b15 partially fix the dune build
257c4d88d1 mtype: remove ghost row types from strengthened signature
6275c0ccda Fix handling of exception-raising specific operations during liveness analysis (#10387)
5152c5249f Refactor Makefile.compilerlibs
265467c1b6 Harden Makefile TAB rules in check-typo
a376cbf8f1 Have update_scope_rec only recurse in principal mode (#10383)
04e9f14aec Convert erroneous tabs to whitespace
f31d422285 Don't emit tabs in prims.c
53890ca007 Refactor one ifeq not to use continuation
118b17afbf Change the configure message not to contain tabs
9280a4b456 The invariants.cmm test should be run only when the native compiler is enabled
159db72136 Merge pull request #1400 from lthls/cmm_invariants
ed17834964 Add David Allsopp as reviewer
49b3574e5f Changes
6ff3561fba ocamltest: add the codegen_exit_status variable
9818f7aaa2 Added fold_{left, right}, exists and for_all to String/Bytes (#882)
b6ef1efa2a Merge pull request #10352 from gasche/Seq.concat
88676f6db6 Use build_config.h instead of -DOCAML_STDLIB_DIR
95d5ce027a Set LC_ALL when calling cygpath in configure
5133961359 Use correct canonical name for Cygwin64
620f2cb93c use a marker at the top when the test block is at the end of the file
ebddee737b ocamltest: allow TEST block at end of file
a672887fe0 Seq.concat: 'a t t -> 'a t
85c2012021 Merge pull request #10376 from dra27/output-complete-obj-msvc
e4b6bfa06c Merge pull request #1819 from shindere/bootstrap-doc-update
90def1a36a BOOTSTRAP.adoc: mention an alternative way to test the bootstrap
60039cd4c1 Modernise Ccomp.expand_libname
0b39a30939 Merge pull request #9632 from lthls/opam-incremental-builds
ca9b5f991e Changes
34fd9d060b Update expected test
ae9555bbb1 Merge pull request #10377 from dra27/ocamltest-quoting
8b1bc01c31 Set up execution environment before launching a CI script
5260392143 Merge pull request #10366 from shindere/ocamlrun-build-variable
7c46b03e04 Update Changes
55b33ba5b3 Build system: simplify the compilation of the caml-tex tool
54a33a3781 Clarify the way `make runtop` works.
2f4f677f59 Add a Changes entry
70e21f77fc Introduce the NEW_OCAMLRUN build variable
e0c77200c3 Build system: use boot/ocamlrun rather than runtime/ocamlrun in some places
2a9c7b61c8 Build system: rename the CAMLRUN variable to OCAMLRUN
ff32380dd6 Build system: replace suffix rules by pattern rules
687a2e8d82 Merge pull request #10373 from Octachron/existence_precedes_essence
98430d1a8b Run tests/output-complete-obj/test.ml on Windows
637b9fca96 Call the partial linker correctly on msvc
d60da30752 Support both quoting styles in ocamltest
64044dd668 tweak error message for unknown constructors or fields
90d52931b5 Updates and fixes
95e55306a3 Update to take opam-custom-install into account
ddd70be4a7 Document incremental build solutions with opam
eaf2aaf96d Fix #10338: Translcore.push_defaults does not respect scoping (#10340)
36042d0623 output-complete-exe: do not generate .cds file (#10371)
f916c7b736 Merge pull request #10360 from EduardoRFS/refactor-format-of-tpackage-merge
4f4e46af44 Remove the availability analysis (#10355)
bbbe9e559b Disable manual build temporarily
c8abad8526 Merge pull request #10303 from dra27/fix-10302
24d7f3bde8 Introduce per-function environment for Emit (#8936)
430c134435 Merge pull request #10368 from dra27/bootstrap-docs
7882a4ee27 Merge pull request #8929 from Octachron/printtyp_fix_nested_recursive_definitions
a09a84f9a7 Merge pull request #10365 from shindere/build-system-cleanup
c0a233a5b5 Update BOOTSTRAP.adoc for changing primitives
b9acbd6711 Merge pull request #10363 from Octachron/ocamldoc_entities
207035bcc7 fix printing of nested recursive definitions
01b32afefa Build system: remove now useless line from Makefile.config.in
2264171349 Merge pull request #10327 from shindere/ocamltest-files-subdirectories
6422902a9c ocamldoc: escape <, > and & in html backend
94faefd5da Add convenience pretty printer for `Either.t` (added in 4.12) (#10242)
f7974c9efa unify field name and type on Tpackage
2217ed8fd5 Merge pull request #10045 from dra27/csharp-mingw
7b44b5323a Add a Changes entry
e5baa87996 Get rid of the setup-links.sh script in the tool-ocamldep-modalias test
9256944766 Use the copy action in the tool-ocamlopt-save-ir/start_from_emit.ml test
d841571cac Use the copy action in the tool-ocamldep-modalias/main.ml test
b16c2f5ff1 Use the copy action in the missing_set_of_closures test
43ede500f9 Use the copy action in the opaque/test.ml test
c5a82d9525 ocamltest: implement a copy action
c9181a1e84 no-alias-deps/aliases.ml: rename b.cmi.invalid to b.cmi
823a8e9e7f Slightly simplify the tool-ocamldep-modalias test
8b565ab16b Rewrite the typing-missing-cmi test to use ocamltest's subdirectories variable
ddd910d96a Rewrite the tool-ocamldep-shadowing test to use ocamltest's subdirectories variable
6d69290746 Rewrite the missing_set_of_closures test to use ocamltest's subdirectories variable
97198d62f7 Rewrite the opaque test to use ocamltest's subdirectories variable
e8a1b21928 Rewrite the lib-dynlink-private test to use ocamltest's subdirectories variable
66297e6d6b Put lib-dynlink-private in a correct form
3804a99a82 Slightly simplify the lib-dynlink-private test
2e5a4d02b3 Rewrite the lib-dynlink-pr4839 test to use ocamltest's subdirectories variable
ad828c2525 Rewrite the lib-dynlink-pr4229 test to use ocamltest's subdirectories variable
da1920e247 Rewrite the lib-dynlink-native test to use ocamltest's subdirectories variable
e8cee9623e ocamltest: introduce the subdirectories variable
7c0d049f85 ocamltest: rename the files variable to readonly_files
48d800cf57 Fix #8575: Surprising interaction between polymorphic variants and constructor disambiguation (#10362)
848d54d45c Merge pull request #10358 from lpw25/tbl-in-load-path
2af9bd9b48 Add Changes entry
0f83d55f25 Use hash tables for the load path
787624ac2a Fix test to avoid building manual in forks
430c60ee71 Test file (WIP)
2a969c01c9 Ensure installed stdlib artefacts have correct case (#10301)
d50339d4a1 Merge pull request #10354 from xavierleroy/specific-operations
94e74742e1 Enable Cmm invariants on some CI runs
7bac5a91f8 Add --enable-cmm-invariants configure flag
846735684f Treat Ialloc_far as not pure and susceptible to raise
15e635462b Fix handling of exception-raising specific operations during spilling
8bf201b8c7 Refactor Proc.op_is_pure and fix Mach.operation_can_raise
b7bc826e4b Merge pull request #10219 from Octachron/printtyp_explicit_syntactic_groups
f823915210 Merge pull request #10357 from dra27/testsuite-cygpath
c46389154e Merge pull request #9957 from stedolan/remove-enforce-constraints
89a489a966 review: comments and constant propagation
1ae91bb361 Add cmm-invariants to OCAMLPARAM
5a83b92072 Remove Ctype.enforce_constraints
9f29f9a67f Remove a call to enforce_constraints when checking GADT patterns
1425135c66 Remove a call to enforce_constraints when checking type declarations
da95d2863a Remove a call to enforce_constraints when constructing types
71fb042b3f Fix .depend
d504b58650 Add Changes entry
9b2c634513 Add an optional check for invariants on the Cmm representation
091bff3376 Merge pull request #10225 from EduardoRFS/fix-dune-build
3c29f13eaa dune: fix main.exe and optmain.exe build
9272ef4ad3 dune: fix main.bc and optmain.bc
8604cbe8b2 dune: fix ocamltest_config.ml
d3643c6813 remove Typecore.create_package_type unused helper (#10356)
f20c30e15d Fix testsuite MKDLL for bootstrapped flexlink
d12738a234 Merge pull request #10243 from dra27/check-configure
eed1110e6a make update_scope recursive while keeping trace_gadt_instances (#10277)
77115193cc Merge pull request #10297 from dra27/ocamltest-rm_rf
178cca1bab Don't follow symlinks in rm_rf
72f76f40b4 Merge pull request #10309 from dra27/win32unix-errno
856b7251a1 Fix ocamltest's rm_rf function w.r.t. symlinks
c0c241e073 Merge pull request #10346 from dra27/fix-makefiles
38fb9e0a94 Fix mingw-w64 DLL support with binutils 2.36+ (#10351)
6da8fd5a33 Remove slightly gratitutous make macro uses
0df6bc9e05 Correct rules in Makefile.compilerlibs
4bcd7e6d0f Merge pull request #10245 from stedolan/remove-cyclic-abbrev-check
50a7877f72 Merge pull request #10310 from dra27/restore-spacetime-option
4dca3c3c6b Merge pull request #9336 from Octachron/mundane_list_printing_fix
adcb23f298 Merge pull request #10350 from dra27/tweak-actions
7931817474 Merge pull request #10341 from gasche/manual-caml-example-in-cmds
be53454367 toplevel: identify list by their types
93a9b2dc91 Fix destroyed_at_c_call on RISC-V (#10349)
60f9d2f177 Fix #10271 using Env.remove_last_open (#10308)
3abccf87f7 Do the homework promised in #10199 (#10347)
3a42a43b15 Fetch from the correct place when deepening
422eb95503 Fetch 50 commits in the full-flambda workflow
e06ba31282 Add missing Iopaque case to Proc.op_is_pure for RISC-V (#10345)
791f8195fc Merge pull request #10217 from damiendoligez/fix-9853
2c85ab7344 warning cli: tweak single-letter warning deprecation (#10312)
b720b583a1 Keep Sys.opaque_identity in Cmm and Mach (#9412)
e3a3d3d4b7 Merge pull request #10344 from dra27/config-var-doc
bb74b8d35e Merge pull request #10336 from dra27/test-manual-in-gha
3047ad8aa9 Fix realpath test (#10326)
2fee7a8e54 Fix bytecode compilation of Lsend(Cached, _) (#10325)
06735ef77e Merge pull request #8732 from Octachron/remove_fixed_type_error_2
505d4ec926 Merge branch 'trunk' into fix-9853
75902a8713 Merge pull request #10140 from gasche/require-full-labels
b92682336d Merge pull request #10097 from gasche/lazy-map
9c28b74749 manual/src/Makefile: simplify the build of warnings-help.etex
d2e6320a24 manual: enable {caml_example} in cmds/
782f6df808 escape @ in etex files
2e25fec094 Merge pull request #10335 from shindere/remove-makefile.tools
d3f6c33059 Correct documentation of -config-var
7c379d3787 review: Typedecl.set_fixed_row -> set_private_row
7d827537fb Limit the automated build on push to ocaml/ocaml
09444ef6a0 Update Changes
08c7d829e0 Labellize Typedecl.transl_with_constraints
62c4b0a2ba explicit and rename Bad_fixed_type errror
3d93b0b3fa more documentation for map_val, suggested by Stephen Dolan
9ee6e6d829 move opportune_map into the already-forced section, rename into map_val
a93d732587 adapt the manual to discourage labels-omitted applications
b3ad2a4921 fix the testsuite
d3fc1261a7 enable warning 6 [labels-omitted] by default
57a1e33b49 Build system: provide a default value for OCAMLLEX
4784566aeb Remove Makefile.tools, now unused.
6130791b9a Stop using Makefile.tools in the makefiles used to build the manual
09c940f579 Lazy.{map,opportune_map} : ('a -> 'b) -> 'a Lazy.t -> 'b Lazy.t
812f5462a0 lazy.mli: create documentation sections, make some explanations more precise
958e2cd60e Ensure PDF manual is tested
359e1c4929 Merge pull request #10334 from stedolan/fix-32b-gc-debug-build
61722caf8b testsuite/Makefile: stop using Makefile.tools
c63cb262ed testsuite/tools/Makefile: stop using Makefile.tools
a9511cedd8 testsuite/lib/Makefile: stop using Makefile.tools
4c77dca743 Makefile.tools: remove the unused OCAML variable
05f03c9504 Makefile.tools: get rid of the unused OCOPTFLAGS variable
768e9568db Makefile.tools: remove the unused DIFF variable
980e598b31 Makefile.tools: remove the unused OCAMLMKLIB variable
2fd5c5408f Makefile.tools: remove the unused DUMPOBJ and OBJINFO variables
a4d01fd716 Makefile.tools: remove the unused UNIXLIBVAR variable
ec409d3da7 Makefile.tools & co: get rid of the TOPDIR variable
e3f567c788 Only build the entire manual if it changed
1dbeaa9033 Remove the unused CTOPDIR build variable
7d892f5893 Just test the web manual
9218345d0d Merge pull request #10333 from dra27/fix-swedish-build
2fd7ef283e distclean should also clean (manual build system)
9f68013abe Plumb in and fix manual/Makefile distclean
20f5a95be1 Missing items from manual/src/Makefile clean
98734ac2cf Fix html_processing Makefile for parallel make
0705393be8 Fix parallel build
9a4c0fad9c Test building the manual in CI
f52adb3011 Fix an assertion in gc_ctrl.c that's wrong in 32-bit builds
064f231421 Fix manual build
5f90caf6e2 Generate lambda/runtimedef.ml correctly in Swedish
44f3e7ac16 Merge pull request #10322 from COCTI/strong_trail
ef296e49b2 No longer mark Windows Unicode runtime as experimental (#10318)
69a573f15d Changes in simplif.ml - removal of try_depth ref (#9827)
2226776f13 documentation: configuration switch for an odoc documentation mode (#9997)
d4dd566e3e Semantic diffings for functor types and applications (#9331)
8f40388ffd Use s_table rather than s_ref
3699d6f8a7 replace backtracking trail by a normal ref
3ef9ce800f Wrong branch name used for deepening fetch in GHA
4764325f1b Merge pull request #10320 from emillon/doc-callbackn-clarification
2126e82e7e Doc: clarify that `caml_callbackN` takes a C array
cce52acc7c separate constraint-solving (unification) part of type_pat into solve_* (#10311)
d10908cc51 Merge pull request #10317 from UnixJunkie/patch-2
470acf6b90 typos in HACKING.adoc
f0a1be6f05 Merge pull request #9407 from Anukriti12/interface_not_found
fd247ba17b added warning for missing mli interface file
bef455a872 Merge pull request #10047 from dbuenzli/unix-realpath
152ea69612 Merge pull request #1636 from oandrieu/stdlib-exn-documentation
9321e28567 Merge pull request #10232 from lpw25/unused-labels
6462df79a8 Merge pull request #10300 from sanette/triangle
eb91b328ea Fix #10298: Incorrect propagation of type equalities in functor application (#10305)
31dcf5b5bd Merge pull request #10306 from MisterDA/sock-wsa-map-error
fcc9a2bd6e Restore the --enable-spacetime option (for error)
364ffedfed Adjust changes.
ea07e45ea6 On Windows realpath strip the \\?\ prefix.
7ca6ecf155 Allow Windows realpath to work on open files.
798f6f1770 Allow Windows realpath to work on inaccessible files (like POSIX).
03a4519c8e Windows: drop the fileapi.h header. It's not needed.
21971ba839 Fix realpathing directories on Windows as suggested by @dra27.
58955bff28 Add a test.
f8feb51992 Address a first round of comments.
62b946efae Add Unix.realpath.
23660a5cfe No system defines WSAENFILE
bb1aa15488 Correct typo in WSAENAMETOOLONG
332c0a9212 Map WSA error code to Unix errno for sockopt and getsockname functions
aa28afb11e Set errno correctly for Win32 Unix.(f)truncate
59007f708c Return EBADF on error in win_filedescr_of_channel
1a2370e333 Add Changes entry
47a14b4a36 Add warning for unused labels
3a128ca0e1 Remove unused labels
f7ae40d9f8 Add tests for unused labels
76859caa15 Make the add_ note clearer
55e2cf9a82 Use @raise
63f7876e61 Adjust documentation of (^) and add to String.cat
68fa762fb2 document some exceptions raised when array/string get too large
24876b0900 typo in the doccomment for Buffer.sub
0bc918135a document some Buffer functions that can raise Invalid_argument
970797fa38 Ensure DS-form offset is a multiple of 4
406e522d92 Revert "Fix lib-string/binary.ml test on ppc64"
414bdec9ae Fix lib-string/binary.ml test on ppc64
b30a7c66b0 Stabilise the output of bigarrcml test
1a59019bb4 Merge pull request #10285 from dra27/coldstart-harder
416100442c change entry for #10295: mention reviewers
5ab5e50a82 adjust bullet size and pos
3fb3bd7fff fix for a row-arity mismatch in pattern-matching compilation (#10295)
6f0a02034c replace triangles for items by disc or diamonds
4c484b2bc1 Merge pull request #10282 from sanette/bug#10254
6e188084b2 use preg_anyspace for detecting Chapter and Part
0d4d90b159 prepare for hevea UTF sequence 2004-202F (PR#61)
98a1212de7 remove \n
4db33fed3a Merge pull request #10207 from Octachron/deprecated_single_letter_warning
89fe8c6273 typo
27eb86f206 add more white-space regexp
619d06823a Changes
2a27200a48 warning deprecation: full normalization of parsed tokens
7cb592530d compiler interface: deprecate single letter warning
c0eea1b229 Fix caml_tex warning setting
a7f80409d6 stop using single letter in warning settings
357654891e Merge pull request #9448 from dra27/missing-bytes-string
cbe61cceef Merge pull request #10221 from dra27/ba-win
694db9d5b9 Changes fix
672f445a0a make webman compatible with Hevea 2.35
6916136563 document some types in typecore.ml (#10292)
da9791ec91 Merge pull request #10288 from maranget/bowtie-for-event
236485bdbd Manual, html, avoid warning. Slight formatting change.
ea0f9f0388 Manual, html, inactive definition for latex-only command.
171bf31825 Manual, html, replace illegible symbol by \bowtie.
b00a79adb4 Preserve evaluation order in simplify_exits (#10284)
c877ef8d02 Remove $(LIBFILES) from boot/ in coldstart
f92c0a73f1 Suppress sanitizer message for known memory leak
bcffaef30f Reflect the status of the naked pointer checker in the exit code (#10171)
fc9534746b Dynamically allocate the alternate signal stack (#10266)
7a9a8ddcee Merge pull request #10247 from johnwhitington/refman-examples
2046041d5e Merge pull request #10274 from garrigue/split-Ppat_or
cb6e79c3fe manual: unicode character declarations
21e741403c indent normally
c4a03dc86c please @gscherer
8981f8d0d7 style
ddb34f0d72 Split Ppat_or case Typecore.type_pat
9feee5b8b7 Merge pull request #10270 from dra27/fix-undefined
229a94ac54 Merge pull request #10269 from shindere/enhance-manual-readme
ba1abcebea Variation on a theme of Makefile.docfiles
b2a0f4bc86 remove assertion that is not always true
db7680db28 Do not preserve fragments when compacting. Duh.
f47a498dec add assertion
861b581894 fix for #9853
dd5455fa9e manual/tests/Makefile
abd1fb3ed5 Move the undefined make variables to other-checks
85842cd8da Detect unused Makefile variables in workflow
05787308d4 manual/README.md enhancements
cb22f293e9 Define $( ) to clear unused variable warning in make
de3a7962c6 Require Makefile.common before stdlib/StdlibModules
8b8168ee09 Typecheck x|>f and f @@ x as (f x) (#10081)
8c92f979ec manual: document unary extension operator (#10263)
297fbe90aa Fix GC message when shrinking mark stack (#10264)
a477306691 Optimise Int32.unsigned_to_int on 64-bit (#10244)
685f14c695 Merge pull request #10260 from dra27/hygiene-fixes
cb0ae7b93c Use GitHub Actions outputs instead of cookies
b2a0c6d551 Merge pull request #10227 from shindere/factorize-ocamllex-ocamlyacc-build-rules
7868f7850e Merge pull request #10213 from damiendoligez/fix-best-fit
be34be3af2 Build system: deduplicate the rules used to generate the lexers and parsers
4c3c947210 Fix computation of FETCH_HEAD in GitHub Actions
6b41eac8d3 refactor initialization code for the allocation policy
9fce0f6fc7 Makefile{,.tools}: make it possible to override ocamllex
499b1b3cb7 tools/Makefile: make it possible to override ocamllex
2f2113223e ocamldoc/Makefile: Use generic rules to generate lexers and parsers
6f647a881c {debugger,lex}/Makefile: make it possible to override lexer andparser generator
fbb18d5c0a ocamltest/Makefile: make it possible to override the lexer and parser generators
8aeb57fcf0 Build system: rename the OCAMLLEX_FLAGS to OCAMLLEXFLAGS
7f937238ba Use CAML_NAME_SPACE
34606a59e2 Add Bytes.{starts,ends}_with
8574ad463e Add binary integer decoding functions to String
011be235e8 Add Bytes.split_on_char
28dc873db7 Add String.cat as dual of Bytes.cat
d63347d248 Add String.{of,to}_bytes
50392b101d Add String.empty as dual of Bytes.empty
500d8dc829 Merge pull request #10150 from dra27/one-with-log
c51af74ad8 Fix oo examples
40caa561bf Merge pull request #10255 from MisterDA/otherlibs-use-option-macros
2ea972b151 Merge pull request #10257 from MisterDA/cloexec-unimplemented-socketpair
daa944c888 Merge pull request #10256 from MisterDA/hacking-doc-remove-travis-add-gha
3b4ae00a59 Use 4.12 convenience option macros in C stubs
041ef0f70a Replace Travis with GitHub Actions in documentation
330b36670d Update reviewers
fff6a16488 Spacing
d1e2e41e59 Add forgotten cloexec parameter to un-implemented socketpair
aced66c380 Merge pull request #10133 from Octachron/with_module_types
6a25084847 update changes
1b851235ff review: merge_constraints remove recursive knot
97e3964aaf review: lift check_modtype calls
d6c0c15732 review: add a parsetree invariant
cfce291c82 review: begin...end for a then
9bac0f1e5a review: iterator extensions
dbcea3c3ee review: error message
f44262484c review: printing parsetree
cc63969644 review: rename Pwith_module_type* to Pwith_modtype*
1cf108957f Final fixes for first round of reviews
8ea2b113ef machine epsilon, missed this one earlier
649345db65 Address some of @gasche's pattern suggestions
f23e382ecf Merge pull request #1020…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants