[4.13 bug] %apply externals require an arrow type as first argument regardless of the actual type #10450

kit-ty-kate · 2021-06-09T15:05:14Z

The following test case fails with OCaml 4.13 but worked perfectly before:

module M : sig
  type t
  val empty : t
end = struct
  type t = unit -> unit
  let empty () = ()
end

external test : M.t -> unit -> unit = "%apply"

let () = test M.empty ()

$ ocamlc test.ml
File "test.ml", line 11, characters 14-21:
11 | let () = test M.empty ()
                   ^^^^^^^
Error: This expression has type M.t
       This is not a function; it cannot be applied.

M.t is an abstract type but internally a function, which to me should be valid. However OCaml 4.13 somehow treats it as a function and rejects this code with a rather cryptic error message.

Looking into the PRs merged in 4.13 so far, it looks like this was introduced in #10081 (cc @alainfrisch @garrigue)

The text was updated successfully, but these errors were encountered:

kit-ty-kate · 2021-06-09T15:06:21Z

This was originally detected in the wild in bap-core-theory:

# File "lib/bap_core_theory/bap_core_theory_basic.ml", line 44, characters 32-37:
# 44 |   let small s x = int s Bitvec.(int x mod modulus (size s))
#                                      ^^^^^
# Error: This expression has type Bitvec.t Bitvec.m
#        This is not a function; it cannot be applied.
# Command exited with code 2.
# make: *** [Makefile:8: build] Error 1

gasche · 2021-06-09T15:18:35Z

For the record, I find that the new behavior that breaks this program is reasonable: %apply is only sound if its first argument is a function, and using the type-system to enforce this is in the spirit of the OCaml language.

The program could be fixed by defining the test function within the module itself, instead of outside the abstraction boundary. (You may then not be able to export the fact that it is an external, only a value -- I'm not sure -- which may have performance application, but I would wait for observable differences in performance before worrying about this issue.)

kit-ty-kate · 2021-06-09T16:31:46Z

@gasche that's a fair point.
@ivg was there some particular performance implication in exporting the external outside of Bitvec?

ivg · 2021-06-09T16:35:22Z

was there some particular performance implication in exporting the external outside of Bitvec?

Yes, significant. The code wasn't inlined without it. With external the code Bitvec.(x + y mod m) together with Zarith was compiled to native machine code, i.e., to two machine operations. Without it it was two calls to OCaml function.

ivg · 2021-06-09T16:51:56Z

For the reference, here is the library that uses the trick

(** [(x <op> y) mod m] applies operation [<op>] modulo [m].

    Example: [(x + y) mod m] returns the sum of [x] and [y] modulo [m].

    Note: the [mod] function is declared as a primitive to enable
    support for inlining in non flambda versions of OCaml. Indeed,
    underneath the hood the ['a m] type is defined as the reader monad
    [modulus -> 'a], however we don't want to create a closure every
    time we compute an operation over bitvectors. With this trick, all
    versions of OCaml no matter the optimization options will inline
    [(x+y) mod m] and won't create any closures, even if [m] is not
    known at compile time.
*)
external (mod) : 'a m -> modulus -> 'a = "%apply"

where type 'a m is modulus -> 'a hidden under the abstraction boundary.

The program could be fixed by defining the test function within the module itself, instead of outside the abstraction boundary.

This won't work in the real application. The whole idea is to hide that 'a m which denotes a computation modulo m. As it was noticed before, exposing the fact that mod is the apply primitive enabled inlining (even without flambda) and generated optimal code. I am currently looking for the black box impact on bap in case if the function is turned from external to a normal val.

lthls · 2021-06-09T18:03:21Z

It's not very satisfying, but you can get around the new restriction by using the "%identity" primitive instead:

type 'a mod_result = modulus -> 'a
external (mod) : 'a m -> 'a mod_result = "%identity"

I think this should give you the same result as with "%apply". The extra type definition is a work around the check for primitive arity (parentheses don't seem to be enough).

ivg · 2021-06-09T18:26:41Z

I think this should give you the same result as with "%apply". The extra type definition is a work around the check for primitive arity (parentheses don't seem to be enough).

Not sure if it will work. I still need to apply as 'a m is a function and I need to apply it. I thinking so far to use

type 'a m = private modulus -> 'a

Which I presume will work. Will it?

ivg · 2021-06-09T20:13:13Z

let sum8 x y = Bitvec.((x + y) mod m)

With external, it is compiled by 4.09.x on amd64 to,

	subq	$8, %rsp
.L106:
	movq	camlExample@GOTPCREL(%rip), %rdi
	movq	(%rdi), %rdi
	movq	%rdi, (%rsp)
	call	camlZ__add_92@PLT
.L100:
	movq	%rax, %rbx
	movq	%rbx, %rdi
	call	ml_z_sign@PLT
	cmpq	$1, %rax
	jl	.L104
	movq	%rbx, %rdi
	movq	(%rsp), %r12
	movq	%r12, %rsi
	call	ml_z_compare@PLT
	cmpq	$1, %rax
	jl	.L105
	movq	%r12, (%rsp)
	jmp	.L104
.L105:
	movq	%rbx, %rax
	addq	$8, %rsp
	ret
.L104:
	movq	%rbx, %rax
	movq	(%rsp), %rbx
	addq	$8, %rsp
	jmp	camlZ__logand_131@PLT

Notice the absence of allocations.
When I remove the external annotation, the compiler is no longer able to see through it and allocates a closure,

camlExample__sum8_205:
	subq	$8, %rsp
.L108:
	movq	camlExample@GOTPCREL(%rip), %rdi
	movq	(%rdi), %rdi
	movq	%rax, %rsi
	movq	camlBitvec@GOTPCREL(%rip), %rax
	movq	592(%rax), %rdx
.L109:
	subq	$48, %r15
	movq	caml_young_limit@GOTPCREL(%rip), %rax
	cmpq	(%rax), %r15
	jb	.L110
	leaq	8(%r15), %rcx
	movq	$5367, -8(%rcx)
	movq	camlExample__fun_220@GOTPCREL(%rip), %rax
	movq	%rax, (%rcx)
	movq	$3, 8(%rcx)
	movq	%rsi, 16(%rcx)
	movq	%rbx, 24(%rcx)
	movq	%rdx, 32(%rcx)
	movq	(%rcx), %rsi
	movq	%rdi, %rax
	movq	%rcx, %rbx
	addq	$8, %rsp
	jmp	*%rsi
.L110:
	call	caml_call_gc@PLT
.L111:
	jmp	.L109

lthls · 2021-06-09T23:05:42Z

I think this should give you the same result as with "%apply". The extra type definition is a work around the check for primitive arity (parentheses don't seem to be enough).

Not sure if it will work. I still need to apply as 'a m is a function and I need to apply it.

a mod b is translated as ((mod) a) b then typechecked.
During typechecking, if (mod) is the apply primitive then (starting from 4.13) the type annotation on (mod) is discarded (I think) and the types for a and b are inferred as if the expression were a b, which requires a to be typable as a function. Otherwise the normal typechecking algorithm is used, matching the type of (mod) from its declaration with the type of its arguments.
After typechecking, applications are uncurrified and the expression is translated into (mod) a b. Then, if mod is the apply primitive, this whole application is translated into a b, while if mod is the identity primitive it is translated first back into ((mod) a) b, then (mod) a is translated into a, so you get back a b.
All of this occurs before any inlining considerations, so both approaches are indistinguishable by the backend and will generate the same code.

I thinking so far to use
type 'a m = private modulus -> 'a
Which I presume will work. Will it?

If I read the manual correctly (I'm not an expert on this), you would still need explicit coercions everywhere, so I think this doesn't count as working.

let sum8 x y = Bitvec.((x + y : 'a m :> modulus -> 'a) mod m)

ivg · 2021-06-10T12:46:20Z

I tried 4.13.0~alpha1 and got perfect results¹ (with no allocations and all functions inlined) when the mod operator is defined as a simple val. So the fix for us would be just to translate external to val. A win-win!

With that said, it looks like that #10081 changed the way how primitives (externals) work. Before that externals were untyped so that we can write external magic : 'a -> 'b = "%identity", but now some of the primitives, e.g., %apply are typed. This introduces some inconsistency into the language. On the other hand, it concerns an expert and unsafe part of the language that partially reduces the impact of this change (most of the users shall not even know about these features). The only documented part of the extrernal concerns C functions, which are still typed the same.

¹⁾) I am really excited with that!

gasche · 2021-06-10T13:17:47Z

Out of curiosity, does using val degrade performance on other recent versions of OCaml?

lthls · 2021-06-10T13:19:17Z

I'm also curious to know whether the perfect results were obtained with Flambda, Closure (the non-flambda mode), or both. I'm surprised that Closure would actually perform that much inlining.

ivg · 2021-06-10T13:38:46Z

Out of curiosity, does using val degrade performance on other recent versions of OCaml?

Yes and no. On microbenchmarking yes, hundred times slower, as every addition turned into allocation, instead of two machine instructions. On my blackbox test, which was /usr/bin/time -v bap /bin/ls I didn't notice any significant changes (there was some observable performance impact that is < 5%). But other tests, e.g., if we will interpret a binary, which more heavily relies on arithmetic operations, will be used might show different results.

And I only tried 4.09 and 4.13. I can't say from what version the inlining started to work without having to rely on external.

I'm also curious to know whether the perfect results were obtained with Flambda

I think that with flambda we always had good results, since the introduction of bitvec, which was 4.04. The perfect results with 4.13.0 were obtained for 4.13.0+alpha1, which, as far as I can tell doesn't include the flambda option. I can say that inlining appears already on clambda (and rawclamba) level (but not yet present on lambda),

(sum8/206
       (closure 
         (fun camlExample__sum8_206 2  x/208 y/209
           (let
             (prim/232 (field 0 (read_symbol camlExample))
              prim/233
                (let
                  (arg/220 y/209
                   arg/219 x/208
                   funct/222 (field 74 (read_symbol camlBitvec)))
                  (closure 
                    (fun camlExample__fun_223 1  arg/221 env/228
                      (let
                        (y/229 (field 3 env/228)
                         x/230 (field 2 env/228)
                         x/231 (apply* camlZ__add_94  x/230 y/229))
                        (if
                          (|| (< (ml_z_sign x/231) 0)
                            (>= (ml_z_compare x/231 arg/221) 0))
                          (apply* camlZ__logand_139  x/231 arg/221) x/231))) 
                    arg/219 arg/220 funct/222)))
             (apply prim/233 prim/232))) ))

lthls · 2021-06-10T14:39:02Z

The output you've shown looks like what I would expect out of Closure. The inner closure means that you still have an allocation at each call to sum8, which is what you were trying to prevent (and you can see that the last apply prim/233 prim/232 is not inlined).
Are you sure that this is the result that you wanted ?
To give you an idea, this is the Clambda code I get with Flambda:

un-anf (camlExample__sum8_84):
(let
  (m/300 (field 0 "camlExample__Popaque_38")
   apply_arg/302 (apply* camlZ__add_238  x/299 y/298))
  (if
    (if (< (ml_z_sign apply_arg/302) 0) 1
      (apply* camlZ__geq_643  apply_arg/302 m/300))
    (apply* camlZ__logand_445  apply_arg/302 m/300) apply_arg/302))

lthls · 2021-06-10T14:44:04Z

And here is the code I get with my "%identity" trick, with Closure:

    (sum8/289
       (closure 
         (fun camlExample__sum8_289 2  x/291 y/292
           (let
             (m/296 (field 0 (read_symbol camlExample))
              x/297 (apply* camlZ__add_94  x/291 y/292))
             (if
               (|| (< (ml_z_sign x/297) 0) (>= (ml_z_compare x/297 m/296) 0))
               (apply* camlZ__logand_139  x/297 m/296) x/297))) ))

ivg · 2021-06-10T16:22:37Z

Are you sure that this is the result that you wanted ?

Yep, indeed, I jumped to conclusion too fast, I looked into the closure, which is perfectly inlined itself, but didn't notice that the toplevel function allocates the closure before calling it. Indeed, there's still an allocation happening. Though still better than it was (before that x + y was allocated as a closure and then applied to the modulus function.

ivg · 2021-06-10T16:27:02Z

And here is the code I get with my "%identity" trick, with Closure:

Could you elaborate (or point me to your version or send a patch) on this trick, as I can't get it work.

lthls · 2021-06-10T16:38:56Z

The initial idea was to apply the following patch both in bitvec.mli and bitvec.ml:

- external (mod) : 'a m -> modulus -> 'a = "%apply"
+ type 'a mod_result = modulus -> 'a
+ external (mod) : 'a m -> 'a mod_result = "%identity"

But with type substitutions (available since 4.08, if I remember correctly), we can get a slightly smaller patch:

diff --git a/lib/bitvec/bitvec.ml b/lib/bitvec/bitvec.ml
index 5276462..d2cd5df 100644
--- a/lib/bitvec/bitvec.ml
+++ b/lib/bitvec/bitvec.ml
@@ -63,7 +63,7 @@ let norm {m} x =
    inlining (with any other operator definition the vanilla
    version of OCaml will create a closure)
 *)
-external (mod) : 'a m -> modulus -> 'a = "%apply"
+external (mod) : 'a m -> 'a m = "%identity"
 
 let modulus w = {
   m = Z.(one lsl w - one);
diff --git a/lib/bitvec/bitvec.mli b/lib/bitvec/bitvec.mli
index f4495d8..2f522e0 100644
--- a/lib/bitvec/bitvec.mli
+++ b/lib/bitvec/bitvec.mli
@@ -41,7 +41,8 @@ val m64 : modulus
     [(x+y) mod m] and won't create any closures, even if [m] is not
     known at compile time.
 *)
-external (mod) : 'a m -> modulus -> 'a = "%apply"
+type 'a mod_result := modulus -> 'a (* Work around arity checks *)
+external (mod) : 'a m -> 'a mod_result = "%identity"
 
 module type S = sig
   (** an abstract representation of an operation modulo some number.*)

lthls · 2021-06-10T17:23:35Z

Apparently my trick requires a fairly recent version of OCaml (probably 4.13.0), otherwise (mod) ((+) x y) is simplified into (+) x y too late, so (mod) ((+) x y) m isn't simplified into (+) x y m but into ((+) x y) m, preventing the expected inlining of (+). So it's not a solution for earlier versions of OCaml (I failed to notice that earlier, sorry about that).

ivg · 2021-06-10T19:18:21Z

So it's not a solution for earlier versions of OCaml (I failed to notice that earlier, sorry about that).

That's not a problem! For versions before 4.13 we can use the apply primitive and for versions >= 13 we will be using the identity trick.

It drives me crazy though that we just change mod from being apply to identity (which clearly have different semantics) without changing anything else and get still the same results :)

lthls · 2021-06-11T10:31:34Z

In a way, apply is the identity specialised to function types and eta-expanded. So it's not completely unreasonable for both primitives to behave similarly.

Octachron · 2021-07-16T09:36:00Z

If I understand correctly the conclusion, since 4.13 the %identity primitive is optimized in such way that one can always replace

type ('a,'b) fn
external app: ('a,'b) fn -> 'a -> 'b = "%apply"

with

type ('a,'b) fn
type ('a,'b) ocaml_fn := 'a -> 'b
external app: ('a,'b) fn -> ('a,'b) ocaml_fn = "%identity"

I imagine that we could improve backward compatibility by providing both a %typed_apply and a %apply primitives with the old behavior. However, since we are speaking of an internal feature with an alternative implementation, the increase in complexity does not seem worthwhile and the issue can be closed.

kit-ty-kate · 2021-08-02T15:09:03Z

I feel a bit uneasy about the conclusion (modifying code bases to change %apply to %identity when compiling with 4.13), it doesn't feel quite right. I'm concerned people outside of the opensource community might have code that does the same so even if it might be fine for the bap suite there might be other things that breaks because of this.

If everyone think it's not worth changing this right now, @ivg would it be possible to have a minor release of bap-core-theory (presumably 2.3.1) with this fix?

ivg · 2021-08-02T17:14:03Z

It is in the bap-bitvector package, but yes, I think we can add a patch file to the opam release with a fix.

lpw25 · 2021-08-02T17:46:15Z

I agree that the conclusion is unsatisfying. I believe that no where else do we insist that a particular primitive be given a particular type. It would seem pretty easy to only have the weird new behaviour when the type matches the expected pattern, and then just fall back to the ordinary behaviour for typing a function application.

Octachron · 2021-08-06T15:46:51Z

I have a tentative implementation in #10558 which only applies the new typing rule when the type of the primitives matches exactly the standard one. This fixes BAP compatibility issue without introducing new primitives, which seem nice. Are there any objections to the idea or implementation?

ivg · 2021-08-06T15:56:50Z

Thanks, Florian, this is awesome!

#10450: restore support for %apply and %revapply with non-translucid types

dra27 added this to the 4.13 milestone Jun 9, 2021

Octachron mentioned this issue Aug 5, 2021

#10450: restore support for %apply and %revapply with non-translucid types #10558

Merged

Octachron closed this as completed in #10558 Aug 24, 2021

Octachron added a commit that referenced this issue Aug 24, 2021

Merge pull request #10558 from Octachron/apply_selective_typing_prim

1bd0e66

#10450: restore support for %apply and %revapply with non-translucid types

Octachron mentioned this issue Aug 25, 2021

OCaml 4.13.0, first beta release ocaml/opam-repository#19323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.13 bug] %apply externals require an arrow type as first argument regardless of the actual type #10450

[4.13 bug] %apply externals require an arrow type as first argument regardless of the actual type #10450

kit-ty-kate commented Jun 9, 2021

kit-ty-kate commented Jun 9, 2021

gasche commented Jun 9, 2021

kit-ty-kate commented Jun 9, 2021

ivg commented Jun 9, 2021 •

edited

ivg commented Jun 9, 2021 •

edited

lthls commented Jun 9, 2021

ivg commented Jun 9, 2021

ivg commented Jun 9, 2021 •

edited

lthls commented Jun 9, 2021

ivg commented Jun 10, 2021

gasche commented Jun 10, 2021

lthls commented Jun 10, 2021

ivg commented Jun 10, 2021

lthls commented Jun 10, 2021

lthls commented Jun 10, 2021

ivg commented Jun 10, 2021

ivg commented Jun 10, 2021

lthls commented Jun 10, 2021 •

edited

lthls commented Jun 10, 2021

ivg commented Jun 10, 2021

lthls commented Jun 11, 2021

Octachron commented Jul 16, 2021

kit-ty-kate commented Aug 2, 2021

ivg commented Aug 2, 2021

lpw25 commented Aug 2, 2021

Octachron commented Aug 6, 2021 •

edited

ivg commented Aug 6, 2021

[4.13 bug] %apply externals require an arrow type as first argument regardless of the actual type #10450

[4.13 bug] %apply externals require an arrow type as first argument regardless of the actual type #10450

Comments

kit-ty-kate commented Jun 9, 2021

kit-ty-kate commented Jun 9, 2021

gasche commented Jun 9, 2021

kit-ty-kate commented Jun 9, 2021

ivg commented Jun 9, 2021 • edited

ivg commented Jun 9, 2021 • edited

lthls commented Jun 9, 2021

ivg commented Jun 9, 2021

ivg commented Jun 9, 2021 • edited

lthls commented Jun 9, 2021

ivg commented Jun 10, 2021

gasche commented Jun 10, 2021

lthls commented Jun 10, 2021

ivg commented Jun 10, 2021

lthls commented Jun 10, 2021

lthls commented Jun 10, 2021

ivg commented Jun 10, 2021

ivg commented Jun 10, 2021

lthls commented Jun 10, 2021 • edited

lthls commented Jun 10, 2021

ivg commented Jun 10, 2021

lthls commented Jun 11, 2021

Octachron commented Jul 16, 2021

kit-ty-kate commented Aug 2, 2021

ivg commented Aug 2, 2021

lpw25 commented Aug 2, 2021

Octachron commented Aug 6, 2021 • edited

ivg commented Aug 6, 2021

ivg commented Jun 9, 2021 •

edited

ivg commented Jun 9, 2021 •

edited

ivg commented Jun 9, 2021 •

edited

lthls commented Jun 10, 2021 •

edited

Octachron commented Aug 6, 2021 •

edited