Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.13 bug] %apply externals require an arrow type as first argument regardless of the actual type #10450

Closed
kit-ty-kate opened this issue Jun 9, 2021 · 27 comments · Fixed by #10558
Milestone

Comments

@kit-ty-kate
Copy link
Member

The following test case fails with OCaml 4.13 but worked perfectly before:

module M : sig
  type t
  val empty : t
end = struct
  type t = unit -> unit
  let empty () = ()
end

external test : M.t -> unit -> unit = "%apply"

let () = test M.empty ()
$ ocamlc test.ml
File "test.ml", line 11, characters 14-21:
11 | let () = test M.empty ()
                   ^^^^^^^
Error: This expression has type M.t
       This is not a function; it cannot be applied.

M.t is an abstract type but internally a function, which to me should be valid. However OCaml 4.13 somehow treats it as a function and rejects this code with a rather cryptic error message.

Looking into the PRs merged in 4.13 so far, it looks like this was introduced in #10081 (cc @alainfrisch @garrigue)

@kit-ty-kate
Copy link
Member Author

This was originally detected in the wild in bap-core-theory:

# File "lib/bap_core_theory/bap_core_theory_basic.ml", line 44, characters 32-37:
# 44 |   let small s x = int s Bitvec.(int x mod modulus (size s))
#                                      ^^^^^
# Error: This expression has type Bitvec.t Bitvec.m
#        This is not a function; it cannot be applied.
# Command exited with code 2.
# make: *** [Makefile:8: build] Error 1

@dra27 dra27 added this to the 4.13 milestone Jun 9, 2021
@gasche
Copy link
Member

gasche commented Jun 9, 2021

For the record, I find that the new behavior that breaks this program is reasonable: %apply is only sound if its first argument is a function, and using the type-system to enforce this is in the spirit of the OCaml language.

The program could be fixed by defining the test function within the module itself, instead of outside the abstraction boundary. (You may then not be able to export the fact that it is an external, only a value -- I'm not sure -- which may have performance application, but I would wait for observable differences in performance before worrying about this issue.)

@kit-ty-kate
Copy link
Member Author

@gasche that's a fair point.
@ivg was there some particular performance implication in exporting the external outside of Bitvec?

@ivg
Copy link
Member

ivg commented Jun 9, 2021

was there some particular performance implication in exporting the external outside of Bitvec?

Yes, significant. The code wasn't inlined without it. With external the code Bitvec.(x + y mod m) together with Zarith was compiled to native machine code, i.e., to two machine operations. Without it it was two calls to OCaml function.

@ivg
Copy link
Member

ivg commented Jun 9, 2021

For the reference, here is the library that uses the trick

(** [(x <op> y) mod m] applies operation [<op>] modulo [m].

    Example: [(x + y) mod m] returns the sum of [x] and [y] modulo [m].

    Note: the [mod] function is declared as a primitive to enable
    support for inlining in non flambda versions of OCaml. Indeed,
    underneath the hood the ['a m] type is defined as the reader monad
    [modulus -> 'a], however we don't want to create a closure every
    time we compute an operation over bitvectors. With this trick, all
    versions of OCaml no matter the optimization options will inline
    [(x+y) mod m] and won't create any closures, even if [m] is not
    known at compile time.
*)
external (mod) : 'a m -> modulus -> 'a = "%apply"

where type 'a m is modulus -> 'a hidden under the abstraction boundary.

The program could be fixed by defining the test function within the module itself, instead of outside the abstraction boundary.

This won't work in the real application. The whole idea is to hide that 'a m which denotes a computation modulo m. As it was noticed before, exposing the fact that mod is the apply primitive enabled inlining (even without flambda) and generated optimal code. I am currently looking for the black box impact on bap in case if the function is turned from external to a normal val.

@lthls
Copy link
Contributor

lthls commented Jun 9, 2021

It's not very satisfying, but you can get around the new restriction by using the "%identity" primitive instead:

type 'a mod_result = modulus -> 'a
external (mod) : 'a m -> 'a mod_result = "%identity"

I think this should give you the same result as with "%apply". The extra type definition is a work around the check for primitive arity (parentheses don't seem to be enough).

@ivg
Copy link
Member

ivg commented Jun 9, 2021

I think this should give you the same result as with "%apply". The extra type definition is a work around the check for primitive arity (parentheses don't seem to be enough).

Not sure if it will work. I still need to apply as 'a m is a function and I need to apply it. I thinking so far to use

type 'a m = private modulus -> 'a

Which I presume will work. Will it?

@ivg
Copy link
Member

ivg commented Jun 9, 2021

let sum8 x y = Bitvec.((x + y) mod m)

With external, it is compiled by 4.09.x on amd64 to,

	subq	$8, %rsp
.L106:
	movq	camlExample@GOTPCREL(%rip), %rdi
	movq	(%rdi), %rdi
	movq	%rdi, (%rsp)
	call	camlZ__add_92@PLT
.L100:
	movq	%rax, %rbx
	movq	%rbx, %rdi
	call	ml_z_sign@PLT
	cmpq	$1, %rax
	jl	.L104
	movq	%rbx, %rdi
	movq	(%rsp), %r12
	movq	%r12, %rsi
	call	ml_z_compare@PLT
	cmpq	$1, %rax
	jl	.L105
	movq	%r12, (%rsp)
	jmp	.L104
.L105:
	movq	%rbx, %rax
	addq	$8, %rsp
	ret
.L104:
	movq	%rbx, %rax
	movq	(%rsp), %rbx
	addq	$8, %rsp
	jmp	camlZ__logand_131@PLT

Notice the absence of allocations.
When I remove the external annotation, the compiler is no longer able to see through it and allocates a closure,

camlExample__sum8_205:
	subq	$8, %rsp
.L108:
	movq	camlExample@GOTPCREL(%rip), %rdi
	movq	(%rdi), %rdi
	movq	%rax, %rsi
	movq	camlBitvec@GOTPCREL(%rip), %rax
	movq	592(%rax), %rdx
.L109:
	subq	$48, %r15
	movq	caml_young_limit@GOTPCREL(%rip), %rax
	cmpq	(%rax), %r15
	jb	.L110
	leaq	8(%r15), %rcx
	movq	$5367, -8(%rcx)
	movq	camlExample__fun_220@GOTPCREL(%rip), %rax
	movq	%rax, (%rcx)
	movq	$3, 8(%rcx)
	movq	%rsi, 16(%rcx)
	movq	%rbx, 24(%rcx)
	movq	%rdx, 32(%rcx)
	movq	(%rcx), %rsi
	movq	%rdi, %rax
	movq	%rcx, %rbx
	addq	$8, %rsp
	jmp	*%rsi
.L110:
	call	caml_call_gc@PLT
.L111:
	jmp	.L109

@lthls
Copy link
Contributor

lthls commented Jun 9, 2021

I think this should give you the same result as with "%apply". The extra type definition is a work around the check for primitive arity (parentheses don't seem to be enough).

Not sure if it will work. I still need to apply as 'a m is a function and I need to apply it.

a mod b is translated as ((mod) a) b then typechecked.
During typechecking, if (mod) is the apply primitive then (starting from 4.13) the type annotation on (mod) is discarded (I think) and the types for a and b are inferred as if the expression were a b, which requires a to be typable as a function. Otherwise the normal typechecking algorithm is used, matching the type of (mod) from its declaration with the type of its arguments.
After typechecking, applications are uncurrified and the expression is translated into (mod) a b. Then, if mod is the apply primitive, this whole application is translated into a b, while if mod is the identity primitive it is translated first back into ((mod) a) b, then (mod) a is translated into a, so you get back a b.
All of this occurs before any inlining considerations, so both approaches are indistinguishable by the backend and will generate the same code.

I thinking so far to use

type 'a m = private modulus -> 'a

Which I presume will work. Will it?

If I read the manual correctly (I'm not an expert on this), you would still need explicit coercions everywhere, so I think this doesn't count as working.

let sum8 x y = Bitvec.((x + y : 'a m :> modulus -> 'a) mod m)

@ivg
Copy link
Member

ivg commented Jun 10, 2021

I tried 4.13.0~alpha1 and got perfect results1 (with no allocations and all functions inlined) when the mod operator is defined as a simple val. So the fix for us would be just to translate external to val. A win-win!

With that said, it looks like that #10081 changed the way how primitives (externals) work. Before that externals were untyped so that we can write external magic : 'a -> 'b = "%identity", but now some of the primitives, e.g., %apply are typed. This introduces some inconsistency into the language. On the other hand, it concerns an expert and unsafe part of the language that partially reduces the impact of this change (most of the users shall not even know about these features). The only documented part of the extrernal concerns C functions, which are still typed the same.


1)) I am really excited with that!

@gasche
Copy link
Member

gasche commented Jun 10, 2021

Out of curiosity, does using val degrade performance on other recent versions of OCaml?

@lthls
Copy link
Contributor

lthls commented Jun 10, 2021

I'm also curious to know whether the perfect results were obtained with Flambda, Closure (the non-flambda mode), or both. I'm surprised that Closure would actually perform that much inlining.

@ivg
Copy link
Member

ivg commented Jun 10, 2021

Out of curiosity, does using val degrade performance on other recent versions of OCaml?

Yes and no. On microbenchmarking yes, hundred times slower, as every addition turned into allocation, instead of two machine instructions. On my blackbox test, which was /usr/bin/time -v bap /bin/ls I didn't notice any significant changes (there was some observable performance impact that is < 5%). But other tests, e.g., if we will interpret a binary, which more heavily relies on arithmetic operations, will be used might show different results.

And I only tried 4.09 and 4.13. I can't say from what version the inlining started to work without having to rely on external.

I'm also curious to know whether the perfect results were obtained with Flambda

I think that with flambda we always had good results, since the introduction of bitvec, which was 4.04. The perfect results with 4.13.0 were obtained for 4.13.0+alpha1, which, as far as I can tell doesn't include the flambda option. I can say that inlining appears already on clambda (and rawclamba) level (but not yet present on lambda),

(sum8/206
       (closure 
         (fun camlExample__sum8_206 2  x/208 y/209
           (let
             (prim/232 (field 0 (read_symbol camlExample))
              prim/233
                (let
                  (arg/220 y/209
                   arg/219 x/208
                   funct/222 (field 74 (read_symbol camlBitvec)))
                  (closure 
                    (fun camlExample__fun_223 1  arg/221 env/228
                      (let
                        (y/229 (field 3 env/228)
                         x/230 (field 2 env/228)
                         x/231 (apply* camlZ__add_94  x/230 y/229))
                        (if
                          (|| (< (ml_z_sign x/231) 0)
                            (>= (ml_z_compare x/231 arg/221) 0))
                          (apply* camlZ__logand_139  x/231 arg/221) x/231))) 
                    arg/219 arg/220 funct/222)))
             (apply prim/233 prim/232))) ))

@lthls
Copy link
Contributor

lthls commented Jun 10, 2021

The output you've shown looks like what I would expect out of Closure. The inner closure means that you still have an allocation at each call to sum8, which is what you were trying to prevent (and you can see that the last apply prim/233 prim/232 is not inlined).
Are you sure that this is the result that you wanted ?
To give you an idea, this is the Clambda code I get with Flambda:

un-anf (camlExample__sum8_84):
(let
  (m/300 (field 0 "camlExample__Popaque_38")
   apply_arg/302 (apply* camlZ__add_238  x/299 y/298))
  (if
    (if (< (ml_z_sign apply_arg/302) 0) 1
      (apply* camlZ__geq_643  apply_arg/302 m/300))
    (apply* camlZ__logand_445  apply_arg/302 m/300) apply_arg/302))

@lthls
Copy link
Contributor

lthls commented Jun 10, 2021

And here is the code I get with my "%identity" trick, with Closure:

    (sum8/289
       (closure 
         (fun camlExample__sum8_289 2  x/291 y/292
           (let
             (m/296 (field 0 (read_symbol camlExample))
              x/297 (apply* camlZ__add_94  x/291 y/292))
             (if
               (|| (< (ml_z_sign x/297) 0) (>= (ml_z_compare x/297 m/296) 0))
               (apply* camlZ__logand_139  x/297 m/296) x/297))) ))

@ivg
Copy link
Member

ivg commented Jun 10, 2021

Are you sure that this is the result that you wanted ?

Yep, indeed, I jumped to conclusion too fast, I looked into the closure, which is perfectly inlined itself, but didn't notice that the toplevel function allocates the closure before calling it. Indeed, there's still an allocation happening. Though still better than it was (before that x + y was allocated as a closure and then applied to the modulus function.

@ivg
Copy link
Member

ivg commented Jun 10, 2021

And here is the code I get with my "%identity" trick, with Closure:

Could you elaborate (or point me to your version or send a patch) on this trick, as I can't get it work.

@lthls
Copy link
Contributor

lthls commented Jun 10, 2021

The initial idea was to apply the following patch both in bitvec.mli and bitvec.ml:

- external (mod) : 'a m -> modulus -> 'a = "%apply"
+ type 'a mod_result = modulus -> 'a
+ external (mod) : 'a m -> 'a mod_result = "%identity"

But with type substitutions (available since 4.08, if I remember correctly), we can get a slightly smaller patch:

diff --git a/lib/bitvec/bitvec.ml b/lib/bitvec/bitvec.ml
index 5276462..d2cd5df 100644
--- a/lib/bitvec/bitvec.ml
+++ b/lib/bitvec/bitvec.ml
@@ -63,7 +63,7 @@ let norm {m} x =
    inlining (with any other operator definition the vanilla
    version of OCaml will create a closure)
 *)
-external (mod) : 'a m -> modulus -> 'a = "%apply"
+external (mod) : 'a m -> 'a m = "%identity"
 
 let modulus w = {
   m = Z.(one lsl w - one);
diff --git a/lib/bitvec/bitvec.mli b/lib/bitvec/bitvec.mli
index f4495d8..2f522e0 100644
--- a/lib/bitvec/bitvec.mli
+++ b/lib/bitvec/bitvec.mli
@@ -41,7 +41,8 @@ val m64 : modulus
     [(x+y) mod m] and won't create any closures, even if [m] is not
     known at compile time.
 *)
-external (mod) : 'a m -> modulus -> 'a = "%apply"
+type 'a mod_result := modulus -> 'a (* Work around arity checks *)
+external (mod) : 'a m -> 'a mod_result = "%identity"
 
 module type S = sig
   (** an abstract representation of an operation modulo some number.*)

@lthls
Copy link
Contributor

lthls commented Jun 10, 2021

Apparently my trick requires a fairly recent version of OCaml (probably 4.13.0), otherwise (mod) ((+) x y) is simplified into (+) x y too late, so (mod) ((+) x y) m isn't simplified into (+) x y m but into ((+) x y) m, preventing the expected inlining of (+). So it's not a solution for earlier versions of OCaml (I failed to notice that earlier, sorry about that).

@ivg
Copy link
Member

ivg commented Jun 10, 2021

So it's not a solution for earlier versions of OCaml (I failed to notice that earlier, sorry about that).

That's not a problem! For versions before 4.13 we can use the apply primitive and for versions >= 13 we will be using the identity trick.

It drives me crazy though that we just change mod from being apply to identity (which clearly have different semantics) without changing anything else and get still the same results :)

@lthls
Copy link
Contributor

lthls commented Jun 11, 2021

In a way, apply is the identity specialised to function types and eta-expanded. So it's not completely unreasonable for both primitives to behave similarly.

@Octachron
Copy link
Member

If I understand correctly the conclusion, since 4.13 the %identity primitive is optimized in such way that one can always replace

type ('a,'b) fn
external app: ('a,'b) fn -> 'a -> 'b = "%apply"

with

type ('a,'b) fn
type ('a,'b) ocaml_fn := 'a -> 'b
external app: ('a,'b) fn -> ('a,'b) ocaml_fn = "%identity"

I imagine that we could improve backward compatibility by providing both a %typed_apply and a %apply primitives with the old behavior. However, since we are speaking of an internal feature with an alternative implementation, the increase in complexity does not seem worthwhile and the issue can be closed.

@kit-ty-kate
Copy link
Member Author

I feel a bit uneasy about the conclusion (modifying code bases to change %apply to %identity when compiling with 4.13), it doesn't feel quite right. I'm concerned people outside of the opensource community might have code that does the same so even if it might be fine for the bap suite there might be other things that breaks because of this.

If everyone think it's not worth changing this right now, @ivg would it be possible to have a minor release of bap-core-theory (presumably 2.3.1) with this fix?

@ivg
Copy link
Member

ivg commented Aug 2, 2021

It is in the bap-bitvector package, but yes, I think we can add a patch file to the opam release with a fix.

@lpw25
Copy link
Contributor

lpw25 commented Aug 2, 2021

I agree that the conclusion is unsatisfying. I believe that no where else do we insist that a particular primitive be given a particular type. It would seem pretty easy to only have the weird new behaviour when the type matches the expected pattern, and then just fall back to the ordinary behaviour for typing a function application.

@Octachron
Copy link
Member

Octachron commented Aug 6, 2021

I have a tentative implementation in #10558 which only applies the new typing rule when the type of the primitives matches exactly the standard one. This fixes BAP compatibility issue without introducing new primitives, which seem nice. Are there any objections to the idea or implementation?

@ivg
Copy link
Member

ivg commented Aug 6, 2021

Thanks, Florian, this is awesome!

Octachron added a commit that referenced this issue Aug 24, 2021
#10450: restore support for %apply and %revapply with non-translucid types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants