Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy.map #10097

Merged
merged 4 commits into from
Apr 14, 2021
Merged

Lazy.map #10097

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,12 @@ Working version
- #9582: Add Array.{find_opt,find_map,split,combine}.
(Nicolás Ojeda Bär, review by Daniel Bünzli and Gabriel Scherer)

- #10097: Lazy.map, Lazy.map_val: ('a -> 'b) -> 'a Lazy.t -> 'b Lazy.t
(map f x) is always (lazy (f (force x))), whereas (map_val f x)
applies f directly if x is already forced.
(Gabriel Scherer, review by Nicolás Ojeda Bär, Alain Frisch, Xavier Leroy,
Daniel Bünzli and Stephen Dolan)

* #10169, #10270: Use capitalized module names in the Standard Library prefixing
scheme to match Dune, e.g. Stdlib__String instead of Stdlib__string. This is a
breaking change only to code which attempted to use the internal names before.
Expand Down
11 changes: 9 additions & 2 deletions stdlib/lazy.ml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ external make_forward : 'a -> 'a lazy_t = "caml_lazy_make_forward"

external force : 'a t -> 'a = "%lazy_force"

(* let force = force *)

let force_val = CamlinternalLazy.force_val

Expand All @@ -64,7 +63,6 @@ let from_fun (f : unit -> 'arg) =
Obj.set_field x 0 (Obj.repr f);
(Obj.obj x : 'arg t)


let from_val (v : 'arg) =
let t = Obj.tag (Obj.repr v) in
if t = Obj.forward_tag || t = Obj.lazy_tag || t = Obj.double_tag then begin
Expand All @@ -81,3 +79,12 @@ let lazy_from_fun = from_fun
let lazy_from_val = from_val

let lazy_is_val = is_val


let map f x =
lazy (f (force x))

let map_val f x =
if is_val x
then lazy_from_val (f (force x))
else lazy (f (force x))
76 changes: 57 additions & 19 deletions stdlib/lazy.mli
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ type 'a t = 'a CamlinternalLazy.t

exception Undefined

(* val force : 'a t -> 'a *)
external force : 'a t -> 'a = "%lazy_force"
(** [force x] forces the suspension [x] and returns its result.
If [x] has already been forced, [Lazy.force x] returns the
Expand All @@ -67,36 +66,75 @@ external force : 'a t -> 'a = "%lazy_force"
recursively.
*)

val force_val : 'a t -> 'a
(** [force_val x] forces the suspension [x] and returns its
result. If [x] has already been forced, [force_val x]
returns the same value again without recomputing it.
(** {1 Iterators} *)

If the computation of [x] raises an exception, it is unspecified
whether [force_val x] raises the same exception or {!Undefined}.
@raise Undefined if the forcing of [x] tries to force [x] itself
recursively.
val map : ('a -> 'b) -> 'a t -> 'b t
(** [map f x] returns a suspension that, when forced,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useful variant would apply f eagerly if the suspension had already been forced (particularly useful when f is very cheap, to avoid building a useless closure and possibly keeping x alive for too long). One could even argue that this function would be a more natural map over lazy values, seen as (mutable) sum types with two constructors (forced value or lazy thunk). Should we provide both variants (or just one function with an optional flag)? I'm not strongly asking for it, just wanted to raise the point in case this affects the naming choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this optimization makes sense (but in the same function, no need for a separate function or an extra argument). The docstring should be amended to indicate that f could be executed right away if the lazy has already been forced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting optimization but its semantics is weird, so it should definitely not be the default behavior. By "weird semantics", I mean it's neither lazy nor strict. The lazy behavior is

let map f x = lazy (f (force x))

as in this PR, and the strict behavior is

let map f x = f (force x)

In by-the-book functional programming, there is no other option! The optimized map proposed by @alainfrisch can only be defined if primitives like Lazy.is_val are provided, which again is not by-the-book functional programming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @xavierleroy that this definitely shouldn't be the default. If f is expensive and the resulting lazy is not forced, then the optimisation can do unbounded unnecessary work, precisely what the user is using Lazy to avoid.

a more natural map over lazy values, seen as (mutable) sum types with two constructors (forced value or lazy thunk).

This is indeed the natural map over sums with two constructors, but I see that as an implementation detail of Lazy rather than a spec. The natural map over computations-evaluated-only-if-forced is @gasche's, because it has:

    map f (lazy e) == lazy (f e)

where == is observational equivalence (assuming you don't get to observe with the "advanced" parts of Lazy). By constrast, the optimised sometimes-eager version:

    ignore (map print_int (lazy (1 + 1)))

makes it observable whether the compiler has performed constant-folding.

forces [x] and applies [f] to its value.

It is equivalent to [lazy (f (Lazy.force x))].

@since 4.13.0
*)

(** {1 Reasoning on already-forced suspensions} *)
gasche marked this conversation as resolved.
Show resolved Hide resolved

val is_val : 'a t -> bool
(** [is_val x] returns [true] if [x] has already been forced and
did not raise an exception.
@since 4.00.0 *)

val from_val : 'a -> 'a t
(** [from_val v] evaluates [v] first (as any function would) and returns
an already-forced suspension of its result.
It is the same as [let x = v in lazy x], but uses dynamic tests
to optimize suspension creation in some cases.
@since 4.00.0 *)

val map_val : ('a -> 'b) -> 'a t -> 'b t
(** [map_val f x] applies [f] directly if [x] is already forced,
otherwise it behaves as [map f x].

When [x] is already forced, this behavior saves the construction of
a suspension, but on the other hand it performs more work eagerly
that may not be useful if you never force the function result.
gasche marked this conversation as resolved.
Show resolved Hide resolved

If [f] raises an exception, it will be raised immediately when
[is_val x], or raised only when forcing the thunk otherwise.

If [map_val f x] does not raise an exception, then
[is_val (map_val f x)] is equal to [is_val x].

@since 4.13.0 *)


(** {1 Advanced}

The following definitions are for advanced uses only; they require
familiary with the lazy compilation scheme to be used appropriately. *)

val from_fun : (unit -> 'a) -> 'a t
(** [from_fun f] is the same as [lazy (f ())] but slightly more efficient.

[from_fun] should only be used if the function [f] is already defined.
It should only be used if the function [f] is already defined.
In particular it is always less efficient to write
[from_fun (fun () -> expr)] than [lazy expr].

@since 4.00.0 *)

val from_val : 'a -> 'a t
(** [from_val v] returns an already-forced suspension of [v].
This is for special purposes only and should not be confused with
[lazy (v)].
@since 4.00.0 *)
val force_val : 'a t -> 'a
(** [force_val x] forces the suspension [x] and returns its
result. If [x] has already been forced, [force_val x]
returns the same value again without recomputing it.

val is_val : 'a t -> bool
(** [is_val x] returns [true] if [x] has already been forced and
did not raise an exception.
@since 4.00.0 *)
If the computation of [x] raises an exception, it is unspecified
whether [force_val x] raises the same exception or {!Undefined}.
@raise Undefined if the forcing of [x] tries to force [x] itself
recursively.
Comment on lines +130 to +133
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring notes some unspecified behaviour, and then specifies it in the next sentence. It seems fairly clear under what circumstances force_val will raise Undefined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the two sentences are talking about different things: (1) what happens if the suspended computation raises an exception (say lazy (raise Exit)), and (2) what happens if the thunk tries to force itself. In the second case it (claims that it) raises Undefined, in the first case it may raise Undefined or re-raise the exception.

In any case, this is unrelated to the change in this PR, so I would propose to deal with any potential change proposals here in a separate issue/PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's clearer. I guess I don't understand under what circumstances it's ever safe to raise an exception from within something forced by this function then, as it always has the opportunity to gobble the exception and raise a Undefined instead. Give that, how can an exception ever safely be thrown?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There's the case of a non-recursive function for which you can inspect all possible exceptions, but then we already know that it can never raised Undefined from the next sentence. But the specification of force_val retains the possibility of raising an Undefined even in that case).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this force_val function is intended to be used for thunks that we know are already forced (for example by checking is_val first), and those thunks will not raise an exception. It's in the Advanced section, so probably not intended for wide usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell about the original intent, but force_val is at most no better than force for already forced chunks (it's usually less efficient because force is a compiler primitive whereas force_val isn't).
It's only marginally better for not-yet-forced thunks because it skips one try-with handler, and unless AFL instrumentation is on (that prevents some optimisations from triggering) it probably still ends up slower anyway.

*)


(** {1 Deprecated} *)

val lazy_from_fun : (unit -> 'a) -> 'a t
[@@ocaml.deprecated "Use Lazy.from_fun instead."]
Expand Down
57 changes: 57 additions & 0 deletions testsuite/tests/lib-lazy/test.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
(* TEST
* expect
*)

(* expect-tests currently do not collect I/O,
so we emulate I/O by collecting output in a "log" *)
let logger () =
let log = ref [] in
let show_log v = List.rev !log, v in
let log v = log := v :: !log in
log, show_log
[%%expect{|
val logger : unit -> ('a -> unit) * ('b -> 'a list * 'b) = <fun>
|}]

let _ =
let log, show_log = logger () in
let x = lazy (log "x"; 41) in
let y =
log "map";
Lazy.map (fun n -> log "y"; n+1) x in
log "force y";
show_log (Lazy.force y)
;;
[%%expect{|
- : string list * int = (["map"; "force y"; "x"; "y"], 42)
|}]

let _ =
let log, show_log = logger () in
let x = lazy (log "x"; 41) in
let y =
log "map_val";
Lazy.map_val (fun n -> log "y"; n+1) x in
assert (not (Lazy.is_val y));
log "force y";
show_log (Lazy.force y)
;;
[%%expect{|
- : string list * int = (["map_val"; "force y"; "x"; "y"], 42)
|}]

let _ =
let log, show_log = logger () in
let x = lazy (log "x"; 41) in
log "force x";
let () = ignore (Lazy.force x) in
let y =
log "map_val";
Lazy.map_val (fun n -> log "y"; n+1) x in
assert (Lazy.is_val y);
log "y is val";
show_log (Lazy.force y)
;;
[%%expect{|
- : string list * int = (["force x"; "x"; "map_val"; "y"; "y is val"], 42)
|}]