Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: add List.is_empty #10464

Merged
merged 4 commits into from
Oct 31, 2022
Merged

stdlib: add List.is_empty #10464

merged 4 commits into from
Oct 31, 2022

Conversation

craigfe
Copy link
Contributor

@craigfe craigfe commented Jun 18, 2021

Adds a fairly self-explanatory function:

val is_empty : _ list -> bool

This can be found in both containers and base.

The rationale for proposing it is that I see it hand-rolled as List.length l = 0 relatively frequently in the wild, which is inefficient. It's possible to use l = [] instead, but I think a dedicated function would encourage users to fall into this more efficient pattern organically. (Plus, some codebases intentionally put barriers in the way of polymorphic equality.)

Notes on consistency:

  • we already have Map.is_empty;
  • there's an outstanding issue regarding {String,Bytes,Array}.is_empty, with the concensus being that these should be implemented in terms of compiler primitives;
  • there's perhaps a case to be made for Seq.is_empty too, though it does force an unused element. (Containers and Base provide it.)

@craigfe
Copy link
Contributor Author

craigfe commented Jun 18, 2021

Implementation switched to use structural equality rather than pattern matching (thanks to a hint by @dra27). It seems the compiler has an easier time eliminating the branch this way:

 |(seq
-|  (let (is_empty/80 = (function l/82 (if l/82 0a 1a)))
+|  (let (is_empty/80 = (function l/82 (== l/82 0a)))
 |    (setfield_ptr(root-init) 0 (global Test!) is_empty/80))
 |  0a)

@lthls
Copy link
Contributor

lthls commented Jun 18, 2021

Note: the structural equality version is better now, but only because the pattern-matching compiler generates sub-optimal code.
In an ideal world the pattern-matching case should compile to l & 1 (one bit comparison) while the equality will be compiled to l == 1 (full-length comparison).
(I believe that this is already the case in Flambda 2, and could be the case in the current compiler if a switch was generated instead of an if-then-else).

@raphael-proust
Copy link
Contributor

there's perhaps a case to be made for Seq.is_empty too, though it does force an unused element. (Containers and Base provide it.)

I would prefer not adding functions that force sequences.

@xavierleroy
Copy link
Contributor

With List.hd, List.tl, List.cons, and now List.is_empty, we can finally write true 1960's Lisp code!

let rec map f l =
  if List.is_empty l then [] else List.cons (f (List.hd l)) (map f (List.tl l)))

Oops, we still miss List.nil ! Quick, a PR !

More seriously: the one true way to work with lists in OCaml is pattern-matching. Every library function on lists that distracts from pattern-matching makes me sad.

@johnwhitington
Copy link
Contributor

johnwhitington commented Jun 18, 2021

I normally write (( = ) []), which is shorter. For example:

let all_empty = List.for_all (( = ) [])
let substantive = List.filter (( <> ) [])

Is that less efficient? Or in poor taste? Or perhaps just hard to read?

@craigfe
Copy link
Contributor Author

craigfe commented Jun 18, 2021

@johnwhitington: it is less efficient indeed, at least with non-flambda compilers. This is partly due to the allocation cost of the ( = ) [] closure, and partly because that closure prevents inlining. For optimal performance you should eta-expand to avoid the partial application.

module All_empty = struct
  let via_inline          l = List.for_all (( = ) []) l
  let via_inline_weak       = List.for_all (( = ) [])
  let via_inline_expanded l = List.for_all (fun x -> x = []) l
  let via_stdlib          l = List.for_all List.is_empty l
end

When benchmarked with ocamlopt.4.12.0 on the test case [ []; [ 1 ]; [ 1; 2 ] ], this gives:

❭ dune exec ./test.exe
╭─────────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                         │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────┼───────────────────────────┼───────────────────────────┤
│  all_empty/inline           │             5.0002 mnw/run│             19.2466 ns/run│
│  all_empty/inline_weak      │             0.0002 mnw/run│             17.7193 ns/run│
│  all_empty/inline_expanded  │             0.0001 mnw/run│              5.6662 ns/run│
│  all_empty/stdlib           │             0.0001 mnw/run│              5.8278 ns/run│
╰─────────────────────────────┴───────────────────────────┴───────────────────────────╯

You may be interested in https://blog.janestreet.com/the-dangers-of-being-too-partial/.

As mentioned above, my main concern w.r.t. performance is not that experienced users can't hand-roll an efficient implementation, but rather that too many people seem to reach for the very inefficient List.length l = 0 instead. (I've observed this very thing crippling performance in multiple cases in the wild.)

I do also happen to think there is a readability argument in favour of is_empty, but clearly others will disagree.

@dra27
Copy link
Member

dra27 commented Jun 18, 2021

FWIW, from my own teaching, I would say that the existence of hd, tl and most especially nth do much more to veer the inexperienced functional programmer from writing decent functions over lists than I would expect is_empty to.

@Octachron
Copy link
Member

Octachron commented Jun 18, 2021

I don't think it is fair to compare is_empty with nth, hd and tl. First, this is_empty function is total. It is even a morphism between ('a list, @) and (bool, &&), thus it does seem to respect the algebraic nature of lists.

@pmetzger
Copy link
Member

Obviously pattern matching handles most of one's daily needs (see the fact that few have complained that this was missing over the years), but I find myself writing similar things often enough that I think it wouldn't be bad to have this in the standard library. And as has been said, is_empty is total.

@alainfrisch
Copy link
Contributor

We already have Option.is_none. The addition seems quite coherent with it.

More seriously: the one true way to work with lists in OCaml is pattern-matching. Every library function on lists that distracts from pattern-matching makes me sad.

And since the one true way to work with recursive data structures are recursive functions, one should discourage the use of iterators like List.fold_left that distract from both recursive functions and pattern matching. 😜

@xavierleroy
Copy link
Contributor

I stand by my comment that the one true way to inspect lists, options, and any other value of inductive type is by pattern-matching.

If there's clear support for this PR, I'll shut up. Otherwise, I'll move to close.

@dbuenzli
Copy link
Contributor

I'd like to add my support then.

When predicates are needed I find the lack of such functions make the code noisy and less obvious than it could be; see for example @johnwhitington's examples.

Also I'd like to be able to write List.is_empty to test my assoc lists for emptiness. That brings them closer to Map.S.

@pmetzger
Copy link
Member

Indeed. To reiterate, sometimes, you just want to pass map or for_all something like is_empty . It's a useful total function in a number of contexts where you need to pass a function in. It's not intended to be used in if statements or the like. I agree that pattern matching is canonical there.

@mshinwell
Copy link
Contributor

I am in favour of adding this.

@lthls
Copy link
Contributor

lthls commented Oct 29, 2021

To follow up on my earlier comment: now that #10681 is merged, we are in the ideal world, so there's no reason to prefer equality to pattern-matching anymore (at least in native mode).

@pmetzger
Copy link
Member

@lthls I don't understand the relevance here. If you want to run map over a collection of lists to get a collection of booleans stating which are empty and which are not empty, you still need is_empty. It shows up in a number of similar circumstances.

@dra27
Copy link
Member

dra27 commented Oct 29, 2021

@pmetzger - the comment's about the implementation, rather than the need for the function. @craigfe originally gave:

let is_empty = function
  | [] -> true
  | _ :: _ -> false

which #10681 in ocamlopt the code compiled down to (amd64):

        cmpq    $1, %rax
        je      .L100
        movl    $1, %eax
        ret
        .align  4
.L100:
        movl    $3, %eax
        ret

vs

        cmpq    $1, %rax
        sete    %al
        movzbq  %al, %rax
        leaq    1(%rax,%rax), %rax
        ret

for the code presently in the PR. However, trunk (i.e. 4.14) now generates better code for the original pattern matching version:

        andq    $1, %rax
        leaq    1(%rax,%rax), %rax
        ret

@craigfe
Copy link
Contributor Author

craigfe commented Oct 29, 2021

Thanks @lthls for the pointer and @dra27 for the summary. I've switched back to the original pattern-matching implementation.

I'm delighted that the most idiomatic implementation is now also the most efficient one!

@pmetzger
Copy link
Member

Ah, indeed, I was confused about what thread of the conversation was being invoked.

Copy link
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge with second approval.

@dbuenzli
Copy link
Contributor

I thought programming against trunk would give me that :–)

As of ocaml#10681, the native compiler is able to simplify this
pattern-matching to use bitwise operations rather than an explicit
branch (as was previously the case) or an integer comparison (as is the
case for the implementation using `( = )`).
Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally in Xavier's camp that pattern-matching is the nice way to inspect values. This being said, some of the arguments here are convincing, and I agree with the point of @dra27 in #11680 that it's too tempting for beginners to write List.length li = 0 instead. Finally, there are many other containers that provide this function, notably Stack, Queue, Seq, Set, Map, so the change arguably adds consistency. Approved.

stdlib/list.mli Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet