Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Array.{find_opt,find_map,split,combine} #9582

Merged
merged 3 commits into from Nov 28, 2020
Merged

Conversation

nojb
Copy link
Contributor

@nojb nojb commented May 20, 2020

What it says on the label.

@nojb nojb changed the title Add Array.find_{opt,map} Add Array.{find_opt,find_map,split,combine} May 20, 2020
Copy link
Contributor

@dbuenzli dbuenzli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the lack of changes entry, nothing to say.

@nojb
Copy link
Contributor Author

nojb commented May 20, 2020

Except for the lack of changes entry, nothing to say.

Thanks! Changes entry added. Planning to merge once CI passes.

@nojb
Copy link
Contributor Author

nojb commented May 20, 2020

Added some tests for good measure.

@gasche
Copy link
Member

gasche commented May 20, 2020

For arrays I find it natural to also have versions that return the index at which the element was found: findi : (int -> 'a -> bool) -> 'a array -> int * 'a. (But Extlib and Batteries have a findi function with the much less natural type ('a -> bool) -> 'a array -> int.)

@nojb
Copy link
Contributor Author

nojb commented May 20, 2020

For arrays I find it natural to also have versions that return the index at which the element was found: findi : (int -> 'a -> bool) -> 'a array -> int * 'a. (But Extlib and Batteries have a findi function with the much less natural type ('a -> bool) -> 'a array -> int.)

A more general function is findi_map: (int -> 'a -> 'b option) -> 'a array -> 'b option. What do you think? In that case we should also add findi_opt: (int -> 'a -> bool) -> 'a array -> 'a option.

@gasche
Copy link
Member

gasche commented May 20, 2020

(I think that even findi_opt could have return type (int * 'b) option, even though your version can express it.)

@dbuenzli
Copy link
Contributor

I was a bit obsessed about matching List functions but we could also go for an entirely different design which would be to add:

val find_index_left : ?start:int -> (int -> 'a -> bool) -> 'a t -> int option
val find_index_right : ?start:int -> (int -> 'a -> bool) -> 'a t -> int option
val find_map_left : ?start:int -> (int -> 'a -> 'b option)  -> 'a t -> 'b option
val find_map_right : ?start:int -> (int -> 'a -> 'b option) -> 'a t -> 'b option

The first two avoid an allocation if you are only interested in getting the value (you index the array with the given int, rather than having the tuple index/value being created).

@gasche
Copy link
Member

gasche commented May 20, 2020

I think it is apparent now that I opened a can of design worms with my small point. I don't want to make @nojb's life harder than it should be, and maybe we can discuss those in a separate PR (or do nothing at all about these points). This should be fine as long as my proposal and, for example, @dbuenzli's, both have what is currently proposed as a common subset. Do we agree that this is the case? (Daniel, does your different API suggest not having the functions currently proposed?)

@dbuenzli
Copy link
Contributor

I'm in general not very fond of having a bazillion of different functions but maybe it doesn't hurt to have the minimal set of functions proposed by this PR on the ground of List API compability/uniformity.

@alainfrisch
Copy link
Contributor

@dbuenzli : in your proposal, I guess that start is the index at which to start the search. Any reason not to include a stop index (or a len) ? What about exposing a single function with the two directions (default is left-to-right, an optional flag switches to right-to-left)?

@alainfrisch
Copy link
Contributor

Whatever functions are being added (and once the discussion has settled, of course), I would argue to add the same functions to Bytes and Float.Array, our two specialized array types.

@nojb
Copy link
Contributor Author

nojb commented May 20, 2020

I think it is apparent now that I opened a can of design worms with my small point. I don't want to make @nojb's life harder than it should be, and maybe we can discuss those in a separate PR (or do nothing at all about these points).

I think it makes sense to take some time to discuss the API for "find" functions on arrays.

Also, by the way, Alain pointed out that I had jumped the gun by getting ready to merge without having received an approval from another maintainer, so let us take a step back and take some time to arrive at a good design.

@dbuenzli
Copy link
Contributor

@dbuenzli : in your proposal, I guess that start is the index at which to start the search. Any reason not to include a stop index (or a len) ?

The perspective of this function is to be able to restart a search after you found an element so that you can collect all of them in the array (i.e. make it easy to compose invocations to get your own "find_all").

Of course you an always add more toggles to functions but in the end I prefer functions without too many optional features. Note that to limit the search you can do that with find_map and having it return a Some _ with more metadata.

What about exposing a single function with the two directions (default is left-to-right, an optional flag switches to right-to-left)?

Here again in the end I don't like to add to many toggles to functions. I made a few designs in the past with optional rev arguments to invert the search direction but then switched back to have explicit direction following fold_{left,right} and I think I was generally happier with the result.

@gasche
Copy link
Member

gasche commented May 20, 2020

In the case of find functions, another option would be to have nice find functions in Seq.t and convert to a sequence before finding the element. (Do array offer both 'a Seq.t and (int * 'a) Seq.t conversions, like maps or hashtables?)

@bluddy
Copy link
Contributor

bluddy commented May 20, 2020

(Do array offer both 'a Seq.t and (int * 'a) Seq.t conversions, like maps or hashtables?)

Yes. I think this is a great idea, especially if we get functions like drop and take to replace the need for a start_idx.

The perspective of this function is to be able to restart a search after you found an element so that you can collect all of them in the array (i.e. make it easy to compose invocations to get your own "find_all").

Isn't that just a fold_left? At a certain point, you can just use a fold_right/left with early termination. fold_while is more useful for this and is more general purpose.

@dbuenzli
Copy link
Contributor

In the case of find functions, another option would be to have nice find functions in Seq.t and convert to a sequence before finding the element

This adds a lot of allocations to find something.

Isn't that just a fold_left?

The control flow is inverted and you can stop before. But yes everything is a fold.

@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

I gave this some more thought, and I like the following two functions:

val find_opt: ?start:int -> ?len:int -> ('a -> bool) -> 'a array -> int option
val find_map: ?start:int -> ?len:int -> ('a -> 'b option) -> 'a array -> 'b option

(I still haven't decided if having "left"/"right" versions or a ?rev:bool argument is better though.)

What do you think?

@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

By the way, if you are using arrays chances are that you care about the compact layout it provides. Going through a Seq.t to search for an element does not strike me as coherent with that.

@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

(Sorry I had missed part of the discussion and I see my suggestion is very close to what is proposed in this comment)

@gasche
Copy link
Member

gasche commented May 26, 2020

In Batteries we have a type Substring defined internally as string * offset:int * length:int (and advertised as having constant-time creation), on which many substring-relevant functions are defined (index{,_from}, take/drop/split, fold, iter, etc.). For the user-facing API I think that this is a more elegant solution than adding ?start:int -> ?len:int -> parameters to all relevant functions. (Internally you probably want segment-parametrized functions to be able to implement both the Array function and the Array.Sub function of the same name.)

@nojb
Copy link
Contributor Author

nojb commented May 26, 2020

In Batteries we have a type Substring defined internally as string * offset:int * length:int (and advertised as having constant-time creation), on which many substring-relevant functions are defined (index{,_from}, take/drop/split, fold, iter, etc.). For the user-facing API I think that this is a more elegant solution than adding ?start:int -> ?len:int -> parameters to all relevant functions. (Internally you probably want segment-parametrized functions to be able to implement both the Array function and the Array.Sub function of the same name.)

OK, according to this, if we add "find" function to Array itself, they should act on the whole array, something like:

val find_opt_{left,right}: 'a array -> (int -> 'a -> bool) -> int option
val find_map_{left,right}: 'a array -> (int -> 'a -> 'b option) -> 'b option

What do you think? Would these functions be worthy of addition, even if not the most general ones?

@gasche
Copy link
Member

gasche commented May 26, 2020

Yes, I think they are fine. But then maybe I am being unfair to optional parameters, and we should really consider having them for all functions? (I'm not a very good standard-library-API designer.)

(Consistently with fold_{left,right}, find_leftis for a find that goes right, andfind_right` goes left.)

@nojb nojb added the stdlib label Jun 10, 2020
@nojb
Copy link
Contributor Author

nojb commented Jun 12, 2020

PR stalled without clear consensus. I'm leaning towards closing it, but I'm wondering if it is worth putting Array.{split,combine} (which seem uncontroversial) in a different PR. They are not very commonly used functions anyway.

@gasche
Copy link
Member

gasche commented Jun 12, 2020

I looked at the patch again: personally I find the proposed API (which is exactly the one of List) very reasonable. We diverged when we tried to propose finer-grained APIs for traversing only a sub-array, but if we forget about this, I like the functions that are actually in the PR. They also received an approval from @dbuenzli, which is very high bar for a Stdlib change. So maybe we could decide that we like find_{opt,map} as they are, and try to get the PR merged as-is?

I generally follow the principle of not approving Stdlib PRs myself. Here we have somewhat of an approval resource starvation, because the most active stdlib maintainer is @nojb, and it's his turn to propose something. Maybe some other maintainer would like to chime in?

@nojb
Copy link
Contributor Author

nojb commented Nov 26, 2020

What do we do with this? Anyone willing to voice a vote either way? If no-one chimes in, we will close it.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an exception to my own rule, here is an approval stamp.

@nojb
Copy link
Contributor Author

nojb commented Nov 27, 2020

As an exception to my own rule, here is an approval stamp.

Thanks! I rebased the PR.

@nojb
Copy link
Contributor Author

nojb commented Nov 27, 2020

Both CI errors (full-flambda and AppVeyor) seem unrelated...

@nojb nojb closed this Nov 27, 2020
@nojb nojb reopened this Nov 27, 2020
@nojb nojb merged commit 56c4fca into ocaml:trunk Nov 28, 2020
@nojb nojb deleted the array_find branch November 28, 2020 08:13
dbuenzli pushed a commit to dbuenzli/ocaml that referenced this pull request Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants