-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seq.range #10480
base: trunk
Are you sure you want to change the base?
Seq.range #10480
Conversation
(should probably add only one function per PR)
xref #9386 |
Hi! While I am in favor of this function addition, I'm not really convinced by the current function signature : I think it is not very much in the OCaml spirit to have optional arguments, and as you brought out, they are rare in the stdlib. What's more the other implementation suggest to have at least For the case of
|
Hi !
Regarding Regarding
I think having that argument is important because it allows for backward ranges, but I don’t know if it is frequent enough to justify having it as a positional argument What do you think about the following signatures (similar to the Base signature, except for the ommited Seq.range: ?step:int -> int -> int -> int Seq.t
Seq.inf_range: ?step:int -> int -> int Seq.t |
Looks good :-) |
For the record: I'm not a fan of
|
What do you think about
|
Let me come back to your remark
First, this isn't quite right: see Second, that's because many functions that create sequences are in other modules, e.g. To be consistent, maybe your |
EDIT :This part of my comment is essentially wrong, see yallop's comment below That’s true. I think this part of my message wasn’t very precise, so let me attempt to clarify it
These are already very useful because they allow to :
But, what I wanted to say by pointing that out, is that in my opinion, none of those ways (except of course for the constructor) make use of the full potential of
This is what I meant by
I hope it is clearer now ^^
Good point, it is true that this implementation is specific to the CCInt.range: t -> t -> t iter Personally, I think it would be a bit confusing. The Moreover, I am not sure this would make the naming more consistent : the Additionally, in other parts of the List.init : int -> (int -> 'a) -> 'a list Another solution would be perhaps to
For example we could provide the function (I’m not sure how to name it, repeat is a very bad name) Seq.repeat: ('a -> 'a) -> 'a -> 'a t such that This would make it easy to recreate the infinite range for e.g. let int32_inf_range ?(step=Int32.one) start = Seq.repeat (Int32.add step) start But also for, for example, for Zarith integers let z_inf_range ?(step=Z.one) start = Seq.repeat (Z.add step) start For the finite range, we could similarly make a finite version of What do you think? |
You should take a look at # let range ~start ~stop ~step =
Seq.unfold (fun i -> if i >= stop then None else Some (i, i+step)) start;;
val range : start:int -> stop:int -> step:int -> int Seq.t = <fun>
# List.of_seq (range ~start:10 ~stop:20 ~step:2);;
- : int list = [10; 12; 14; 16; 18] or like this: # let repeat f x = Seq.unfold (fun x -> Some (x, f x)) x;;
val repeat : ('a -> 'a) -> 'a -> 'a Seq.t = <fun> |
Seems like I have read the docs too fast, I wasn’t aware of Also it makes my earlier statement about the methods for creating |
Well as a recap, before I change the interface, we have two open questions:
(caveats for each are discussed above) I'm personnally more in favor of |
Doesn't it make more sense to
I think this way there is very little surface for argumentation. The end of the I would say this function belongs to either |
OK I split the function into Seq.range : ?step:int -> int -> int -> int Seq.t
Seq.count_from : ?step:int -> int -> int Seq.t As per your recommendations. |
Your last commit makes change in part of the code that is unrelated to this PR, I think you should revert it. |
This reverts commit 9cf0252.
Yes, I ran ocp-indent on the code since it didn’t pass the checks, but I didn’t think about the side effects… I’m reverting it |
I looked at Sequel and oseq codes,
In conclusion, if #9312 is happening soon, we should decide
|
Nitpick: Oseq has |
Just curious, do you have any good justification for that ? Two things that come to mind is that exclusive doesn't allow you to enumerate In another context, most people have been arguing aswell for exclusive bounds, mainly from a proof perspective. But from an ergonomic/code reading perspective I have yet to be convinced. E.g. it seems to me that the Besides the Dijsktra note I mentioned there gives basically three arguments in favor of exclusive, one of which is purely esthetical ("that is ugly") and somehow moot if your datatype allows negative numbers anyway, and the two others are these The PL world seems definitively undecided; and so I am so far. |
I don't have an elaborate argument right now, but :
- it works well for rust (for both iterators and array slices). Ergonomic and nice.
- exclusive ranges mean you can have empty ranges. With inclusive ranges it's awkward (or impossible if you infer the direction from the bounds). It feels wrong to not have empty ranges; and something I don't like with ocaml's for loops.
- the +1 thing I assume refers to consecutive ranges, where the start of the next needs to be n+1 from the previous one if one uses inclusive ranges. I think it's like indexing from 1 in arrays : not wrong, just not convincingly better in my eyes 🤷♀️
|
Sorry, I ctrl-f'd too fast For arguments in favor of exclusive range, I'd add that
|
That's not really compelling… I'm sure most programmers using
You didn't mention why inclusive is ergonomic. I don't find it ergonomic because it's not WYSIWYG. You constantly need interpretative labor (in the form of subtracting 1 to the upper bound) when you read code to understand what you really get. In the particular case of array indexing which is prone to off-by-one errors I would rather shoot for the solution that needs less mental labor from the code reader. Besides, if we need to resort to aesthetics arguments, it puts your upper bound outside the indexing domain of your array which I don't find particularly nice, if not an impedance mismatch.
With inclusive you get empty ranges when your upper bound is strictly smaller than the lower one (e.g. It seems that with inclusive you don't get to express an empty range at But one could argue that not being able to enumerate
The I'm yet to be convinced that using inclusive bounds leads to more So given that I'm mostly presented with what seems to me aesthetics arguments in favor of exclusive, I'm so far still much more drawn to an inclusive bound convention as it needs less interpretative labor – but then I'm a lazy person and pretty bad at indexing puzzles. |
Yes that’s where the tradeoff resides:
In conclusion the situation is like that:
It follows, an important question is: is it better to specify the bounds, or the length ?. Honestly from my experience, I’ve had to specify the length / number of elements way more often. Especially, since I’ve been using them in python, I find it quite rare to have to write I’ve made this comparative array of my experience and what was said above. It contains, for each cases, said advantages, as well as in which cases they are better.
|
It's definitely tied to habits and aesthetics. As someone used to 0-based indexing and arrays, it's just more consistent to me to have ranges be consistent with array indexing and slices indexing.
|
Since the start element is not an optional element in your current proposal I don't really see what's the problem with using So I find some of the advantages for exclusive in your table quite a bit disputable, notably the one about repeating task. And as for iterating over datastructures you will likely rather invoke an Regarding indexing/slicing maybe we are a bit OT. But I still don't see what exclusive bounds makes particularly "more consistent" with For indexed datastructures it's interesting to ponder a bit on trying to caracterize the datatype of your function arguments. With exclusive bounds you need to introduce a new datatype/concept to be able to talk about the behaviour of functions in a meaningful way: see the discussion the about indices and positions of a string in the String module which in my opinion makes the functions of the whole string module much more complicated to understand than it could be – I like my occam razor too :-) |
Regarding indexing/slicing maybe we are a bit OT. But I still don't see what exclusive bounds makes particularly "more consistent" with `0`-based indexing/slicing. Personally I find that in my indexed data structures chopping tasks I'm much more interested in specifying bounds, that is which indices are going to be part of my results.
We are not OT, it also corresponds to List.init/Array.init behavior!
`Array.init n f` is indeed the range `0 .. n` where `n` is excluded. :-)
|
And I answered too quickly, but: slices are very much a thing in OCaml,
even if implicitly. Consider the arguments of Array.sub, Bytes.blit, etc.
They are all length-based, with 0 based indexing. This means that the
range `i .. i+len` describes directly the slice you get from
`Array.sub arr i len`, for example.
|
I think in the absence of a compelling case (and there's not going to be an overly compelling case one way or another), we want to aim for consistency if possible, and if not, then consistency with the majority of cases. It's far more important that the user build up an expectation of how things work than deciding between several fairly similar designs. If ranges are exclusive in the API (and I think they are), then they should stay that way. Of course, the one exception I can think of is baked into the language itself in the |
As I mentioned when I tried to introduce In most chopping tasks you chunks get caracterized by indices (results of finding functions), having to convert them to lengths just introduces more opportunities for off-by-one errors. |
That's only one specific use case though. A lot of other tasks are more amenable to offset + length, in my experience, even the most basic reading/writing with |
Many functions over sequences were just introduced into 4.14, including |
For my reference, this was #10583.
While using 4.14, I find myself reimplementing |
Hi! This is my first PR (ever in an open-source project), so please be patient with me.
Adds
(I am not sure about the interface, see interface)
What does it do?
This is the equivalent of
range
from python, oriota
from lisp / C...It builds a sequence containing a range of integers, starting at
start
(default 0), ending atend
not included (default: infinite), and doing steps ofstep
.More formally, it creates a$s_i$ so that
$$s_i = \text{start} + \text{step} \times i$$
Seq.t
representing the sequenceand so that
stop
is given, that sequence is finite, and contains only the elements that are in the segmentstop
is not given, the sequence is finiteWhy?
I love the concept of the
Seq.t
type, I use python a lot, and I would love to be able to do all that python iterators allow me to do in ocaml, but there is a slight problem with that:The
Seq
module doesn’t have a lot of primitives (it’s got 14 functions. Compare withList
that has 62 (thanks%s/^val//n
)In particular there isn’t any function in that module to create
seq
. Though there are conversions from other types, that means that there is no function for creating infinite sequences, or to build sequences as we traverse themI learnt (from the
stdlib/CONTRIBUTING
file) that anybody could contribute to the stdlib, so here I am.Interface
I’m not very sure about this interface, mainly because all arguments are optional, and optional arguments are rare in the stdlib. (though there are functions that have a lot of optional arguments, for example
Filename.Unix.quote_command
)The name and arguments are mainly inspired from the python
range
function. The fact that they’re named makes them self-explanatory.The Batteries implementation (in BatEnum) is the following
The differences are that the
start
is positional, and there is nostep
. I think removingstep
has no advantage since it is an optional argument anyway.The Containers one is
there is no
step
and all arguments are positional. And stop is included by default (but there is also a--^
operator that doesn’t include it)Janestreet Base has:
It is very clear and self-explanatory (especially the
\
inclusive/
`exclusive` make the function very clear), but because of that it has a lot of arguments, so the interface may be a bit harder to rememberIn conclusion, other sensible options include
stop
a positional argument (like in python, or in BatEnum)--
operator like in ContainersRemarks
'a Seq.t
is actually a shorthand forunit ->'a Seq.node
, this means this function is actually of typeThat explains why there is no (technical) need to have a non-optional arguments here.
Other python iterator functions that would be interesting to have in
Seq
and that I would like to implement
Seq.init
(likeList.init
) though I’m not sure this is necessary withSeq.range
itertools.product
, that calculates the cartesian product betweens two sequences (and returns a sequence of that product in alphabetic order)Seq.map2
(likeList.map2
), or aSeq.zip : 'a Seq.t -> 'b Seq.t -> 'a*'b Seq.t
likepython.zip
(one would be easily derivable from the other)list
but not forseq
such asfor_all
,reduce
,mem
,find
…