New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String suffix and prefix functions. #9533
Conversation
I support it! A couple of remarks:
|
I think for the labeled version it is okay. |
I think a label should be added in I usually use
I would rather not complexify the interface of these simple functions. Lowercasing your datum before testing seems fine. |
A related operation is to check for a given suffix/prefix and return the string with that suffix/prefix dropped (returning a |
Is it that important ? Once you have checked the affix with the predicate function on the (possibly lowercased) datum it's just a matter of trimming the original datum by the length of your affix. I rather have good cutting functions which are really lacking at the moment. |
By the same argument, checking for prefix/suffix is "as simple" as cutting the string and comparing for equality. Avoiding length- and index-related calculation (and also copying) on the call side is the whole point, I think. |
Not really. It doesn't feel very natural to have to cut a string for checking for prefix/suffix and there are corner cases you'll have to deal with (like the datum is shorter than the prefix) if you tell people to go that way -- these corner cases do not exist once you have checked for affixness and say, simply |
I added labels at least to the labeled version, I'm still unsure what the policy about labels for the non labeled modules so I left it out. Concerning the cutting functionality I have no strong opinion. |
stdlib/string.mli
Outdated
@@ -316,6 +316,14 @@ val equal: t -> t -> bool | |||
(** The equal function for strings. | |||
@since 4.03.0 *) | |||
|
|||
val is_prefix : t -> t -> bool | |||
(** [String.is_prefix s pre] tests if [pre] is a prefix of [s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned this is the wrong argument order. If I read String.is_prefix s0 s1
I expect the function to check that s0
is a prefix of s1
. It's also an inconvenient order for partial application.
Containers' is_prefix
has the order I suggest. Batteries and base do not have this order but I don't think it's a good idea to follow them on that one.
The same holds for is_suffix
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the current variant is consistent with the existing check_suffix
function from Filename
and consistent with the other search and check functions in the String
module which all take the string on which to perform the actions are as first argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need at least one label here if we want the function, so that the order cannot be confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the current variant is consistent with the existing
check_suffix
function fromFilename
I don't think there's any need for consistency with that one. The name is sufficently different.
String
module which all take the string on which to perform the actions are as first argument.
There's already split_on_char
that does not follow this. Following what you mention results in something that reads wrong if applied unlabelled, please don't do this. Code reading is more important than the consistency (which never really exists in OCaml) you are looking for.
It doesn't feel very natural to me to lowercase a whole string to check for a given (small) prefix in a case-insensitive way... As for the cutting "corner cases", one could provide cutting functions that don't fail (such as |
I'm in favor of using labels also for the "normal" versions (in |
At least it's a quick way to do it in a compositional way. Adding optional arguments for special casing the lowercase ASCII letters normalisation in a prefix checking function on bytes feels very ad-hoc to me. It's again one of these very limited things you can't use half of the time because you have other normalisations to perform like transforming dashes to underscores or actually need to support the Unicode repertoire in a meaningful way. As a result, what you get is API cruft. |
I believe we should drop the case-insensitivity idea, and agree to use a label in the standard version (let's be frank, no one uses the Personally I'm curious about the idea of having suffix-removing functions. It is natural to think of those functions as returning an augmented boolean, either a failure or the reduced string. So the general interface would be |
I added the label to |
I rebased my branch, added @dbuenzli and @gasche as reviewer (since both are mentioned by Github in the reviewer field) and reverted the change to the Windows Filename suffix, since the Version using the new |
Is there anything missing or can this be merged? |
I'm not sure I deeply stand behind this design but it still do like the regularity of the function signatures for certain higher order usages. This has not been discussed and since we will have to live with the functions forever it might be worth pondering. (Don't take this as me being against the merge @bschommer, I'll be as glad as you to eventually have these functions in the stdlib). |
I have no strong opinion on the signature of the two functions but the current version seems to me verbose enough to make clear how to use the functions. |
Verbosity was not my point. The point is uniformity of signatures for these kind of predicates while retaining reading clarity. |
Okay I understand the problem. A quick look at other languages shows that most prefer the name I would argue that having the suffix or prefix first makes more sense since it seems to me more common to pass such a function with the same prefix or suffix as predicate to for example |
Sorry, I have myself argued above that the affix should be first so I thought that was evident. I should have been more precise: I'm just questioning between having a uniform label ( Regarding |
Batteries has Concerning |
Thanks @bschommer. So to sum up here are mostly equaly sound alternatives to me: (* 1 *)
val contains : string -> string -> bool
val starts_with : string -> string -> bool
val ends_with : string -> string -> bool
(* 2 *)
val is_infix : affix:string -> string -> bool
val is_prefix : affix:string -> string -> bool
val is_suffix : affix:string -> string -> bool
(* 3 *)
val is_infix : infix:string -> string -> bool
val is_prefix : prefix:string -> string -> bool
val is_suffix : suffix:string -> string -> bool I'm a bit unsure and don't have a strong opinion but I may be leaning a bit towards 1. which despite the seemingly less regular naming scheme reads quite well in my opinion. |
I would prefer either 1 or 3, but have no strong opinion. Concerning the contains function, I would welcome the addition of such a function, but I afair there already was a PR for this and it was rejected since the function is already contained in some form in the |
I like the function names in 1, but I dislike the lack of labels to distinguish the argument. As a predicate, it is more natural to have the "affix" string as the first argument ( E.g.: val contains: pattern:string -> string -> bool
val starts_with: pattern:string -> string -> bool
val ends_with: pattern:string -> string -> bool |
|
I think it should be added is @alainfrisch against ?
Funny. We don't read the same way. When I read I don't mind 1) with labels but the more I think about it the more I'm in favour of uniform labelling. I was never really happy with my
(* 5 *)
val contains: needle:string -> string -> bool
val starts_with: needle:string -> string -> bool
val ends_with: needle:string -> string -> bool or if we drop the uniformity: (* 6 *)
val contains: needle:string -> string -> bool
val starts_with: prefix:string -> string -> bool
val ends_with: suffix:string -> string -> bool |
I'm not sure if one does not want to have a val contains: needle:string -> string -> int option Returning |
Well in fact |
I definitely agree with the naturalness of |
I renamed the functions to |
I rebased the PR to avoid the conflict with the Changes file and I think there has been some kind of agreement on the API, so from my side this seems ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me, and indeed the discussion seems to have converged. Unless someone objects, I propose to merge by end of day today.
The functions test if the second argument is a prefix or suffix of the first argument.
Merged. Thank you very much for your efforts @bschommer! |
If someone once wants to add the infix version I'd suggest |
And rereading the whole thing I think maybe we should have gone with the val includes: sub:string -> string -> bool
val starts_with: sub:string -> string -> bool
val ends_with: sub:string -> string -> bool Which I personally find better than: val includes: sub:string -> string -> bool
val starts_with: prefix:string -> string -> bool
val ends_with: suffix:string -> string -> bool |
I'm not sure what the policy is here, but since these function are not officially release one still could change the signature. I agree that |
I don't think there's anything wrong with changing it. Precisely because it's not yet out in the wild. Once the latter occurs we'll be stuck with the names forever.
It may be a little bit less precise but I think that from a code reading perspective it works equally well to disambiguate which is which and is a bit more ergonomic for the code writer (a single consistent name to remember, enables higher-order usages).
|
@bschommer If you don't feel like it I'm not suggesting you do the work for |
Concerning |
let starts_with ~prefix s = | ||
let len_s = length s | ||
and len_pre = length prefix in | ||
let rec aux i = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will allocate a closure
extlib uses the following implementation : https://github.com/ygrek/ocaml-extlib/blob/master/src/extString.ml#L43
ftr extlib uses |
While updating extlib for 4.12 I notice that for the people used to extlib order of arguments (and/or in the code that uses extlib) this change will definitely cause bugs (esp. given that "label omitted" warning is disabled by default). |
Is it possible to change to |
See my comment #10105 (comment) . Too late to change in 4.12, but perhaps it is still time to not include these functions in 4.12. |
As I mentioned in PR #9446 I think it would be a good idea to have a function to check for prefix and suffix in the String module. Both ocaml-batteries as well as Base contain such a function and the module
Filename
implements a simplified version itself for thecheck_suffix
method.I implemented a first simple version of these functions but there are still some open points:
starts_with
andends_with
and Base usesis_prefix
andis_suffix
, which I personally find better.Bytes
.Any opinions on this?