Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Sequence.first #1139

Open
Deijin27 opened this issue Feb 15, 2023 · 42 comments
Open

[Feature] Sequence.first #1139

Deijin27 opened this issue Feb 15, 2023 · 42 comments

Comments

@Deijin27
Copy link

I really like the fluent sequence api that wren has.
But I feel like a method to extract a single item from a sequence is missing.

Possible implementation

var lst = [6, 7, 8, 9]
lst.first // 6
lst.skip(1).first // 7
lst.first {|n| n > 7 } // 8
lst.first {|n| n > 10 } // null
lst.skip(4).first // null
  first {
    for (element in this) {
      return element
    }
    return null
  }

  first(f) {
    for (element in this) {
      if (f.call(element)) {
        return element
      }
    }
    return null
  }

Do people think this would be a good addition?

@ruby0x1
Copy link
Member

ruby0x1 commented Feb 15, 2023

I've often thought about it, the interesting question is the lack of symmetry with last, wouldn't it feel weird without?

@Deijin27
Copy link
Author

Yeah. While it's much less common, last would be nice too.

The implementation is a bit more tricky though. For a sequence that can be indexed like List, you would want to prefer that to get the value more efficiently.

@mhermier
Copy link
Contributor

mhermier commented Feb 16, 2023 via email

@Deijin27
Copy link
Author

Ah okay, the sequence[index] meets what I want, just to get a single item from a sequence. Is it just not added yet then? I'm not sure exactly what I'm looking for, but I couldnt see any PRs.

I just tested in wren_cli its not in 0.4.0.

@PureFox48
Copy link
Contributor

Sequences aren't in general indexable (for example Range isn't) so you have to convert to a list first:

var lst = [6, 7, 8, 9]
var first = lst.where { |e| e > 6 }.toList[0]
System.print(first) // 7

The trouble with this is that it's not very efficient. You have to iterate through the entire sequence to find out which elements satisfy the predicate, add those to a new list and then extract the first one.

Fot this reason, I would be in favor of your proposal as you only need to iterate the sequence until you find an element which satisfies the predicate and then return that directly.

@mhermier
Copy link
Contributor

mhermier commented Feb 16, 2023 via email

@PureFox48
Copy link
Contributor

Also if first was added, what about second, third...
Also that naming implies ordering which is not implied by the proposed
implementation (which would produce "invalid" results for maps).

Although you have a point about 'first' implying ordering, in my experience languages which have similar fluent APIs to Wren (C#, Kotlin etc.) do still use the word 'first' in the sense that it returns the first element in the iteration which satisfies the predicate. How useful this is will depend on the nature of the sequence though I agree that it may be useless for maps whose iteration order is undefined unless you simply want to extract some random key/value pair which satisfies the predicate.

Also those languages do generally have a 'last' method as well though I can't think of one which has second, third etc. Although you'd need to iterate through the whole sequence to find the last element it would still be more efficient than 'where' because you wouldn't need to do .toList[-1] to extract it.

Possible implementations for Wren:

  last {
    var final = null
    for (element in this) {
       final = element
    }
    return final
  }

  last(a, f) {
    var final = null
    for (element in this) {
      if (f.call(element)) {
        final = element
      }
    }
    return final
  }

@jube
Copy link

jube commented Feb 16, 2023

In C++, it's called front/back instead of first/last.

@mhermier
Copy link
Contributor

@PureFox48 These implementations are O(N). O(1) can be achieved in most case with a reverse().first(), since most reverse() could be implemented lazily. It would also help to guarantee that we don't enter an infinite loop for non terminated Sequences.

@PureFox48
Copy link
Contributor

Yes, I wondered about that - I have a reverse iterator in my own modules, though we don't have one as yet in the core library.

front/back would be reasonable alternatives to first/last in the overloads which don't have a predicate (more or less the C++ usage) but I'm not sure they sound quite right where there is a predicate.

@mhermier
Copy link
Contributor

first(unaryPredicate) really sounds like find(unaryPredicate) returning a value instead of an iterator. So maybe it should be something like findValue(unaryPredicate) instead?

@PureFox48
Copy link
Contributor

I'd avoided suggesting find remembering that you have a PR (#898) open for Sequence.find to return an iterator.

But we could perhaps use findFirst and findLast to return the first and last values.

@jube
Copy link

jube commented Feb 16, 2023

first(predicate) is useless, it's just where(predicate).first

@PureFox48
Copy link
Contributor

You could implement it like that but it would produce an intermediate WhereSequence which @Deijin27's suggested implementation does not do.

On the other hand it would add less code to core and front/back would be fine.

@Deijin27
Copy link
Author

I don't mind what it's called, it's the functionality that matters.

A second or third wouldn't be necessary, as you can combine it with skip(1) skip(2) for this rare case.

I've had a think about my use case, and general use cases in any language
First I'll give some common ones

// if a matching item is found, handle it, otherwise return
// handling it in the loop often leads to pyramid code, so it's common to have
// the 'failures' return, and the 'successful' path be the last thing in the method
// if you need a loop, even the where feels unnecessary when it could be replaced with an if
// the example further down is probably a better case to convey why you might 
// only want or expect a single matching item
// but it's also reasonable that one might want any singlular Frog
var foundFrog = null
for (frog in amphibians.where {|amphibian| amphibian is Frog }) {
  foundFrog = frog
  break
}
if (foundFrog == null) return
handleFrog(firstFrog)

var firstFrog = amphibians.first {|amphibian| amphibian is Frog }
if (foundFrog == null) return
handleFrog(firstFrog)

// you have a list which could be empty, you want
// a single item, in this example to be a default "selected" item
// the indexer throws an exception if the list is empty,
// so you have less simple code
var selectedAmphibian = null
if (amphibians.count > 0) {
   selectedAmphibian = amphibians[0]
}

var selectedAmphibian = amphibians.first

// find an item, where a property is generally unique to an item
// but it's a property on that item, rather than being enforced via
// a map. Or maybe it's multiple items that are unique as a whole. 
// Consider an collection with firstName and surname
var people = [
  Person.new("Kermit", "TheFrog"),
  Person.new("Froggy", "McFrogFace")
]
var kermitTheFrog = null
for (person in people.where {|x| x.firstName == "Kermit" && x.surname == "TheFrog"}) {
  kermitTheFrog = person
  break
}
var kermitTheFrog = people.first {|x| x.firstName == "Kermit" && x.surname == "TheFrog" }

A specific case I have in wren is my xml library.
In the actual implementation of various methods, I use the approach of returning the first value in a loop.

    #doc = "The first and only node in this document that is an XElement. null if there is no XElement in the document."
    root {
        for (node in nodes) {
            if (node is XElement) {
                return node
            }
        }
        return null
    }

And from a user's point of view, an xml element has child nodes that are elements or comments, and thus the elements property is always a WhereSequence to filter out the comments. When processing you often have similar cases to the kermit example, since xml is commonly used for serialization of such data.

// example from readme. Finding a the color of a specific fish in the xml document
var colorOfFishCalledPearl = doc
    .root
    .elements("danio")
    .where {|e| e.attribute("name").value == "pearl" }
    .toList[0]
    .attribute("color")
    .value

I am considering adding various methods that take a function argument, but that wouldn't be necessary wren had the first method; that combined with the already existing where and map would be perfect, and my library would nicely slot into wren's infrastructure.

@mhermier
Copy link
Contributor

The real question is why do you need to handle only the first entry. I mean when you are dealing with a list of item, either:

  • one treat the full list in place.
  • one extract an entry, to treat exhaust the list.
  • one choose an entry from it, so the first one become some default.
    And because peeking the first entry is not stable, since it depends of what the container implementation is, it is not a predictable source, unless it is indexed by any form of unique key.

@jube
Copy link

jube commented Feb 17, 2023

@mhermier It exists in many APIs for a reason. Sometimes, you just want an item in a sequence and you pick the first to come.

@mhermier
Copy link
Contributor

For me, instead it should be name more something like "peekValue" (that throw) with a compagnon "peekValueOr" (that has a default value), since it removes the notion of position in the naming.

@jube
Copy link

jube commented Feb 17, 2023

Well, a Sequence, by definition, has an order: the order in which you can iterate over the elements. So the "first" is well defined: it's the first element in the sequence.

@mhermier
Copy link
Contributor

It has an intrinsic order which is necessary for traversal, not a stable order in the sense that inserting objects in some order guarantee the traversal order in every Sequence species.

@Deijin27
Copy link
Author

I think the examples I gave show how its useful, and they are examples with ordered sequences. Lists or filtered lists. I dont think I've ever used it on the equivalent of a Map in other programming languages, its just there by coincidence.

Skip is comparable to this. Since the Map's enumeration order is not consistent, skip is not useful for it.

@mhermier
Copy link
Contributor

It is true, the API is somehow broken/inconsistent in some aspects. It doesn't take into account some Sequences behaviors like infinite size, ordering and probably some others I don't think of right now.

@PureFox48
Copy link
Contributor

When most programmers see the word peek they probably think about obtaining the first element of some list structure (typically a stack or queue) without removing it. So I think it would be a good word to use here rather than first or front to avoid any ordering connotation.

I see no reason to use peekValue (what else can we be peeking?) or peekValueOr - we can just use peek || default for that.

If we also adopt @jube's idea of using where to do any filtering, there's no need to have a separate overload which takes a predicate. So peek is only going to add a few lines to core and would be a worthwhile addition IMO.

Another advantage of peek is that we needn't bother with adding last/back to maintain symmetry. These can only be implemented efficiently in Wren if we have a reverse iterator which would have to be done on the C side to cover non-indexed sequences properly. I doubt whether it's worth the effort as, in practice, most sequences are indexed and people would just use for (i in seq.count-1..0) or simply reverse them with seq[-1..0] and iterate through that. The last element itself would, of course, be seq[-1].

@mhermier
Copy link
Contributor

I proposed peekValue because I thought it could returns an iterator. But I agree peek make sense.

About peekOr I insist on it. We need to be able to distinguish between a peek that failed and potential valid falsy value that could have been returned by the Sequence. It should probably be discussed in another issue, but I really think we need undefined and operator ?: or alike, to mimic null optional. It would simplify a bunch of problem like these.

@PureFox48
Copy link
Contributor

PureFox48 commented Feb 17, 2023

Well the only way that peek can fail is if the sequence is empty in which case it will return null.

We could check explicitly for that with (peek == null) ? default : peek but I'll grant you that the first element could exist and have a null value and so this isn't a perfect solution.

There are doubtless other methods in core where returning null is ambiguous but, if we start having Or methods for all of them, that may add considerable bloat.

As you suggest, a better solution would be to return an undefined value which can be checked for to resolve any ambiguity - I believe there already is one which is used internally but is not exposed publicly so we could perhaps use that. But that's obviously quite a far reaching change so I think it would be best to open a new issue for that with your preliminary thoughts.

@PureFox48
Copy link
Contributor

Incidentally, a simple way to deal with undefined values would be to have a convention whereby you returned some obscure control code which would never be used in practice these days to denote a failure. A possible candidate would be NAK ('Negative Acknowledgement') which is "\x15" and, if one goes way back, was actually known as ERR at one time.

@PureFox48
Copy link
Contributor

A more OO solution which has just occurred to me would be to add a small singleton class to core:

class Undefined {
    // Returns the singleton. If it hasn't been created, creates it first.
    static value { __instance == null ? __instance = Undefined.new_() : __instance }

    // Private constructor.
    construct new_() {}
}

You could then return Undefined.value instead of null to signify an error and test for it with returnValue is Undefined. So instead of peekOr we could do (peek is Undefined) ? default : peek.

@mhermier
Copy link
Contributor

undefined exist in the VM. It is not exposed to the public, and serve as a tombstone value for the map implementation. It is only a matter of political decision to expose it.

@PureFox48
Copy link
Contributor

Well, if there's no good reason why it can't be exposed, then I think that would be the best solution.

undefined would need to become a keyword and so the change would not be backwards compatible but I think that would be a price worth paying.

Technically and analogous to null, undefined would be the only instance of an Undefined class in core and (along with null and false) would be regarded as 'falsey'.

@Deijin27
Copy link
Author

I see what you're getting at with there needing to be a way to distinguish between a not found and a found null value.

I just wanted to say if you were doing the singleton value approach, you just use the class itself.

class Undefined {}

peek {
  for (value in this) {
    return value
  }
  return Undefined
}

@mhermier
Copy link
Contributor

Yes, we have a discriminatory equality but to be useful as a null optional it should have a dedicated operator.

The question of exposing it was already discussed with @munificent, who didn't wanted to expose another null. It is a complex topic since handling null values is a hot potato. There are many reasons to have them or not mostly based on error handling decisions, like throwing or monad... Monad are interesting because they trigger compile time error, but since we don't have that, it stays a political decision at how to handle error handling and deal with default/fallback values. But if we go monad we can remove null as this and mostly replace it with an enum of error (including a null equivalent).

@PureFox48
Copy link
Contributor

I just wanted to say if you were doing the singleton value approach, you just use the class itself.

Yes, that would work.

Instead of peekOr we could then do (peek == Undefined) ? default : peek.

@Deijin27
Copy link
Author

@datatypevoid Can I have a thumb emoji please? I don't mind which one, I just feel left out. :)

On a more serious note, this has come a long way from the initial suggestion.
The naming of "peek" is good.
Using a new operator certainly sounds like a significant decision that shouldn't be taken lightly.

Would you consider adding a "peek" method to Sequence with a simple implementation, and moving the greater discussion to it's own thread?

Could be a peek that returns null.
Could even only be the peekOr, then future changes wouldn't break code assuming a particular return value of peek.
A peek that throws an exception on failure is significantly less useful, you almost always want to handle the failure case.

@mhermier
Copy link
Contributor

For now the implementation should be something like:

peek() {
  var iterator = iterate(null)
  if (iterator) errorHandlingToBeDefined
  return iteratorValue(iterator)
}

or if we simply throw it can be reduced to:

peek() { iteratorValue(iterate(null)) }

@PureFox48
Copy link
Contributor

If it were up to me (which, thankfully, I hear everyone say it isn't!), then I'd implement Sequence.peek to return null on failure as you had it originally:

peek {
  for (value in this) {
    return value
  }
  return null
}

This at least would be consistent with the way all other methods in core use nulland would also be consistent with the way existing methods in the Sequence class which don't return an iterator are coded (i.e. using a simple for statement).

We can then decide later what if anything to do about undefined for which there should be a separate issue. If we do decide to expose/introduce it in some shape or form, then all methods returning null can be reviewed/changed at the same time.

@Deijin27
Copy link
Author

Specifically in the situation that the predicate variant is included (I'm fine if not, where(fn).peek is fine). Peek sounds odd.

peek(condition), "peek for a value that meets the condition" is a stretch

I thought about "next" "next value that meets the condition", but for the non-predicate one it implies a state in my head, like if you called it twice you would get a different value each time.

"single" might be good. "single value that meets the condition" "get a single value". But it might be too abstract from the fact that in many sequences it will give you a consistent value evem if there is multiple items.

Or of course, just be a different name like findValue, but you lose the assocoation between the two.

@mhermier
Copy link
Contributor

@PureFox48 your statement is false, seeking a non existing element in an array and map throws.

@Deijin27 single may imply to return a Sequence of one element, or test the size() == 1.

@Deijin27
Copy link
Author

Deijin27 commented Feb 18, 2023

It throws if the index is out of range in the List.

It returns null if the key isnt found in a map.

I dont think this is an inconsistency.
With a List, if youre looking for a value at an index the check of list.count is fast, and then youre confident the value exists.
With Map, its common to want a tryGetValueByKey method to to do a contains and get in one. In wren, this is done with null.
The choice of which seems to be based on priorities of the situation.

@PureFox48
Copy link
Contributor

@Deijin27

I agree that peek(predicate) just doesn't sound right and, if we go with peek, then I think we would have to use where(predicate) to do any preliminary filtering.

I'm not keen on next which sounds a bit like you're wanting the second element rather than the first and I think single would be confusing as it's used in a different sense in other languages which have a fluent API. In C#, for example, it's always used with a predicate and returns the only element of a sequence which satisfies that predicate, throwing an exception if there's more than one.

@mhermier

your statement is false, seeking a non existing element in an array and map throws.

Which statement is that?

For the avoidance of any doubt, I'm only suggesting we look at methods which currently return null to indicate an undefined value, not those which throw in this situation. Sure, we can review the latter to ensure that's still the best course of action but changing them now would probably be a huge break from the past.

As @Deijin27 has just said, seeking a non existing element in a map returns null rather than throwing - you can use the containsKey method to check whether the key actually exists.

@PureFox48
Copy link
Contributor

I've just had a quick look through all the core and CLI modules to find out which methods return null if unsuccessful and could only find these 5:

  • List.remove(value) - returns null if value not found (or the removed value otherwise)
  • Map.remove(key) - returns null if key not present (or the removed value otherwise)
  • Map[key] - returns null if key not present (or the value for that key otherwise)
  • Num.fromString(value) - returns null if value can't be parsed (or the number otherwise)
  • Stdin.readLine() - returns null if stdin is closed (or the string read otherwise)

Of these the last two aren't a problem - null is neither a number nor a string and so they aren't ambiguous.

The first three are ambiguous as null could be a value in the list or map but we do have list.contains (or indexof) and map.containsKey to resolve any ambiguity.

This makes me wonder whether we are getting over-excited about introducing some kind of undefined value - particularly if @munificent was against it - and should we therefore just stick with what we have?

User defined modules can, of course, deal with this situation in any way they please and, as already noted in this issue, there are several ways which don't use null available to them.

@mhermier
Copy link
Contributor

If it is ambiguous then it is not well defined. By that I mean, if we use null to encode a valid value, an error, or as an optional place holder, we can't use an API correctly and intuitively.

@PureFox48
Copy link
Contributor

I agree that it's not ideal but there are other things in Wren which, due to the simplicity of the language, we accept - sometimes reluctantly - as 'best efforts'.

Having chewed this over at length, I've come to the conclusion that we should accept the possibly ambiguous use of 'null' as a best effort (at least in core).

There are, after all, ways to check if it's a 'real' value or not and, even with this proposal, you could use Sequence.isEmpty for that purpose.

There doesn't seem to be much appetite for exposing undefined as a 'second null' in any case (it was voted down when I proposed it). This doesn't really surprise me as @munificent's views still carry considerable weight with all of us.

The alternatives - using an obscure sentinel value, using a singleton value or a singleton class object - all suffer from being 'truthy' under the current rules so, if you're using 'falsiness' to check for an undefined value (even in a third party module), you'd need to change your code to check for undefined values explicitly.

Also the 'singleton' solutions are not backwards compatible (Undefined, or whatever, may have previously been used as an ordinary identifier) though the sentinel value approach could be made so by introducing an extra property into core (Object.undefined or Object.error perhaps) to wrap it and make it easier to use. There are plenty of candidates other than NAK - such as: 0x7f (DEL) or even an unused NaN value perhaps.

Out of curiosity, as you obviously feel strongly about this, what is your preferred solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants