Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix null convertion in Num::fromString (#1131) #1132

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mhermier
Copy link
Contributor

@mhermier mhermier commented Jan 5, 2023

When strtod returns 0.0, no conversion could have happened. Handle the error case correctly.

When `strtod` returns `0.0`, no conversion could have happened. Handle
the error case correctly.
@PureFox48
Copy link
Contributor

If there's no conversion then strtod will return 0.0 but it would also return 0.0 if passed "0.0" or "0.0suffix" say.

However, your actual patch does handle that correctly.

Personally, I'd like to have seen "1.2suffix" return 1.2 as it does in several other languages I've used including C itself which saves you having to trim off extraneous characters after the numerical part. However, I've just checked and it returns null at present so, unfortunately, that would be a breaking change.

@mhermier
Copy link
Contributor Author

mhermier commented Jan 6, 2023

The statement "C itself which saves you having to trim off extraneous characters after the numerical part" is misleading. The libc API offers incremental parsing. As such it is up to the user to decide/handle error cases, such as extraneous characters. Being lax about extra characters is the lazy route.

But I think I prefer the stricter route for parsing. It is more easier to compose (to parse more diverse inputs) and propagate errors. Relying on these incremental parsing API based on expected usages, always backfire when you have to fight against these expected usages.

@CrazyInfin8
Copy link
Contributor

Personally, I'd like to have seen "1.2suffix" return 1.2 as it does in several other languages I've used including C itself which saves you having to trim off extraneous characters after the numerical part. However, I've just checked and it returns null at present so, unfortunately, that would be a breaking change.

I suppose that #984 could provide this functionality. Though I'd argue that this behavior should be a separate function or using extra, optional parameters on fromString to specify whether we want to parse the whole string or just the first parse-able number. wrenParseNum is also able to return the number of characters in the number which could be useful for consuming strings in wren, as well as providing info of what caused a parse to fail.

I think if there is interest on this topic, we could open a new discussion.

@PureFox48
Copy link
Contributor

PureFox48 commented Jan 6, 2023

@mhermier

Well it's worth noting that Num.fromString is not completely 'pure' in that it does ignore extraneous whitespace (" 123 " works fine) and also parses number forms such as 0X123 with a capital X or 0123 which are not valid Wren though, thankfully, the latter is parsed as a decimal rather than a C-style octal literal.

However, the general vibe of the method seems to be to parse anything that looks in its entirety like a number and return null for anything else so I'm content to leave things as they are.

In most cases that arise in practice you know (or think you know) that the numeric part will either be followed by a space + a word or by a fixed-length non-numeric suffix so it's easy to trim these off using split or by taking a sub-string. Other cases are more awkward to deal with (particularly when the number is a hex literal) and you then need to search for the first non-numeric character. I might add something which does this to my 'str' module though it's already bulging at the seams following the recent addition of some more Unicode-aware versions of lower and upper case conversion methods :)

@CrazyInfin8

Hadn't considered it but, yes, we could provide the above functionality via an extra parameter to Num.fromString. This would be backwards compatible as the current version could then become an overload of it which maintained the status quo by setting the extra parameter to false.

I don't mind having a separate discussion about it though I suspect @mhermier might not be too keen as he's against the idea on principle.

Perhaps it's best to leave it for now until we see what's going to happen about binary/octal literals and digit separators being added to the language which would, of course, necessitate #984 or something like it being seriously entertained.

@mhermier
Copy link
Contributor Author

mhermier commented Jan 6, 2023

@PureFox48 I'm not against it, but if an API is complex and user can fall into a pit of partial success by being lazy, they will do. The problem with strtod is exactly that. The API is simple and meant to be used for incremental parsing, ie the user get back how much was parsed. On the other hand Num.fromString lack the possibility to output how much was parsed. So we can't know if a string parsed today, will be parsed the same way tomorrow, unless we properly check for a complete parsing of the input.
As an example, 16rcafebabe is a valid hexadecimal number in smalltalk. With what you suggest 16 would be returned and rcafebabe would be ignored because it would makes no sense. If tomorrow, we decide to support that syntax, it would return hex cafebabe value, breaking code and there is no sane way to recover from that.

@PureFox48
Copy link
Contributor

Well, if this were done as an extra parameter, then the user would have to explicitly set it. We'd therefore have:

var a = Num.fromString("16rcafebabe")       //> null
var b = Num.fromString("16rcafebabe", true) //> 16

So I don't really accept that this would be an easy pit into which a lazy user could unwittingly fall.

However, I do accept that as we don't know what the future may hold we should stick to the status quo for now as far as the core library is concerned (folks like me can always roll their own) though I'd be very surprised indeed if Smalltalk-type integer literals were ever to be introduced into Wren!

@mhermier
Copy link
Contributor Author

mhermier commented Jan 6, 2023

Anyway, I think the way it is fixed here is acceptable.

@PureFox48
Copy link
Contributor

Yes, so do I and the quicker it's merged the better.

@PureFox48
Copy link
Contributor

PureFox48 commented Jan 6, 2023

OK, this is what I've come up with for my 'str' module:

class Str {
    //  ... (existing stuff)

    // After trimming whitespace from the string 's', takes as many characters as possible
    // to form a valid number and converts it thereto using the Num.fromString method.
    // Returns null if such a conversion is impossible.
    static toNum(s) {
        if (s is Num) return s
        if (!(s is String)) s = "%(s)"
        s = s.trim()
        var n = Num.fromString(s)
        if (n) return n
        if (s.count < 2) return null
        var chars = s.toList
        for (i in chars.count-1..1) {
            chars.removeAt(i)
            if (n = Num.fromString(chars.join())) return n
        }
        return null
    }
}

var strs = ["", "?", "7Z", "3 dog night", "123.45 dollars", "$123.45", "-infiniteCalm", "nansAreGoodToEat", "0x1gen"]
for (s in strs) System.print("\"%(s)\" -> %(Str.toNum(s))")

/* output:

"" -> null
"?" -> null
"7Z" -> 7
"3 dog night" -> 3
"123.45 dollars" -> 123.45
"$123.45" -> null
"-infiniteCalm" -> -infinity
"nansAreGoodToEat" -> nan
"0x1gen" -> 1

*/

This isn't very efficient as I'm just removing characters from the end of the string until I find something that Num.fromString can parse or returning null if nothing can be parsed.

However, it's a heck of a lot less code than trying to parse the string from the front and deal with all the possibilities that can arise.

As the strings would not usually be very long, it should be quick enough in practice. However, if anyone has any suggestions for improvement, I'd be glad to hear them :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants