Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] String Comparison Ignore Case #1134

Open
mwasplund opened this issue Jan 8, 2023 · 1 comment
Open

[Feature] String Comparison Ignore Case #1134

mwasplund opened this issue Jan 8, 2023 · 1 comment

Comments

@mwasplund
Copy link

I need to be able to compare strings ignoring case. This can be accomplished by introducing a string.compare with direct support for ignore case or by introducing toUpper/toLower (Best to use upper to avoid some situations characters to not match when round tripped to lowercase). Introducing the conversion methods are more general and support a broader range of scenarios, but will be slower than a direct comparison. I know this is a can of worms since it will most likely need to consider culture information. Wren may be able to use culture invariant comparisons, but unsure if that will meet the needs of all users of the language.

@PureFox48
Copy link
Contributor

Currently, we don't have methods in the core library to convert a string to lower or upper case.

However, there is a PR (#1019) to introduce String.lower based on the following Wren code:

lower {
    var output = ""
    for (c in codePoints) {
        if ((c >= 65 && c <= 90) || (c >= 192 && c <= 214) || (c >= 216 && c <= 222)) {
            c = c + 32
        }
        output = output + String.fromCodePoint(c)
    }
    return output
}

The PR doesn't include an upper function but the code for that would be:

upper {
    var output = ""
    for (c in codePoints) {
        if ((c >= 97 && c <= 122) || (c >= 224 && c <= 246) || (c >= 248 && c <= 254)) {
            c = c - 32
        }        
        output = output + String.fromCodePoint(c)
    }
    return output
}

These are based on the ISO-8859-1 character set (i.e. Unicode codepoints < 256) which has the merit of providing almost complete coverage of the major Western European languages. A minor problem is that there is no upper case equivalent of the German letter and the very rare French letter ÿ within this character set. Although it would be possible to upper case the former as SS unfortunately this would not round-trip.

I think myself this as far as we're likely to go in a simple language such as Wren. It would be much more difficult to extend casing to the full Unicode character set (though I do have methods which produce much greater coverage in my own modules), to provide normalization or to have locale specific versions for the reasons discussed in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants