Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/validate.go: redundant check for invalidChar #225

Open
3052 opened this issue Oct 23, 2023 · 5 comments
Open

common/validate.go: redundant check for invalidChar #225

3052 opened this issue Oct 23, 2023 · 5 comments
Assignees

Comments

@3052
Copy link

3052 commented Oct 23, 2023

if unicode.IsControl(j) || unicode.Is(unicode.C, j) {
invalidChar = true
}

dbhub.io/common/validate.go

Lines 181 to 183 in f467849

if unicode.IsControl(j) || unicode.Is(unicode.C, j) {
invalidChar = true
}

the IsControl checks are redundant, because unicode.C includes all control characters.

package main

import (
   "fmt"
   "unicode"
)

func main() {
   fmt.Println(unicode.IsControl(0)) // true
   fmt.Println(unicode.Is(unicode.C, 0)) // true
}
@justinclift
Copy link
Member

justinclift commented Oct 24, 2023

Cool, thanks. In my head I have a ToDo item to better understand the Unicode character er... sets, so we can't be accidentally caught out by things.

Haven't gotten around to it yet (thus our mostly not supporting unicode), but this is good info that will likely help. 😄

@3052
Copy link
Author

3052 commented Oct 24, 2023

another issue with the current code is its not checking for invalid UTF-8. when you iterate a string, with invalid UTF-8 it returns utf8.RuneError (U+FFFD), then on the next iteration advances the input one byte. you can check this in a range loop, but its probably easier to just use the handy utf8.DecodeRune, because it returns the size along with the rune itself. with your current code, you have no way to differ an intentional U+FFFD, and one that indicates an error. here is an example name that your code currently reports as valid, which is not valid UTF-8:

"\xA0\xA1"

improved code:

package unicode

import (
   "unicode"
   "unicode/utf8"
)

func binary(src []byte) bool {
   for len(src) >= 1 {
      r, size := utf8.DecodeRune(src)
      if r == utf8.RuneError {
         if size == 1 {
            return true
         }
      }
      if unicode.Is(unicode.C, r) {
         return true
      }
      src = src[size:]
   }
   return false
}

@justinclift
Copy link
Member

Oh crap. We'd better fix that. I should have time to look into this tonight. 😄

@justinclift justinclift self-assigned this Oct 24, 2023
@justinclift
Copy link
Member

Hmmm:

func binary(src []byte) bool {

Shouldn't a validation function for unicode look at things as runes rather than bytes?

@3052
Copy link
Author

3052 commented Oct 24, 2023

if you prefer you can use this instead:

https://godocs.io/unicode/utf8#DecodeRuneInString

but its basically the same thing. the input is string or []byte, then with either function a single rune is returned from the front, and combined with the returned size value, you can determine if the input is valid. or if you want to cheat, you can just read the entire input at once:

https://godocs.io/unicode/utf8#Valid
https://godocs.io/unicode/utf8#ValidString

but if you need any other processing (which you do to detect $ and similar), then you are looping the entire input twice. but I guess if the inputs are small then it doesn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants