common/validate.go: redundant check for invalidChar #225

3052 · 2023-10-23T23:46:19Z

Lines 91 to 93 in f467849

    
           if unicode.IsControl(j) || unicode.Is(unicode.C, j) { 
        
           	invalidChar = true 
        
           }

dbhub.io/common/validate.go

Lines 181 to 183 in f467849

    
           if unicode.IsControl(j) || unicode.Is(unicode.C, j) { 
        
           	invalidChar = true 
        
           }

the IsControl checks are redundant, because unicode.C includes all control characters.

package main

import (
   "fmt"
   "unicode"
)

func main() {
   fmt.Println(unicode.IsControl(0)) // true
   fmt.Println(unicode.Is(unicode.C, 0)) // true
}

The text was updated successfully, but these errors were encountered:

justinclift · 2023-10-24T03:28:29Z

Cool, thanks. In my head I have a ToDo item to better understand the Unicode character er... sets, so we can't be accidentally caught out by things.

Haven't gotten around to it yet (thus our mostly not supporting unicode), but this is good info that will likely help. 😄

3052 · 2023-10-24T04:06:02Z

another issue with the current code is its not checking for invalid UTF-8. when you iterate a string, with invalid UTF-8 it returns utf8.RuneError (U+FFFD), then on the next iteration advances the input one byte. you can check this in a range loop, but its probably easier to just use the handy utf8.DecodeRune, because it returns the size along with the rune itself. with your current code, you have no way to differ an intentional U+FFFD, and one that indicates an error. here is an example name that your code currently reports as valid, which is not valid UTF-8:

"\xA0\xA1"

improved code:

package unicode

import (
   "unicode"
   "unicode/utf8"
)

func binary(src []byte) bool {
   for len(src) >= 1 {
      r, size := utf8.DecodeRune(src)
      if r == utf8.RuneError {
         if size == 1 {
            return true
         }
      }
      if unicode.Is(unicode.C, r) {
         return true
      }
      src = src[size:]
   }
   return false
}

justinclift · 2023-10-24T04:21:36Z

Oh crap. We'd better fix that. I should have time to look into this tonight. 😄

justinclift · 2023-10-24T04:23:19Z

Hmmm:

func binary(src []byte) bool {

Shouldn't a validation function for unicode look at things as runes rather than bytes?

3052 · 2023-10-24T04:30:27Z

if you prefer you can use this instead:

https://godocs.io/unicode/utf8#DecodeRuneInString

but its basically the same thing. the input is string or []byte, then with either function a single rune is returned from the front, and combined with the returned size value, you can determine if the input is valid. or if you want to cheat, you can just read the entire input at once:

https://godocs.io/unicode/utf8#Valid
https://godocs.io/unicode/utf8#ValidString

but if you need any other processing (which you do to detect $ and similar), then you are looping the entire input twice. but I guess if the inputs are small then it doesn't matter.

justinclift self-assigned this Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common/validate.go: redundant check for invalidChar #225

common/validate.go: redundant check for invalidChar #225

3052 commented Oct 23, 2023 •

edited

justinclift commented Oct 24, 2023 •

edited

3052 commented Oct 24, 2023 •

edited

justinclift commented Oct 24, 2023

justinclift commented Oct 24, 2023

3052 commented Oct 24, 2023 •

edited

common/validate.go: redundant check for invalidChar #225

common/validate.go: redundant check for invalidChar #225

Comments

3052 commented Oct 23, 2023 • edited

justinclift commented Oct 24, 2023 • edited

3052 commented Oct 24, 2023 • edited

justinclift commented Oct 24, 2023

justinclift commented Oct 24, 2023

3052 commented Oct 24, 2023 • edited

3052 commented Oct 23, 2023 •

edited

justinclift commented Oct 24, 2023 •

edited

3052 commented Oct 24, 2023 •

edited

3052 commented Oct 24, 2023 •

edited