Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supplementary unicode characters from Ll and Lu categories are not parsed as identifier together with _ #12485

Closed
unkarjedy opened this issue Nov 6, 2021 · 2 comments · Fixed by scala/scala#9805
Assignees
Labels
Milestone

Comments

@unkarjedy
Copy link

Scala 2.13.7

According to the spec https://www.scala-lang.org/files/archive/spec/2.13/13-syntax-summary.html
all these should be parsed (except commented OK compiler error).

Scala 2.13.7 compiler produces errors where I placed the comments.

//all ok
class A0 {
  val  = 1 // category: So other
  val  = 1 // category: Sm math
  val 𝓅 = 1 // category: Ll math (suppl)
  val 𐐀 = 1 // category: Lu (suppl)

  val ⚕⚕⚕ = 1
  val ∀∀∀ = 1
  val 𝓅𝓅𝓅 = 1
  val 𐐀𐐀𐐀 = 1
}

class A1 {
  val a_⚕ = 1
  val a_∀ = 1

  // FALSE compiler error
  // lower `a` + lower `_` + suppl upper/lower letter from cat `Ll | Lu'
  val a_𝓅 = 1
  val a_𐐀 = 1
}

class A2 {
  // OK compiler error
  // lower `_` + op  `⚕ | ∀` should be separated with an extra `_`
  //val _⚕ = 1
  //val _∀ = 1

  // FALSE compiler error
  // lower `_` + suppl upper/lower letter from cat `Ll | Lu'
  val _𝓅 = 1
  val _𐐀 = 1
}

class A3 {
  //OK
  val __⚕ = 1
  val __∀ = 1

  // FALSE compiler error
  // lower `_` + lower `_` + suppl upper/lower letter from cat `Ll | Lu'
  val __𝓅 = 1
  val __𐐀 = 1
}

All false compiler errros are OK Java identifiers:

//java compiles just fine
class MyJava {
    public void 𝓅() {}
    public void 𐐀() {}

    public void 𝓅𝓅𝓅() {}
    public void 𐐀𐐀𐐀() {}

    public void a_𝓅() {}
    public void a_𐐀() {}

    public void _𝓅() {}
    public void _𐐀() {}

    public void __𝓅() {}
    public void __𐐀() {}
}

relates to:
#12482
#1406
scala/scala#9687

@unkarjedy
Copy link
Author

I haven't dived into the scalac implementation details, but looks like Lu and Ll categories are parsed as part of op & opchar rule, not lower or upper rule

@som-snytt
Copy link

som-snytt commented Nov 6, 2021

Reminder to self, check out the caret in messaging.

+newSource1.scala:24: error: illegal start of simple pattern
+  val a_𝓅 = 1
+           ^

Edit: I think it was due to a bug about consuming chars. I didn't look too closely, and also I didn't create tests around error printing. My vim doesn't do a great job with supple chars, though it did display a wide cursor, also I don't have all the char sets installed, apparently.

@unkarjedy unkarjedy changed the title supplementary unicode characters from Ll and Lo categories are not parsed as identifier together with _ supplementary unicode characters from Ll and Lu categories are not parsed as identifier together with _ Nov 6, 2021
@dwijnand dwijnand added this to the 2.13.8 milestone Nov 18, 2021
@SethTisue SethTisue modified the milestones: 2.13.8, 2.13.9 Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants