From e293992d456f54bf712f64fba2df195e658391e8 Mon Sep 17 00:00:00 2001 From: Martin Atkins Date: Wed, 16 Nov 2022 09:20:03 -0800 Subject: [PATCH] docs: "How Terraform Uses Unicode" should mention HCL too I missed this on my first attempt to write this document. Consequently we're currently depending on a version of HCL which uses Unicode 9, and that's significantly lagging behind everything else which is currently on Unicode 13. My goal of adding these docs then is to remind us to update HCL to Unicode 15 once we're updating everything else to Unicode 15 with the Go 1.20 release, assuming that the Go team completes that Unicode upgrade as currently planned. --- docs/unicode.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/docs/unicode.md b/docs/unicode.md index e6f627d21153..efcb442dc76d 100644 --- a/docs/unicode.md +++ b/docs/unicode.md @@ -45,6 +45,31 @@ The other subsystems described below should always be set up to match themselves with `unicode.Version` and generate an error if they cannot, but that isn't true of all of them. +## Unicode Identifier Rules in HCL + +_Identifier and Pattern Syntax_ (TF31) is a Unicode standards annex which +describe a set of rules for tokenizing "identifiers", such as variable names +in a programming language. + +HCL uses a superset of that specification for its own identifier tokenization +rules, and so it includes some code derived from the TF31 data tables that +describe which characters belong to the "ID_Start" and "ID_Continue" classes. + +Since Terraform is the primary user of HCL, it's typically Terraform's adoption +of a new Unicode version which drives HCL to adopt one. To update the Unicode +tables to a new version: +* Edit `hclsyntax/generate.go`'s line which runs `unicode2ragel.rb` to specify + the URL of the `DerivedCoreProperties.txt` data file for the intended Unicode + version. +* Run `go generate ./hclsyntax` to run the generation code to update both + `unicode_derived.rl` and, indirectly, `scan_tokens.go`. (You will need both + a Ruby interpreter and the Ragel state machine compiler on your system in + order to complete this step.) +* Run all the tests to check for regressions: `go test ./...` +* If all looks good, commit all of the changes and open a PR to HCL. +* Once that PR is merged and released, update Terraform to use the new version + of HCL. + ## Unicode Text Segmentation _Text Segmentation_ (TR29) is a Unicode standards annex which describes