Add primitive string type. #18973

rlepigre · 2024-04-24T08:17:22Z

Since refactoring Constr.t to combine primitive values into an intermediate type is not a clear win (see #17951), here is an attempt to add ~~primitive char and string types by simply extending Constr.t with corresponding primitive value constructors~~ a primitive string type by simply extending Constr.t with a corresponding value constructor.

Check list:

Added / updated test-suite.
documentation
Added changelog.
Opened overlay pull requests.

silene · 2024-04-24T11:36:08Z

Two preliminary remarks:

Do not add new VM opcodes (especially some that will cause an instant segfault). Use existing opcodes instead. For example, PString.cat can directly be mapped to CHECKCAMLCALL2.
It seems we do not gain much from having a primitive type char. It seems like using int would work just as well, if not better.

rlepigre · 2024-04-24T11:56:12Z

@silene I don't understand the code related to VM opcodes, so I'd really appreciate help here. By the way, why are many of these opcode named CHECK*? Are they not meant to correspond to specific instructions? (Your suggestion makes me think they are not.)

About your second point (using int as char), I'm not sure that this is the right thing, since char builds in the fact that the integer value is less than 256. Also, having a specific char type would probably be better when it comes to extraction? In any case, let's see what others think.

silene · 2024-04-24T12:00:55Z

The CHECK part means that they support open terms as input. (It happens that this is the most common behavior, and it would have been better to use a NOCHECK prefix for their uncommon counterparts, but what is done is done.) Some of them correspond to specific instructions, e.g., CHECKADDINT63; some of them work with arbitrary code, e.g., CHECKCAMLCALL.

In your case, the starting point would be to modify Vmbytegen.get_caml_prim, and from there, the other needed changes should flow naturally.

silene · 2024-04-24T12:04:46Z

since char builds in the fact that the integer value is less than 256

Note that this is not even an intrinsic property of your type. It is just a consequence of axiom chr_code_id. If you were to replace char by int, then this axiomatic property would just move from axiom chr_code_id to axiom make_get_spec.

JasonGross · 2024-04-24T17:28:06Z

What is the benefit of having primitive strings over using array int? The downside I'm concerned about is ad-hoc complication of the kernel and tcb. (If the primary upside is extraction of literals, maybe something can be added to extraction in general?)
On the performance side:

Are any of the operations asymptotically more efficient?
What are the constant factor speedups that we see for each operation?

SkySkimmer · 2024-04-24T17:33:58Z

Strings are more space efficient (1 byte / character instead of 1 word / character).
You could recover some efficiency by using all the bits in an int instead of just the lower 8 but since primitive ints are 63 bits it doesn't fit well.

rlepigre · 2024-04-24T19:12:35Z

@JasonGross using arrays of integers, you are either space-inefficient (by storing one character per word) or you need to work with an encoding which requires computation. We have several use-cases where what we want is a compact representation of names (for variables, functions, ...) from a source program. These names would typically be string literals in a (generated) Coq source file, and they would only be carried around during program verification (and also used as keys for lookups in a maps).

silene · 2024-04-25T07:04:08Z

What are the constant factor speedups that we see for each operation?

I performed a small test by repetitively concatenating strings of various size until the total size reaches 100,000 characters. For native_compute, the speedup of having native strings would be between 300x and 400x. For vm_compute, it would be between 1000x and 3000x.

To improve the situation when working with strings, we could add a primitive concatenation operation on arrays. This time, the speedup for native_compute would only be between 40x and 50x, while for vm_compute, it would be between 25x and 50x.

So, in addition to the fact that arrays occupy 8x more memory than strings (but this is hardly relevant performance-wise, unless you are actually filling your memory with strings), the things to remember are the following ones:

Arrays in Coq are persistent, so mutating them leaves a trail, which has a non-negligible cost.
Mutating/copying arrays in OCaml triggers the GC barrier, contrarily to strings (and floating-point arrays), which again has a non-negligible cost.
Iterating in the VM is damn slow.

Obviously, these numbers are only meaningful under the assumption that no mutating operations are ever added to a native Coq string type. If strings start offer mutations like arrays do, then the performance gains would be a lot less dramatic.

kernel/vmemitcodes.ml

kernel/vmvalues.ml

theories/Strings/PString.v

test-suite/primitive/string/test.v

theories/Strings/PString.v

silene · 2024-05-06T06:18:47Z

theories/Strings/PString.v

+       \/
+       (i + 1 <? length s1 = true /\
+        i + 1 <? length s2 = true /\
+        get s1 (i+1) <=? get s2 (i+1) = true)%uint63).


I am pretty sure one can derive False from this axiom.

Needless to say, this axiom is way too complicated for its own good.

I can't say I'm too happy about this axiom, but do you have an alternative in mind?

I think the axiom should be on the primitive compare not the derived lt.
Then maybe define to_list : string -> list char and the axiom says forall s s', compare_string s s' = compare_list compare_int (to_list s) (to_list s') (not sure if we have compare_list already defined)

I am not sure whether we have compare_list, but we do have String.compare. More generally, I think we should strive to express all our axioms in terms of fully-defined inductive functions. (This is something we had in the old days of retroknowledge and that we unfortunately lost.) And then, properties like transitivity would be trivial corollaries.

The axiom was indeed wrong, but it should be fixed now. I'll look into the other possibilities when I have a minute.

proux01

A few comments. About the specification, I concur with Guillaume that it would be better to have it in terms of equivalence with an inductive implementation, like it's done for int and float.

interp/notation.ml

kernel/cClosure.ml

kernel/environ.ml

kernel/nativevalues.ml

kernel/retroknowledge.ml

proux01 · 2024-05-13T08:32:31Z

test-suite/primitive/string/test.v

+Goal make 5 "a" = cat (make 2 "a") (make 3 "a").
+Proof. lazy. syntactic_refl. Qed.
+
+Goal make 5 "a" = cat (make 2 "a") (make 3 "a").
+Proof. cbn. syntactic_refl. Qed.
+
+Goal make 5 "a" = cat (make 2 "a") (make 3 "a").
+Proof. cbv. syntactic_refl. Qed.


One could also have tests for simpl and hnf.

proux01 · 2024-05-13T08:32:41Z

test-suite/primitive/string/test.v

+Goal make 5 "a" = cat (make 2 "a") (make 3 "a").
+Proof. cbv. syntactic_refl. Qed.
+
+(* [vm_compute] *)


C.f. above comment

proux01 · 2024-05-13T08:32:53Z

test-suite/primitive/string/test.v

+Goal compare "a" "ab" = Lt.
+Proof. vm_compute. syntactic_refl. Qed.
+
+(* [native_compute] *)


C.f. above comment

proux01 · 2024-05-13T08:34:13Z

theories/Strings/PString.v

@@ -0,0 +1,236 @@
+Require Import Uint63.


Please split this file into multiple ones (primitives, specs, axioms, lemmas) with minimal requirements otherwise it will be a nightmare the day we'll have to run primitive strings through the debuger.

I split the primitives to their own files, and same for the axioms. The rest is still in a single for for now, but still a work in progress.

theories/Strings/PString.v

proux01 · 2024-05-23T15:37:56Z

test-suite/primitive/string/test.v

+Check (eq_refl : length (make (max_length+1) "a") = max_length).
+
+Check (eq_refl : make 0 "a" = "").
+Check (eq_refl : make 5 "a" = "aaaaa").


BTW, this is fine when it comes to testing the parsing but a printing test is probably also in order in test-suite/output (c.f. test-suite/output/FloatNumberSyntax.v for instance).

proux01 · 2024-05-23T15:41:41Z

So if I understand correctly, the only missing things here are the executable spec and the doc. FWIW if we want it into upcoming 8.20, this must be merged before June 17th.

rlepigre · 2024-05-25T23:15:15Z

theories/Strings/PrimString.v

+  Definition print (i : int_wrapper) : option string :=
+    if (i.(int_wrap) <? 256)%uint63 then Some (make 1 i.(int_wrap)) else None.
+  String Notation int_wrapper parse print : char63_scope.
+  Coercion int_wrap : int_wrapper >-> int.


To make the notation work, I had to use this coercion. Is that OK, or is there a better approach here?

Do you have an example of something that doesn't work? I can't reproduce.

For example, the test file I added has failures. Basically, using the notation gives a type error for me.

You can also try Check make 2 "c". at the very end of PrimString.v.

So that's not a printing issue, rather a parsing one. The coercion is probably fine, otherwise you can copy what's done in number notation to string notation so that you can register the string notation for type int rather than int_wrapper.

rlepigre requested review from a team as code owners April 24, 2024 08:17

coqbot-app bot added the needs: full CI The latest GitLab pipeline that ran was a light CI. Say "@coqbot run full ci" to get a full CI. label Apr 24, 2024

rlepigre force-pushed the br/prim-string branch from 1010980 to 2407bb6 Compare April 24, 2024 08:42

rlepigre force-pushed the br/prim-string branch 2 times, most recently from eae1f69 to 044fcbf Compare April 24, 2024 16:09

rlepigre force-pushed the br/prim-string branch 3 times, most recently from b9e9fd0 to d268cf9 Compare April 25, 2024 06:56

silene reviewed Apr 25, 2024

View reviewed changes

kernel/vmemitcodes.ml Outdated Show resolved Hide resolved

kernel/vmemitcodes.ml Outdated Show resolved Hide resolved

rlepigre force-pushed the br/prim-string branch from d268cf9 to bd451f8 Compare April 25, 2024 07:39

silene reviewed Apr 25, 2024

View reviewed changes

kernel/vmvalues.ml Outdated Show resolved Hide resolved

rlepigre force-pushed the br/prim-string branch from aff50d5 to 3f5a3f3 Compare May 2, 2024 23:42

rlepigre commented May 2, 2024

View reviewed changes

theories/Strings/PString.v Outdated Show resolved Hide resolved

rlepigre force-pushed the br/prim-string branch 2 times, most recently from 713376b to b057cb6 Compare May 3, 2024 16:03

rlepigre commented May 3, 2024

View reviewed changes

test-suite/primitive/string/test.v Outdated Show resolved Hide resolved

rlepigre force-pushed the br/prim-string branch 3 times, most recently from 12b405b to 6ba97e8 Compare May 4, 2024 09:01

rlepigre commented May 4, 2024

View reviewed changes

test-suite/primitive/string/test.v Show resolved Hide resolved

rlepigre force-pushed the br/prim-string branch from 6ba97e8 to 3088631 Compare May 5, 2024 20:33

silene reviewed May 6, 2024

View reviewed changes

rlepigre force-pushed the br/prim-string branch from 3088631 to d6d1f10 Compare May 6, 2024 21:09

github-actions bot added the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 7, 2024

rlepigre force-pushed the br/prim-string branch from d6d1f10 to 017d063 Compare May 8, 2024 18:55

coqbot-app bot removed the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 8, 2024

proux01 reviewed May 13, 2024

View reviewed changes

rlepigre force-pushed the br/prim-string branch 3 times, most recently from cc54007 to fcd48e8 Compare May 13, 2024 11:47

github-actions bot added the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 22, 2024

proux01 reviewed May 23, 2024

View reviewed changes

rlepigre added 2 commits May 26, 2024 00:31

Combine string and numbers in [plugin/syntax].

7c2cc15

Uniform handling of numbers and strings in [plugin/syntax].

757b02c

rlepigre force-pushed the br/prim-string branch from fcd48e8 to 5a5c66f Compare May 25, 2024 22:39

coqbot-app bot removed the needs: rebase Should be rebased on the latest master to solve conflicts or have a newer CI run. label May 25, 2024

rlepigre added 2 commits May 26, 2024 01:12

Add a primitive string type.

39f8974

WIP

8cd78b8

rlepigre force-pushed the br/prim-string branch from 5a5c66f to 8cd78b8 Compare May 25, 2024 23:12

rlepigre commented May 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add primitive string type. #18973

Add primitive string type. #18973

rlepigre commented Apr 24, 2024 •

edited by proux01

silene commented Apr 24, 2024

rlepigre commented Apr 24, 2024

silene commented Apr 24, 2024

silene commented Apr 24, 2024

JasonGross commented Apr 24, 2024

SkySkimmer commented Apr 24, 2024

rlepigre commented Apr 24, 2024

silene commented Apr 25, 2024

silene May 6, 2024

rlepigre May 6, 2024

SkySkimmer May 6, 2024

silene May 6, 2024

rlepigre May 6, 2024

proux01 left a comment

proux01 May 13, 2024

proux01 May 13, 2024

proux01 May 13, 2024

proux01 May 13, 2024

rlepigre May 13, 2024

proux01 May 23, 2024

proux01 commented May 23, 2024

rlepigre May 25, 2024

proux01 May 26, 2024

rlepigre May 26, 2024

proux01 May 26, 2024

Add primitive string type. #18973

Are you sure you want to change the base?

Add primitive string type. #18973

Conversation

rlepigre commented Apr 24, 2024 • edited by proux01

silene commented Apr 24, 2024

rlepigre commented Apr 24, 2024

silene commented Apr 24, 2024

silene commented Apr 24, 2024

JasonGross commented Apr 24, 2024

SkySkimmer commented Apr 24, 2024

rlepigre commented Apr 24, 2024

silene commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proux01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proux01 commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlepigre commented Apr 24, 2024 •

edited by proux01