New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add primitive string type. #18973
base: master
Are you sure you want to change the base?
Add primitive string type. #18973
Conversation
1010980
to
2407bb6
Compare
Two preliminary remarks:
|
@silene I don't understand the code related to VM opcodes, so I'd really appreciate help here. By the way, why are many of these opcode named About your second point (using |
The In your case, the starting point would be to modify |
Note that this is not even an intrinsic property of your type. It is just a consequence of axiom |
eae1f69
to
044fcbf
Compare
What is the benefit of having primitive strings over using
|
Strings are more space efficient (1 byte / character instead of 1 word / character). |
@JasonGross using arrays of integers, you are either space-inefficient (by storing one character per word) or you need to work with an encoding which requires computation. We have several use-cases where what we want is a compact representation of names (for variables, functions, ...) from a source program. These names would typically be string literals in a (generated) Coq source file, and they would only be carried around during program verification (and also used as keys for lookups in a maps). |
b9e9fd0
to
d268cf9
Compare
I performed a small test by repetitively concatenating strings of various size until the total size reaches 100,000 characters. For To improve the situation when working with strings, we could add a primitive concatenation operation on arrays. This time, the speedup for So, in addition to the fact that arrays occupy 8x more memory than strings (but this is hardly relevant performance-wise, unless you are actually filling your memory with strings), the things to remember are the following ones:
Obviously, these numbers are only meaningful under the assumption that no mutating operations are ever added to a native Coq string type. If strings start offer mutations like arrays do, then the performance gains would be a lot less dramatic. |
d268cf9
to
bd451f8
Compare
713376b
to
b057cb6
Compare
12b405b
to
6ba97e8
Compare
theories/Strings/PString.v
Outdated
\/ | ||
(i + 1 <? length s1 = true /\ | ||
i + 1 <? length s2 = true /\ | ||
get s1 (i+1) <=? get s2 (i+1) = true)%uint63). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure one can derive False
from this axiom.
Needless to say, this axiom is way too complicated for its own good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I'm too happy about this axiom, but do you have an alternative in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the axiom should be on the primitive compare
not the derived lt
.
Then maybe define to_list : string -> list char
and the axiom says forall s s', compare_string s s' = compare_list compare_int (to_list s) (to_list s')
(not sure if we have compare_list already defined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether we have compare_list
, but we do have String.compare
. More generally, I think we should strive to express all our axioms in terms of fully-defined inductive functions. (This is something we had in the old days of retroknowledge and that we unfortunately lost.) And then, properties like transitivity would be trivial corollaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The axiom was indeed wrong, but it should be fixed now. I'll look into the other possibilities when I have a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. About the specification, I concur with Guillaume that it would be better to have it in terms of equivalence with an inductive implementation, like it's done for int and float.
Goal make 5 "a" = cat (make 2 "a") (make 3 "a"). | ||
Proof. lazy. syntactic_refl. Qed. | ||
|
||
Goal make 5 "a" = cat (make 2 "a") (make 3 "a"). | ||
Proof. cbn. syntactic_refl. Qed. | ||
|
||
Goal make 5 "a" = cat (make 2 "a") (make 3 "a"). | ||
Proof. cbv. syntactic_refl. Qed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could also have tests for simpl
and hnf
.
Goal make 5 "a" = cat (make 2 "a") (make 3 "a"). | ||
Proof. cbv. syntactic_refl. Qed. | ||
|
||
(* [vm_compute] *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C.f. above comment
Goal compare "a" "ab" = Lt. | ||
Proof. vm_compute. syntactic_refl. Qed. | ||
|
||
(* [native_compute] *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C.f. above comment
@@ -0,0 +1,236 @@ | |||
Require Import Uint63. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this file into multiple ones (primitives, specs, axioms, lemmas) with minimal requirements otherwise it will be a nightmare the day we'll have to run primitive strings through the debuger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the primitives to their own files, and same for the axioms. The rest is still in a single for for now, but still a work in progress.
cc54007
to
fcd48e8
Compare
Check (eq_refl : length (make (max_length+1) "a") = max_length). | ||
|
||
Check (eq_refl : make 0 "a" = ""). | ||
Check (eq_refl : make 5 "a" = "aaaaa"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this is fine when it comes to testing the parsing but a printing test is probably also in order in test-suite/output (c.f. test-suite/output/FloatNumberSyntax.v for instance).
So if I understand correctly, the only missing things here are the executable spec and the doc. FWIW if we want it into upcoming 8.20, this must be merged before June 17th. |
Definition print (i : int_wrapper) : option string := | ||
if (i.(int_wrap) <? 256)%uint63 then Some (make 1 i.(int_wrap)) else None. | ||
String Notation int_wrapper parse print : char63_scope. | ||
Coercion int_wrap : int_wrapper >-> int. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the notation work, I had to use this coercion. Is that OK, or is there a better approach here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of something that doesn't work? I can't reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the test file I added has failures. Basically, using the notation gives a type error for me.
You can also try Check make 2 "c".
at the very end of PrimString.v
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's not a printing issue, rather a parsing one. The coercion is probably fine, otherwise you can copy what's done in number notation to string notation so that you can register the string notation for type int
rather than int_wrapper
.
Since refactoring
Constr.t
to combine primitive values into an intermediate type is not a clear win (see #17951), here is an attempt to addprimitivea primitivechar
andstring
types by simply extendingConstr.t
with corresponding primitive value constructorsstring
type by simply extendingConstr.t
with a corresponding value constructor.Check list: