-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added string support and first class characters #685
base: master
Are you sure you want to change the base?
Conversation
…StringValue to Fcn / Rcd / Tuple
} | ||
|
||
@Override | ||
public final boolean mutates() { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is according to the doc (it normalization and fingerprinting don't mutate the object) but not in line with the e.g. IntValues...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new basic data type to TLC introduces problems, a serious one being to modify the fingerprinting code to maintain the extremely small probability of collisions. This is not trivial. The same fingerprinting algorithm was used inside a compiler once. A compiler bug was traced to a collision of fingerprints that should have had an astronomically small probability. It was discovered that this was the result of a bug in the algorithm's implementation.
One possibility might be to make characters typed model values, using as type a character that is not currently allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no experience in this matter, but I fail to see why characters can not be handled in a similar way as integers since both are essentially numbers. In fact, it seems to me that your suggestion would have a very similar result, except for substituting the char kind byte with the model value kind byte and the character value with the unique id of the character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the fingerprinting algorithm. But I know that some smart people (researchers at one of the world's leading research labs), working from a description of the algorithm, produced an incorrect program. This tells me that modifying code that implements the algorithm, without fully understanding the code or the algorithm, is risky.
Fingerprinting is tricky. You're mapping numbers, sets of number, functions on sets of numbers, etc. all into strings of bits. For any single one of those kinds of objects, it's not too hard to find some mapping that has a small probability of collisions. But how do you know that there is a small probability of a collision between two different kinds of value? It's not easy. Saying characters are like numbers so you just treat them like numbers sounds like a recipe for producing collisions between characters and numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, the code uses a one-way compression function that produces a 64bit tag from a tag and a byte (FP64.Extend
). To address the problem you mentioned, every class first extends the fingerprint with a byte unique to the class, followed by the data. This leads to different sequences of bytes fed to the compression function even if values of different types share the same binary representation (at least for primitives/types of constant length).
Speaking of collisions: As far as I can tell the code of FP64.Extend
treats chars as a single byte. Strings that contain characters outside the range 0..2^8-1 are probably rare but would result in colliding fingerprints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I believe the problem solved by the general fingerprinting algorithm involved fingerprinting values from an unbounded set of types. With values from a fixed set of types, the problem does seem simple.
Thanks for pointing out the problem that can arise if TLA+ is extended to allow non-ASCII characters in strings. It would be nice if it could be documented somewhere in case that extension is ever implemented, but I don't know how to add documentation for code that no one has begun writing.
|
||
@Override | ||
final boolean assignable(Value val) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to be unused.
On a general note: When I wrote this back in July I noticed that the code base of the different value types is rather convoluted and contains a lot of code duplication. Also, it seems as if somebody wanted to extract interfaces but stopped in the middle of the refactoring process. Is a continuation of this process/clean-up desired? If so, I could probably spend some time on it. |
TLC was originally written by one person in an amazingly short time. As a result, the code was not well planned. Any improvements to it are most welcome. However, the poor structuring makes it very hard to make any changes without introducing bugs. So, as with doctors, the prime commandment for anyone making changes is: Do no harm. This means that anyone who makes changes must commit to doing a lot of thinking and testing. There are engineers who depend on TLC for designing important systems. |
The |
Strings should be included in the elementary data types of TLC. Only a tiny fraction of specs will ever care that strings are made up of characters, and you should not slow down the evaluation of operations on strings in all specs just for the benefit of that tiny fraction. That means that whenever TLC represents a value in terms of elementary data types, it will have to check every sequence to see if it's a string. (TLC already has to check every function to see if it's a sequence.) |
I don't think that the added functionality would slow down existing specs since basically all new code paths would previously have resulted in errors if I'm not mistaken. |
I just remembered that, because string handling in Java was slow at the time, TLC creates a unique ID for each string and uses it to check for equality of strings. I don't remember if those IDs are assigned by the parser, but new ones are probably produced if strings are dynamically created, which they can be. Anyway, you need to be aware of the use of those IDs and make sure things work properly when creating new strings in previously impossible ways. If you remove strings as elementary data types, this will change fingerprinting for all specs. I don't know if that will significantly affect performance. |
The corresponding class is Overall, it would be good to get rid of |
Thanks for pointing that out. This is indeed a problem because the unique id is basically the index in the internal table. Thus in my implementation comparing strings depends on whether are not they are represented as
I assume that every TLA+ value must result in the same fingerprint regardless of its representation? In this case fingerprinting would indeed have to be adapted in order to guarantee accurate fingerprints for specs using the new feature, leading to a slow down for old specs. Instead of checking each sequence for string-ness a better option would probably be to treat Should I implement what I outlined above (i.e. is it worth the cost)? Do you see any other issues?
Getting rid of Tangential question: Fingerprinting |
A potential solution would be to compare on a character basis even if Anything that significantly slows down the evaluation of specs that do I assume that every TLA+ value must result in the same fingerprint The same value having different fingerprints slows down model Should I implement what I outlined above (i.e. is it worth the cost)? My opinion has been that such a small percentage of TLA+ specs would I do have one question: To do what you want to do with first-class |
Maybe the following would be acceptable (I don't really like it though)? I do not depend on this feature. I simply tend to stumble upon corner cases. When I started out with TLA+ about a year ago I read "specifying systems" and set out to do things a certain way. It happened more than once that my ideas would not work because of some restrictions in TLC or TLAPM. All of it could be worked around, but especially concerning strings I saw #512 and thought that maybe I could contribute to it, especially since there seemed to be a plan on what to do. But I do see the downsides of it now. I don't think that anyone would benefit from adding new syntax except if it were very visibly documented. One of my main issues with TLA and its tools is its lack of central documentation. If you want to look something up you often have to consult several documents and even then you might still get unexpected errors. I wanted to counteract this by making TLC support more of TLA+, but as the downsides seem to outweigh the benefits I think we can close this PR. |
Thank you for trying to help. I'm afraid one of the things that aren't properly documented is how best to help. You wrote that a major problem for you has been the lack of central documentation. That is something we can certainly use help with. Neither Markus Kuppe nor I have the time to devote to it. I would be happy to advise on such a task. As is true of any effort to improve TLA+ or its tools, one should talk to us before implementing anything to make sure that it will be as useful as possible. Leslie |
I think it would take at least one person who has both a decent amount of time to spare and a clue of what they are doing to realize a truly unified documentation of TLA+ and its tools. I fear I fulfill neither of those requirements. It is one thing to spend two days patching together a feature that is ultimately rejected because of a lack of consideration and another to create an overarching compendium / reference. Anything less than that would just be one more resource to consider and thus counterproductive. |
bd71c07
to
0a543cc
Compare
Closes #512 and related issues.
Added several todos where I'm uncertain how to implement functions correctly and may raise questions regarding the implementation of other value types