-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitwise operators. Fixes #3345. #4833
base: master
Are you sure you want to change the base?
Conversation
maybe we should add some test cases for this? |
Yes, though if key people think that this should not be merged then there's no point writing tests for it. |
src/core/lexer.l
Outdated
@@ -262,6 +262,10 @@ use[ \t\r\n]*"<" { BEGIN(cond_use); LOCATION_COUNT_LINES(parserlloc, yytext); } | |||
|
|||
{UNICODE}+ { parser_error_pos -= strlen(yytext); return TOK_ERROR; } | |||
|
|||
0x{H}{1,8} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: There's an ugly bug in OpenSCAD's lexer which allows variables to start with a number. Your patch might break code which relies on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this may become a tricky discussion, I'd suggest split that off as a separate PR for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It breaks anything that uses variables that would be legal hex constants. It does not break variables that don't match the pattern. Whether or not disallowing 0x0f
as a variable is acceptable is probably the biggest technical question in this change.
#3345 mentions a couple of possible alternative styles: a function that accepts a hex string and returns a number, or a Python-style syntax like h'0f'
.
Over in #3345 you will see that apparently @nophead relies on the identifiers-that-start-with-digits behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I don't think any of them look like hex constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still... allowing 0w123 and 0y123 as identifiers, but not allowing 0x123, would be odd. Except for the incompatibility, I'd be in favor of disallowing identifiers that start with digits. Being able to use real part numbers as names is nice, but if the part number is 4632-342 you can't use it, so maybe the starts-with-digit-otherwise-alphanumeric case is narrow enough to disallow.
Maybe we should think about some kind of pragma mechanism to enable/disable language features. Start out with #pragma hex-constants
to enable the new feature, but eventually migrate to #pragma leading-digit-identifiers
to enable the legacy behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will split hex constants off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a per-feature pragma, I'd say just deprecate and use a monotonically increasing version pragma for backwards compatibility instead. That way, the language always moves forward, and it's much easier on us to manage. ..but that's a completely separate discussion :)
@@ -404,6 +407,34 @@ comparison | |||
} | |||
; | |||
|
|||
binaryor | |||
: binaryand | |||
| binaryor '|' binaryand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider also adding and
, or
, and xor
as keywords, like C++ did? That way we'd get xor
back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being primarily a C guy, and something of a C++ newbie, no, I didn't... and I can't say that it gives me warm fuzzy feelings to increase the number of reserved words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I didn't add them because I didn't even know that C++ did that. And, now that I know, I'm not sure it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, it's a bit late to add keywords, especially if that makes them unusable as symbols.
I'm uncertain about the 32-bit thing. JavaScript's version of integer ops is also very much battle tested and we can stea^H^H^H^Hget inspired by existing implementation details. |
I will compare against what JS does. Agree that ... adopting ... behavior is best. Though I'd kind of prefer to avoid adopting C's precedence where binary operators are lower precedence than comparison; that seems like it leads only to bugs. |
Note that there are some design comments over in #3345, including comments on the 0x thing. |
So many comments... but I'll read #3345 |
Two gaps have occurred to me: hex-string to number, and number to hex-string. One possible answer, that may be too clever, would be to use |
In terms of hex values, it could be interesting to let |
I believe JS does pretty much the same as you; convert to 32-bit. Just nice to have the same semantics to avoid surprises. precedence is an orthogonal topic. I don't have any opinions on that yet :) |
WRT Unfortunately, you could not support the lesser-known |
I investigated JS a little. They allow arbitrary-sized hex constants, that (like decimal) just start losing precision when they get over 13 digits. Bitwise arithmetic is limited to the low 32 bits. JS bitwise arithmetic is signed(!). JS precedence is C-like; Python has two types, Python precedence puts bitwise operators above comparison, so Python's behavior seems like the better one to emulate, though implementing unlimited-precision int does not seem worthwhile and I don't know what the implications would be of not implementing it. |
[ Transcribing comments from #3345. Let's keep comments on this PR here. ] @rewolff said: I think there is atypo. You write 2^32-1 where you mean 2^31-1. Hmm. Oh! ... OK. Now I understand. Maybe just say: unsigned integers from 0 to 2^32-1 work as For "future compatibility" I'd just reserve EVERYTHING outside If it were me to extend this, I would just convert a double to a 64bit Consider supporting "-1" as an "all bits set": a noop with "and". @jordanbrown0 said: I started out converting to uint64_t, but binary-not yielded bad results, because with small inputs it yielded numbers in the vicinity of 2^64 and on conversion back to double the low-order bits are lost. @jordanbrown0 said: It's been pointed out in IRC that the 0xHHHH syntax is a breaking change, because OpenSCAD allows identifiers that start with digits(?!?). Thus, e.g.,
is legal today and would be broken by this change. Whether it is a good thing that it's legal today is left as an exercise for the reader. @nophead said: It is good for me because it allows real names for parts that I model, which sometimes begin with numbers. @jordanbrown0 said: Alternative strategies might include a function that accepts a hex string and returns a number, or a Python-style @jordanbrown0 said: Note that it's only breaking for identifiers that match the 0xHHHHHHHH pattern; identifiers that don't start with |
-1 does naturally equal 0xffffffff, 2^32-1. Within the bounds of 32-bit arithmetic, -1 & x == x. |
373c819
to
4e19b5f
Compare
Wrote some tests, which exposed some bugs. Fixed those bugs. |
Note from writing tests: doing bitwise stuff without hex constants is a nuisance. |
The test failed on some systems, with differences like:
The "-" lines are the expected lines; the "+" lines are the actual lines. The actual lines are simply wrong; the expected lines are where the statements in question are located. My guess is that there's a CRLF vs LF problem that is causing the lexer to count lines incorrectly when it's presented with a file with unexpected line breaks. The lexer should really silently handle those issues, but I bet that I ended up with CRLF lines and that the MSYS2 operations didn't clean them up during the push. Will look at it more in a few hours. |
Indeed, looking at the lexer it sure looks like a CRLF file on a UNIX system will count both the CR and the LF as starting new lines, except that CR in comments (at the end of But I've stripped out the CRs from the test files, so (fingers crossed) the problem should go away for this PR. |
I filed the CRLF bug as #4882 . |
Open questions... These first few are easy, I think, but should get concurrence.
That brings us to the not-easy questions: presenting values as hex. (Octal and binary are theoretically possible but probably have very small audiences.) There's an argument for leaving this for another PR, but in writing the test cases I found it very unnatural to talk about bitwise arithmetic in decimal. There are three sub-cases:
Details:
Any other open questions? |
On Wed, Dec 06, 2023 at 11:56:55AM -0800, Jordan Brown wrote:
Open questions...
These first few are easy, I think, but should get concurrence. * Is
the non-C-like (but Python-like) precedence OK? (My answer is
"yes", that the C precedence is error-prone and not useful.)
Agreed!
* Xor, either using punctuation, a text operator, or a function.
(I'd prefer a function, if at all. I see xor as very much a
secondary feature, not useful for the key use case of bitmaps.)
If you start doing things with bitmaps/bitvectors, xor becomes
"useful" pretty quickly. e.g. "somehow the user indicated that bit 5
needs to be the other way around". So I say: Yes I see the usefulness
in an xor, a function is fine if the ^ operator is not available.
* 32-bit, or more? And if more, 53-bit, 64-bit, or unlimited?
53-bit would be straightforward; the other two more difficult. (My
answer: leave it at 32-bit but make it clear that that might change
in the future.)
Fully agreed.
* Signed, or not? Simplest case: is `~0` equal to 2^31-1, or is it
equal to -1? (My answer: unsigned; signed does not seem useful for
the use cases here.)
Agreed. But I'd like to add that ~0 and -1 are the same. There are
people to whom -1 is more natural than ~0 (me :-) ).
* Right shift sign-extending, or not? (My answer: not, especially
if these values are unsigned.)
Indeed. -1 >> 1 == 2^31-1.
That brings us to the not-easy questions: presenting values as hex.
(Octal and binary are theoretically possible but probably have very
small audiences.) There's an argument for leaving this for another
PR, but in writing the test cases I found it very unnatural to talk
about bitwise arithmetic in decimal.
There are three sub-cases:
* Hex constants.
* Hex-string-to-number.
* Number-to-hex-string.
Details:
* Hex constants
* 0xHHHHHHHH. Incompatible change, but obvious.
I'd vote for this one: Such incompatible changes happen from time to
time when a language evolves. I've had a C variable name suddenly be a
keyword in a later edition of C. Thus: Although 0x89ab is now a valid
identifier, it would become a numerical constant in the future.
The 0.001% of users who have their variables or modules named that way
are out of luck. Sorry. (it makes life easier for the 1% of users who
will use hex constants).
* Number-to-string
* `str(n, base=16)` - adding a named parameter to `str()`.
Currently `str()` ignores names, and theoretically giving them
semantics would be incompatible, but it seems unlikely that
anybody uses this "feature". Would allow for additional
numeric-formatting options, e.g. `width=10`, `pad="0"`, et cetera
and so lead to solving additional problems.
I like generalizations that happen to solve additional problems!
…--
** ***@***.*** ** https://www.BitWizard.nl/ ** +31-15-2049110 **
** Delftechpark 11 2628 XJ Delft, The Netherlands. KVK: 27239233 **
f equals m times a. When your f is steady, and your m is going down
your a is going up. -- Chris Hadfield about flying up the space shuttle.
|
As described, they aren't the same. They're the same after you feed the -1 through at least one bitwise operator. But before that, ~0 is 2^32-1, and -1 is... -1. The way that I would think of it is that the bitwise operators act on integer values 0..2^32-1. If you try to use a bitwise operator on a non-integer, or on a value outside that range (like -1), it gets coerced into being an integer inside that range - for a non-integer by truncating the fractional part; for a value outside that range by stripping off the high bits in 2's-complement representation. So there are really three behaviors there:
|
bb897c8
to
3a8f42a
Compare
Adds:
<<
and>>
. Right shift is zero-filling.&
,|
, and~
. (No xor; C-style^
conflicts with exponentiation.)Precedence is C-like but not quite the same.
C precedence is: unary (including ~), multiply, add, shift, greater/less, equality, binary-and, binary-or, logical-and, logical-or, ternary
Precedence here moves binary-and and binary-or above comparisons: unary (including ~), multiply, add, shift, binary-and, binary-or, greater/less, equality, logical-and, logical-or, ternary
This avoids the C strangeness where
x & 1 == 0
tests whether 1 is equal to 0, and then ands the result with x. (I think that a very early C did not have logical-and and local-or, and used binary-and and binary-or for the purpose.)Does not add a new integer type. Operates on 32-bit values stored as ordinary OpenSCAD numbers. (OpenSCAD numbers could store ~53-bit values but that seemed harder to explain.) The conversion is straightforward C conversion from double to unsigned 32-bit integer: fractions are dropped, values above 2^32 have the top parts chopped off, negative values are converted to 32-bit signed and then unsigned, and so small negative values turn into values near 2^32.
Mostly, you should use these operations only on integers 0 through 2^32-1. -(2^31) through -1 behave sensibly if you understand two's complement. It would be best not to rely on the fact that values above 2^32 are truncated; a future version might extend to 53-bit support.
Possibly missing: a way to turn a number into a hexadecimal string.
I'm not sure that this is a good idea, but it's not clearly a bad idea. I think there are better solutions for many of the problems that people would try to solve with bit operations, but bit operations may be more comfortable for programmers used to languages like C. The major "cost" is consuming the
&
,|
, and~
operators, and using them for any other purpose would confuse people.