Bitwise operators. Fixes #3345. #4833

jordanbrown0 · 2023-11-17T06:20:55Z

Adds:

Hexadecimal constants, in C-style 0xHHHHHHHH.
Left and right shift, using C-style << and >>. Right shift is zero-filling.
Binary "and", "or", and "not", using C-style &, |, and ~. (No xor; C-style ^ conflicts with exponentiation.)

Precedence is C-like but not quite the same.
C precedence is: unary (including ~), multiply, add, shift, greater/less, equality, binary-and, binary-or, logical-and, logical-or, ternary
Precedence here moves binary-and and binary-or above comparisons: unary (including ~), multiply, add, shift, binary-and, binary-or, greater/less, equality, logical-and, logical-or, ternary
This avoids the C strangeness where x & 1 == 0 tests whether 1 is equal to 0, and then ands the result with x. (I think that a very early C did not have logical-and and local-or, and used binary-and and binary-or for the purpose.)

Does not add a new integer type. Operates on 32-bit values stored as ordinary OpenSCAD numbers. (OpenSCAD numbers could store ~53-bit values but that seemed harder to explain.) The conversion is straightforward C conversion from double to unsigned 32-bit integer: fractions are dropped, values above 2^32 have the top parts chopped off, negative values are converted to 32-bit signed and then unsigned, and so small negative values turn into values near 2^32.

Mostly, you should use these operations only on integers 0 through 2^32-1. -(2^31) through -1 behave sensibly if you understand two's complement. It would be best not to rely on the fact that values above 2^32 are truncated; a future version might extend to 53-bit support.

Possibly missing: a way to turn a number into a hexadecimal string.

I'm not sure that this is a good idea, but it's not clearly a bad idea. I think there are better solutions for many of the problems that people would try to solve with bit operations, but bit operations may be more comfortable for programmers used to languages like C. The major "cost" is consuming the &, |, and ~ operators, and using them for any other purpose would confuse people.

pca006132 · 2023-12-02T12:38:59Z

maybe we should add some test cases for this?

jordanbrown0 · 2023-12-03T01:24:16Z

Yes, though if key people think that this should not be merged then there's no point writing tests for it.

kintel · 2023-12-03T17:58:37Z

src/core/lexer.l

@@ -262,6 +262,10 @@ use[ \t\r\n]*"<"		{ BEGIN(cond_use); LOCATION_COUNT_LINES(parserlloc, yytext); }

 {UNICODE}+              { parser_error_pos -= strlen(yytext); return TOK_ERROR; }

+0x{H}{1,8}              {


Note: There's an ugly bug in OpenSCAD's lexer which allows variables to start with a number. Your patch might break code which relies on that.

As this may become a tricky discussion, I'd suggest split that off as a separate PR for now.

It breaks anything that uses variables that would be legal hex constants. It does not break variables that don't match the pattern. Whether or not disallowing 0x0f as a variable is acceptable is probably the biggest technical question in this change.

#3345 mentions a couple of possible alternative styles: a function that accepts a hex string and returns a number, or a Python-style syntax like h'0f'.

Over in #3345 you will see that apparently @nophead relies on the identifiers-that-start-with-digits behavior.

Yes but I don't think any of them look like hex constants.

Still... allowing 0w123 and 0y123 as identifiers, but not allowing 0x123, would be odd. Except for the incompatibility, I'd be in favor of disallowing identifiers that start with digits. Being able to use real part numbers as names is nice, but if the part number is 4632-342 you can't use it, so maybe the starts-with-digit-otherwise-alphanumeric case is narrow enough to disallow.

Maybe we should think about some kind of pragma mechanism to enable/disable language features. Start out with #pragma hex-constants to enable the new feature, but eventually migrate to #pragma leading-digit-identifiers to enable the legacy behavior.

Will split hex constants off.

Instead of a per-feature pragma, I'd say just deprecate and use a monotonically increasing version pragma for backwards compatibility instead. That way, the language always moves forward, and it's much easier on us to manage. ..but that's a completely separate discussion :)

kintel · 2023-12-03T18:03:57Z

src/core/parser.y

@@ -404,6 +407,34 @@ comparison
            }
 		;

+binaryor
+        : binaryand
+        | binaryor '|' binaryand


Did you consider also adding and, or, and xor as keywords, like C++ did? That way we'd get xor back

Being primarily a C guy, and something of a C++ newbie, no, I didn't... and I can't say that it gives me warm fuzzy feelings to increase the number of reserved words.

To be clear: I didn't add them because I didn't even know that C++ did that. And, now that I know, I'm not sure it's a good idea.

Fair, it's a bit late to add keywords, especially if that makes them unusable as symbols.

kintel · 2023-12-03T18:17:53Z

I'm uncertain about the 32-bit thing.
Otoh, since this is implemented in JavaScript in almost the same way, perhaps it would be cleaner (and easier to explain) if we make our integer operator implementation exactly the same as JavaScript's?

JavaScript's version of integer ops is also very much battle tested and we can stea^H^H^H^Hget inspired by existing implementation details.

jordanbrown0 · 2023-12-03T18:21:22Z

I will compare against what JS does. Agree that ... adopting ... behavior is best.

Though I'd kind of prefer to avoid adopting C's precedence where binary operators are lower precedence than comparison; that seems like it leads only to bugs.

jordanbrown0 · 2023-12-03T18:22:06Z

Note that there are some design comments over in #3345, including comments on the 0x thing.

kintel · 2023-12-03T18:35:42Z

So many comments... but I'll read #3345

jordanbrown0 · 2023-12-03T18:38:42Z

Two gaps have occurred to me: hex-string to number, and number to hex-string.

One possible answer, that may be too clever, would be to use hex() for both. hex(number) would return a string, while hex(string) would return a number.

kintel · 2023-12-03T18:46:39Z

In terms of hex values, it could be interesting to let color() accept numeric hex values too
..and let hex() accept "#rrggbb" strings.

kintel · 2023-12-03T18:52:43Z

I will compare against what JS does. Agree that ... adopting ... behavior is best.

I believe JS does pretty much the same as you; convert to 32-bit. Just nice to have the same semantics to avoid surprises.

precedence is an orthogonal topic. I don't have any opinions on that yet :)

jordanbrown0 · 2023-12-03T19:59:39Z

WRT color(), you mean so that color(0xff0000) was equivalent to color([1,0,0]), full-red? And noting of course that it would be the same as color(16711680). Sure, I suppose, but I'm not sure what the use case is.

Unfortunately, you could not support the lesser-known #rgb form (one digit each), because there wouldn't be any way to distinguish 0xfff (white) from 0x000fff (full blue with a little green). You can only draw that distinction with strings.

jordanbrown0 · 2023-12-03T20:45:12Z

I investigated JS a little. They allow arbitrary-sized hex constants, that (like decimal) just start losing precision when they get over 13 digits. 0x100000000000000 plus one is unchanged. My now-backed-out hex constant support rejects constants over 8 digits.

Bitwise arithmetic is limited to the low 32 bits. 0xfffffffffffff & 0xf00000000 (13 Fs, 8 zeroes) is zero. That's consistent with what I did.

JS bitwise arithmetic is signed(!). 0xffffffff (2^32-1) is a positive number. 0xffffffff | 1 is -1. 1<<31 is a large negative number. I made bitwise arithmetic unsigned because nobody (or at least nobody sane) wants to to signed arithmetic with these operators. (In a low-level language like C where you might be using bitwise arithmetic to implement normal arithmetic, maybe... but not in a high-level language where you are using bitwise arithmetic to manipulate bit masks.) If we decided that consistency was critical I could match, but I'd have to hold my nose.

JS precedence is C-like; a&b==c is a & (b==c), with a numeric result.

Python has two types, int and float. int appears to have unlimited precision. int appears to be signed, in that ~0 is displayed as -1. int appears to convert to float at appropriate times - int divided by int is float, for instance. (Integer division is //.) Conversion to float has expected problems at 2^52; if you set x to 0x100000000000000 (14 zeroes) then x+1 is precise but x+1.0 is not. x+1-x is 1; x+1.0-x is 0. (pow(int,positiveInt) is int!)

Python precedence puts bitwise operators above comparison, so a&b==c is (a&b) == c, with a boolean result.

Python's behavior seems like the better one to emulate, though implementing unlimited-precision int does not seem worthwhile and I don't know what the implications would be of not implementing it.

jordanbrown0 · 2023-12-03T21:01:27Z

[ Transcribing comments from #3345. Let's keep comments on this PR here. ]

@rewolff said:
Jordan, Good work! Thanks!

I think there is atypo. You write 2^32-1 where you mean 2^31-1.

Hmm. Oh! ... OK. Now I understand.

Maybe just say: unsigned integers from 0 to 2^32-1 work as
expected. Integers outside that range or non-integer values are
currently truncated to unsigned 32 bit integers. This includes
negative numbers. If you know how 2s complement and the truncation
work, numbers outside 0 .. 2^32-1 could be used, but are unsupported.

For "future compatibility" I'd just reserve EVERYTHING outside
the first 32 bits. Do not predict 53 bits.

If it were me to extend this, I would just convert a double to a 64bit
longlong. in the range 2^53 .. 2^63 you're loosing bottom bits. You
cannot OR something with 2^61 and then test for the lower 8 bits (or
so)... Anyway, the official "API" is now 0..2^32-1, outside that
behaviour is currently documented but unsupported (=may change at any
time) and when it is extended, you get 0..2^53-1 and outside that
unsupported, again possibly with documeented behaviour.

Consider supporting "-1" as an "all bits set": a noop with "and".

@jordanbrown0 said:

I started out converting to uint64_t, but binary-not yielded bad results, because with small inputs it yielded numbers in the vicinity of 2^64 and on conversion back to double the low-order bits are lost.

@jordanbrown0 said:

It's been pointed out in IRC that the 0xHHHH syntax is a breaking change, because OpenSCAD allows identifiers that start with digits(?!?). Thus, e.g.,

0x0f = 14;

is legal today and would be broken by this change.

Whether it is a good thing that it's legal today is left as an exercise for the reader.

@nophead said:

It is good for me because it allows real names for parts that I model, which sometimes begin with numbers.

@jordanbrown0 said:

Alternative strategies might include a function that accepts a hex string and returns a number, or a Python-style x'foo' syntax like h'7f'.

@jordanbrown0 said:

Note that it's only breaking for identifiers that match the 0xHHHHHHHH pattern; identifiers that don't start with 0x would not be affected. Similarly, identifiers that have non-hex characters after the 0x would not be affected; 0x123z would continue to be a valid identifier.

jordanbrown0 · 2023-12-03T21:35:30Z

Consider supporting "-1" as an "all bits set": a noop with "and".

-1 does naturally equal 0xffffffff, 2^32-1. Within the bounds of 32-bit arithmetic, -1 & x == x.

jordanbrown0 · 2023-12-05T09:59:08Z

Wrote some tests, which exposed some bugs. Fixed those bugs.

jordanbrown0 · 2023-12-05T09:59:40Z

Note from writing tests: doing bitwise stuff without hex constants is a nuisance.

jordanbrown0 · 2023-12-06T00:06:09Z

The test failed on some systems, with differences like:

 ECHO: "Expect two warnings:"
-WARNING: undefined operation (number | bool) in file bitwise-operators.scad, line 36
-WARNING: undefined operation (bool | number) in file bitwise-operators.scad, line 38
+WARNING: undefined operation (number | bool) in file bitwise-operators.scad, line 61
+WARNING: undefined operation (bool | number) in file bitwise-operators.scad, line 65

The "-" lines are the expected lines; the "+" lines are the actual lines.

The actual lines are simply wrong; the expected lines are where the statements in question are located.

My guess is that there's a CRLF vs LF problem that is causing the lexer to count lines incorrectly when it's presented with a file with unexpected line breaks. The lexer should really silently handle those issues, but I bet that I ended up with CRLF lines and that the MSYS2 operations didn't clean them up during the push.

Will look at it more in a few hours.

jordanbrown0 · 2023-12-06T07:36:58Z

Indeed, looking at the lexer it sure looks like a CRLF file on a UNIX system will count both the CR and the LF as starting new lines, except that CR in comments (at the end of //, or anywhere inside /*...*/) won't be counted.

But I've stripped out the CRs from the test files, so (fingers crossed) the problem should go away for this PR.

jordanbrown0 · 2023-12-06T18:30:54Z

I filed the CRLF bug as #4882 .

jordanbrown0 · 2023-12-06T19:56:45Z

Open questions...

These first few are easy, I think, but should get concurrence.

Is the non-C-like (but Python-like) precedence OK? (My answer is "yes", that the C precedence is error-prone and not useful.)
Xor, either using punctuation, a text operator, or a function. (I'd prefer a function, if at all. I see xor as very much a secondary feature, not useful for the key use case of bitmaps.)
32-bit, or more? And if more, 53-bit, 64-bit, or unlimited? 53-bit would be straightforward; the other two more difficult. (My answer: leave it at 32-bit but make it clear that that might change in the future.)
Signed, or not? Simplest case: is ~0 equal to 2^32-1, or is it equal to -1? (My answer: unsigned; signed does not seem useful for the use cases here.)
Right shift sign-extending, or not? (My answer: not, especially if these values are unsigned.)

That brings us to the not-easy questions: presenting values as hex. (Octal and binary are theoretically possible but probably have very small audiences.) There's an argument for leaving this for another PR, but in writing the test cases I found it very unnatural to talk about bitwise arithmetic in decimal.

There are three sub-cases:

Hex constants.
Hex-string-to-number.
Number-to-hex-string.

Details:

Hex constants
- 0xHHHHHHHH. Incompatible change, but obvious.
- Python-esque (but Python doesn't do it this way) h'........'. Are there other cases where we'd like that structure for other special strings?
- Don't have hex constants; rely on string-to-number.
String-to-number
- n = hex(s);
- num(s [, base=b]) - solving a more general problem, but very verbose for use as a constant.
Number-to-string
- s = hex(n); - it would be clever, perhaps too clever, to use the same function for both directions.
- tohex(n)
- str(n, base=16) - adding a named parameter to str(). Currently str() ignores names, and theoretically giving them semantics would be incompatible, but it seems unlikely that anybody uses this "feature". Would allow for additional numeric-formatting options, e.g. width=10, pad="0", et cetera and so lead to solving additional problems.
- Object-style: n.toString(16). (Probably with named options as above.) It would be nice to move in this direction for namespace reasons, and doing one-off cases probably wouldn't be hard, but it's trickier to do in general.

Any other open questions?

rewolff · 2023-12-07T10:06:04Z

On Wed, Dec 06, 2023 at 11:56:55AM -0800, Jordan Brown wrote: Open questions... These first few are easy, I think, but should get concurrence. * Is the non-C-like (but Python-like) precedence OK? (My answer is "yes", that the C precedence is error-prone and not useful.)

Agreed!

* Xor, either using punctuation, a text operator, or a function. (I'd prefer a function, if at all. I see xor as very much a secondary feature, not useful for the key use case of bitmaps.)

If you start doing things with bitmaps/bitvectors, xor becomes "useful" pretty quickly. e.g. "somehow the user indicated that bit 5 needs to be the other way around". So I say: Yes I see the usefulness in an xor, a function is fine if the ^ operator is not available.

* 32-bit, or more? And if more, 53-bit, 64-bit, or unlimited? 53-bit would be straightforward; the other two more difficult. (My answer: leave it at 32-bit but make it clear that that might change in the future.)

Fully agreed.

* Signed, or not? Simplest case: is `~0` equal to 2^31-1, or is it equal to -1? (My answer: unsigned; signed does not seem useful for the use cases here.)

Agreed. But I'd like to add that ~0 and -1 are the same. There are people to whom -1 is more natural than ~0 (me :-) ).

* Right shift sign-extending, or not? (My answer: not, especially if these values are unsigned.)

Indeed. -1 >> 1 == 2^31-1.

That brings us to the not-easy questions: presenting values as hex. (Octal and binary are theoretically possible but probably have very small audiences.) There's an argument for leaving this for another PR, but in writing the test cases I found it very unnatural to talk about bitwise arithmetic in decimal. There are three sub-cases: * Hex constants. * Hex-string-to-number. * Number-to-hex-string. Details: * Hex constants * 0xHHHHHHHH. Incompatible change, but obvious.

I'd vote for this one: Such incompatible changes happen from time to time when a language evolves. I've had a C variable name suddenly be a keyword in a later edition of C. Thus: Although 0x89ab is now a valid identifier, it would become a numerical constant in the future. The 0.001% of users who have their variables or modules named that way are out of luck. Sorry. (it makes life easier for the 1% of users who will use hex constants).

* Number-to-string * `str(n, base=16)` - adding a named parameter to `str()`. Currently `str()` ignores names, and theoretically giving them semantics would be incompatible, but it seems unlikely that anybody uses this "feature". Would allow for additional numeric-formatting options, e.g. `width=10`, `pad="0"`, et cetera and so lead to solving additional problems.

I like generalizations that happen to solve additional problems!

…

-- ** ***@***.*** ** https://www.BitWizard.nl/ ** +31-15-2049110 ** ** Delftechpark 11 2628 XJ Delft, The Netherlands. KVK: 27239233 ** f equals m times a. When your f is steady, and your m is going down your a is going up. -- Chris Hadfield about flying up the space shuttle.

jordanbrown0 · 2023-12-07T17:11:50Z

But I'd like to add that ~0 and -1 are the same.

As described, they aren't the same. They're the same after you feed the -1 through at least one bitwise operator. But before that, ~0 is 2^32-1, and -1 is... -1.

The way that I would think of it is that the bitwise operators act on integer values 0..2^32-1. If you try to use a bitwise operator on a non-integer, or on a value outside that range (like -1), it gets coerced into being an integer inside that range - for a non-integer by truncating the fractional part; for a value outside that range by stripping off the high bits in 2's-complement representation.

So there are really three behaviors there:

Non-integers. I won't promise that I'll never extend the bitwise operators into the fractional space. Can't think of why I would, but the purist in me likes it.
Numbers greater than 2^32-1. I specifically don't promise that they will always be stripped down to 32 bits.
Negative numbers. This is a tough question. Today, -1|0 is a positive number, as is ~0. Arguably, we should reserve the right for one or them to both be negative, some day. We should probably guarantee that 0x80000000 will never be negative, but if we go above 32 bits maybe we'll go signed. I'd say we should be very clear in documentation that the future is uncertain.

jordanbrown0 mentioned this pull request Nov 17, 2023

Suggestion / question: Bitwise operator support #3345

Open

pca006132 mentioned this pull request Dec 3, 2023

Consider implementing a digit separator symbol for better skimmability of large numbers #4872

Open

kintel reviewed Dec 3, 2023

View reviewed changes

jordanbrown0 force-pushed the binary-arithmetic branch from 373c819 to 4e19b5f Compare December 3, 2023 21:37

jordanbrown0 changed the title ~~Bitwise operators, hexadecimal constants. Fixes #3345.~~ Bitwise operators. Fixes #3345. Dec 5, 2023

jordanbrown0 added 3 commits April 4, 2024 19:30

Bitwise operators, hexadecimal constants. Fixes openscad#3345.

7127c85

Back out hex constants for now.

98479c4

Add AST de-parsing for bitwise operators.

b43dfb7

jordanbrown0 added 4 commits April 4, 2024 19:30

Check for negative and too-large values for rhs of shifts.

d6f665e

Bitwise operators were improperly inserted in the precedence chain.

ffa7a24

Tests for bitwise operators.

87a0246

Remove CRs from test files.

3a8f42a

jordanbrown0 force-pushed the binary-arithmetic branch from bb897c8 to 3a8f42a Compare April 5, 2024 02:43

More robust conversion from double to int

7a65ef8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bitwise operators. Fixes #3345. #4833

Bitwise operators. Fixes #3345. #4833

jordanbrown0 commented Nov 17, 2023 •

edited

pca006132 commented Dec 2, 2023

jordanbrown0 commented Dec 3, 2023

kintel Dec 3, 2023

kintel Dec 3, 2023

jordanbrown0 Dec 3, 2023

nophead Dec 3, 2023

jordanbrown0 Dec 3, 2023

jordanbrown0 Dec 3, 2023

kintel Dec 3, 2023

kintel Dec 3, 2023

jordanbrown0 Dec 3, 2023

jordanbrown0 Dec 3, 2023

kintel Dec 3, 2023

kintel commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

kintel commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

kintel commented Dec 3, 2023 •

edited

kintel commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 5, 2023

jordanbrown0 commented Dec 5, 2023

jordanbrown0 commented Dec 6, 2023

jordanbrown0 commented Dec 6, 2023 •

edited

jordanbrown0 commented Dec 6, 2023

jordanbrown0 commented Dec 6, 2023 •

edited

rewolff commented Dec 7, 2023 via email

jordanbrown0 commented Dec 7, 2023 •

edited

		@@ -262,6 +262,10 @@ use[ \t\r\n]*"<" { BEGIN(cond_use); LOCATION_COUNT_LINES(parserlloc, yytext); }

		{UNICODE}+ { parser_error_pos -= strlen(yytext); return TOK_ERROR; }

		0x{H}{1,8} {

Bitwise operators. Fixes #3345. #4833

Are you sure you want to change the base?

Bitwise operators. Fixes #3345. #4833

Conversation

jordanbrown0 commented Nov 17, 2023 • edited

pca006132 commented Dec 2, 2023

jordanbrown0 commented Dec 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kintel commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

kintel commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

kintel commented Dec 3, 2023 • edited

kintel commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 3, 2023

jordanbrown0 commented Dec 5, 2023

jordanbrown0 commented Dec 5, 2023

jordanbrown0 commented Dec 6, 2023

jordanbrown0 commented Dec 6, 2023 • edited

jordanbrown0 commented Dec 6, 2023

jordanbrown0 commented Dec 6, 2023 • edited

rewolff commented Dec 7, 2023 via email

jordanbrown0 commented Dec 7, 2023 • edited

jordanbrown0 commented Nov 17, 2023 •

edited

kintel commented Dec 3, 2023 •

edited

jordanbrown0 commented Dec 6, 2023 •

edited

jordanbrown0 commented Dec 6, 2023 •

edited

jordanbrown0 commented Dec 7, 2023 •

edited