Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitwise operators. Fixes #3345. #4833

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jordanbrown0
Copy link
Contributor

@jordanbrown0 jordanbrown0 commented Nov 17, 2023

Adds:

  • Hexadecimal constants, in C-style 0xHHHHHHHH.
  • Left and right shift, using C-style << and >>. Right shift is zero-filling.
  • Binary "and", "or", and "not", using C-style &, |, and ~. (No xor; C-style ^ conflicts with exponentiation.)

Precedence is C-like but not quite the same.
C precedence is: unary (including ~), multiply, add, shift, greater/less, equality, binary-and, binary-or, logical-and, logical-or, ternary
Precedence here moves binary-and and binary-or above comparisons: unary (including ~), multiply, add, shift, binary-and, binary-or, greater/less, equality, logical-and, logical-or, ternary
This avoids the C strangeness where x & 1 == 0 tests whether 1 is equal to 0, and then ands the result with x. (I think that a very early C did not have logical-and and local-or, and used binary-and and binary-or for the purpose.)

Does not add a new integer type. Operates on 32-bit values stored as ordinary OpenSCAD numbers. (OpenSCAD numbers could store ~53-bit values but that seemed harder to explain.) The conversion is straightforward C conversion from double to unsigned 32-bit integer: fractions are dropped, values above 2^32 have the top parts chopped off, negative values are converted to 32-bit signed and then unsigned, and so small negative values turn into values near 2^32.

Mostly, you should use these operations only on integers 0 through 2^32-1. -(2^31) through -1 behave sensibly if you understand two's complement. It would be best not to rely on the fact that values above 2^32 are truncated; a future version might extend to 53-bit support.

Possibly missing: a way to turn a number into a hexadecimal string.

I'm not sure that this is a good idea, but it's not clearly a bad idea. I think there are better solutions for many of the problems that people would try to solve with bit operations, but bit operations may be more comfortable for programmers used to languages like C. The major "cost" is consuming the &, |, and ~ operators, and using them for any other purpose would confuse people.

@pca006132
Copy link
Member

maybe we should add some test cases for this?

@jordanbrown0
Copy link
Contributor Author

Yes, though if key people think that this should not be merged then there's no point writing tests for it.

src/core/lexer.l Outdated
@@ -262,6 +262,10 @@ use[ \t\r\n]*"<" { BEGIN(cond_use); LOCATION_COUNT_LINES(parserlloc, yytext); }

{UNICODE}+ { parser_error_pos -= strlen(yytext); return TOK_ERROR; }

0x{H}{1,8} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: There's an ugly bug in OpenSCAD's lexer which allows variables to start with a number. Your patch might break code which relies on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this may become a tricky discussion, I'd suggest split that off as a separate PR for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks anything that uses variables that would be legal hex constants. It does not break variables that don't match the pattern. Whether or not disallowing 0x0f as a variable is acceptable is probably the biggest technical question in this change.

#3345 mentions a couple of possible alternative styles: a function that accepts a hex string and returns a number, or a Python-style syntax like h'0f'.

Over in #3345 you will see that apparently @nophead relies on the identifiers-that-start-with-digits behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but I don't think any of them look like hex constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still... allowing 0w123 and 0y123 as identifiers, but not allowing 0x123, would be odd. Except for the incompatibility, I'd be in favor of disallowing identifiers that start with digits. Being able to use real part numbers as names is nice, but if the part number is 4632-342 you can't use it, so maybe the starts-with-digit-otherwise-alphanumeric case is narrow enough to disallow.

Maybe we should think about some kind of pragma mechanism to enable/disable language features. Start out with #pragma hex-constants to enable the new feature, but eventually migrate to #pragma leading-digit-identifiers to enable the legacy behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will split hex constants off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a per-feature pragma, I'd say just deprecate and use a monotonically increasing version pragma for backwards compatibility instead. That way, the language always moves forward, and it's much easier on us to manage. ..but that's a completely separate discussion :)

@@ -404,6 +407,34 @@ comparison
}
;

binaryor
: binaryand
| binaryor '|' binaryand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider also adding and, or, and xor as keywords, like C++ did? That way we'd get xor back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being primarily a C guy, and something of a C++ newbie, no, I didn't... and I can't say that it gives me warm fuzzy feelings to increase the number of reserved words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I didn't add them because I didn't even know that C++ did that. And, now that I know, I'm not sure it's a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, it's a bit late to add keywords, especially if that makes them unusable as symbols.

@kintel
Copy link
Member

kintel commented Dec 3, 2023

I'm uncertain about the 32-bit thing.
Otoh, since this is implemented in JavaScript in almost the same way, perhaps it would be cleaner (and easier to explain) if we make our integer operator implementation exactly the same as JavaScript's?

JavaScript's version of integer ops is also very much battle tested and we can stea^H^H^H^Hget inspired by existing implementation details.

@jordanbrown0
Copy link
Contributor Author

I will compare against what JS does. Agree that ... adopting ... behavior is best.

Though I'd kind of prefer to avoid adopting C's precedence where binary operators are lower precedence than comparison; that seems like it leads only to bugs.

@jordanbrown0
Copy link
Contributor Author

Note that there are some design comments over in #3345, including comments on the 0x thing.

@kintel
Copy link
Member

kintel commented Dec 3, 2023

So many comments... but I'll read #3345

@jordanbrown0
Copy link
Contributor Author

Two gaps have occurred to me: hex-string to number, and number to hex-string.

One possible answer, that may be too clever, would be to use hex() for both. hex(number) would return a string, while hex(string) would return a number.

@kintel
Copy link
Member

kintel commented Dec 3, 2023

In terms of hex values, it could be interesting to let color() accept numeric hex values too
..and let hex() accept "#rrggbb" strings.

@kintel
Copy link
Member

kintel commented Dec 3, 2023

I will compare against what JS does. Agree that ... adopting ... behavior is best.

I believe JS does pretty much the same as you; convert to 32-bit. Just nice to have the same semantics to avoid surprises.

precedence is an orthogonal topic. I don't have any opinions on that yet :)

@jordanbrown0
Copy link
Contributor Author

WRT color(), you mean so that color(0xff0000) was equivalent to color([1,0,0]), full-red? And noting of course that it would be the same as color(16711680). Sure, I suppose, but I'm not sure what the use case is.

Unfortunately, you could not support the lesser-known #rgb form (one digit each), because there wouldn't be any way to distinguish 0xfff (white) from 0x000fff (full blue with a little green). You can only draw that distinction with strings.

@jordanbrown0
Copy link
Contributor Author

I investigated JS a little. They allow arbitrary-sized hex constants, that (like decimal) just start losing precision when they get over 13 digits. 0x100000000000000 plus one is unchanged. My now-backed-out hex constant support rejects constants over 8 digits.

Bitwise arithmetic is limited to the low 32 bits. 0xfffffffffffff & 0xf00000000 (13 Fs, 8 zeroes) is zero. That's consistent with what I did.

JS bitwise arithmetic is signed(!). 0xffffffff (2^32-1) is a positive number. 0xffffffff | 1 is -1. 1<<31 is a large negative number. I made bitwise arithmetic unsigned because nobody (or at least nobody sane) wants to to signed arithmetic with these operators. (In a low-level language like C where you might be using bitwise arithmetic to implement normal arithmetic, maybe... but not in a high-level language where you are using bitwise arithmetic to manipulate bit masks.) If we decided that consistency was critical I could match, but I'd have to hold my nose.

JS precedence is C-like; a&b==c is a & (b==c), with a numeric result.

Python has two types, int and float. int appears to have unlimited precision. int appears to be signed, in that ~0 is displayed as -1. int appears to convert to float at appropriate times - int divided by int is float, for instance. (Integer division is //.) Conversion to float has expected problems at 2^52; if you set x to 0x100000000000000 (14 zeroes) then x+1 is precise but x+1.0 is not. x+1-x is 1; x+1.0-x is 0. (pow(int,positiveInt) is int!)

Python precedence puts bitwise operators above comparison, so a&b==c is (a&b) == c, with a boolean result.

Python's behavior seems like the better one to emulate, though implementing unlimited-precision int does not seem worthwhile and I don't know what the implications would be of not implementing it.

@jordanbrown0
Copy link
Contributor Author

[ Transcribing comments from #3345. Let's keep comments on this PR here. ]


@rewolff said:
Jordan, Good work! Thanks!

I think there is atypo. You write 2^32-1 where you mean 2^31-1.

Hmm. Oh! ... OK. Now I understand.

Maybe just say: unsigned integers from 0 to 2^32-1 work as
expected. Integers outside that range or non-integer values are
currently truncated to unsigned 32 bit integers. This includes
negative numbers. If you know how 2s complement and the truncation
work, numbers outside 0 .. 2^32-1 could be used, but are unsupported.

For "future compatibility" I'd just reserve EVERYTHING outside
the first 32 bits. Do not predict 53 bits.

If it were me to extend this, I would just convert a double to a 64bit
longlong. in the range 2^53 .. 2^63 you're loosing bottom bits. You
cannot OR something with 2^61 and then test for the lower 8 bits (or
so)... Anyway, the official "API" is now 0..2^32-1, outside that
behaviour is currently documented but unsupported (=may change at any
time) and when it is extended, you get 0..2^53-1 and outside that
unsupported, again possibly with documeented behaviour.

Consider supporting "-1" as an "all bits set": a noop with "and".


@jordanbrown0 said:

I started out converting to uint64_t, but binary-not yielded bad results, because with small inputs it yielded numbers in the vicinity of 2^64 and on conversion back to double the low-order bits are lost.


@jordanbrown0 said:

It's been pointed out in IRC that the 0xHHHH syntax is a breaking change, because OpenSCAD allows identifiers that start with digits(?!?). Thus, e.g.,

0x0f = 14;

is legal today and would be broken by this change.

Whether it is a good thing that it's legal today is left as an exercise for the reader.


@nophead said:

It is good for me because it allows real names for parts that I model, which sometimes begin with numbers.


@jordanbrown0 said:

Alternative strategies might include a function that accepts a hex string and returns a number, or a Python-style x'foo' syntax like h'7f'.


@jordanbrown0 said:

Note that it's only breaking for identifiers that match the 0xHHHHHHHH pattern; identifiers that don't start with 0x would not be affected. Similarly, identifiers that have non-hex characters after the 0x would not be affected; 0x123z would continue to be a valid identifier.

@jordanbrown0
Copy link
Contributor Author

Consider supporting "-1" as an "all bits set": a noop with "and".

-1 does naturally equal 0xffffffff, 2^32-1. Within the bounds of 32-bit arithmetic, -1 & x == x.

@jordanbrown0 jordanbrown0 changed the title Bitwise operators, hexadecimal constants. Fixes #3345. Bitwise operators. Fixes #3345. Dec 5, 2023
@jordanbrown0
Copy link
Contributor Author

Wrote some tests, which exposed some bugs. Fixed those bugs.

@jordanbrown0
Copy link
Contributor Author

Note from writing tests: doing bitwise stuff without hex constants is a nuisance.

@jordanbrown0
Copy link
Contributor Author

The test failed on some systems, with differences like:

 ECHO: "Expect two warnings:"
-WARNING: undefined operation (number | bool) in file bitwise-operators.scad, line 36
-WARNING: undefined operation (bool | number) in file bitwise-operators.scad, line 38
+WARNING: undefined operation (number | bool) in file bitwise-operators.scad, line 61
+WARNING: undefined operation (bool | number) in file bitwise-operators.scad, line 65

The "-" lines are the expected lines; the "+" lines are the actual lines.

The actual lines are simply wrong; the expected lines are where the statements in question are located.

My guess is that there's a CRLF vs LF problem that is causing the lexer to count lines incorrectly when it's presented with a file with unexpected line breaks. The lexer should really silently handle those issues, but I bet that I ended up with CRLF lines and that the MSYS2 operations didn't clean them up during the push.

Will look at it more in a few hours.

@jordanbrown0
Copy link
Contributor Author

jordanbrown0 commented Dec 6, 2023

Indeed, looking at the lexer it sure looks like a CRLF file on a UNIX system will count both the CR and the LF as starting new lines, except that CR in comments (at the end of //, or anywhere inside /*...*/) won't be counted.

But I've stripped out the CRs from the test files, so (fingers crossed) the problem should go away for this PR.

@jordanbrown0
Copy link
Contributor Author

I filed the CRLF bug as #4882 .

@jordanbrown0
Copy link
Contributor Author

jordanbrown0 commented Dec 6, 2023

Open questions...

These first few are easy, I think, but should get concurrence.

  • Is the non-C-like (but Python-like) precedence OK? (My answer is "yes", that the C precedence is error-prone and not useful.)
  • Xor, either using punctuation, a text operator, or a function. (I'd prefer a function, if at all. I see xor as very much a secondary feature, not useful for the key use case of bitmaps.)
  • 32-bit, or more? And if more, 53-bit, 64-bit, or unlimited? 53-bit would be straightforward; the other two more difficult. (My answer: leave it at 32-bit but make it clear that that might change in the future.)
  • Signed, or not? Simplest case: is ~0 equal to 2^32-1, or is it equal to -1? (My answer: unsigned; signed does not seem useful for the use cases here.)
  • Right shift sign-extending, or not? (My answer: not, especially if these values are unsigned.)

That brings us to the not-easy questions: presenting values as hex. (Octal and binary are theoretically possible but probably have very small audiences.) There's an argument for leaving this for another PR, but in writing the test cases I found it very unnatural to talk about bitwise arithmetic in decimal.

There are three sub-cases:

  • Hex constants.
  • Hex-string-to-number.
  • Number-to-hex-string.

Details:

  • Hex constants
    • 0xHHHHHHHH. Incompatible change, but obvious.
    • Python-esque (but Python doesn't do it this way) h'........'. Are there other cases where we'd like that structure for other special strings?
    • Don't have hex constants; rely on string-to-number.
  • String-to-number
    • n = hex(s);
    • num(s [, base=b]) - solving a more general problem, but very verbose for use as a constant.
  • Number-to-string
    • s = hex(n); - it would be clever, perhaps too clever, to use the same function for both directions.
    • tohex(n)
    • str(n, base=16) - adding a named parameter to str(). Currently str() ignores names, and theoretically giving them semantics would be incompatible, but it seems unlikely that anybody uses this "feature". Would allow for additional numeric-formatting options, e.g. width=10, pad="0", et cetera and so lead to solving additional problems.
    • Object-style: n.toString(16). (Probably with named options as above.) It would be nice to move in this direction for namespace reasons, and doing one-off cases probably wouldn't be hard, but it's trickier to do in general.

Any other open questions?

@rewolff
Copy link
Contributor

rewolff commented Dec 7, 2023 via email

@jordanbrown0
Copy link
Contributor Author

jordanbrown0 commented Dec 7, 2023

But I'd like to add that ~0 and -1 are the same.

As described, they aren't the same. They're the same after you feed the -1 through at least one bitwise operator. But before that, ~0 is 2^32-1, and -1 is... -1.

The way that I would think of it is that the bitwise operators act on integer values 0..2^32-1. If you try to use a bitwise operator on a non-integer, or on a value outside that range (like -1), it gets coerced into being an integer inside that range - for a non-integer by truncating the fractional part; for a value outside that range by stripping off the high bits in 2's-complement representation.

So there are really three behaviors there:

  • Non-integers. I won't promise that I'll never extend the bitwise operators into the fractional space. Can't think of why I would, but the purist in me likes it.
  • Numbers greater than 2^32-1. I specifically don't promise that they will always be stripped down to 32 bits.
  • Negative numbers. This is a tough question. Today, -1|0 is a positive number, as is ~0. Arguably, we should reserve the right for one or them to both be negative, some day. We should probably guarantee that 0x80000000 will never be negative, but if we go above 32 bits maybe we'll go signed. I'd say we should be very clear in documentation that the future is uncertain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

None yet

5 participants