Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: building on windows #12

Merged
merged 31 commits into from Dec 2, 2021
Merged

fix: building on windows #12

merged 31 commits into from Dec 2, 2021

Conversation

kboroszko
Copy link
Collaborator

@kboroszko kboroszko commented Nov 26, 2021

This PR fixes building on Windows by:

  • adding serialization methods on windows, because we can't use xdr.
  • changing OPTIONAL to OPIONAL in field_behavior.proto. For more info look here.

This change is Reviewable

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @dopiera and @kboroszko)


bld.sh, line 1 at r1 (raw file):

#!/usr/bin/env bash

I'm guessing this file got here by accident, right?


WORKSPACE, line 112 at r1 (raw file):

https://github.com/protocolbuffers/protobuf/issues/7076

I think we need more explanation that that.

How about this:
Because of a bug in protocol buffers (protocolbuffers/protobuf#7076), new versions of this project fail to compile on Windows. The problem hinges on OPTIONAL being defined as an empty string under Windows. This makes the preprocessor remove every mention of OPTIONAL from the code, which causes compilation failures. This temporary workaround renames the name of the protobuf value OPTIONAL to OPIONAL. This should be safe as it does not affect the generated protobufs.


WORKSPACE, line 118 at r1 (raw file):

-i.bak

what do we need those *.baks for?


tensorflow_io/core/kernels/bigtable/serialization.h, line 28 at r1 (raw file):

values as byte buffers

I think it should be "byte buffers as values".


tensorflow_io/core/kernels/bigtable/serialization.h, line 32 at r1 (raw file):

 XDR seems to match what HBase does.

I think we need to elaborate on why we need two implementations.

I think we should say the following:

HBase stores integers as big-endian and floats as IEEE754 (also big-endian). Given that integer endianness does not always match float endianness, and the fact that there are architectures where it is neither little nor big (BE-32), implementing this properly is non-trivial. Ideally, we would use a library to do that. XDR matches what HBase does, but it is not easily available on Windows, so we decided to go with a hybrid approach. On Windows we assume that integer endianness matches float endianness and implement the deserialization ourselves and everywhere else we use XDR. For that reason we provide two implementations

Copy link
Collaborator Author

@kboroszko kboroszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @dopiera)


WORKSPACE, line 112 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
https://github.com/protocolbuffers/protobuf/issues/7076

I think we need more explanation that that.

How about this:
Because of a bug in protocol buffers (protocolbuffers/protobuf#7076), new versions of this project fail to compile on Windows. The problem hinges on OPTIONAL being defined as an empty string under Windows. This makes the preprocessor remove every mention of OPTIONAL from the code, which causes compilation failures. This temporary workaround renames the name of the protobuf value OPTIONAL to OPIONAL. This should be safe as it does not affect the generated protobufs.

Done.


WORKSPACE, line 118 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
-i.bak

what do we need those *.baks for?

Cargo cult.


tensorflow_io/core/kernels/bigtable/serialization.h, line 28 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
values as byte buffers

I think it should be "byte buffers as values".

Actually, I don't get why. Each cell has a value, and this value is represented as bytes, no?


tensorflow_io/core/kernels/bigtable/serialization.h, line 32 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…
 XDR seems to match what HBase does.

I think we need to elaborate on why we need two implementations.

I think we should say the following:

HBase stores integers as big-endian and floats as IEEE754 (also big-endian). Given that integer endianness does not always match float endianness, and the fact that there are architectures where it is neither little nor big (BE-32), implementing this properly is non-trivial. Ideally, we would use a library to do that. XDR matches what HBase does, but it is not easily available on Windows, so we decided to go with a hybrid approach. On Windows we assume that integer endianness matches float endianness and implement the deserialization ourselves and everywhere else we use XDR. For that reason we provide two implementations

Done.


bld.sh, line 1 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I'm guessing this file got here by accident, right?

yup

dopiera
dopiera previously approved these changes Nov 29, 2021
Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera)

Copy link
Collaborator Author

@kboroszko kboroszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera)


WORKSPACE, line 118 at r1 (raw file):

Previously, kboroszko (Kajetan Boroszko) wrote…

Cargo cult.

It turns out that we have this because without it the builiding on MacOS mysteriously fails. It's some problem with sed on OSX, but i didn't spend much time debugging it. All I know is it breaks, when I remove the .bak. Other sed's in this file use this option as well, I assume for the same reason. I don't think it's worth investigating.

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera and @kboroszko)


tensorflow_io/core/kernels/bigtable/serialization.h, line 28 at r1 (raw file):

Previously, kboroszko (Kajetan Boroszko) wrote…

Actually, I don't get why. Each cell has a value, and this value is represented as bytes, no?

I think it's a tautology to say that data is stored as byte buffers - I don't think there's an alternative. After all, ints, floats, protobufs or images are all stored as byte buffers.

I think that when you say "byte buffers as values", you indicate that the only type of the value is a byte buffer, which is what we want to indicate here.

Either way, I think the reader will get what we mean, so fix it only if you like.


tests/test_bigtable/test_serialization.py, line 40 at r8 (raw file):

        )
    ):
        test_case.assertEqual(values[i].numpy(), r.numpy()[0])

For floats and doubles you want assertAlmostEqual.

Also, why isn't this a method in BigtableReadTest?

Copy link
Collaborator Author

@kboroszko kboroszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera)


tests/test_bigtable/test_serialization.py, line 40 at r8 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

For floats and doubles you want assertAlmostEqual.

Also, why isn't this a method in BigtableReadTest?

Done

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @dopiera)

@kboroszko kboroszko merged commit 391ce6a into master Dec 2, 2021
@kboroszko kboroszko deleted the kb/win_fix branch December 2, 2021 19:45
dopiera pushed a commit that referenced this pull request Dec 3, 2021
dopiera pushed a commit that referenced this pull request Dec 3, 2021
kboroszko added a commit that referenced this pull request Dec 13, 2021
kboroszko added a commit that referenced this pull request Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants