fix: building on windows #12

kboroszko · 2021-11-26T13:36:12Z

This PR fixes building on Windows by:

adding serialization methods on windows, because we can't use xdr.
changing OPTIONAL to OPIONAL in field_behavior.proto. For more info look here.

This change is

dopiera

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @dopiera and @kboroszko)

#!/usr/bin/env bash

I'm guessing this file got here by accident, right?

WORKSPACE, line 112 at r1 (raw file):

https://github.com/protocolbuffers/protobuf/issues/7076

I think we need more explanation that that.

How about this:
Because of a bug in protocol buffers (protocolbuffers/protobuf#7076), new versions of this project fail to compile on Windows. The problem hinges on OPTIONAL being defined as an empty string under Windows. This makes the preprocessor remove every mention of OPTIONAL from the code, which causes compilation failures. This temporary workaround renames the name of the protobuf value OPTIONAL to OPIONAL. This should be safe as it does not affect the generated protobufs.

WORKSPACE, line 118 at r1 (raw file):

-i.bak

what do we need those *.baks for?

tensorflow_io/core/kernels/bigtable/serialization.h, line 28 at r1 (raw file):

values as byte buffers

I think it should be "byte buffers as values".

tensorflow_io/core/kernels/bigtable/serialization.h, line 32 at r1 (raw file):

 XDR seems to match what HBase does.

I think we need to elaborate on why we need two implementations.

I think we should say the following:

HBase stores integers as big-endian and floats as IEEE754 (also big-endian). Given that integer endianness does not always match float endianness, and the fact that there are architectures where it is neither little nor big (BE-32), implementing this properly is non-trivial. Ideally, we would use a library to do that. XDR matches what HBase does, but it is not easily available on Windows, so we decided to go with a hybrid approach. On Windows we assume that integer endianness matches float endianness and implement the deserialization ourselves and everywhere else we use XDR. For that reason we provide two implementations

kboroszko

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on @dopiera)

WORKSPACE, line 112 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

https://github.com/protocolbuffers/protobuf/issues/7076
I think we need more explanation that that.

How about this:
Because of a bug in protocol buffers (protocolbuffers/protobuf#7076), new versions of this project fail to compile on Windows. The problem hinges on OPTIONAL being defined as an empty string under Windows. This makes the preprocessor remove every mention of OPTIONAL from the code, which causes compilation failures. This temporary workaround renames the name of the protobuf value OPTIONAL to OPIONAL. This should be safe as it does not affect the generated protobufs.

Done.

WORKSPACE, line 118 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

-i.bak
what do we need those *.baks for?

Cargo cult.

tensorflow_io/core/kernels/bigtable/serialization.h, line 28 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

values as byte buffers
I think it should be "byte buffers as values".

Actually, I don't get why. Each cell has a value, and this value is represented as bytes, no?

tensorflow_io/core/kernels/bigtable/serialization.h, line 32 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

 XDR seems to match what HBase does.
I think we need to elaborate on why we need two implementations.

I think we should say the following:

HBase stores integers as big-endian and floats as IEEE754 (also big-endian). Given that integer endianness does not always match float endianness, and the fact that there are architectures where it is neither little nor big (BE-32), implementing this properly is non-trivial. Ideally, we would use a library to do that. XDR matches what HBase does, but it is not easily available on Windows, so we decided to go with a hybrid approach. On Windows we assume that integer endianness matches float endianness and implement the deserialization ourselves and everywhere else we use XDR. For that reason we provide two implementations

Done.

bld.sh, line 1 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I'm guessing this file got here by accident, right?

yup

dopiera

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera)

kboroszko

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera)

WORKSPACE, line 118 at r1 (raw file):

Previously, kboroszko (Kajetan Boroszko) wrote…

Cargo cult.

It turns out that we have this because without it the builiding on MacOS mysteriously fails. It's some problem with sed on OSX, but i didn't spend much time debugging it. All I know is it breaks, when I remove the .bak. Other sed's in this file use this option as well, I assume for the same reason. I don't think it's worth investigating.

dopiera

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera and @kboroszko)

tensorflow_io/core/kernels/bigtable/serialization.h, line 28 at r1 (raw file):

Previously, kboroszko (Kajetan Boroszko) wrote…

Actually, I don't get why. Each cell has a value, and this value is represented as bytes, no?

I think it's a tautology to say that data is stored as byte buffers - I don't think there's an alternative. After all, ints, floats, protobufs or images are all stored as byte buffers.

I think that when you say "byte buffers as values", you indicate that the only type of the value is a byte buffer, which is what we want to indicate here.

Either way, I think the reader will get what we mean, so fix it only if you like.

tests/test_bigtable/test_serialization.py, line 40 at r8 (raw file):

        )
    ):
        test_case.assertEqual(values[i].numpy(), r.numpy()[0])

For floats and doubles you want assertAlmostEqual.

Also, why isn't this a method in BigtableReadTest?

kboroszko

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @dopiera)

tests/test_bigtable/test_serialization.py, line 40 at r8 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

For floats and doubles you want assertAlmostEqual.

Also, why isn't this a method in BigtableReadTest?

Done

dopiera

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @dopiera)

kboroszko added 5 commits November 26, 2021 11:07

serialization for windows

54db247

patched com_google_googleapis to not include OPTIONAL keyword

9fb5a99

linting

16b6197

linux line endings

d546f24

passing cell instead of value

b4c1f70

kboroszko force-pushed the kb/win_fix branch from b4c1f70 to d546f24 Compare November 26, 2021 17:03

kboroszko added 8 commits November 26, 2021 19:08

using two implementations

4cd52d5

removed too much logging

548ea9e

use_xdr proper value

960d157

added tests for serialization

08692f0

different bool byte representation

2a9bb12

linting

078b218

refactored code to two subclasses

52c5679

refactored name of ReinterpretSerializer

2620357

kboroszko requested a review from dopiera November 29, 2021 16:04

wrong type

438a9a0

dopiera requested changes Nov 29, 2021

View reviewed changes

kboroszko added 5 commits November 29, 2021 18:21

removed switching serializers

92e8709

upadated tests

c2b83cc

PR comments

2bb9f92

linting

9b1489d

return status in bytesToBool

d15dbe4

kboroszko commented Nov 29, 2021

View reviewed changes

dopiera previously approved these changes Nov 29, 2021

View reviewed changes

_win32 flag

614bf88

kboroszko dismissed dopiera’s stale review via 614bf88 November 29, 2021 17:50

kboroszko added 4 commits November 30, 2021 09:07

static cast

d8148a7

merged master

64cecc2

added bak to sed in WROKSPACE

d602a90

static cast no std

c91d6e4

kboroszko added 6 commits November 30, 2021 10:24

different value

1daac68

removed status()

05d08af

changed bool representation

58be56e

bytes same for win and linux

549afad

linting

d5db234

check bool one byte

0617cb1

kboroszko commented Dec 1, 2021

View reviewed changes

dopiera requested changes Dec 2, 2021

View reviewed changes

list and refactor tests

6656892

kboroszko commented Dec 2, 2021

View reviewed changes

dopiera approved these changes Dec 2, 2021

View reviewed changes

kboroszko merged commit 391ce6a into master Dec 2, 2021

kboroszko deleted the kb/win_fix branch December 2, 2021 19:45

dopiera pushed a commit that referenced this pull request Dec 3, 2021

fix: building on windows (#12)

47162cd

dopiera pushed a commit that referenced this pull request Dec 3, 2021

fix: building on windows (#12)

c703514

kboroszko added a commit that referenced this pull request Dec 13, 2021

fix: building on windows (#12)

8ba023a

kboroszko added a commit that referenced this pull request Dec 20, 2021

fix: building on windows (#12)

fb09814

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: building on windows #12

fix: building on windows #12

kboroszko commented Nov 26, 2021 •

edited by dopiera

dopiera left a comment

kboroszko left a comment

dopiera left a comment

kboroszko left a comment

dopiera left a comment

kboroszko left a comment

dopiera left a comment

fix: building on windows #12

fix: building on windows #12

Conversation

kboroszko commented Nov 26, 2021 • edited by dopiera

dopiera left a comment

Choose a reason for hiding this comment

kboroszko left a comment

Choose a reason for hiding this comment

dopiera left a comment

Choose a reason for hiding this comment

kboroszko left a comment

Choose a reason for hiding this comment

dopiera left a comment

Choose a reason for hiding this comment

kboroszko left a comment

Choose a reason for hiding this comment

dopiera left a comment

Choose a reason for hiding this comment

kboroszko commented Nov 26, 2021 •

edited by dopiera