Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the Transaction Manager node name is less than or equal to 28 bytes #40613

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

marcosgopen
Copy link
Contributor

@marcosgopen marcosgopen commented May 13, 2024

We need to include the node name in the fixed size Xid data structure(limited to 28 bytes) so that the TM can determine which XA resource branches it is responsible for.

related to PR: #36752

please note: this is fixing the issue truncating the byte array which means that it impacts the collision avoidance. This is why I added a stress test to 'test' the collision. I think that the risk of collision is still very low but a preferable solution may be to have a byte array type 'nodeName' so that the collision resistance exclusively relies on the hash algorithm.

@quarkus-bot quarkus-bot bot added the area/narayana Transactions / Narayana label May 13, 2024
@marcosgopen
Copy link
Contributor Author

FYI @turing85 and @mmusgrov

@mmusgrov
Copy link
Contributor

mmusgrov commented May 14, 2024

I added a comment to the issue indicating that the spring boot port should just avoid using the UTF-8 encoding and use ISO-LATIN-1 instead. This would mean the original code submitted by @turing85 would stand.

Copy link
Contributor

@mmusgrov mmusgrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am declining this PR because the node name is less than or equal to 28 bytes when a fixed-length character encoding is used to convert it to a byte array.

@marcosgopen
Copy link
Contributor Author

marcosgopen commented May 14, 2024

Closed as not a bug

@quarkus-bot quarkus-bot bot added the triage/invalid This doesn't seem right label May 14, 2024
@geoand geoand reopened this May 16, 2024
@quarkus-bot quarkus-bot bot removed the triage/invalid This doesn't seem right label May 16, 2024
@geoand
Copy link
Contributor

geoand commented May 16, 2024

This is going to need a better title

@marcosgopen marcosgopen changed the title Node name Ensure the Transaction Manager node name is less than or equal to 28 bytes May 16, 2024
@quarkus-bot

This comment has been minimized.

@geoand geoand requested review from mmusgrov and yrodiere and removed request for mmusgrov May 16, 2024 11:07
Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I'll wait for @mmusgrov's approval before actually approving -- he knows the implications in Narayana, I really don't.

MessageDigest messageDigest224 = MessageDigest.getInstance(HASH_ALGORITHM_FOR_SHORTENING);
transactions.nodeName = new String(messageDigest224.digest(nodeNameAsBytes), StandardCharsets.UTF_8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that your problem was with the length of the resulting string, but it seems to me there was an equally important problem: given the content of the hash is arbitrary, this particular line could throw exceptions due to malformed input... not all byte arrays are valid in UTF-8. See java.lang.String#decodeUTF8_UTF16 for example. So it seems to me base64 encoding is necessary regardless of length issues.

Even without that, you could (and likely would) end up with non-printable characters in your node name... Though from what I read elsewhere it seems the node name gets re-hashed afterwards, so I suppose Narayana takes care of making it human-readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yrodiere for your analysis. I would like to clarify some points (I hope we all agree about them):

  • only Strings longer then 28 bytes are shorten by the following process.
  • the original string has an arbitrary length ( >28 bytes), then it is hashed and it's output is 28 bytes (SHA-224). This is the only hash used in this process.
  • Base64 encoding is done after the byte array is hashed.
  • the base64 encoded byte array has an arbitrary length and so it is truncate so that it is a 28 bytes array
  • Using the Base64 encoding for the byte array the resulting Strings are printable, are utf-8 valid and can be easily encoded to UTF-8 without changing the length

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be documented in the JavaDoc of

?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. I heard about different numbers that are actually known limits. 28 for Narayana itself and 23 for usage under Kubernetes. Would it make sense to make the actual algorithm more configurable. E.g. a number property with valid range from 1 to 28 and 28 as default. Maybe even replace TransactionManagerConfiguration.shortenNodeNameIfNecessary and let the default of the new property be 0 to signal no shortening at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the JavaDoc, thanks @graben . Concerning having a parameter to configure the length of the string I think it might be risky because the hash collision would be directly impacted when the string length is changed. So we would need to make sure that the users know exactly about the impact of that configuration's change. However we need to remember that only string with byte array's length > 28 are shorten, other strings are not. So when the users have a strict limitation about the string length (e.g. for Kubernetes 23 as you said) they can pass 23 bytes string and that string will not be changed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, doesn't the same argument belong to 28, too? If 28 is the allowed max, disallow greater values and force user to shrink the config? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @graben. Yes, it does. But I would say that the collision avoidance with 28 bytes is still acceptable (see the stress test) IMO, but it probably would not with a random number of bytes between 1 to 27. So again I am OK to make that number configurable but I would recommend anyway to shorten the hash with the longest string allowed (which is 28).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the question is if 28 is the maximum allowed by Narayana, why offer an optional property to shorten values that are obviously wrong at all? Shouldn't Quarkus disallow values larger than 28 and force users to shorten themselves? No potential problems with collisions anymore! I know it's a bit of a radical solution, but it feels odd that users can define invalid properties and the software should solve it with a workaround with potential drawbacks. Just my 2 cents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @graben for your comment here. I am not sure I am following you here. You first were talking about adding an extra parameter for the length of the node name, now you are talking about removing this 'Generate right-length node name' feature (the one discussed in the issue #30491 ). I might be wrong but this PR's goal is to patch the existing implementation, not to change it. Could I maybe ask you to discuss this in a different issue or on Zulip where it is easier to get the community's feedback about this topic?
Getting back to your comment I would like to underline that the default behaviour is the one that you described: in fact the 'shortenNodeNameIfNecessary' variable is set to false by default (see https://github.com/quarkusio/quarkus/blob/main/extensions/narayana-jta/runtime/src/main/java/io/quarkus/narayana/jta/runtime/TransactionManagerConfiguration.java#L34), the original reasons to implement this feature are discussed in the issue #30491 and zulip topic https://quarkusio.zulipchat.com/#narrow/stream/187030-users/topic/manipulate.20environment.20variables.20in.20configuration.20properties. We had a discussion about it also in narayana-users group https://groups.google.com/g/narayana-users/c/ttSff9HvXdA.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it as is, it just came into my mind, while following the process involving from the very first beginning. ;-)

@marcosgopen marcosgopen force-pushed the node-name branch 2 times, most recently from 0a2ce53 to a595636 Compare May 17, 2024 11:58
Add unit test, stress test commented and updated the JavaDoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/narayana Transactions / Narayana triage/flaky-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Patch from #30491 "Generate right-length node name" seems invalid
7 participants