Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subject interning #824

Merged
merged 1 commit into from Sep 20, 2021
Merged

Subject interning #824

merged 1 commit into from Sep 20, 2021

Conversation

moredure
Copy link
Contributor

@moredure moredure commented Sep 15, 2021

@derekcollison Please review. The price of this comparison near 2ns per operation but in case of plain subscriptions it reduces single allocation per operation and results in speedup. Looking forward for your comments. If somehow to know beforehand that subscription using wildcard then it's possible to make it no cost. So I suggest to merge this pull request first, than make some changes to subscription struct to contain isWildcard bool field and then switch to using it. so no need to compare strings just single if isWildcard and nothings else.

@coveralls
Copy link

coveralls commented Sep 15, 2021

Coverage Status

Coverage increased (+0.02%) to 87.248% when pulling 12db3f2 on domaincrawler:main into 1714547 on nats-io:main.

@moredure moredure changed the title Update nats.go Subject interning Sep 15, 2021
nats.go Outdated
@@ -2692,7 +2692,10 @@ func (nc *Conn) processMsg(data []byte) {
}

// Copy them into string
subj := string(nc.ps.ma.subject)
subj := sub.Subject
if subj != string(nc.ps.ma.subject) {
Copy link
Contributor Author

@moredure moredure Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no allocations here in case of plain subject without wildcard used for subscription

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you do a subscription those subjects can contain wildcards, and we have utilities to test for that. When a NATS application receives a message they are always on literal subjects (no wildcards).

Copy link
Contributor Author

@moredure moredure Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me where can I find them? Somewhere in the nats-server? Then it's possible to extend Subscription struct with wildcard bool or smth like that and change this subj != string(nc.ps.ma.subject) to this sub.wildcard and make it O(1) in all scenarios instead of O(n) in worst case scenario.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for wildcard, are 2 copies made here? since we call string(nc.ps.ma.subject) twice? So yes, maybe evaluating if the subscription is wildcarded when created (so only once) and use that here would be better.

Copy link
Contributor Author

@moredure moredure Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kozlovic no, because of Go compiler optimisation take place here, just for reference: bytes.Equal works same way if you check internals

// Neither cmd/compile nor gccgo allocates for these string conversions.
return string(a) == string(b)

@derekcollison
Copy link
Member

Thanks, will have a few others take a look.

@derekcollison
Copy link
Member

I believe if you only store and retrieve from the map when a message is received that would solve the problem.

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

in case of high amount of different subjects (matching wildcard) we can overflow map and it will grow large, may be inappropriate, it needs to be sometimes cleaned and requires more logic, also to check for isWildcard looks much simpler because of we know beforehand type of subscription subject plain or wildcard and it cannot change. Later we can optimise interning for case of isWildcard (store and sometimes free the map of subjects (by len(map) == opts.MaxMapLen) or pooled maps so they will be cleaned by gc sometimes, etc)

@derekcollison
Copy link
Member

I was assuming you would use an LRU of some sort etc..

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

type Suscription struct {
  // ...
  // Whether wildcard used in Subject
  wildcard bool
 // ...
}

to struct
utility to check for wildcardness

// wildcard will check a subject name for wildcardness.
func wildcard(subj string) bool {
	return strings.ContainsAny(subj, "*>")
}

it will be used

sub := &Subscription{Subject: subj, Queue: queue, mcb: cb, conn: nc, jsi: js, wildcard: wildcard(subj)}

and
the change

subj := sub.Subject
if sub.wildcard {
      subj = string(nc.ps.ma.subject)
}

so it will be much better than comparing string for each message processed.

@derekcollison
Copy link
Member

Each message you receive has a non-wildcard subject. I think a low contention LRU would be fine here when receiving a message.

Note that with JetStream the subject will match the origin subject not the low level one, so even if you did a non-wildcard subscription you still need to check the subject.

@kozlovic
Copy link
Member

Note that with JetStream the subject will match the origin subject not the low level one, so even if you did a non-wildcard subscription you still need to check the subject.

Ah right, with JetSteam the incoming message subject will likely NEVER match the subscription's subject, even if non-wildcard'ed. At this point (adding a map, etc..) I am really not sure if that will yield any benefit. We may trade one mem copy with CPU time that will make things worse. But if it can be evaluated that it does improve (for wildcard, non wildcard and JetStream), why not.

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

I am not known with the way JetStream works yet, I'll check it with JetSteam the incoming message subject will likely NEVER match the subscription's subject.

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

@kozlovic it's not match subject passed to js.SubjectSync, etc but it may match internal created Subscription.Subject or not match depending on wildcard it is or not.

@kozlovic
Copy link
Member

it's not much subject passed to js.Subject*() but it may match internal created Subscription.Subject or not match depending on wildcard it is or not.

No. The subject passed to js.Subscribe() is not the one going to be used by the internal subscription created in js.Subscribe() call. This will likely be a non-wc, but the point is that it could be a subscription on say "_INBOX.1", but the server will send to this subscription a message on subject "foo". This is because the server knows the subscription ID, but rewrites the subject. So you can't rely on the fact that the incoming message's subject will be the subscription's subject (even with no WC), at least for JetStream.

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

Looks like I had a typo in previous build) yes you are right but delivery passed to subscribeLocked so wildcard function detect it anyway.
single if costs less than a nanosecond for wildcard and the performance benefit pretty good for plain subscriptions where Subject match received message subject since string allocation costs more that if. If you have any concerns about code style please highlight it. I am also not like the view of , wildcard: wildcard(subj)}, it makes me cry with blood tears actually)

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

Maybe it's time to start doing smth like this? instead of single line, looking forward for your suggestions.

sub := &Subscription{
	Subject:  subj,
	Queue:    queue,
	mcb:      cb,
	conn:     nc,
	jsi:      js,
	wildcard: wildcard(subj),
}

@kozlovic
Copy link
Member

But again, for JetStream, the subscription may be done on "_INBOX.1" but message may arrive on "foo". This has nothing to do with wildcards. At the very least, you should do this change only for sub.jsi == nil and leave current code for JS.

@moredure
Copy link
Contributor Author

moredure commented Sep 15, 2021

Why is that happening, is JetStream some sort of broker that's why messages could be messed up?

@kozlovic
Copy link
Member

Why is that happening, is JetStream some sort of broker that's why messages could be messed up?

JetStream is the persistence layer added to core NATS. It rewrites subjects. So you can't really write a test for this outside of using JetStream, unless you write your own "hand crafted" server, which we have in some tests...

@moredure
Copy link
Contributor Author

@kozlovic sounds reasonable, added check for jsi != nil || wildcard

nats.go Outdated Show resolved Hide resolved
nats.go Outdated Show resolved Hide resolved
nats.go Outdated Show resolved Hide resolved
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@wallyqs wallyqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's squash into single commit before merging

@moredure
Copy link
Contributor Author

utilities @derekcollison mentioned inside nats-server

// Determine if a subject has any wildcard tokens.
func subjectHasWildcard(subject string) bool {
	return !subjectIsLiteral(subject)
}

// Determine if the subject has any wildcards. Fast version, does not check for
// valid subject. Used in caching layer.
func subjectIsLiteral(subject string) bool {
	for i, c := range subject {
		if c == pwc || c == fwc {
			if (i == 0 || subject[i-1] == btsep) &&
				(i+1 == len(subject) || subject[i+1] == btsep) {
				return false
			}
		}
	}
	return true
}

But they are unexported.

@kozlovic
Copy link
Member

We could port, or tokenize the subject, which makes the code simpler (the one in server is written for performance, not readability). But even your current approach is ok to me since the worse case is that a string does not really have real wildcard, just * or > as part of a token, and in that case your change would make a copy of nc.ps.ma.subject (which is the original, safe, code).

@kozlovic kozlovic merged commit 3dc8947 into nats-io:main Sep 20, 2021
kozlovic added a commit that referenced this pull request Sep 21, 2021
The recent PR #824 tried to optimize reducing memory copy
by using the non JS, non wildcard subscription's subject
as the message subject.

However, in case where a NATS subscription is used for
the delivery subject of a AddConsumer call, then this
breaks and the message would be received with the subscription
subject instead of the subject that we get from MSG parsing.

This PR basically reverts changes from #824 and add a subject
to make sure that we catch such issue in the future.

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
@kozlovic
Copy link
Member

@moredure For your information: we had to release version v1.12.3 that reverts changes from your PR. There was unfortunately issues with non JS subscriptions used to receive messages from a JetStream consumer. In that case, message's subject was rewritten but we would use the subscription's subject (say the inbox, instead of "foo"). I added a test that will now make sure that any change to try to optimize things still don't break this use case.
So we still like the idea to try reduce memory copies, but this would need more tweaking.

@moredure
Copy link
Contributor Author

moredure commented Sep 21, 2021

Aha, see, I guess I needed to make this change before jetmagic happens) ok, I will investigate this thing more deeply later since this change fits my use case and not producing bugs with invalid subscription.
@kozlovic as I understand plain subscription without wildcard can receive message with absolutely different subject name by jetstream producer, then the only way to make subject interning? by utilising initial version of this change?

subj := sub.Subject
if subj != string(nc.subject) {
  subj = string(nc.subject)
}

@kozlovic
Copy link
Member

as I understand plain subscription without wildcard can receive message with absolutely different subject name?

Yes, at least in the context of JetStream (for now).

@moredure
Copy link
Contributor Author

moredure commented Sep 21, 2021

@kozlovic I don't know, to make another pull request with such a behaviour or is it better to postpone until better times?)

@kozlovic
Copy link
Member

Would have to profile because it seems to me that doing what you suggest may still result in a copy (simply for the comparison, or maybe compiler optimize this, or we use bytes.Equal(), etc..). If you want and can provide some bench with various subscription types (wildcard, non wildcard, and various subjects length) and see if performance actually increases, etc.. this would help. But I understand that all this takes time and you may not have the time since your modification works well for your use case. Thanks again for the contribution, regardless.

@moredure
Copy link
Contributor Author

moredure commented Sep 21, 2021

No it's not, same thing with nc.filters[string(nc.ps.ma.subject)] not causing any allocation. I left the new pull request, I will come with some benchmarks later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants