Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve udp tunnel stability #406

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Improve udp tunnel stability #406

wants to merge 9 commits into from

Conversation

magnww
Copy link

@magnww magnww commented Feb 5, 2023

}
}

func (t *Tunnel) getSSHNoWait(ctx context.Context) ssh.Conn {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't getSSH equivalent when c != nil ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right.
Remove the wait may also cause high cpu usage, I will revert this change.

return u.Errorf("inbound-udpchan: %w", err)
u.Errorf("inbound-udpchan: %w", err)
// wait for a short while for the ssh connection
time.Sleep(time.Duration(100) * time.Millisecond)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of sleeping, can use channels to know when the ssh connection is ready?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for adding sleep before is that, when the connection is lost, getSSH() will return nil immediately, and the runOutbound() loop will take up a lot of memory and cause the process to be killed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh the loop spins? I haven’t seen that happen before though that sounds like a bug, where the real fix might be to block with a waitgroup

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After connection lost, activatingConnWait() cannot block, because t.activatingConn.DoneAll() is called.
Removing DoneAll() fixes the issue.

@@ -68,6 +68,10 @@ func New(c Config) *Tunnel {
return t
}

func (t *Tunnel) Close() {
t.activatingConn.DoneAll()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a lock/unlock + nil check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activatingConn is a waitGroup, it is value type and thread safe, so no need for lock and nil checks

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yea that's right, its a custom wait group (which might be a bad idea itself - anyway)

it should block while disconnected/connecting and it should not-block while connected

DoneAll here would stop blocking, but Tunnel Done means closed?

how does this improve UDP? sorry, I think I'm missing something

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, when the connection is lost, DoneAll in the BindSSH method will be called, which causes activatingConnWait() to not work and cannot block.

This leads to high resource usage of runOutbound in udpListener. Calling u.getUDPChan(ctx) will immediately gets nil instead of blocking.

Close is added to correctly release resources in server mode, because DoneAll was removed from BindSSH.

In client mode, there is no need to call DoneAll because the same tunnel is always used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants