Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in rte_eth_tx_burst - how to manage thread safety? #725

Open
mikebromwich opened this issue Mar 9, 2021 · 1 comment
Open

panic in rte_eth_tx_burst - how to manage thread safety? #725

mikebromwich opened this issue Mar 9, 2021 · 1 comment

Comments

@mikebromwich
Copy link

mikebromwich commented Mar 9, 2021

Hi,

I am using nff-go with the netvsc DPDK driver (for Hyper-V) with two ports. In order to respond to ARP and ICMP requests, I am using DealARPICMP.

It appears that if an ARP response (which is sent in handleARPICMPRequests using answerPacket.SendPacket) cooincides with an outgoing packet being sent by the flow graph, this causes a panic (SIGSEGV) in rte_eth_tx_burst.

I've read various articles (e.g. http://mails.dpdk.org/archives/dev/2014-January/001077.html) that rte_eth_tx_burst is not thread safe (using the same port and queue). Also, the Intel documentation says...

'If multiple threads are to use the same hardware queue on the same NIC port, then locking, or some other form of mutual exclusion, is necessary.'

How can I avoid this crash and coordinate the calls to rte_eth_tx_burst between nff_go_send and directSend?

I can synchronize the calls to directSend by using my own implementation of DealARPICMP - but seemingly can't avoid collisions with nff_go_send.

Thanks,

Mike

Edited to add relevant stack trace:

[signal SIGSEGV: segmentation violation code=0x1 addr=0xc pc=0xa25660]

runtime stack:
runtime.throw(0xc19c64, 0x2a)
/usr/local/go/src/runtime/panic.go:1117 +0x72
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:718 +0x2e5

goroutine 37 [syscall, locked to thread]:
runtime.cgocall(0x863a30, 0xc000317928, 0xc000317938)
/usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc0003178f8 sp=0xc0003178c0 pc=0x4dfd9b
github.com/intel-go/nff-go/internal/low._Cfunc_directSend(0x12d0a9fc0, 0x12d0a0000, 0x0)
_cgo_gotypes.go:572 +0x45 fp=0xc000317928 sp=0xc0003178f8 pc=0x7e19a5
github.com/intel-go/nff-go/internal/low.DirectSend.func1(0x12d0a9fc0, 0x0, 0xc00031a170)
/home/mike/upf/nff-go/internal/low/low.go:95 +0x57 fp=0xc000317958 sp=0xc000317928 pc=0x7e4ed7
github.com/intel-go/nff-go/internal/low.DirectSend(0x12d0a9fc0, 0x9ed806524e5d0000, 0x6f4ea8c0dd6193f3)
/home/mike/upf/nff-go/internal/low/low.go:95 +0x35 fp=0xc000317980 sp=0xc000317958 pc=0x7e2b15
github.com/intel-go/nff-go/packet.(*Packet).SendPacket(...)
/home/mike/upf/nff-go/packet/packet.go:848
main.handleARP(0x1170b34ce, 0xc00021e108, 0x1e00a10)
/home/mike/upf/main.go:114 +0x237 fp=0xc0003179f8 sp=0xc000317980 pc=0x85d757
main.handleCorePacket(0x1170b3440, 0xc87c90, 0xc00021e108, 0x3c0000003f)
/home/mike/upf/main.go:194 +0x115 fp=0xc000317a20 sp=0xc0003179f8 pc=0x85dd75
github.com/intel-go/nff-go/flow.separate(0x1170b3440, 0xc000226310, 0xc87c90, 0xc00021e108, 0x3)
/home/mike/upf/nff-go/flow/flow.go:1796 +0x48 fp=0xc000317a50 sp=0xc000317a20 pc=0x7f1408
github.com/intel-go/nff-go/flow.segmentProcess(0xb7b720, 0xc0002045a0, 0xc000184140, 0x11, 0x11, 0xc0001a0120, 0xc0001a0180, 0xc0001a8600, 0xc000310000, 0x3, ...)
/home/mike/upf/nff-go/flow/flow.go:1466 +0x4d9 fp=0xc000317ef0 sp=0xc000317a50 pc=0x7f01f9
github.com/intel-go/nff-go/flow.(*instance).startNewClone.func1(0xc000228780, 0x5, 0xc00018e900)
/home/mike/upf/nff-go/flow/scheduler.go:289 +0x25e fp=0xc000317fc8 sp=0xc000317ef0 pc=0x7f77be
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc000317fd0 sp=0xc000317fc8 pc=0x5485e1
created by github.com/intel-go/nff-go/flow.(*instance).startNewClone
/home/mike/upf/nff-go/flow/scheduler.go:283 +0x2c5

@mikebromwich
Copy link
Author

I've temporarily worked around this by creating two Generator flow functions and connecting them into the ICMP and ARP processing via channels. However, this has tied-up two more cores - and required duplication of the ARP code within the framework - so I'd appreciate any better solutions anybody can suggest.

Thanks,

Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant