Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP messages out-of-order in Call Flow #408

Open
jcabezas61 opened this issue Jul 16, 2022 · 6 comments
Open

SIP messages out-of-order in Call Flow #408

jcabezas61 opened this issue Jul 16, 2022 · 6 comments
Assignees
Labels

Comments

@jcabezas61
Copy link

jcabezas61 commented Jul 16, 2022

Hi,
Very often (but not always) when running sngrep in my opensips server:
The ordering of SIP messages in Call Flow window is very messed.
I'm not sure but it seems that all the messages are displayed though out-of-order.

sngrep version = 1.4.6
OS = Ubuntu 20.04.2 LTS

sngrep-out-of-order-blurred

Thanks,
Julio

@Kaian
Copy link
Member

Kaian commented Jul 18, 2022

Hi Julio!

Looks like time sort function is not working as expected because there are negative time diffs in the left column.

Can this be reproduced with an offline pcap file? Could send me one to debug the issue?

Thanks!

@Kaian Kaian self-assigned this Jul 18, 2022
@Kaian Kaian added the bug label Jul 18, 2022
@jcabezas61
Copy link
Author

Hi,

Here is a "bad call" as seen on screen and in the exported pcap.
for doing the export I selected only the desired bad call but pcap includes many other SIP messages (that were flowing through the server during the capture).
You can select in Wireshark the relevant call messages using a filter like "sip.Call-ID~MWQ2"

I could notice some things about the problem during my usage of sngrep:

1- In my experience the same installed sngrep in the same server: along one day works fine(message order correct) for some minutes/hours and then starts to mess things for some more minutes/hours and then again works fine. It forms a sucession of cycles of well- and ill- functioning.

2- I could not yet understand the duration of those cycles or what triggers/explains the change from well- to ill- and vice-versa.

3- Besides the messed order of the messages in the displayed call flow it is frequent that when doing the capture I can see that some messages take some randow seconds to appear in the flow, some appear after other later messages already rendered on screen

4- My procedure to produce the .pcap is selecting just the one call that I want to export. It seems that a "problematic call" goes to pcap with several other messages not pertaining to the selected call. On the other hand a "good call" export shows strictly all the messages that are part of the call and no other extra message.

Thanks
out-of-order_19-07-22

Link to pcap: https://www.dropbox.com/s/ytyewxwm5rs4yoy/out-of-order_19-07-22.pcap?dl=0.

@jcabezas61
Copy link
Author

hi,
Any news on this issue?
BR

@Kaian
Copy link
Member

Kaian commented Aug 10, 2022

Hi!

Sorry, I've been on hollidays these weeks.

I've tested the attached pcap and message order seems ok in both sngrep 1.4.6 and 1.5.0
Although orrder is ok, the flow shows lots of messages that are probably packet retransmissions.

sngrep does not support TCP retransmissions (#102) packets and they are handled like normal packets so flows may end with a lot of duplicated arrows.

image

image

Maybe the problem is totally related to TCP dialogs?

Regards

@jcabezas61
Copy link
Author

jcabezas61 commented Aug 12, 2022

Hi,

You ask me Maybe the problem is totally related to TCP dialogs? and I don't know what to say but the fact is that sometimes, during some time (see below) sngrep handles well the TCP-based dialogs. Btw all my important SIP traffic is TCP.

Let's make a fresh assessment of the problem as we know today:

There are time intervals (periods that can last for minutes or more) when all successive sngrep captures seem flawless

  • all message-flows appear correctly ordered with no missing messages
  • if you save (F2) any single selected call and open the resulting .pcap in wireshark or sngrep you obtain back the original selected flow, wonderful!
  • this recovered flow has all the original messages and no other messages associated to any other Call-ID
  • Let's name this a "healthy capture" ocurring inside a "healthy capturing period"

But there are time intervals (periods that last for minutes/hours) when all sngrep captures are defective

  • message-flows appear out-of-order and messages that we know that existed(because the call succeded) are missing in the flow
  • some difftimes between messages appear negative
  • if you save (F2) just one selected call and open the resulting .pcap in wireshark or sngrep you DO NOT obtain back the original flow!
  • besides the messages of selected call, messages pertaining to other undesired Call-IDs go saved into the .pcap
  • Let's name this a "sick capture" ocurring inside a "sick capturing period"

Also I observed that:

  • you can be inside a "healthy capturing period" and suddenly it becomes a "sick capturing period"
  • as an attempt of solving the problem, if you exit sngrep during a "sick period" and start sngrep again you don't get a "healthy period"
  • I never realized what can be done to avoid a "sick period" or to terminate it.

What could be the next step to understanding? or some new experiment?

BR.

@Kaian
Copy link
Member

Kaian commented Aug 16, 2022

Hi!

My guess is that period with defective captures are caused by networks errors that generates TCP retransmissions. When those retransmissions occur, sngrep handle them as normal packets, causing errors in flows (because it only supports TCP streams that are flawless as we mention earlier).

One approach would be to try to reproduce this with an offline capture. Try capturing at the same time with other raw capture like tcpdump all the traffic and as soon as sngrep fails, stop the capture and check if there have been errors in TCP streams. Configure tcpdump to rotate captures to get a small amount of packets to analize. Opening that capture with sngrep will probably cause the same defective behaviour.

Regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants