All of lore.kernel.org
 help / color / mirror / Atom feed
* Debugging stuck tcp connection across localhost
@ 2022-01-06 14:59 Ben Greear
  2022-01-06 15:20 ` Neal Cardwell
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-06 14:59 UTC (permalink / raw)
  To: netdev

Hello,

I'm working on a strange problem, and could use some help if anyone has ideas.

On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
traffic generation on the netdevs, etc), I see cases where two processes trying
to communicate across localhost with TCP seem to get a stuck network
connection:

[greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED

Both processes in question continue to execute, and as far as I can tell, they are properly
attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
are non blocking).  If one was stuck not reading, I would expect netstat
to show bytes in the rcv buffer, but it is zero as you can see above.

Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
but my code can grab logs at time it sees the problem.  Is there anything
more I can do to figure out why the TCP connection appears to be stuck?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 14:59 Debugging stuck tcp connection across localhost Ben Greear
@ 2022-01-06 15:20 ` Neal Cardwell
  2022-01-06 15:39   ` Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: Neal Cardwell @ 2022-01-06 15:20 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>
> Hello,
>
> I'm working on a strange problem, and could use some help if anyone has ideas.
>
> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
> traffic generation on the netdevs, etc), I see cases where two processes trying
> to communicate across localhost with TCP seem to get a stuck network
> connection:
>
> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
>
> Both processes in question continue to execute, and as far as I can tell, they are properly
> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
> are non blocking).  If one was stuck not reading, I would expect netstat
> to show bytes in the rcv buffer, but it is zero as you can see above.
>
> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
> but my code can grab logs at time it sees the problem.  Is there anything
> more I can do to figure out why the TCP connection appears to be stuck?

It could be very useful to get more information about the state of all
the stuck connections (sender and receiver side) with something like:

  ss -tinmo 'sport = :4004 or sport = :4004'

I would recommend downloading and building a recent version of the
'ss' tool to maximize the information. Here is a recipe for doing
that:

 https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections

It could also be very useful to collect and share packet traces, as
long as taking traces does not consume an infeasible amount of space,
or perturb timing in a way that makes the buggy behavior disappear.
For example, as root:

  tcpdump -w /tmp/trace.pcap -s 120 -c 100000000 -i any port 4004 &

If space is an issue, you might start taking traces once things get
stuck to see what the retry behavior, if any, looks like.

thanks,
neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 15:20 ` Neal Cardwell
@ 2022-01-06 15:39   ` Ben Greear
  2022-01-06 16:16     ` Neal Cardwell
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-06 15:39 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev

On 1/6/22 7:20 AM, Neal Cardwell wrote:
> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>
>> Hello,
>>
>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>
>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>> traffic generation on the netdevs, etc), I see cases where two processes trying
>> to communicate across localhost with TCP seem to get a stuck network
>> connection:
>>
>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
>> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
>>
>> Both processes in question continue to execute, and as far as I can tell, they are properly
>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>> are non blocking).  If one was stuck not reading, I would expect netstat
>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>
>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>> but my code can grab logs at time it sees the problem.  Is there anything
>> more I can do to figure out why the TCP connection appears to be stuck?
> 
> It could be very useful to get more information about the state of all
> the stuck connections (sender and receiver side) with something like:
> 
>    ss -tinmo 'sport = :4004 or sport = :4004'
> 
> I would recommend downloading and building a recent version of the
> 'ss' tool to maximize the information. Here is a recipe for doing
> that:
> 
>   https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections

Thanks for the suggestions!

Here is output from a working system of same OS, the hand-compiled ss seems to give similar output,
do you think it is still worth building ss manually on my system that shows the bugs?

[root@ct523c-3b29 iproute2]# ss -tinmo 'sport = :4004 or sport = :4004'
State             Recv-Q             Send-Q                         Local Address:Port                         Peer Address:Port
ESTAB             0                  0                                  127.0.0.1:4004                            127.0.0.1:40902
	 skmem:(r0,rb87380,t0,tb2626560,f12288,w0,o0,bl0,d0) ts sack reno wscale:4,10 rto:201 rtt:0.009/0.004 ato:40 mss:65483 pmtu:65535 rcvmss:1196 advmss:65483 
cwnd:10 bytes_sent:654589126 bytes_acked:654589126 bytes_received:1687846 segs_out:61416 segs_in:72611 data_segs_out:61406 data_segs_in:11890 send 
582071111111bps lastsnd:163 lastrcv:62910122 lastack:163 pacing_rate 1088548571424bps delivery_rate 261932000000bps delivered:61407 app_limited busy:42494ms 
rcv_rtt:1 rcv_space:43690 rcv_ssthresh:43690 minrtt:0.002
[root@ct523c-3b29 iproute2]# ./misc/ss -tinmo 'sport = :4004 or sport = :4004'
State          Recv-Q          Send-Q                    Local Address:Port                     Peer Address:Port           Process
ESTAB          0               0                             127.0.0.1:4004                        127.0.0.1:40902
	 skmem:(r0,rb87380,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:4,10 rto:201 rtt:0.009/0.003 ato:40 mss:65483 pmtu:65535 rcvmss:1196 advmss:65483 cwnd:10 
bytes_sent:654597556 bytes_acked:654597556 bytes_received:1687846 segs_out:61418 segs_in:72613 data_segs_out:61408 data_segs_in:11890 send 582071111111bps 
lastsnd:219 lastrcv:62916882 lastack:218 pacing_rate 1088548571424bps delivery_rate 261932000000bps delivered:61409 app_limited busy:42495ms rcv_rtt:1 
rcv_space:43690 rcv_ssthresh:43690 minrtt:0.002

> 
> It could also be very useful to collect and share packet traces, as
> long as taking traces does not consume an infeasible amount of space,
> or perturb timing in a way that makes the buggy behavior disappear.
> For example, as root:
> 
>    tcpdump -w /tmp/trace.pcap -s 120 -c 100000000 -i any port 4004 &

I guess this could be  -i lo ?

I sometimes see what is likely a similar problem to an external process, but easiest thing to
reproduce is the localhost stuck connection, and my assumption is that it would be easiest
to debug.

I should have enough space for captures, I'll give that a try.

Thanks,
Ben

> 
> If space is an issue, you might start taking traces once things get
> stuck to see what the retry behavior, if any, looks like.
> 
> thanks,
> neal
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 15:39   ` Ben Greear
@ 2022-01-06 16:16     ` Neal Cardwell
  2022-01-06 19:05       ` Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: Neal Cardwell @ 2022-01-06 16:16 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>
> On 1/6/22 7:20 AM, Neal Cardwell wrote:
> > On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
> >>
> >> Hello,
> >>
> >> I'm working on a strange problem, and could use some help if anyone has ideas.
> >>
> >> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
> >> traffic generation on the netdevs, etc), I see cases where two processes trying
> >> to communicate across localhost with TCP seem to get a stuck network
> >> connection:
> >>
> >> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
> >> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
> >> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
> >>
> >> Both processes in question continue to execute, and as far as I can tell, they are properly
> >> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
> >> are non blocking).  If one was stuck not reading, I would expect netstat
> >> to show bytes in the rcv buffer, but it is zero as you can see above.
> >>
> >> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
> >> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
> >> but my code can grab logs at time it sees the problem.  Is there anything
> >> more I can do to figure out why the TCP connection appears to be stuck?
> >
> > It could be very useful to get more information about the state of all
> > the stuck connections (sender and receiver side) with something like:
> >
> >    ss -tinmo 'sport = :4004 or sport = :4004'
> >
> > I would recommend downloading and building a recent version of the
> > 'ss' tool to maximize the information. Here is a recipe for doing
> > that:
> >
> >   https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>
> Thanks for the suggestions!
>
> Here is output from a working system of same OS, the hand-compiled ss seems to give similar output,
> do you think it is still worth building ss manually on my system that shows the bugs?
>
> [root@ct523c-3b29 iproute2]# ss -tinmo 'sport = :4004 or sport = :4004'
> State             Recv-Q             Send-Q                         Local Address:Port                         Peer Address:Port
> ESTAB             0                  0                                  127.0.0.1:4004                            127.0.0.1:40902
>          skmem:(r0,rb87380,t0,tb2626560,f12288,w0,o0,bl0,d0) ts sack reno wscale:4,10 rto:201 rtt:0.009/0.004 ato:40 mss:65483 pmtu:65535 rcvmss:1196 advmss:65483
> cwnd:10 bytes_sent:654589126 bytes_acked:654589126 bytes_received:1687846 segs_out:61416 segs_in:72611 data_segs_out:61406 data_segs_in:11890 send
> 582071111111bps lastsnd:163 lastrcv:62910122 lastack:163 pacing_rate 1088548571424bps delivery_rate 261932000000bps delivered:61407 app_limited busy:42494ms
> rcv_rtt:1 rcv_space:43690 rcv_ssthresh:43690 minrtt:0.002
> [root@ct523c-3b29 iproute2]# ./misc/ss -tinmo 'sport = :4004 or sport = :4004'
> State          Recv-Q          Send-Q                    Local Address:Port                     Peer Address:Port           Process
> ESTAB          0               0                             127.0.0.1:4004                        127.0.0.1:40902
>          skmem:(r0,rb87380,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:4,10 rto:201 rtt:0.009/0.003 ato:40 mss:65483 pmtu:65535 rcvmss:1196 advmss:65483 cwnd:10
> bytes_sent:654597556 bytes_acked:654597556 bytes_received:1687846 segs_out:61418 segs_in:72613 data_segs_out:61408 data_segs_in:11890 send 582071111111bps
> lastsnd:219 lastrcv:62916882 lastack:218 pacing_rate 1088548571424bps delivery_rate 261932000000bps delivered:61409 app_limited busy:42495ms rcv_rtt:1
> rcv_space:43690 rcv_ssthresh:43690 minrtt:0.002

Great. Yes, it looks like your system has a new enough ss.

> >
> > It could also be very useful to collect and share packet traces, as
> > long as taking traces does not consume an infeasible amount of space,
> > or perturb timing in a way that makes the buggy behavior disappear.
> > For example, as root:
> >
> >    tcpdump -w /tmp/trace.pcap -s 120 -c 100000000 -i any port 4004 &
>
> I guess this could be  -i lo ?

Yes, if the problem is always on the lo device then that should be fine.

> I sometimes see what is likely a similar problem to an external process, but easiest thing to
> reproduce is the localhost stuck connection, and my assumption is that it would be easiest
> to debug.
>
> I should have enough space for captures, I'll give that a try.

Great, thanks!

neal

> Thanks,
> Ben
>
> >
> > If space is an issue, you might start taking traces once things get
> > stuck to see what the retry behavior, if any, looks like.
> >
> > thanks,
> > neal
> >
>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 16:16     ` Neal Cardwell
@ 2022-01-06 19:05       ` Ben Greear
  2022-01-06 20:04         ` Neal Cardwell
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-06 19:05 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev

On 1/6/22 8:16 AM, Neal Cardwell wrote:
> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>>
>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>>>
>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
>>>> to communicate across localhost with TCP seem to get a stuck network
>>>> connection:
>>>>
>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
>>>> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
>>>>
>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>>>> are non blocking).  If one was stuck not reading, I would expect netstat
>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>>>
>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>>>> but my code can grab logs at time it sees the problem.  Is there anything
>>>> more I can do to figure out why the TCP connection appears to be stuck?
>>>
>>> It could be very useful to get more information about the state of all
>>> the stuck connections (sender and receiver side) with something like:
>>>
>>>     ss -tinmo 'sport = :4004 or sport = :4004'
>>>
>>> I would recommend downloading and building a recent version of the
>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>> that:
>>>
>>>    https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections

Hello Neal,

Here is the ss output from when the problem was happening.  I think you can ignore the non-127.0.0.1
connections, but I left them in just in case it is somehow helpful.

In addition, the pcap capture file is uploaded here:

http://www.candelatech.com/downloads/trace-lo-4004.pcap

The problem was happening in this time frame:

[root@ct523c-0bdd ~]# date
Thu 06 Jan 2022 10:14:49 AM PST
[root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port 

ESTAB       0            222024                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min23sec,9)
	 skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483 
advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969 
data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863 
app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
	 skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483 
cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377 
data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372 
busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min52sec,2)
	 skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535 
rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642 
data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357 
busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min23sec,9)
	 skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535 
rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504 
data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps 
delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min46sec,9)
	 skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016 
advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409 
data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited 
busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,7.639ms,8)
	 skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535 
rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944 
segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate 
104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268 
notsent:3928980 minrtt:0.003
[root@ct523c-0bdd ~]# date
Thu 06 Jan 2022 10:14:57 AM PST
[root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port 

ESTAB       0            222208                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min11sec,9)
	 skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483 
advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971 
data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863 
app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
	 skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483 
cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377 
data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372 
busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min40sec,2)
	 skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535 
rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642 
data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357 
busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min11sec,9)
	 skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535 
rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504 
data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps 
delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min35sec,9)
	 skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016 
advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409 
data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited 
busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,1min57sec,9)
	 skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535 
rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945 
segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate 
104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268 
notsent:3928980 minrtt:0.003
[root@ct523c-0bdd ~]#


We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
at least a fair bit of local patches applied.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 19:05       ` Ben Greear
@ 2022-01-06 20:04         ` Neal Cardwell
  2022-01-06 20:20           ` Ben Greear
  2022-01-06 22:26           ` Ben Greear
  0 siblings, 2 replies; 19+ messages in thread
From: Neal Cardwell @ 2022-01-06 20:04 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
>
> On 1/6/22 8:16 AM, Neal Cardwell wrote:
> > On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
> >>
> >> On 1/6/22 7:20 AM, Neal Cardwell wrote:
> >>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I'm working on a strange problem, and could use some help if anyone has ideas.
> >>>>
> >>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
> >>>> traffic generation on the netdevs, etc), I see cases where two processes trying
> >>>> to communicate across localhost with TCP seem to get a stuck network
> >>>> connection:
> >>>>
> >>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
> >>>> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
> >>>> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
> >>>>
> >>>> Both processes in question continue to execute, and as far as I can tell, they are properly
> >>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
> >>>> are non blocking).  If one was stuck not reading, I would expect netstat
> >>>> to show bytes in the rcv buffer, but it is zero as you can see above.
> >>>>
> >>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
> >>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
> >>>> but my code can grab logs at time it sees the problem.  Is there anything
> >>>> more I can do to figure out why the TCP connection appears to be stuck?
> >>>
> >>> It could be very useful to get more information about the state of all
> >>> the stuck connections (sender and receiver side) with something like:
> >>>
> >>>     ss -tinmo 'sport = :4004 or sport = :4004'
> >>>
> >>> I would recommend downloading and building a recent version of the
> >>> 'ss' tool to maximize the information. Here is a recipe for doing
> >>> that:
> >>>
> >>>    https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>
> Hello Neal,
>
> Here is the ss output from when the problem was happening.  I think you can ignore the non-127.0.0.1
> connections, but I left them in just in case it is somehow helpful.
>
> In addition, the pcap capture file is uploaded here:
>
> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>
> The problem was happening in this time frame:
>
> [root@ct523c-0bdd ~]# date
> Thu 06 Jan 2022 10:14:49 AM PST
> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>
> ESTAB       0            222024                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min23sec,9)
>          skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>          skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483
> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min52sec,2)
>          skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min23sec,9)
>          skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min46sec,9)
>          skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,7.639ms,8)
>          skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
> notsent:3928980 minrtt:0.003
> [root@ct523c-0bdd ~]# date
> Thu 06 Jan 2022 10:14:57 AM PST
> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>
> ESTAB       0            222208                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min11sec,9)
>          skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>          skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483
> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min40sec,2)
>          skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min11sec,9)
>          skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min35sec,9)
>          skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,1min57sec,9)
>          skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
> notsent:3928980 minrtt:0.003
> [root@ct523c-0bdd ~]#
>
>
> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
> at least a fair bit of local patches applied.

Thanks for the ss traces and tcpdump output! The tcpdump traces are
nice, in that they start before the connection starts, so capture the
SYN and its critical options like wscale.

From the "timer:(persist" in the ss output, it seems the stalls (that
are preventing the send buffers from being transmitted) are caused by
a 0-byte receive window causing the senders to stop sending, and
periodically fire the ICSK_TIME_PROBE0 timer to check for an open
receive window. From "backoff:9" it seems this condition has lasted
for a very long exponential backoff process.

I don't see 0-byte receive window problems in the trace, but this is
probably because the tcpdump traces only last through 10:12:47 PST,
and the problem is showing up in ss at 10:14:49 AM PST and later.

Is it possible to reproduce the problem again, and this time let the
tcpdump traces run all the way through the period where the
connections freeze and you grab the "ss" output?

You may also have to explicitly kill the tcpdump. Perhaps the tail of
the trace was buffered in tcpdump's output buffer and not flushed to
disk. A "killall tcpdump" should do the trick to force it to cleanly
flush everything.

thanks,
neal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 20:04         ` Neal Cardwell
@ 2022-01-06 20:20           ` Ben Greear
  2022-01-06 22:26           ` Ben Greear
  1 sibling, 0 replies; 19+ messages in thread
From: Ben Greear @ 2022-01-06 20:20 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev

On 1/6/22 12:04 PM, Neal Cardwell wrote:
> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
>>
>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>>>>>
>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
>>>>>> to communicate across localhost with TCP seem to get a stuck network
>>>>>> connection:
>>>>>>
>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>>>> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
>>>>>> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
>>>>>>
>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>>>>>
>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
>>>>>
>>>>> It could be very useful to get more information about the state of all
>>>>> the stuck connections (sender and receiver side) with something like:
>>>>>
>>>>>      ss -tinmo 'sport = :4004 or sport = :4004'
>>>>>
>>>>> I would recommend downloading and building a recent version of the
>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>>>> that:
>>>>>
>>>>>     https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>>
>> Hello Neal,
>>
>> Here is the ss output from when the problem was happening.  I think you can ignore the non-127.0.0.1
>> connections, but I left them in just in case it is somehow helpful.
>>
>> In addition, the pcap capture file is uploaded here:
>>
>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>>
>> The problem was happening in this time frame:
>>
>> [root@ct523c-0bdd ~]# date
>> Thu 06 Jan 2022 10:14:49 AM PST
>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>
>> ESTAB       0            222024                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min23sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
>> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483
>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min52sec,2)
>>           skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min23sec,9)
>>           skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min46sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,7.639ms,8)
>>           skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>> notsent:3928980 minrtt:0.003
>> [root@ct523c-0bdd ~]# date
>> Thu 06 Jan 2022 10:14:57 AM PST
>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>
>> ESTAB       0            222208                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min11sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
>> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483
>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min40sec,2)
>>           skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min11sec,9)
>>           skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min35sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,1min57sec,9)
>>           skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>> notsent:3928980 minrtt:0.003
>> [root@ct523c-0bdd ~]#
>>
>>
>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
>> at least a fair bit of local patches applied.
> 
> Thanks for the ss traces and tcpdump output! The tcpdump traces are
> nice, in that they start before the connection starts, so capture the
> SYN and its critical options like wscale.
> 
>  From the "timer:(persist" in the ss output, it seems the stalls (that
> are preventing the send buffers from being transmitted) are caused by
> a 0-byte receive window causing the senders to stop sending, and
> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
> receive window. From "backoff:9" it seems this condition has lasted
> for a very long exponential backoff process.
> 
> I don't see 0-byte receive window problems in the trace, but this is
> probably because the tcpdump traces only last through 10:12:47 PST,
> and the problem is showing up in ss at 10:14:49 AM PST and later.
> 
> Is it possible to reproduce the problem again, and this time let the
> tcpdump traces run all the way through the period where the
> connections freeze and you grab the "ss" output?
> 
> You may also have to explicitly kill the tcpdump. Perhaps the tail of
> the trace was buffered in tcpdump's output buffer and not flushed to
> disk. A "killall tcpdump" should do the trick to force it to cleanly
> flush everything.

I am pretty sure I let tcpdump run for the entire time, and killed it with ctrl-c.

The 'ss' output was from several minutes from when the problem started happening
(most likely, it is difficult to know exactly when communication hangs).

I will attempt to reproduce, running the 'ss; date; sleep 5' in a loop for entirety of the test,
and will make sure tcpdump is running for the entire duration.

Would netstat (to show send/rcv buffers) be useful output as well?  Anything else
you'd like to see?

Thanks,
Ben

> 
> thanks,
> neal
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 20:04         ` Neal Cardwell
  2022-01-06 20:20           ` Ben Greear
@ 2022-01-06 22:26           ` Ben Greear
  2022-01-10 18:10             ` Ben Greear
  1 sibling, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-06 22:26 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev

On 1/6/22 12:04 PM, Neal Cardwell wrote:
> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
>>
>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>>>>>
>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
>>>>>> to communicate across localhost with TCP seem to get a stuck network
>>>>>> connection:
>>>>>>
>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>>>> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
>>>>>> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
>>>>>>
>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>>>>>
>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
>>>>>
>>>>> It could be very useful to get more information about the state of all
>>>>> the stuck connections (sender and receiver side) with something like:
>>>>>
>>>>>      ss -tinmo 'sport = :4004 or sport = :4004'
>>>>>
>>>>> I would recommend downloading and building a recent version of the
>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>>>> that:
>>>>>
>>>>>     https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>>
>> Hello Neal,
>>
>> Here is the ss output from when the problem was happening.  I think you can ignore the non-127.0.0.1
>> connections, but I left them in just in case it is somehow helpful.
>>
>> In addition, the pcap capture file is uploaded here:
>>
>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>>
>> The problem was happening in this time frame:
>>
>> [root@ct523c-0bdd ~]# date
>> Thu 06 Jan 2022 10:14:49 AM PST
>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>
>> ESTAB       0            222024                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min23sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
>> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483
>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min52sec,2)
>>           skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min23sec,9)
>>           skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min46sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,7.639ms,8)
>>           skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>> notsent:3928980 minrtt:0.003
>> [root@ct523c-0bdd ~]# date
>> Thu 06 Jan 2022 10:14:57 AM PST
>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>
>> ESTAB       0            222208                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min11sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
>> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 advmss:65483
>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min40sec,2)
>>           skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min11sec,9)
>>           skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min35sec,9)
>>           skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,1min57sec,9)
>>           skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>> notsent:3928980 minrtt:0.003
>> [root@ct523c-0bdd ~]#
>>
>>
>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
>> at least a fair bit of local patches applied.
> 
> Thanks for the ss traces and tcpdump output! The tcpdump traces are
> nice, in that they start before the connection starts, so capture the
> SYN and its critical options like wscale.
> 
>  From the "timer:(persist" in the ss output, it seems the stalls (that
> are preventing the send buffers from being transmitted) are caused by
> a 0-byte receive window causing the senders to stop sending, and
> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
> receive window. From "backoff:9" it seems this condition has lasted
> for a very long exponential backoff process.
> 
> I don't see 0-byte receive window problems in the trace, but this is
> probably because the tcpdump traces only last through 10:12:47 PST,
> and the problem is showing up in ss at 10:14:49 AM PST and later.
> 
> Is it possible to reproduce the problem again, and this time let the
> tcpdump traces run all the way through the period where the
> connections freeze and you grab the "ss" output?
> 
> You may also have to explicitly kill the tcpdump. Perhaps the tail of
> the trace was buffered in tcpdump's output buffer and not flushed to
> disk. A "killall tcpdump" should do the trick to force it to cleanly
> flush everything.

Here is another set of debugging, I made sure tcpdump ran the entire time,
as well as the ss monitoring script.

http://www.candelatech.com/downloads/ss_log.txt
http://www.candelatech.com/downloads/trace-lo-4004-b.pcap

In addition, here are logs from my tool with msec timestamps.  It is detecting
communication failure and logging about it.  Interestingly, I think it recovered
after one long timeout, but in the end, it went past the 2-minute cutoff mark
where my program will close the TCP connection and restart things.

1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0
1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0
1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0
1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0
1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0
1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0
1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0
1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0
1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0
1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0
1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0
1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0
1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0
1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0
1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0

# Recovered between here and above it seems.

1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0
1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0
1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0
1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0
1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0
1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0
1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0
1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0
1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0
1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0
1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0
1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0
1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0
1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0
1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0
1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0
1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-06 22:26           ` Ben Greear
@ 2022-01-10 18:10             ` Ben Greear
  2022-01-10 22:16               ` David Laight
  2022-01-11 10:46               ` Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Ben Greear @ 2022-01-10 18:10 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: netdev

On 1/6/22 2:26 PM, Ben Greear wrote:
> On 1/6/22 12:04 PM, Neal Cardwell wrote:
>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>
>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>>>>>>
>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
>>>>>>> to communicate across localhost with TCP seem to get a stuck network
>>>>>>> connection:
>>>>>>>
>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>>>>> tcp        0 7988926 127.0.0.1:4004          127.0.0.1:23184         ESTABLISHED
>>>>>>> tcp        0  59805 127.0.0.1:23184         127.0.0.1:4004          ESTABLISHED
>>>>>>>
>>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
>>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>>>>>>
>>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
>>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
>>>>>>
>>>>>> It could be very useful to get more information about the state of all
>>>>>> the stuck connections (sender and receiver side) with something like:
>>>>>>
>>>>>>      ss -tinmo 'sport = :4004 or sport = :4004'
>>>>>>
>>>>>> I would recommend downloading and building a recent version of the
>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>>>>> that:
>>>>>>
>>>>>>     https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>>>
>>> Hello Neal,
>>>
>>> Here is the ss output from when the problem was happening.  I think you can ignore the non-127.0.0.1
>>> connections, but I left them in just in case it is somehow helpful.
>>>
>>> In addition, the pcap capture file is uploaded here:
>>>
>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>>>
>>> The problem was happening in this time frame:
>>>
>>> [root@ct523c-0bdd ~]# date
>>> Thu 06 Jan 2022 10:14:49 AM PST
>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>>
>>> ESTAB       0            222024                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min23sec,9)
>>>           skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 
>>> rcvmss:65483
>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
>>> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 
>>> advmss:65483
>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min52sec,2)
>>>           skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 
>>> pmtu:65535
>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min23sec,9)
>>>           skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min46sec,9)
>>>           skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 
>>> rcvmss:22016
>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,7.639ms,8)
>>>           skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
>>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>>> notsent:3928980 minrtt:0.003
>>> [root@ct523c-0bdd ~]# date
>>> Thu 06 Jan 2022 10:14:57 AM PST
>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>>
>>> ESTAB       0            222208                   127.0.0.1:57224                  127.0.0.1:4004         timer:(persist,1min11sec,9)
>>>           skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 
>>> rcvmss:65483
>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
>>> ESTAB       0            0                   192.168.200.34:4004              192.168.200.34:16906
>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 
>>> advmss:65483
>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>> ESTAB       0            139656              192.168.200.34:16908             192.168.200.34:4004         timer:(persist,1min40sec,2)
>>>           skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 
>>> pmtu:65535
>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>> ESTAB       0            460                 192.168.200.34:4004              192.168.200.34:16908        timer:(on,1min11sec,9)
>>>           skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>> ESTAB       0            147205              192.168.200.34:16906             192.168.200.34:4004         timer:(persist,1min35sec,9)
>>>           skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 
>>> rcvmss:22016
>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 app_limited
>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>> ESTAB       0            3928980                  127.0.0.1:4004                   127.0.0.1:57224        timer:(persist,1min57sec,9)
>>>           skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
>>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>>> notsent:3928980 minrtt:0.003
>>> [root@ct523c-0bdd ~]#
>>>
>>>
>>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
>>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
>>> at least a fair bit of local patches applied.
>>
>> Thanks for the ss traces and tcpdump output! The tcpdump traces are
>> nice, in that they start before the connection starts, so capture the
>> SYN and its critical options like wscale.
>>
>>  From the "timer:(persist" in the ss output, it seems the stalls (that
>> are preventing the send buffers from being transmitted) are caused by
>> a 0-byte receive window causing the senders to stop sending, and
>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
>> receive window. From "backoff:9" it seems this condition has lasted
>> for a very long exponential backoff process.
>>
>> I don't see 0-byte receive window problems in the trace, but this is
>> probably because the tcpdump traces only last through 10:12:47 PST,
>> and the problem is showing up in ss at 10:14:49 AM PST and later.
>>
>> Is it possible to reproduce the problem again, and this time let the
>> tcpdump traces run all the way through the period where the
>> connections freeze and you grab the "ss" output?
>>
>> You may also have to explicitly kill the tcpdump. Perhaps the tail of
>> the trace was buffered in tcpdump's output buffer and not flushed to
>> disk. A "killall tcpdump" should do the trick to force it to cleanly
>> flush everything.
> 
> Here is another set of debugging, I made sure tcpdump ran the entire time,
> as well as the ss monitoring script.
> 
> http://www.candelatech.com/downloads/ss_log.txt
> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap
> 
> In addition, here are logs from my tool with msec timestamps.  It is detecting
> communication failure and logging about it.  Interestingly, I think it recovered
> after one long timeout, but in the end, it went past the 2-minute cutoff mark
> where my program will close the TCP connection and restart things.
> 
> 1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0
> 1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0
> 1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0
> 1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0
> 1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0
> 1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0
> 1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0
> 1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0
> 1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0
> 1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0
> 1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0
> 1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0
> 1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0
> 1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0
> 1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0
> 
> # Recovered between here and above it seems.
> 
> 1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0
> 1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0
> 1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0
> 1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0
> 1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0
> 1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0
> 1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0
> 1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0
> 1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0
> 1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0
> 1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0
> 1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0
> 1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0
> 1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0
> 1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0
> 1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0
> 1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0

Hello Neal,

Do the new captures help any?

 From my own looking at things, it seems that the sniffer fails to get frames near when the problem
starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
related connections to the same IP port).

And, if I interpret the ss output properly, after the problem happens, the sockets still think they are
sending data.  I didn't check closely enough to see if the peer side thought it received it.

We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
test team (I did not debug it in detail).

Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
kernel if you have some suggested places to add debugging...

Thanks,
Ben


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: Debugging stuck tcp connection across localhost
  2022-01-10 18:10             ` Ben Greear
@ 2022-01-10 22:16               ` David Laight
  2022-01-11 10:46               ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: David Laight @ 2022-01-10 22:16 UTC (permalink / raw)
  To: 'Ben Greear', Neal Cardwell; +Cc: netdev

From: Ben Greear <greearb@candelatech.com>
> Sent: 10 January 2022 18:10
...
>  From my own looking at things, it seems that the sniffer fails to get frames near when the problem
> starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
> related connections to the same IP port).
> 
> And, if I interpret the ss output properly, after the problem happens, the sockets still think they
> are
> sending data.  I didn't check closely enough to see if the peer side thought it received it.
> 
> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
> test team (I did not debug it in detail).
> 
> Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
> kernel if you have some suggested places to add debugging...

Sounds like all transmit traffic on the loopback interface is being discarded
before the point where the frames get fed to tsmdump.

Possibly you could use ftrace to trace function entry+exit of a few
functions that happen in the transmit path and then isolate the point
where the discard is happening.
You can't afford to trace everything - slows things down too much.
But a few traces on each send path should be ok.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-10 18:10             ` Ben Greear
  2022-01-10 22:16               ` David Laight
@ 2022-01-11 10:46               ` Eric Dumazet
  2022-01-11 21:35                 ` Ben Greear
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2022-01-11 10:46 UTC (permalink / raw)
  To: Ben Greear, Neal Cardwell, edumazet; +Cc: netdev


On 1/10/22 10:10, Ben Greear wrote:
> On 1/6/22 2:26 PM, Ben Greear wrote:
>> On 1/6/22 12:04 PM, Neal Cardwell wrote:
>>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> 
>>> wrote:
>>>>
>>>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
>>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear 
>>>>> <greearb@candelatech.com> wrote:
>>>>>>
>>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear 
>>>>>>> <greearb@candelatech.com> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm working on a strange problem, and could use some help if 
>>>>>>>> anyone has ideas.
>>>>>>>>
>>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF 
>>>>>>>> device per 'real' netdev,
>>>>>>>> traffic generation on the netdevs, etc), I see cases where two 
>>>>>>>> processes trying
>>>>>>>> to communicate across localhost with TCP seem to get a stuck 
>>>>>>>> network
>>>>>>>> connection:
>>>>>>>>
>>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>>>>>> tcp        0 7988926 127.0.0.1:4004 127.0.0.1:23184         
>>>>>>>> ESTABLISHED
>>>>>>>> tcp        0  59805 127.0.0.1:23184 127.0.0.1:4004          
>>>>>>>> ESTABLISHED
>>>>>>>>
>>>>>>>> Both processes in question continue to execute, and as far as I 
>>>>>>>> can tell, they are properly
>>>>>>>> attempting to read/write the socket, but they are 
>>>>>>>> reading/writing 0 bytes (these sockets
>>>>>>>> are non blocking).  If one was stuck not reading, I would 
>>>>>>>> expect netstat
>>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see 
>>>>>>>> above.
>>>>>>>>
>>>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a 
>>>>>>>> big messy complicated
>>>>>>>> test case, with my local ath10k-ct and other patches that 
>>>>>>>> enable virtual wifi stations,
>>>>>>>> but my code can grab logs at time it sees the problem.  Is 
>>>>>>>> there anything
>>>>>>>> more I can do to figure out why the TCP connection appears to 
>>>>>>>> be stuck?
>>>>>>>
>>>>>>> It could be very useful to get more information about the state 
>>>>>>> of all
>>>>>>> the stuck connections (sender and receiver side) with something 
>>>>>>> like:
>>>>>>>
>>>>>>>      ss -tinmo 'sport = :4004 or sport = :4004'
>>>>>>>
>>>>>>> I would recommend downloading and building a recent version of the
>>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>>>>>> that:
>>>>>>>
>>>>>>> https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>>>>
>>>> Hello Neal,
>>>>
>>>> Here is the ss output from when the problem was happening. I think 
>>>> you can ignore the non-127.0.0.1
>>>> connections, but I left them in just in case it is somehow helpful.
>>>>
>>>> In addition, the pcap capture file is uploaded here:
>>>>
>>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>>>>
>>>> The problem was happening in this time frame:
>>>>
>>>> [root@ct523c-0bdd ~]# date
>>>> Thu 06 Jan 2022 10:14:49 AM PST
>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>>> State       Recv-Q       Send-Q               Local 
>>>> Address:Port                Peer Address:Port
>>>>
>>>> ESTAB       0            222024 127.0.0.1:57224                  
>>>> 127.0.0.1:4004 timer:(persist,1min23sec,9)
>>>> skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack 
>>>> reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 
>>>> pmtu:65535 rcvmss:65483
>>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 
>>>> bytes_acked:31729223 bytes_received:228063971 segs_out:20134 
>>>> segs_in:17497 data_segs_out:11969
>>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 
>>>> lastack:125263 pacing_rate 12093239064bps delivery_rate 
>>>> 130966000000bps delivered:11863
>>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 
>>>> dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 
>>>> notsent:222024 minrtt:0.013
>>>> ESTAB       0            0 192.168.200.34:4004              
>>>> 192.168.200.34:16906
>>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts 
>>>> sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 
>>>> pmtu:65535 rcvmss:65483 advmss:65483
>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 
>>>> bytes_acked:8174668 bytes_received:20820708 segs_out:3635 
>>>> segs_in:2491 data_segs_out:2377
>>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 
>>>> lastack:125251 pacing_rate 2185097952bps delivery_rate 
>>>> 70451200000bps delivered:2372
>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 
>>>> rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>>> ESTAB       0            139656 192.168.200.34:16908             
>>>> 192.168.200.34:4004 timer:(persist,1min52sec,2)
>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack 
>>>> reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 
>>>> mss:65483 pmtu:65535
>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 
>>>> bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 
>>>> segs_out:2585 segs_in:3642
>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 
>>>> lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 
>>>> 15992bps delivered:2357
>>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 
>>>> rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>>> ESTAB       0            460 192.168.200.34:4004              
>>>> 192.168.200.34:16908 timer:(on,1min23sec,9)
>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack 
>>>> reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 
>>>> mss:22016 pmtu:65535
>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 
>>>> bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 
>>>> segs_out:3672 segs_in:2504
>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps 
>>>> lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 
>>>> 854817384bps delivery_rate 115645432bps
>>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 
>>>> rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>>> ESTAB       0            147205 192.168.200.34:16906             
>>>> 192.168.200.34:4004 timer:(persist,1min46sec,9)
>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack 
>>>> reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 
>>>> mss:65483 pmtu:65535 rcvmss:22016
>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 
>>>> bytes_acked:20820709 bytes_received:8174668 segs_out:2570 
>>>> segs_in:3625 data_segs_out:2409
>>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 
>>>> lastack:125250 pacing_rate 918999184bps delivery_rate 
>>>> 43655333328bps delivered:2331 app_limited
>>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 
>>>> rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>>> ESTAB       0            3928980 127.0.0.1:4004                   
>>>> 127.0.0.1:57224 timer:(persist,7.639ms,8)
>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack 
>>>> reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 
>>>> mss:65483 pmtu:65535
>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 
>>>> bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 
>>>> segs_out:18944
>>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 
>>>> 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 
>>>> pacing_rate 709983112bps delivery_rate
>>>> 104772800000bps delivered:16643 app_limited busy:370468ms 
>>>> rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 
>>>> rcv_space:2279928 rcv_ssthresh:24999268
>>>> notsent:3928980 minrtt:0.003
>>>> [root@ct523c-0bdd ~]# date
>>>> Thu 06 Jan 2022 10:14:57 AM PST
>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>>> State       Recv-Q       Send-Q               Local 
>>>> Address:Port                Peer Address:Port
>>>>
>>>> ESTAB       0            222208 127.0.0.1:57224                  
>>>> 127.0.0.1:4004 timer:(persist,1min11sec,9)
>>>> skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack 
>>>> reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 
>>>> pmtu:65535 rcvmss:65483
>>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 
>>>> bytes_acked:31729223 bytes_received:228063971 segs_out:20136 
>>>> segs_in:17497 data_segs_out:11971
>>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 
>>>> lastack:136944 pacing_rate 12093239064bps delivery_rate 
>>>> 130966000000bps delivered:11863
>>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 
>>>> dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 
>>>> notsent:222208 minrtt:0.013
>>>> ESTAB       0            0 192.168.200.34:4004              
>>>> 192.168.200.34:16906
>>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts 
>>>> sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 
>>>> pmtu:65535 rcvmss:65483 advmss:65483
>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 
>>>> bytes_acked:8174668 bytes_received:20820708 segs_out:3635 
>>>> segs_in:2491 data_segs_out:2377
>>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 
>>>> lastack:136932 pacing_rate 2185097952bps delivery_rate 
>>>> 70451200000bps delivered:2372
>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 
>>>> rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>>> ESTAB       0            139656 192.168.200.34:16908             
>>>> 192.168.200.34:4004 timer:(persist,1min40sec,2)
>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack 
>>>> reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 
>>>> mss:65483 pmtu:65535
>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 
>>>> bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 
>>>> segs_out:2585 segs_in:3642
>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 
>>>> lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 
>>>> 15992bps delivered:2357
>>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 
>>>> rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>>> ESTAB       0            460 192.168.200.34:4004              
>>>> 192.168.200.34:16908 timer:(on,1min11sec,9)
>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack 
>>>> reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 
>>>> mss:22016 pmtu:65535
>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 
>>>> bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 
>>>> segs_out:3672 segs_in:2504
>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps 
>>>> lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 
>>>> 854817384bps delivery_rate 115645432bps
>>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 
>>>> rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>>> ESTAB       0            147205 192.168.200.34:16906             
>>>> 192.168.200.34:4004 timer:(persist,1min35sec,9)
>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack 
>>>> reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 
>>>> mss:65483 pmtu:65535 rcvmss:22016
>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 
>>>> bytes_acked:20820709 bytes_received:8174668 segs_out:2570 
>>>> segs_in:3625 data_segs_out:2409
>>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 
>>>> lastack:136931 pacing_rate 918999184bps delivery_rate 
>>>> 43655333328bps delivered:2331 app_limited
>>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 
>>>> rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>>> ESTAB       0            3928980 127.0.0.1:4004                   
>>>> 127.0.0.1:57224 timer:(persist,1min57sec,9)
>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack 
>>>> reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 
>>>> mss:65483 pmtu:65535
>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 
>>>> bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 
>>>> segs_out:18945
>>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 
>>>> 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 
>>>> 709983112bps delivery_rate
>>>> 104772800000bps delivered:16643 app_limited busy:382149ms 
>>>> rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 
>>>> rcv_space:2279928 rcv_ssthresh:24999268
>>>> notsent:3928980 minrtt:0.003
>>>> [root@ct523c-0bdd ~]#
>>>>
>>>>
>>>> We can reproduce this readily at current, and I'm happy to try 
>>>> patches and/or do more debugging.  We also tried with a 5.12 kernel,
>>>> and saw same problems, but in all cases, we have local patches 
>>>> applied, and there is no way for us to do this test without
>>>> at least a fair bit of local patches applied.
>>>
>>> Thanks for the ss traces and tcpdump output! The tcpdump traces are
>>> nice, in that they start before the connection starts, so capture the
>>> SYN and its critical options like wscale.
>>>
>>>  From the "timer:(persist" in the ss output, it seems the stalls (that
>>> are preventing the send buffers from being transmitted) are caused by
>>> a 0-byte receive window causing the senders to stop sending, and
>>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
>>> receive window. From "backoff:9" it seems this condition has lasted
>>> for a very long exponential backoff process.
>>>
>>> I don't see 0-byte receive window problems in the trace, but this is
>>> probably because the tcpdump traces only last through 10:12:47 PST,
>>> and the problem is showing up in ss at 10:14:49 AM PST and later.
>>>
>>> Is it possible to reproduce the problem again, and this time let the
>>> tcpdump traces run all the way through the period where the
>>> connections freeze and you grab the "ss" output?
>>>
>>> You may also have to explicitly kill the tcpdump. Perhaps the tail of
>>> the trace was buffered in tcpdump's output buffer and not flushed to
>>> disk. A "killall tcpdump" should do the trick to force it to cleanly
>>> flush everything.
>>
>> Here is another set of debugging, I made sure tcpdump ran the entire 
>> time,
>> as well as the ss monitoring script.
>>
>> http://www.candelatech.com/downloads/ss_log.txt
>> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap
>>
>> In addition, here are logs from my tool with msec timestamps. It is 
>> detecting
>> communication failure and logging about it.  Interestingly, I think 
>> it recovered
>> after one long timeout, but in the end, it went past the 2-minute 
>> cutoff mark
>> where my program will close the TCP connection and restart things.
>>
>> 1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 34458ms, sending req for update, 
>> read-isset: 0
>> 1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 40314ms, sending req for update, 
>> read-isset: 0
>> 1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 47038ms, sending req for update, 
>> read-isset: 0
>> 1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 53041ms, sending req for update, 
>> read-isset: 0
>> 1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 34949ms, sending req for update, 
>> read-isset: 0
>> 1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 40692ms, sending req for update, 
>> read-isset: 0
>> 1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 46289ms, sending req for update, 
>> read-isset: 0
>> 1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 52189ms, sending req for update, 
>> read-isset: 0
>> 1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 58198ms, sending req for update, 
>> read-isset: 0
>> 1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 64158ms, sending req for update, 
>> read-isset: 0
>> 1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 69692ms, sending req for update, 
>> read-isset: 0
>> 1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 75612ms, sending req for update, 
>> read-isset: 0
>> 1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 81968ms, sending req for update, 
>> read-isset: 0
>> 1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 87976ms, sending req for update, 
>> read-isset: 0
>> 1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 94185ms, sending req for update, 
>> read-isset: 0
>>
>> # Recovered between here and above it seems.
>>
>> 1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 35840ms, sending req for update, 
>> read-isset: 0
>> 1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 30164ms, sending req for update, 
>> read-isset: 0
>> 1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 36565ms, sending req for update, 
>> read-isset: 0
>> 1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 42802ms, sending req for update, 
>> read-isset: 0
>> 1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 48896ms, sending req for update, 
>> read-isset: 0
>> 1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 55153ms, sending req for update, 
>> read-isset: 0
>> 1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 61488ms, sending req for update, 
>> read-isset: 0
>> 1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 67842ms, sending req for update, 
>> read-isset: 0
>> 1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 74042ms, sending req for update, 
>> read-isset: 0
>> 1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 80330ms, sending req for update, 
>> read-isset: 0
>> 1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 86192ms, sending req for update, 
>> read-isset: 0
>> 1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 92447ms, sending req for update, 
>> read-isset: 0
>> 1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 98687ms, sending req for update, 
>> read-isset: 0
>> 1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 104871ms, sending req for update, 
>> read-isset: 0
>> 1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 110786ms, sending req for update, 
>> read-isset: 0
>> 1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 117439ms, sending req for update, 
>> read-isset: 0
>> 1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has 
>> not received communication in: 123379ms, sending req for update, 
>> read-isset: 0
>
> Hello Neal,
>
> Do the new captures help any?
>
> From my own looking at things, it seems that the sniffer fails to get 
> frames near when the problem
> starts happening.  I am baffled as to how that can happen, especially 
> since it seems to stop getting
> packets from multiple different TCP connections (the sniffer filter 
> would pick up some other loop-back
> related connections to the same IP port).
>
> And, if I interpret the ss output properly, after the problem happens, 
> the sockets still think they are
> sending data.  I didn't check closely enough to see if the peer side 
> thought it received it.
>
> We are going to try to reproduce w/out wifi, but not sure we'll have 
> any luck with that.
> We did test w/out VRF (using lots of ip rules instead), and similar 
> problem was seen according to my
> test team (I did not debug it in detail).
>
> Do you have any suggestions for how to debug this further?  I am happy 
> to hack stuff into the
> kernel if you have some suggested places to add debugging...
>
> Thanks,
> Ben
>
These 2 packets (or lack of packets between them) are suspicious.

14:06:28.294557 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [.], seq 
226598466:226663949, ack 31270804, win 24414, options [nop,nop,TS val 
3325283657 ecr 3325283657], length 65483

14:08:04.143548 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 
31270804:31336287, ack 226663949, win 58408, options [nop,nop,TS val 
3325379506 ecr 3325283657], length 65483


It looks like ACK for first packet is not sent, and packet is not 
retransmitted.

This could be a bug in conntrack (dropping rtx packets over loopback so 
pcap do not show the attempts),

or persistent memory allocation errors in __tcp_send_ack()


Or this could be that the missing ACK packet was dropped by tcpdump.

Another suspicious sequence is:

14:09:29.159816 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 
32608288:32609208, ack 265640849, win 58408, options [nop,nop,TS val 
3325464522 ecr 3325464521], length 920

14:10:05.487756 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [.], seq 
32609208:32674691, ack 265640849, win 58408, options [nop,nop,TS val 
3325500850 ecr 3325464521], length 65483

14:10:05.487762 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [P.], seq 
265640849:265706332, ack 32674691, win 24382, options [nop,nop,TS val 
3325500850 ecr 3325500850], length 65483

Same pattern here, missing ACK for 1st packet, and no retransmits.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-11 10:46               ` Eric Dumazet
@ 2022-01-11 21:35                 ` Ben Greear
  2022-01-12  7:41                   ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-11 21:35 UTC (permalink / raw)
  To: Eric Dumazet, Neal Cardwell, edumazet; +Cc: netdev

On 1/11/22 2:46 AM, Eric Dumazet wrote:
> 
> On 1/10/22 10:10, Ben Greear wrote:
>> On 1/6/22 2:26 PM, Ben Greear wrote:
>>> On 1/6/22 12:04 PM, Neal Cardwell wrote:
>>>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>
>>>>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
>>>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>
>>>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>>>>>>>>
>>>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>>>>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
>>>>>>>>> to communicate across localhost with TCP seem to get a stuck network
>>>>>>>>> connection:
>>>>>>>>>
>>>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>>>>>>> tcp        0 7988926 127.0.0.1:4004 127.0.0.1:23184 ESTABLISHED
>>>>>>>>> tcp        0  59805 127.0.0.1:23184 127.0.0.1:4004 ESTABLISHED
>>>>>>>>>
>>>>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
>>>>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>>>>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
>>>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>>>>>>>>
>>>>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>>>>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>>>>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
>>>>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
>>>>>>>>
>>>>>>>> It could be very useful to get more information about the state of all
>>>>>>>> the stuck connections (sender and receiver side) with something like:
>>>>>>>>
>>>>>>>>      ss -tinmo 'sport = :4004 or sport = :4004'
>>>>>>>>
>>>>>>>> I would recommend downloading and building a recent version of the
>>>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>>>>>>> that:
>>>>>>>>
>>>>>>>> https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>>>>>
>>>>> Hello Neal,
>>>>>
>>>>> Here is the ss output from when the problem was happening. I think you can ignore the non-127.0.0.1
>>>>> connections, but I left them in just in case it is somehow helpful.
>>>>>
>>>>> In addition, the pcap capture file is uploaded here:
>>>>>
>>>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>>>>>
>>>>> The problem was happening in this time frame:
>>>>>
>>>>> [root@ct523c-0bdd ~]# date
>>>>> Thu 06 Jan 2022 10:14:49 AM PST
>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>>>>
>>>>> ESTAB       0            222024 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min23sec,9)
>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>>>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
>>>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>>>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
>>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
>>>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 
>>>>> advmss:65483
>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>>>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min52sec,2)
>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>>>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min23sec,9)
>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
>>>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min46sec,9)
>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>>>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 
>>>>> app_limited
>>>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,7.639ms,8)
>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
>>>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
>>>>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>>>>> notsent:3928980 minrtt:0.003
>>>>> [root@ct523c-0bdd ~]# date
>>>>> Thu 06 Jan 2022 10:14:57 AM PST
>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>>>>
>>>>> ESTAB       0            222208 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min11sec,9)
>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>>>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
>>>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>>>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
>>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
>>>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 
>>>>> advmss:65483
>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>>>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min40sec,2)
>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>>>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min11sec,9)
>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
>>>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min35sec,9)
>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>>>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 
>>>>> app_limited
>>>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,1min57sec,9)
>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
>>>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
>>>>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>>>>> notsent:3928980 minrtt:0.003
>>>>> [root@ct523c-0bdd ~]#
>>>>>
>>>>>
>>>>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
>>>>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
>>>>> at least a fair bit of local patches applied.
>>>>
>>>> Thanks for the ss traces and tcpdump output! The tcpdump traces are
>>>> nice, in that they start before the connection starts, so capture the
>>>> SYN and its critical options like wscale.
>>>>
>>>>  From the "timer:(persist" in the ss output, it seems the stalls (that
>>>> are preventing the send buffers from being transmitted) are caused by
>>>> a 0-byte receive window causing the senders to stop sending, and
>>>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
>>>> receive window. From "backoff:9" it seems this condition has lasted
>>>> for a very long exponential backoff process.
>>>>
>>>> I don't see 0-byte receive window problems in the trace, but this is
>>>> probably because the tcpdump traces only last through 10:12:47 PST,
>>>> and the problem is showing up in ss at 10:14:49 AM PST and later.
>>>>
>>>> Is it possible to reproduce the problem again, and this time let the
>>>> tcpdump traces run all the way through the period where the
>>>> connections freeze and you grab the "ss" output?
>>>>
>>>> You may also have to explicitly kill the tcpdump. Perhaps the tail of
>>>> the trace was buffered in tcpdump's output buffer and not flushed to
>>>> disk. A "killall tcpdump" should do the trick to force it to cleanly
>>>> flush everything.
>>>
>>> Here is another set of debugging, I made sure tcpdump ran the entire time,
>>> as well as the ss monitoring script.
>>>
>>> http://www.candelatech.com/downloads/ss_log.txt
>>> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap
>>>
>>> In addition, here are logs from my tool with msec timestamps. It is detecting
>>> communication failure and logging about it.  Interestingly, I think it recovered
>>> after one long timeout, but in the end, it went past the 2-minute cutoff mark
>>> where my program will close the TCP connection and restart things.
>>>
>>> 1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0
>>> 1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0
>>> 1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0
>>> 1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0
>>> 1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0
>>> 1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0
>>> 1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0
>>> 1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0
>>> 1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0
>>> 1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0
>>> 1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0
>>> 1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0
>>> 1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0
>>> 1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0
>>> 1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0
>>>
>>> # Recovered between here and above it seems.
>>>
>>> 1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0
>>> 1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0
>>> 1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0
>>> 1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0
>>> 1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0
>>> 1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0
>>> 1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0
>>> 1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0
>>> 1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0
>>> 1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0
>>> 1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0
>>> 1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0
>>> 1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0
>>> 1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0
>>> 1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0
>>> 1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0
>>> 1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0
>>
>> Hello Neal,
>>
>> Do the new captures help any?
>>
>> From my own looking at things, it seems that the sniffer fails to get frames near when the problem
>> starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
>> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
>> related connections to the same IP port).
>>
>> And, if I interpret the ss output properly, after the problem happens, the sockets still think they are
>> sending data.  I didn't check closely enough to see if the peer side thought it received it.
>>
>> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
>> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
>> test team (I did not debug it in detail).
>>
>> Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
>> kernel if you have some suggested places to add debugging...
>>
>> Thanks,
>> Ben
>>
> These 2 packets (or lack of packets between them) are suspicious.
> 
> 14:06:28.294557 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [.], seq 226598466:226663949, ack 31270804, win 24414, options [nop,nop,TS val 3325283657 ecr 
> 3325283657], length 65483
> 
> 14:08:04.143548 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 31270804:31336287, ack 226663949, win 58408, options [nop,nop,TS val 3325379506 ecr 
> 3325283657], length 65483
> 
> 
> It looks like ACK for first packet is not sent, and packet is not retransmitted.
> 
> This could be a bug in conntrack (dropping rtx packets over loopback so pcap do not show the attempts),
> 
> or persistent memory allocation errors in __tcp_send_ack()
> 
> 
> Or this could be that the missing ACK packet was dropped by tcpdump.
> 
> Another suspicious sequence is:
> 
> 14:09:29.159816 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 32608288:32609208, ack 265640849, win 58408, options [nop,nop,TS val 3325464522 ecr 
> 3325464521], length 920
> 
> 14:10:05.487756 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [.], seq 32609208:32674691, ack 265640849, win 58408, options [nop,nop,TS val 3325500850 ecr 
> 3325464521], length 65483
> 
> 14:10:05.487762 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [P.], seq 265640849:265706332, ack 32674691, win 24382, options [nop,nop,TS val 3325500850 ecr 
> 3325500850], length 65483
> 
> Same pattern here, missing ACK for 1st packet, and no retransmits.

Thanks for your insight as well.

I have not poked in the tcp stack much, certainly not recently.  Do you have any suggestions
for checking if conntrack is at fault?  For instance, could I run a command that would clear
all contrack entries and then see if it re-built them properly and recovered?

Or something like the 'ss' tool that would show enough debug to find problematic conntrack
entries?

Or, if nothing else, I was planning to try to add printk in the packet tx path, trying to match only for
frames to or from port 4004 and to/from 127.0.0.1.

Two captures in a row showed that packet capture stopped near time of the problem starting, so
I think it is probably not just some random packet-loss issue with the capture, but rather
a real symptom of the problem.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-11 21:35                 ` Ben Greear
@ 2022-01-12  7:41                   ` Eric Dumazet
  2022-01-12 14:52                     ` Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2022-01-12  7:41 UTC (permalink / raw)
  To: Ben Greear; +Cc: Eric Dumazet, Neal Cardwell, netdev

On Tue, Jan 11, 2022 at 1:35 PM Ben Greear <greearb@candelatech.com> wrote:
>
> On 1/11/22 2:46 AM, Eric Dumazet wrote:
> >
> > On 1/10/22 10:10, Ben Greear wrote:
> >> On 1/6/22 2:26 PM, Ben Greear wrote:
> >>> On 1/6/22 12:04 PM, Neal Cardwell wrote:
> >>>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>
> >>>>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
> >>>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>>
> >>>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
> >>>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
> >>>>>>>>>
> >>>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
> >>>>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
> >>>>>>>>> to communicate across localhost with TCP seem to get a stuck network
> >>>>>>>>> connection:
> >>>>>>>>>
> >>>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
> >>>>>>>>> tcp        0 7988926 127.0.0.1:4004 127.0.0.1:23184 ESTABLISHED
> >>>>>>>>> tcp        0  59805 127.0.0.1:23184 127.0.0.1:4004 ESTABLISHED
> >>>>>>>>>
> >>>>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
> >>>>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
> >>>>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
> >>>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
> >>>>>>>>>
> >>>>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
> >>>>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
> >>>>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
> >>>>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
> >>>>>>>>
> >>>>>>>> It could be very useful to get more information about the state of all
> >>>>>>>> the stuck connections (sender and receiver side) with something like:
> >>>>>>>>
> >>>>>>>>      ss -tinmo 'sport = :4004 or sport = :4004'
> >>>>>>>>
> >>>>>>>> I would recommend downloading and building a recent version of the
> >>>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
> >>>>>>>> that:
> >>>>>>>>
> >>>>>>>> https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
> >>>>>
> >>>>> Hello Neal,
> >>>>>
> >>>>> Here is the ss output from when the problem was happening. I think you can ignore the non-127.0.0.1
> >>>>> connections, but I left them in just in case it is somehow helpful.
> >>>>>
> >>>>> In addition, the pcap capture file is uploaded here:
> >>>>>
> >>>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
> >>>>>
> >>>>> The problem was happening in this time frame:
> >>>>>
> >>>>> [root@ct523c-0bdd ~]# date
> >>>>> Thu 06 Jan 2022 10:14:49 AM PST
> >>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
> >>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
> >>>>>
> >>>>> ESTAB       0            222024 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min23sec,9)
> >>>>> skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
> >>>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
> >>>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
> >>>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
> >>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
> >>>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483
> >>>>> advmss:65483
> >>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
> >>>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
> >>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
> >>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min52sec,2)
> >>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
> >>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
> >>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
> >>>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
> >>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min23sec,9)
> >>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
> >>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
> >>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
> >>>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
> >>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min46sec,9)
> >>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
> >>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
> >>>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331
> >>>>> app_limited
> >>>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
> >>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,7.639ms,8)
> >>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
> >>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
> >>>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
> >>>>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
> >>>>> notsent:3928980 minrtt:0.003
> >>>>> [root@ct523c-0bdd ~]# date
> >>>>> Thu 06 Jan 2022 10:14:57 AM PST
> >>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
> >>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
> >>>>>
> >>>>> ESTAB       0            222208 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min11sec,9)
> >>>>> skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
> >>>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
> >>>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
> >>>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
> >>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
> >>>>>           skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483
> >>>>> advmss:65483
> >>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
> >>>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
> >>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
> >>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min40sec,2)
> >>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
> >>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
> >>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
> >>>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
> >>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min11sec,9)
> >>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
> >>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
> >>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
> >>>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
> >>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min35sec,9)
> >>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
> >>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
> >>>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331
> >>>>> app_limited
> >>>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
> >>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,1min57sec,9)
> >>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
> >>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
> >>>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
> >>>>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
> >>>>> notsent:3928980 minrtt:0.003
> >>>>> [root@ct523c-0bdd ~]#
> >>>>>
> >>>>>
> >>>>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
> >>>>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
> >>>>> at least a fair bit of local patches applied.
> >>>>
> >>>> Thanks for the ss traces and tcpdump output! The tcpdump traces are
> >>>> nice, in that they start before the connection starts, so capture the
> >>>> SYN and its critical options like wscale.
> >>>>
> >>>>  From the "timer:(persist" in the ss output, it seems the stalls (that
> >>>> are preventing the send buffers from being transmitted) are caused by
> >>>> a 0-byte receive window causing the senders to stop sending, and
> >>>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
> >>>> receive window. From "backoff:9" it seems this condition has lasted
> >>>> for a very long exponential backoff process.
> >>>>
> >>>> I don't see 0-byte receive window problems in the trace, but this is
> >>>> probably because the tcpdump traces only last through 10:12:47 PST,
> >>>> and the problem is showing up in ss at 10:14:49 AM PST and later.
> >>>>
> >>>> Is it possible to reproduce the problem again, and this time let the
> >>>> tcpdump traces run all the way through the period where the
> >>>> connections freeze and you grab the "ss" output?
> >>>>
> >>>> You may also have to explicitly kill the tcpdump. Perhaps the tail of
> >>>> the trace was buffered in tcpdump's output buffer and not flushed to
> >>>> disk. A "killall tcpdump" should do the trick to force it to cleanly
> >>>> flush everything.
> >>>
> >>> Here is another set of debugging, I made sure tcpdump ran the entire time,
> >>> as well as the ss monitoring script.
> >>>
> >>> http://www.candelatech.com/downloads/ss_log.txt
> >>> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap
> >>>
> >>> In addition, here are logs from my tool with msec timestamps. It is detecting
> >>> communication failure and logging about it.  Interestingly, I think it recovered
> >>> after one long timeout, but in the end, it went past the 2-minute cutoff mark
> >>> where my program will close the TCP connection and restart things.
> >>>
> >>> 1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0
> >>> 1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0
> >>> 1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0
> >>> 1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0
> >>> 1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0
> >>> 1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0
> >>> 1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0
> >>> 1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0
> >>> 1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0
> >>> 1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0
> >>> 1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0
> >>> 1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0
> >>> 1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0
> >>> 1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0
> >>> 1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0
> >>>
> >>> # Recovered between here and above it seems.
> >>>
> >>> 1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0
> >>> 1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0
> >>> 1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0
> >>> 1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0
> >>> 1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0
> >>> 1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0
> >>> 1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0
> >>> 1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0
> >>> 1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0
> >>> 1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0
> >>> 1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0
> >>> 1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0
> >>> 1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0
> >>> 1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0
> >>> 1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0
> >>> 1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0
> >>> 1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0
> >>
> >> Hello Neal,
> >>
> >> Do the new captures help any?
> >>
> >> From my own looking at things, it seems that the sniffer fails to get frames near when the problem
> >> starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
> >> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
> >> related connections to the same IP port).
> >>
> >> And, if I interpret the ss output properly, after the problem happens, the sockets still think they are
> >> sending data.  I didn't check closely enough to see if the peer side thought it received it.
> >>
> >> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
> >> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
> >> test team (I did not debug it in detail).
> >>
> >> Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
> >> kernel if you have some suggested places to add debugging...
> >>
> >> Thanks,
> >> Ben
> >>
> > These 2 packets (or lack of packets between them) are suspicious.
> >
> > 14:06:28.294557 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [.], seq 226598466:226663949, ack 31270804, win 24414, options [nop,nop,TS val 3325283657 ecr
> > 3325283657], length 65483
> >
> > 14:08:04.143548 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 31270804:31336287, ack 226663949, win 58408, options [nop,nop,TS val 3325379506 ecr
> > 3325283657], length 65483
> >
> >
> > It looks like ACK for first packet is not sent, and packet is not retransmitted.
> >
> > This could be a bug in conntrack (dropping rtx packets over loopback so pcap do not show the attempts),
> >
> > or persistent memory allocation errors in __tcp_send_ack()
> >
> >
> > Or this could be that the missing ACK packet was dropped by tcpdump.
> >
> > Another suspicious sequence is:
> >
> > 14:09:29.159816 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 32608288:32609208, ack 265640849, win 58408, options [nop,nop,TS val 3325464522 ecr
> > 3325464521], length 920
> >
> > 14:10:05.487756 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [.], seq 32609208:32674691, ack 265640849, win 58408, options [nop,nop,TS val 3325500850 ecr
> > 3325464521], length 65483
> >
> > 14:10:05.487762 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [P.], seq 265640849:265706332, ack 32674691, win 24382, options [nop,nop,TS val 3325500850 ecr
> > 3325500850], length 65483
> >
> > Same pattern here, missing ACK for 1st packet, and no retransmits.
>
> Thanks for your insight as well.
>
> I have not poked in the tcp stack much, certainly not recently.  Do you have any suggestions
> for checking if conntrack is at fault?  For instance, could I run a command that would clear
> all contrack entries and then see if it re-built them properly and recovered?
>
> Or something like the 'ss' tool that would show enough debug to find problematic conntrack
> entries?
>
> Or, if nothing else, I was planning to try to add printk in the packet tx path, trying to match only for
> frames to or from port 4004 and to/from 127.0.0.1.
>
> Two captures in a row showed that packet capture stopped near time of the problem starting, so
> I think it is probably not just some random packet-loss issue with the capture, but rather
> a real symptom of the problem.
>

Just to clarify:

Have you any qdisc on lo interface ?

Can you try:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5079832af5c1090917a8fd5dfb1a3025e2d85ae0..81a26ce4d79fd48f870b5c1d076a9082950e2a57
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2769,6 +2769,7 @@ bool tcp_schedule_loss_probe(struct sock *sk,
bool advancing_rto)
 static bool skb_still_in_host_queue(struct sock *sk,
                                    const struct sk_buff *skb)
 {
+#if 0
        if (unlikely(skb_fclone_busy(sk, skb))) {
                set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
                smp_mb__after_atomic();
@@ -2778,6 +2779,7 @@ static bool skb_still_in_host_queue(struct sock *sk,
                        return true;
                }
        }
+#endif
        return false;
 }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-12  7:41                   ` Eric Dumazet
@ 2022-01-12 14:52                     ` Ben Greear
  2022-01-12 17:12                       ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-12 14:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Neal Cardwell, netdev

On 1/11/22 11:41 PM, Eric Dumazet wrote:
> On Tue, Jan 11, 2022 at 1:35 PM Ben Greear <greearb@candelatech.com> wrote:
>>
>> On 1/11/22 2:46 AM, Eric Dumazet wrote:
>>>
>>> On 1/10/22 10:10, Ben Greear wrote:
>>>> On 1/6/22 2:26 PM, Ben Greear wrote:
>>>>> On 1/6/22 12:04 PM, Neal Cardwell wrote:
>>>>>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>
>>>>>>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
>>>>>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>>>
>>>>>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
>>>>>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
>>>>>>>>>>>
>>>>>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
>>>>>>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
>>>>>>>>>>> to communicate across localhost with TCP seem to get a stuck network
>>>>>>>>>>> connection:
>>>>>>>>>>>
>>>>>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
>>>>>>>>>>> tcp        0 7988926 127.0.0.1:4004 127.0.0.1:23184 ESTABLISHED
>>>>>>>>>>> tcp        0  59805 127.0.0.1:23184 127.0.0.1:4004 ESTABLISHED
>>>>>>>>>>>
>>>>>>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
>>>>>>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
>>>>>>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
>>>>>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
>>>>>>>>>>>
>>>>>>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
>>>>>>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
>>>>>>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
>>>>>>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
>>>>>>>>>>
>>>>>>>>>> It could be very useful to get more information about the state of all
>>>>>>>>>> the stuck connections (sender and receiver side) with something like:
>>>>>>>>>>
>>>>>>>>>>       ss -tinmo 'sport = :4004 or sport = :4004'
>>>>>>>>>>
>>>>>>>>>> I would recommend downloading and building a recent version of the
>>>>>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
>>>>>>>>>> that:
>>>>>>>>>>
>>>>>>>>>> https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
>>>>>>>
>>>>>>> Hello Neal,
>>>>>>>
>>>>>>> Here is the ss output from when the problem was happening. I think you can ignore the non-127.0.0.1
>>>>>>> connections, but I left them in just in case it is somehow helpful.
>>>>>>>
>>>>>>> In addition, the pcap capture file is uploaded here:
>>>>>>>
>>>>>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
>>>>>>>
>>>>>>> The problem was happening in this time frame:
>>>>>>>
>>>>>>> [root@ct523c-0bdd ~]# date
>>>>>>> Thu 06 Jan 2022 10:14:49 AM PST
>>>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>>>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>>>>>>
>>>>>>> ESTAB       0            222024 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min23sec,9)
>>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>>>>>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
>>>>>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>>>>>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
>>>>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
>>>>>>>            skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483
>>>>>>> advmss:65483
>>>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>>>>>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>>>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>>>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min52sec,2)
>>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>>>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>>>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>>>>>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>>>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min23sec,9)
>>>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>>>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>>>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
>>>>>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>>>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min46sec,9)
>>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>>>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>>>>>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331
>>>>>>> app_limited
>>>>>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>>>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,7.639ms,8)
>>>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>>>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
>>>>>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
>>>>>>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>>>>>>> notsent:3928980 minrtt:0.003
>>>>>>> [root@ct523c-0bdd ~]# date
>>>>>>> Thu 06 Jan 2022 10:14:57 AM PST
>>>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
>>>>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
>>>>>>>
>>>>>>> ESTAB       0            222208 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min11sec,9)
>>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
>>>>>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
>>>>>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
>>>>>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
>>>>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
>>>>>>>            skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483
>>>>>>> advmss:65483
>>>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
>>>>>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
>>>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
>>>>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min40sec,2)
>>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
>>>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
>>>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
>>>>>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
>>>>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min11sec,9)
>>>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
>>>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
>>>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
>>>>>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
>>>>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min35sec,9)
>>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
>>>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
>>>>>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331
>>>>>>> app_limited
>>>>>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
>>>>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,1min57sec,9)
>>>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
>>>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
>>>>>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
>>>>>>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
>>>>>>> notsent:3928980 minrtt:0.003
>>>>>>> [root@ct523c-0bdd ~]#
>>>>>>>
>>>>>>>
>>>>>>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
>>>>>>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
>>>>>>> at least a fair bit of local patches applied.
>>>>>>
>>>>>> Thanks for the ss traces and tcpdump output! The tcpdump traces are
>>>>>> nice, in that they start before the connection starts, so capture the
>>>>>> SYN and its critical options like wscale.
>>>>>>
>>>>>>   From the "timer:(persist" in the ss output, it seems the stalls (that
>>>>>> are preventing the send buffers from being transmitted) are caused by
>>>>>> a 0-byte receive window causing the senders to stop sending, and
>>>>>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
>>>>>> receive window. From "backoff:9" it seems this condition has lasted
>>>>>> for a very long exponential backoff process.
>>>>>>
>>>>>> I don't see 0-byte receive window problems in the trace, but this is
>>>>>> probably because the tcpdump traces only last through 10:12:47 PST,
>>>>>> and the problem is showing up in ss at 10:14:49 AM PST and later.
>>>>>>
>>>>>> Is it possible to reproduce the problem again, and this time let the
>>>>>> tcpdump traces run all the way through the period where the
>>>>>> connections freeze and you grab the "ss" output?
>>>>>>
>>>>>> You may also have to explicitly kill the tcpdump. Perhaps the tail of
>>>>>> the trace was buffered in tcpdump's output buffer and not flushed to
>>>>>> disk. A "killall tcpdump" should do the trick to force it to cleanly
>>>>>> flush everything.
>>>>>
>>>>> Here is another set of debugging, I made sure tcpdump ran the entire time,
>>>>> as well as the ss monitoring script.
>>>>>
>>>>> http://www.candelatech.com/downloads/ss_log.txt
>>>>> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap
>>>>>
>>>>> In addition, here are logs from my tool with msec timestamps. It is detecting
>>>>> communication failure and logging about it.  Interestingly, I think it recovered
>>>>> after one long timeout, but in the end, it went past the 2-minute cutoff mark
>>>>> where my program will close the TCP connection and restart things.
>>>>>
>>>>> 1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0
>>>>> 1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0
>>>>> 1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0
>>>>> 1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0
>>>>> 1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0
>>>>> 1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0
>>>>> 1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0
>>>>> 1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0
>>>>> 1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0
>>>>> 1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0
>>>>> 1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0
>>>>> 1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0
>>>>> 1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0
>>>>> 1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0
>>>>> 1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0
>>>>>
>>>>> # Recovered between here and above it seems.
>>>>>
>>>>> 1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0
>>>>> 1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0
>>>>> 1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0
>>>>> 1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0
>>>>> 1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0
>>>>> 1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0
>>>>> 1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0
>>>>> 1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0
>>>>> 1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0
>>>>> 1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0
>>>>> 1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0
>>>>> 1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0
>>>>> 1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0
>>>>> 1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0
>>>>> 1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0
>>>>> 1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0
>>>>> 1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0
>>>>
>>>> Hello Neal,
>>>>
>>>> Do the new captures help any?
>>>>
>>>>  From my own looking at things, it seems that the sniffer fails to get frames near when the problem
>>>> starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
>>>> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
>>>> related connections to the same IP port).
>>>>
>>>> And, if I interpret the ss output properly, after the problem happens, the sockets still think they are
>>>> sending data.  I didn't check closely enough to see if the peer side thought it received it.
>>>>
>>>> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
>>>> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
>>>> test team (I did not debug it in detail).
>>>>
>>>> Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
>>>> kernel if you have some suggested places to add debugging...
>>>>
>>>> Thanks,
>>>> Ben
>>>>
>>> These 2 packets (or lack of packets between them) are suspicious.
>>>
>>> 14:06:28.294557 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [.], seq 226598466:226663949, ack 31270804, win 24414, options [nop,nop,TS val 3325283657 ecr
>>> 3325283657], length 65483
>>>
>>> 14:08:04.143548 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 31270804:31336287, ack 226663949, win 58408, options [nop,nop,TS val 3325379506 ecr
>>> 3325283657], length 65483
>>>
>>>
>>> It looks like ACK for first packet is not sent, and packet is not retransmitted.
>>>
>>> This could be a bug in conntrack (dropping rtx packets over loopback so pcap do not show the attempts),
>>>
>>> or persistent memory allocation errors in __tcp_send_ack()
>>>
>>>
>>> Or this could be that the missing ACK packet was dropped by tcpdump.
>>>
>>> Another suspicious sequence is:
>>>
>>> 14:09:29.159816 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 32608288:32609208, ack 265640849, win 58408, options [nop,nop,TS val 3325464522 ecr
>>> 3325464521], length 920
>>>
>>> 14:10:05.487756 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [.], seq 32609208:32674691, ack 265640849, win 58408, options [nop,nop,TS val 3325500850 ecr
>>> 3325464521], length 65483
>>>
>>> 14:10:05.487762 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [P.], seq 265640849:265706332, ack 32674691, win 24382, options [nop,nop,TS val 3325500850 ecr
>>> 3325500850], length 65483
>>>
>>> Same pattern here, missing ACK for 1st packet, and no retransmits.
>>
>> Thanks for your insight as well.
>>
>> I have not poked in the tcp stack much, certainly not recently.  Do you have any suggestions
>> for checking if conntrack is at fault?  For instance, could I run a command that would clear
>> all contrack entries and then see if it re-built them properly and recovered?
>>
>> Or something like the 'ss' tool that would show enough debug to find problematic conntrack
>> entries?
>>
>> Or, if nothing else, I was planning to try to add printk in the packet tx path, trying to match only for
>> frames to or from port 4004 and to/from 127.0.0.1.
>>
>> Two captures in a row showed that packet capture stopped near time of the problem starting, so
>> I think it is probably not just some random packet-loss issue with the capture, but rather
>> a real symptom of the problem.
>>
> 
> Just to clarify:
> 
> Have you any qdisc on lo interface ?
> 
> Can you try:
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5079832af5c1090917a8fd5dfb1a3025e2d85ae0..81a26ce4d79fd48f870b5c1d076a9082950e2a57
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2769,6 +2769,7 @@ bool tcp_schedule_loss_probe(struct sock *sk,
> bool advancing_rto)
>   static bool skb_still_in_host_queue(struct sock *sk,
>                                      const struct sk_buff *skb)
>   {
> +#if 0
>          if (unlikely(skb_fclone_busy(sk, skb))) {
>                  set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
>                  smp_mb__after_atomic();
> @@ -2778,6 +2779,7 @@ static bool skb_still_in_host_queue(struct sock *sk,
>                          return true;
>                  }
>          }
> +#endif
>          return false;
>   }
> 

I will try that today.

I don't think I have qdisc on lo:

# tc qdisc show|grep 'dev lo'
qdisc noqueue 0: dev lo root refcnt 2

The eth ports are using fq_codel, and I guess they are using mq as well.

We moved one of the processes off of the problematic machine so that it communicates over
Ethernet instead of 'lo', and problem seems to have gone away.  But, that also
changes system load, so it could be coincidence.

Also, conntrack -L showed nothing on a machine with simpler config where the two problematic processes
are talking over 'lo'.  The machine that shows problem does have a lot of conntrack entries because it
is also doing some NAT for other data connections, but I don't think this should affect the 127.0.0.1 traffic.
There is a decent chance I mis-understand your comment about conntrack though...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost
  2022-01-12 14:52                     ` Ben Greear
@ 2022-01-12 17:12                       ` Eric Dumazet
  2022-01-12 18:01                         ` Debugging stuck tcp connection across localhost [snip] Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2022-01-12 17:12 UTC (permalink / raw)
  To: Ben Greear; +Cc: Eric Dumazet, Neal Cardwell, netdev

On Wed, Jan 12, 2022 at 6:52 AM Ben Greear <greearb@candelatech.com> wrote:
>
> On 1/11/22 11:41 PM, Eric Dumazet wrote:
> > On Tue, Jan 11, 2022 at 1:35 PM Ben Greear <greearb@candelatech.com> wrote:
> >>
> >> On 1/11/22 2:46 AM, Eric Dumazet wrote:
> >>>
> >>> On 1/10/22 10:10, Ben Greear wrote:
> >>>> On 1/6/22 2:26 PM, Ben Greear wrote:
> >>>>> On 1/6/22 12:04 PM, Neal Cardwell wrote:
> >>>>>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>>
> >>>>>>> On 1/6/22 8:16 AM, Neal Cardwell wrote:
> >>>>>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote:
> >>>>>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas.
> >>>>>>>>>>>
> >>>>>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
> >>>>>>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying
> >>>>>>>>>>> to communicate across localhost with TCP seem to get a stuck network
> >>>>>>>>>>> connection:
> >>>>>>>>>>>
> >>>>>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1
> >>>>>>>>>>> tcp        0 7988926 127.0.0.1:4004 127.0.0.1:23184 ESTABLISHED
> >>>>>>>>>>> tcp        0  59805 127.0.0.1:23184 127.0.0.1:4004 ESTABLISHED
> >>>>>>>>>>>
> >>>>>>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly
> >>>>>>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
> >>>>>>>>>>> are non blocking).  If one was stuck not reading, I would expect netstat
> >>>>>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above.
> >>>>>>>>>>>
> >>>>>>>>>>> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
> >>>>>>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
> >>>>>>>>>>> but my code can grab logs at time it sees the problem.  Is there anything
> >>>>>>>>>>> more I can do to figure out why the TCP connection appears to be stuck?
> >>>>>>>>>>
> >>>>>>>>>> It could be very useful to get more information about the state of all
> >>>>>>>>>> the stuck connections (sender and receiver side) with something like:
> >>>>>>>>>>
> >>>>>>>>>>       ss -tinmo 'sport = :4004 or sport = :4004'
> >>>>>>>>>>
> >>>>>>>>>> I would recommend downloading and building a recent version of the
> >>>>>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing
> >>>>>>>>>> that:
> >>>>>>>>>>
> >>>>>>>>>> https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections
> >>>>>>>
> >>>>>>> Hello Neal,
> >>>>>>>
> >>>>>>> Here is the ss output from when the problem was happening. I think you can ignore the non-127.0.0.1
> >>>>>>> connections, but I left them in just in case it is somehow helpful.
> >>>>>>>
> >>>>>>> In addition, the pcap capture file is uploaded here:
> >>>>>>>
> >>>>>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap
> >>>>>>>
> >>>>>>> The problem was happening in this time frame:
> >>>>>>>
> >>>>>>> [root@ct523c-0bdd ~]# date
> >>>>>>> Thu 06 Jan 2022 10:14:49 AM PST
> >>>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
> >>>>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
> >>>>>>>
> >>>>>>> ESTAB       0            222024 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min23sec,9)
> >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
> >>>>>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969
> >>>>>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
> >>>>>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013
> >>>>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
> >>>>>>>            skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483
> >>>>>>> advmss:65483
> >>>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
> >>>>>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
> >>>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
> >>>>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min52sec,2)
> >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
> >>>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
> >>>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
> >>>>>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
> >>>>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min23sec,9)
> >>>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
> >>>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
> >>>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps
> >>>>>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
> >>>>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min46sec,9)
> >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
> >>>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
> >>>>>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331
> >>>>>>> app_limited
> >>>>>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
> >>>>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,7.639ms,8)
> >>>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
> >>>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944
> >>>>>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate
> >>>>>>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
> >>>>>>> notsent:3928980 minrtt:0.003
> >>>>>>> [root@ct523c-0bdd ~]# date
> >>>>>>> Thu 06 Jan 2022 10:14:57 AM PST
> >>>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004'
> >>>>>>> State       Recv-Q       Send-Q               Local Address:Port                Peer Address:Port
> >>>>>>>
> >>>>>>> ESTAB       0            222208 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min11sec,9)
> >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483
> >>>>>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971
> >>>>>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863
> >>>>>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013
> >>>>>>> ESTAB       0            0 192.168.200.34:4004 192.168.200.34:16906
> >>>>>>>            skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483
> >>>>>>> advmss:65483
> >>>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377
> >>>>>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372
> >>>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003
> >>>>>>> ESTAB       0            139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min40sec,2)
> >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535
> >>>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642
> >>>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357
> >>>>>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004
> >>>>>>> ESTAB       0            460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min11sec,9)
> >>>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535
> >>>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504
> >>>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps
> >>>>>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003
> >>>>>>> ESTAB       0            147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min35sec,9)
> >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016
> >>>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409
> >>>>>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331
> >>>>>>> app_limited
> >>>>>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003
> >>>>>>> ESTAB       0            3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,1min57sec,9)
> >>>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535
> >>>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945
> >>>>>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate
> >>>>>>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268
> >>>>>>> notsent:3928980 minrtt:0.003
> >>>>>>> [root@ct523c-0bdd ~]#
> >>>>>>>
> >>>>>>>
> >>>>>>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging.  We also tried with a 5.12 kernel,
> >>>>>>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without
> >>>>>>> at least a fair bit of local patches applied.
> >>>>>>
> >>>>>> Thanks for the ss traces and tcpdump output! The tcpdump traces are
> >>>>>> nice, in that they start before the connection starts, so capture the
> >>>>>> SYN and its critical options like wscale.
> >>>>>>
> >>>>>>   From the "timer:(persist" in the ss output, it seems the stalls (that
> >>>>>> are preventing the send buffers from being transmitted) are caused by
> >>>>>> a 0-byte receive window causing the senders to stop sending, and
> >>>>>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open
> >>>>>> receive window. From "backoff:9" it seems this condition has lasted
> >>>>>> for a very long exponential backoff process.
> >>>>>>
> >>>>>> I don't see 0-byte receive window problems in the trace, but this is
> >>>>>> probably because the tcpdump traces only last through 10:12:47 PST,
> >>>>>> and the problem is showing up in ss at 10:14:49 AM PST and later.
> >>>>>>
> >>>>>> Is it possible to reproduce the problem again, and this time let the
> >>>>>> tcpdump traces run all the way through the period where the
> >>>>>> connections freeze and you grab the "ss" output?
> >>>>>>
> >>>>>> You may also have to explicitly kill the tcpdump. Perhaps the tail of
> >>>>>> the trace was buffered in tcpdump's output buffer and not flushed to
> >>>>>> disk. A "killall tcpdump" should do the trick to force it to cleanly
> >>>>>> flush everything.
> >>>>>
> >>>>> Here is another set of debugging, I made sure tcpdump ran the entire time,
> >>>>> as well as the ss monitoring script.
> >>>>>
> >>>>> http://www.candelatech.com/downloads/ss_log.txt
> >>>>> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap
> >>>>>
> >>>>> In addition, here are logs from my tool with msec timestamps. It is detecting
> >>>>> communication failure and logging about it.  Interestingly, I think it recovered
> >>>>> after one long timeout, but in the end, it went past the 2-minute cutoff mark
> >>>>> where my program will close the TCP connection and restart things.
> >>>>>
> >>>>> 1641506767983:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0
> >>>>> 1641506773839:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0
> >>>>> 1641506780563:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0
> >>>>> 1641506786567:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0
> >>>>> 1641506823537:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0
> >>>>> 1641506829280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0
> >>>>> 1641506834878:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0
> >>>>> 1641506840778:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0
> >>>>> 1641506846786:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0
> >>>>> 1641506852746:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0
> >>>>> 1641506858280:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0
> >>>>> 1641506864200:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0
> >>>>> 1641506870556:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0
> >>>>> 1641506876564:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0
> >>>>> 1641506882774:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0
> >>>>>
> >>>>> # Recovered between here and above it seems.
> >>>>>
> >>>>> 1641507005029:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0
> >>>>> 1641507035759:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0
> >>>>> 1641507042161:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0
> >>>>> 1641507048397:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0
> >>>>> 1641507054491:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0
> >>>>> 1641507060748:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0
> >>>>> 1641507067083:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0
> >>>>> 1641507073438:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0
> >>>>> 1641507079638:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0
> >>>>> 1641507085926:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0
> >>>>> 1641507091788:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0
> >>>>> 1641507098042:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0
> >>>>> 1641507104283:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0
> >>>>> 1641507110466:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0
> >>>>> 1641507116381:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0
> >>>>> 1641507123034:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0
> >>>>> 1641507128975:  Card.cc 801: WARNING:  Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0
> >>>>
> >>>> Hello Neal,
> >>>>
> >>>> Do the new captures help any?
> >>>>
> >>>>  From my own looking at things, it seems that the sniffer fails to get frames near when the problem
> >>>> starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
> >>>> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
> >>>> related connections to the same IP port).
> >>>>
> >>>> And, if I interpret the ss output properly, after the problem happens, the sockets still think they are
> >>>> sending data.  I didn't check closely enough to see if the peer side thought it received it.
> >>>>
> >>>> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
> >>>> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
> >>>> test team (I did not debug it in detail).
> >>>>
> >>>> Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
> >>>> kernel if you have some suggested places to add debugging...
> >>>>
> >>>> Thanks,
> >>>> Ben
> >>>>
> >>> These 2 packets (or lack of packets between them) are suspicious.
> >>>
> >>> 14:06:28.294557 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [.], seq 226598466:226663949, ack 31270804, win 24414, options [nop,nop,TS val 3325283657 ecr
> >>> 3325283657], length 65483
> >>>
> >>> 14:08:04.143548 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 31270804:31336287, ack 226663949, win 58408, options [nop,nop,TS val 3325379506 ecr
> >>> 3325283657], length 65483
> >>>
> >>>
> >>> It looks like ACK for first packet is not sent, and packet is not retransmitted.
> >>>
> >>> This could be a bug in conntrack (dropping rtx packets over loopback so pcap do not show the attempts),
> >>>
> >>> or persistent memory allocation errors in __tcp_send_ack()
> >>>
> >>>
> >>> Or this could be that the missing ACK packet was dropped by tcpdump.
> >>>
> >>> Another suspicious sequence is:
> >>>
> >>> 14:09:29.159816 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 32608288:32609208, ack 265640849, win 58408, options [nop,nop,TS val 3325464522 ecr
> >>> 3325464521], length 920
> >>>
> >>> 14:10:05.487756 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [.], seq 32609208:32674691, ack 265640849, win 58408, options [nop,nop,TS val 3325500850 ecr
> >>> 3325464521], length 65483
> >>>
> >>> 14:10:05.487762 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [P.], seq 265640849:265706332, ack 32674691, win 24382, options [nop,nop,TS val 3325500850 ecr
> >>> 3325500850], length 65483
> >>>
> >>> Same pattern here, missing ACK for 1st packet, and no retransmits.
> >>
> >> Thanks for your insight as well.
> >>
> >> I have not poked in the tcp stack much, certainly not recently.  Do you have any suggestions
> >> for checking if conntrack is at fault?  For instance, could I run a command that would clear
> >> all contrack entries and then see if it re-built them properly and recovered?
> >>
> >> Or something like the 'ss' tool that would show enough debug to find problematic conntrack
> >> entries?
> >>
> >> Or, if nothing else, I was planning to try to add printk in the packet tx path, trying to match only for
> >> frames to or from port 4004 and to/from 127.0.0.1.
> >>
> >> Two captures in a row showed that packet capture stopped near time of the problem starting, so
> >> I think it is probably not just some random packet-loss issue with the capture, but rather
> >> a real symptom of the problem.
> >>
> >
> > Just to clarify:
> >
> > Have you any qdisc on lo interface ?
> >
> > Can you try:
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 5079832af5c1090917a8fd5dfb1a3025e2d85ae0..81a26ce4d79fd48f870b5c1d076a9082950e2a57
> > 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2769,6 +2769,7 @@ bool tcp_schedule_loss_probe(struct sock *sk,
> > bool advancing_rto)
> >   static bool skb_still_in_host_queue(struct sock *sk,
> >                                      const struct sk_buff *skb)
> >   {
> > +#if 0
> >          if (unlikely(skb_fclone_busy(sk, skb))) {
> >                  set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
> >                  smp_mb__after_atomic();
> > @@ -2778,6 +2779,7 @@ static bool skb_still_in_host_queue(struct sock *sk,
> >                          return true;
> >                  }
> >          }
> > +#endif
> >          return false;
> >   }
> >
>
> I will try that today.
>
> I don't think I have qdisc on lo:
>
> # tc qdisc show|grep 'dev lo'
> qdisc noqueue 0: dev lo root refcnt 2

Great, I wanted to make sure you were not hitting some bug there
(pfifo_fast has been buggy for many kernel versions)

>
> The eth ports are using fq_codel, and I guess they are using mq as well.
>
> We moved one of the processes off of the problematic machine so that it communicates over
> Ethernet instead of 'lo', and problem seems to have gone away.  But, that also
> changes system load, so it could be coincidence.
>
> Also, conntrack -L showed nothing on a machine with simpler config where the two problematic processes
> are talking over 'lo'.  The machine that shows problem does have a lot of conntrack entries because it
> is also doing some NAT for other data connections, but I don't think this should affect the 127.0.0.1 traffic.
> There is a decent chance I mis-understand your comment about conntrack though...

This was a wild guess. Honestly, I do not have a smoking gun yet.

>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost [snip]
  2022-01-12 17:12                       ` Eric Dumazet
@ 2022-01-12 18:01                         ` Ben Greear
  2022-01-12 18:44                           ` Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-12 18:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Neal Cardwell, netdev

On 1/12/22 9:12 AM, Eric Dumazet wrote:
> On Wed, Jan 12, 2022 at 6:52 AM Ben Greear <greearb@candelatech.com> wrote:
>>
>> On 1/11/22 11:41 PM, Eric Dumazet wrote:
>>> On Tue, Jan 11, 2022 at 1:35 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>> On 1/11/22 2:46 AM, Eric Dumazet wrote:
>>>>>

>>> Just to clarify:
>>>
>>> Have you any qdisc on lo interface ?
>>>
>>> Can you try:
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index 5079832af5c1090917a8fd5dfb1a3025e2d85ae0..81a26ce4d79fd48f870b5c1d076a9082950e2a57
>>> 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -2769,6 +2769,7 @@ bool tcp_schedule_loss_probe(struct sock *sk,
>>> bool advancing_rto)
>>>    static bool skb_still_in_host_queue(struct sock *sk,
>>>                                       const struct sk_buff *skb)
>>>    {
>>> +#if 0
>>>           if (unlikely(skb_fclone_busy(sk, skb))) {
>>>                   set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
>>>                   smp_mb__after_atomic();
>>> @@ -2778,6 +2779,7 @@ static bool skb_still_in_host_queue(struct sock *sk,
>>>                           return true;
>>>                   }
>>>           }
>>> +#endif
>>>           return false;
>>>    }
>>>
>>
>> I will try that today.
>>
>> I don't think I have qdisc on lo:
>>
>> # tc qdisc show|grep 'dev lo'
>> qdisc noqueue 0: dev lo root refcnt 2
> 
> Great, I wanted to make sure you were not hitting some bug there
> (pfifo_fast has been buggy for many kernel versions)
> 
>>
>> The eth ports are using fq_codel, and I guess they are using mq as well.
>>
>> We moved one of the processes off of the problematic machine so that it communicates over
>> Ethernet instead of 'lo', and problem seems to have gone away.  But, that also
>> changes system load, so it could be coincidence.
>>
>> Also, conntrack -L showed nothing on a machine with simpler config where the two problematic processes
>> are talking over 'lo'.  The machine that shows problem does have a lot of conntrack entries because it
>> is also doing some NAT for other data connections, but I don't think this should affect the 127.0.0.1 traffic.
>> There is a decent chance I mis-understand your comment about conntrack though...
> 
> This was a wild guess. Honestly, I do not have a smoking gun yet.

I tried your patch above, it did not help.

Also, looks like maybe we reproduced same issue with processes on different
machines, but I was not able to verify it was the same root cause, and at
least, it was harder to reproduce.

I'm back to testing in the easily reproducible case now.

I have a few local patches in the general networking path, I'm going to
attempt to back those out just in case my patches are buggy.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost [snip]
  2022-01-12 18:01                         ` Debugging stuck tcp connection across localhost [snip] Ben Greear
@ 2022-01-12 18:44                           ` Ben Greear
  2022-01-12 18:47                             ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2022-01-12 18:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Neal Cardwell, netdev

On 1/12/22 10:01 AM, Ben Greear wrote:
> On 1/12/22 9:12 AM, Eric Dumazet wrote:
>> On Wed, Jan 12, 2022 at 6:52 AM Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> On 1/11/22 11:41 PM, Eric Dumazet wrote:
>>>> On Tue, Jan 11, 2022 at 1:35 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>
>>>>> On 1/11/22 2:46 AM, Eric Dumazet wrote:
>>>>>>
> 
>>>> Just to clarify:
>>>>
>>>> Have you any qdisc on lo interface ?
>>>>
>>>> Can you try:
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index 5079832af5c1090917a8fd5dfb1a3025e2d85ae0..81a26ce4d79fd48f870b5c1d076a9082950e2a57
>>>> 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -2769,6 +2769,7 @@ bool tcp_schedule_loss_probe(struct sock *sk,
>>>> bool advancing_rto)
>>>>    static bool skb_still_in_host_queue(struct sock *sk,
>>>>                                       const struct sk_buff *skb)
>>>>    {
>>>> +#if 0
>>>>           if (unlikely(skb_fclone_busy(sk, skb))) {
>>>>                   set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
>>>>                   smp_mb__after_atomic();
>>>> @@ -2778,6 +2779,7 @@ static bool skb_still_in_host_queue(struct sock *sk,
>>>>                           return true;
>>>>                   }
>>>>           }
>>>> +#endif
>>>>           return false;
>>>>    }
>>>>
>>>
>>> I will try that today.
>>>
>>> I don't think I have qdisc on lo:
>>>
>>> # tc qdisc show|grep 'dev lo'
>>> qdisc noqueue 0: dev lo root refcnt 2
>>
>> Great, I wanted to make sure you were not hitting some bug there
>> (pfifo_fast has been buggy for many kernel versions)
>>
>>>
>>> The eth ports are using fq_codel, and I guess they are using mq as well.
>>>
>>> We moved one of the processes off of the problematic machine so that it communicates over
>>> Ethernet instead of 'lo', and problem seems to have gone away.  But, that also
>>> changes system load, so it could be coincidence.
>>>
>>> Also, conntrack -L showed nothing on a machine with simpler config where the two problematic processes
>>> are talking over 'lo'.  The machine that shows problem does have a lot of conntrack entries because it
>>> is also doing some NAT for other data connections, but I don't think this should affect the 127.0.0.1 traffic.
>>> There is a decent chance I mis-understand your comment about conntrack though...
>>
>> This was a wild guess. Honestly, I do not have a smoking gun yet.
> 
> I tried your patch above, it did not help.
> 
> Also, looks like maybe we reproduced same issue with processes on different
> machines, but I was not able to verify it was the same root cause, and at
> least, it was harder to reproduce.
> 
> I'm back to testing in the easily reproducible case now.
> 
> I have a few local patches in the general networking path, I'm going to
> attempt to back those out just in case my patches are buggy.

Well, I think maybe I found the problem.

I looked in the right place at the right time and saw that the kernel was spewing about
neigh entries being full.  The defaults are too small for the number of interfaces
we are using.  Our script that was supposed to set the thresholds higher had a typo
in it that caused it to not actually set the values.

When the neigh entries are fully consumed, then even communication across 127.0.0.1
fails in somewhat mysterious ways, and I guess this can break existing connections
too, not just new connections.

We'll do some more testing with the thresh setting fix in...always a chance there is more than one
problem in this area.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost [snip]
  2022-01-12 18:44                           ` Ben Greear
@ 2022-01-12 18:47                             ` Eric Dumazet
  2022-01-12 18:54                               ` Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2022-01-12 18:47 UTC (permalink / raw)
  To: Ben Greear; +Cc: Eric Dumazet, Neal Cardwell, netdev

On Wed, Jan 12, 2022 at 10:44 AM Ben Greear <greearb@candelatech.com> wrote:

> Well, I think maybe I found the problem.
>
> I looked in the right place at the right time and saw that the kernel was spewing about
> neigh entries being full.  The defaults are too small for the number of interfaces
> we are using.  Our script that was supposed to set the thresholds higher had a typo
> in it that caused it to not actually set the values.
>
> When the neigh entries are fully consumed, then even communication across 127.0.0.1
> fails in somewhat mysterious ways, and I guess this can break existing connections
> too, not just new connections.

Nice to hear this is not a TCP bug ;)

>
> We'll do some more testing with the thresh setting fix in...always a chance there is more than one
> problem in this area.
>
> Thanks,
> Ben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Debugging stuck tcp connection across localhost [snip]
  2022-01-12 18:47                             ` Eric Dumazet
@ 2022-01-12 18:54                               ` Ben Greear
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Greear @ 2022-01-12 18:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Neal Cardwell, netdev

On 1/12/22 10:47 AM, Eric Dumazet wrote:
> On Wed, Jan 12, 2022 at 10:44 AM Ben Greear <greearb@candelatech.com> wrote:
> 
>> Well, I think maybe I found the problem.
>>
>> I looked in the right place at the right time and saw that the kernel was spewing about
>> neigh entries being full.  The defaults are too small for the number of interfaces
>> we are using.  Our script that was supposed to set the thresholds higher had a typo
>> in it that caused it to not actually set the values.
>>
>> When the neigh entries are fully consumed, then even communication across 127.0.0.1
>> fails in somewhat mysterious ways, and I guess this can break existing connections
>> too, not just new connections.
> 
> Nice to hear this is not a TCP bug ;)

For sure...it took me a long time to get brave enough to even suggest it might
be on a public list :)

That said, might be nice to always reserve a neigh entry for localhost
at least...can't be that hard to arp for it!

--Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-01-12 18:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 14:59 Debugging stuck tcp connection across localhost Ben Greear
2022-01-06 15:20 ` Neal Cardwell
2022-01-06 15:39   ` Ben Greear
2022-01-06 16:16     ` Neal Cardwell
2022-01-06 19:05       ` Ben Greear
2022-01-06 20:04         ` Neal Cardwell
2022-01-06 20:20           ` Ben Greear
2022-01-06 22:26           ` Ben Greear
2022-01-10 18:10             ` Ben Greear
2022-01-10 22:16               ` David Laight
2022-01-11 10:46               ` Eric Dumazet
2022-01-11 21:35                 ` Ben Greear
2022-01-12  7:41                   ` Eric Dumazet
2022-01-12 14:52                     ` Ben Greear
2022-01-12 17:12                       ` Eric Dumazet
2022-01-12 18:01                         ` Debugging stuck tcp connection across localhost [snip] Ben Greear
2022-01-12 18:44                           ` Ben Greear
2022-01-12 18:47                             ` Eric Dumazet
2022-01-12 18:54                               ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.