From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8015FC433F5 for ; Wed, 12 Jan 2022 17:12:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349921AbiALRMe (ORCPT ); Wed, 12 Jan 2022 12:12:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242909AbiALRMe (ORCPT ); Wed, 12 Jan 2022 12:12:34 -0500 Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0981AC06173F for ; Wed, 12 Jan 2022 09:12:34 -0800 (PST) Received: by mail-qk1-x72f.google.com with SMTP id 193so2937104qkh.13 for ; Wed, 12 Jan 2022 09:12:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EIF4k0JxSXZNzpNKqxaAZKMLPQDGXTZy2VYDOA9Y7b0=; b=iuukMBkd8Nx8tN7yU461UQgucHCMoHM7ubTHBPGcG7q9N4oyULvHBkJOIAViC8XaPU lsKkVrWFmCZrDNf1JaaxVdwno3xc1eOPwehXbVClRfG1CmrfoRYwLG2ZtBhiLjkSpiKz VJnXE4NXpMZEuG6dVo3wecHSUh6FQlT5UDFHdN1fWd4n9K9R4eMetjaAxBo9qWE67Tqf ItqG6xpVvK9IywG2AVuydmmVapZx1KrdEH0NA1LUARY2MkMnpuaTzgL8u3xNamQ60xP5 KS8416IdQ4ymr4HaDZlN50oTeTmxZFAGGRO99UJlYOu2xW6IaJ4Sbg52AJK05uFb/kEk KCwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EIF4k0JxSXZNzpNKqxaAZKMLPQDGXTZy2VYDOA9Y7b0=; b=ZIH3JhvyAIklBOl8J9lpZTpqnDVNKuCj6tVDzO71ZS8G2Dw4lvoU7ibEygzJvfmo0H p4Ar6nLPyND1n/Ko8VK5hg3WhA2RBy1dGf49cGg5X8qfM/Tqia1ZpeIdRJBXYI7fVEfK MCYFyfeE6eeIYvvnPP5AZj+v6lD0NWDfwy3ishf+fIQPewLnOGMnuxRaCZiibBW8AXdd L7Ra7oCCs6wZnFo+Ps9xiz9nD94PPGfP3DA9fbeb0oqQXKRuKZ5yuJlw7Bgs701uXHyc zAlboIVCMrBCnVCMeCKnxfDrF5l/3ydxtwfhudWSpVnsUUYmxAfoZWqwAFQjtdzbPKpi 5/+w== X-Gm-Message-State: AOAM531yzM4snkNbcuDN0hoAS8TGhIUg0irBRqPxdazrE+EKiXXzs8AW jaLXcNgmVPrIJp32oQbtpCSmIPGmdKzonEA7psPrhQ== X-Google-Smtp-Source: ABdhPJyTYe7/8lLt4klzi1K2RsEcpL/DW7S8jJHLAV5B8ihuNoG8urdDQH0McW9kIoHK9YJXPRif7LW9OxSVKCBymFQ= X-Received: by 2002:a25:cb91:: with SMTP id b139mr884302ybg.296.1642007552527; Wed, 12 Jan 2022 09:12:32 -0800 (PST) MIME-Version: 1.0 References: <38e55776-857d-1b51-3558-d788cf3c1524@candelatech.com> <9330e1c7-f7a2-0f1e-0ede-c9e5353060e3@candelatech.com> <0b2b06a8-4c59-2a00-1fbc-b4734a93ad95@gmail.com> In-Reply-To: From: Eric Dumazet Date: Wed, 12 Jan 2022 09:12:21 -0800 Message-ID: Subject: Re: Debugging stuck tcp connection across localhost To: Ben Greear Cc: Eric Dumazet , Neal Cardwell , netdev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, Jan 12, 2022 at 6:52 AM Ben Greear wrote: > > On 1/11/22 11:41 PM, Eric Dumazet wrote: > > On Tue, Jan 11, 2022 at 1:35 PM Ben Greear wrote: > >> > >> On 1/11/22 2:46 AM, Eric Dumazet wrote: > >>> > >>> On 1/10/22 10:10, Ben Greear wrote: > >>>> On 1/6/22 2:26 PM, Ben Greear wrote: > >>>>> On 1/6/22 12:04 PM, Neal Cardwell wrote: > >>>>>> On Thu, Jan 6, 2022 at 2:05 PM Ben Greear wrote: > >>>>>>> > >>>>>>> On 1/6/22 8:16 AM, Neal Cardwell wrote: > >>>>>>>> On Thu, Jan 6, 2022 at 10:39 AM Ben Greear wrote: > >>>>>>>>> > >>>>>>>>> On 1/6/22 7:20 AM, Neal Cardwell wrote: > >>>>>>>>>> On Thu, Jan 6, 2022 at 10:06 AM Ben Greear wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>> I'm working on a strange problem, and could use some help if anyone has ideas. > >>>>>>>>>>> > >>>>>>>>>>> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev, > >>>>>>>>>>> traffic generation on the netdevs, etc), I see cases where two processes trying > >>>>>>>>>>> to communicate across localhost with TCP seem to get a stuck network > >>>>>>>>>>> connection: > >>>>>>>>>>> > >>>>>>>>>>> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep 127.0.0.1 > >>>>>>>>>>> tcp 0 7988926 127.0.0.1:4004 127.0.0.1:23184 ESTABLISHED > >>>>>>>>>>> tcp 0 59805 127.0.0.1:23184 127.0.0.1:4004 ESTABLISHED > >>>>>>>>>>> > >>>>>>>>>>> Both processes in question continue to execute, and as far as I can tell, they are properly > >>>>>>>>>>> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets > >>>>>>>>>>> are non blocking). If one was stuck not reading, I would expect netstat > >>>>>>>>>>> to show bytes in the rcv buffer, but it is zero as you can see above. > >>>>>>>>>>> > >>>>>>>>>>> Kernel is 5.15.7+ local hacks. I can only reproduce this in a big messy complicated > >>>>>>>>>>> test case, with my local ath10k-ct and other patches that enable virtual wifi stations, > >>>>>>>>>>> but my code can grab logs at time it sees the problem. Is there anything > >>>>>>>>>>> more I can do to figure out why the TCP connection appears to be stuck? > >>>>>>>>>> > >>>>>>>>>> It could be very useful to get more information about the state of all > >>>>>>>>>> the stuck connections (sender and receiver side) with something like: > >>>>>>>>>> > >>>>>>>>>> ss -tinmo 'sport = :4004 or sport = :4004' > >>>>>>>>>> > >>>>>>>>>> I would recommend downloading and building a recent version of the > >>>>>>>>>> 'ss' tool to maximize the information. Here is a recipe for doing > >>>>>>>>>> that: > >>>>>>>>>> > >>>>>>>>>> https://github.com/google/bbr/blob/master/Documentation/bbr-faq.md#how-can-i-monitor-linux-tcp-bbr-connections > >>>>>>> > >>>>>>> Hello Neal, > >>>>>>> > >>>>>>> Here is the ss output from when the problem was happening. I think you can ignore the non-127.0.0.1 > >>>>>>> connections, but I left them in just in case it is somehow helpful. > >>>>>>> > >>>>>>> In addition, the pcap capture file is uploaded here: > >>>>>>> > >>>>>>> http://www.candelatech.com/downloads/trace-lo-4004.pcap > >>>>>>> > >>>>>>> The problem was happening in this time frame: > >>>>>>> > >>>>>>> [root@ct523c-0bdd ~]# date > >>>>>>> Thu 06 Jan 2022 10:14:49 AM PST > >>>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004' > >>>>>>> State Recv-Q Send-Q Local Address:Port Peer Address:Port > >>>>>>> > >>>>>>> ESTAB 0 222024 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min23sec,9) > >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2232,w227144,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483 > >>>>>>> advmss:65483 cwnd:10 bytes_sent:36810035 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20134 segs_in:17497 data_segs_out:11969 > >>>>>>> data_segs_in:16642 send 6049237875bps lastsnd:3266 lastrcv:125252 lastack:125263 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863 > >>>>>>> app_limited busy:275880ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222024 minrtt:0.013 > >>>>>>> ESTAB 0 0 192.168.200.34:4004 192.168.200.34:16906 > >>>>>>> skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 > >>>>>>> advmss:65483 > >>>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377 > >>>>>>> data_segs_in:2330 send 1823271222bps lastsnd:125253 lastrcv:125250 lastack:125251 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372 > >>>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003 > >>>>>>> ESTAB 0 139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min52sec,2) > >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535 > >>>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642 > >>>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:7465 lastrcv:125250 lastack:125253 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357 > >>>>>>> busy:271236ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004 > >>>>>>> ESTAB 0 460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min23sec,9) > >>>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535 > >>>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504 > >>>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:19753 lastrcv:158000 lastack:125250 pacing_rate 854817384bps delivery_rate 115645432bps > >>>>>>> delivered:2355 busy:200993ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003 > >>>>>>> ESTAB 0 147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min46sec,9) > >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016 > >>>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409 > >>>>>>> data_segs_in:2371 send 459529825bps lastsnd:7465 lastrcv:125253 lastack:125250 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 > >>>>>>> app_limited > >>>>>>> busy:185315ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003 > >>>>>>> ESTAB 0 3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,7.639ms,8) > >>>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:8 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535 > >>>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312422779 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18944 > >>>>>>> segs_in:20021 data_segs_out:18090 data_segs_in:11862 send 394446201bps lastsnd:56617 lastrcv:125271 lastack:125252 pacing_rate 709983112bps delivery_rate > >>>>>>> 104772800000bps delivered:16643 app_limited busy:370468ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268 > >>>>>>> notsent:3928980 minrtt:0.003 > >>>>>>> [root@ct523c-0bdd ~]# date > >>>>>>> Thu 06 Jan 2022 10:14:57 AM PST > >>>>>>> [root@ct523c-0bdd ~]# ss -tinmo 'dport = :4004 or sport = :4004' > >>>>>>> State Recv-Q Send-Q Local Address:Port Peer Address:Port > >>>>>>> > >>>>>>> ESTAB 0 222208 127.0.0.1:57224 127.0.0.1:4004 timer:(persist,1min11sec,9) > >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f2048,w227328,o0,bl0,d0) ts sack reno wscale:10,4 rto:201 backoff:9 rtt:0.866/0.944 ato:40 mss:65483 pmtu:65535 rcvmss:65483 > >>>>>>> advmss:65483 cwnd:10 bytes_sent:36941001 bytes_retrans:22025 bytes_acked:31729223 bytes_received:228063971 segs_out:20136 segs_in:17497 data_segs_out:11971 > >>>>>>> data_segs_in:16642 send 6049237875bps lastsnd:2663 lastrcv:136933 lastack:136944 pacing_rate 12093239064bps delivery_rate 130966000000bps delivered:11863 > >>>>>>> app_limited busy:287561ms rwnd_limited:21ms(0.0%) retrans:0/2 dsack_dups:2 rcv_rtt:0.671 rcv_space:1793073 rcv_ssthresh:934517 notsent:222208 minrtt:0.013 > >>>>>>> ESTAB 0 0 192.168.200.34:4004 192.168.200.34:16906 > >>>>>>> skmem:(r0,rb19521831,t0,tb2626560,f0,w0,o0,bl0,d0) ts sack reno wscale:10,10 rto:201 rtt:0.483/0.64 ato:40 mss:22016 pmtu:65535 rcvmss:65483 > >>>>>>> advmss:65483 > >>>>>>> cwnd:5 ssthresh:5 bytes_sent:8175956 bytes_retrans:460 bytes_acked:8174668 bytes_received:20820708 segs_out:3635 segs_in:2491 data_segs_out:2377 > >>>>>>> data_segs_in:2330 send 1823271222bps lastsnd:136934 lastrcv:136931 lastack:136932 pacing_rate 2185097952bps delivery_rate 70451200000bps delivered:2372 > >>>>>>> busy:14988ms rwnd_limited:1ms(0.0%) retrans:0/5 rcv_rtt:1.216 rcv_space:779351 rcv_ssthresh:9759798 minrtt:0.003 > >>>>>>> ESTAB 0 139656 192.168.200.34:16908 192.168.200.34:4004 timer:(persist,1min40sec,2) > >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f3960,w143496,o0,bl0,d0) ts sack reno wscale:10,10 rto:37397 backoff:2 rtt:4182.62/8303.35 ato:40 mss:65483 pmtu:65535 > >>>>>>> rcvmss:22016 advmss:65483 cwnd:10 bytes_sent:22351275 bytes_retrans:397320 bytes_acked:20703982 bytes_received:7815946 segs_out:2585 segs_in:3642 > >>>>>>> data_segs_out:2437 data_segs_in:2355 send 1252479bps lastsnd:19146 lastrcv:136931 lastack:136934 pacing_rate 2504952bps delivery_rate 15992bps delivered:2357 > >>>>>>> busy:282917ms retrans:0/19 rcv_rtt:0.004 rcv_space:288293 rcv_ssthresh:43690 notsent:139656 minrtt:0.004 > >>>>>>> ESTAB 0 460 192.168.200.34:4004 192.168.200.34:16908 timer:(on,1min11sec,9) > >>>>>>> skmem:(r0,rb9433368,t0,tb2626560,f2356,w1740,o0,bl0,d0) ts sack reno wscale:10,10 rto:102912 backoff:9 rtt:0.741/1.167 ato:40 mss:22016 pmtu:65535 > >>>>>>> rcvmss:65483 advmss:65483 cwnd:1 ssthresh:2 bytes_sent:7850211 bytes_retrans:33437 bytes_acked:7815486 bytes_received:20703981 segs_out:3672 segs_in:2504 > >>>>>>> data_segs_out:2380 data_segs_in:2356 send 237689609bps lastsnd:31434 lastrcv:169681 lastack:136931 pacing_rate 854817384bps delivery_rate 115645432bps > >>>>>>> delivered:2355 busy:212674ms unacked:1 retrans:0/24 lost:1 rcv_rtt:1.439 rcv_space:385874 rcv_ssthresh:4715943 minrtt:0.003 > >>>>>>> ESTAB 0 147205 192.168.200.34:16906 192.168.200.34:4004 timer:(persist,1min35sec,9) > >>>>>>> skmem:(r0,rb2000000,t0,tb2000000,f507,w151045,o0,bl0,d0) ts sack reno wscale:10,10 rto:223 backoff:9 rtt:11.4/18.962 ato:40 mss:65483 pmtu:65535 rcvmss:22016 > >>>>>>> advmss:65483 cwnd:10 bytes_sent:23635760 bytes_retrans:220124 bytes_acked:20820709 bytes_received:8174668 segs_out:2570 segs_in:3625 data_segs_out:2409 > >>>>>>> data_segs_in:2371 send 459529825bps lastsnd:19146 lastrcv:136934 lastack:136931 pacing_rate 918999184bps delivery_rate 43655333328bps delivered:2331 > >>>>>>> app_limited > >>>>>>> busy:196996ms retrans:0/14 rcv_rtt:0.005 rcv_space:220160 rcv_ssthresh:43690 notsent:147205 minrtt:0.003 > >>>>>>> ESTAB 0 3928980 127.0.0.1:4004 127.0.0.1:57224 timer:(persist,1min57sec,9) > >>>>>>> skmem:(r0,rb50000000,t0,tb3939840,f108,w4005780,o0,bl0,d3) ts sack reno wscale:4,10 rto:251 backoff:9 rtt:13.281/25.84 ato:40 mss:65483 pmtu:65535 > >>>>>>> rcvmss:65483 advmss:65483 cwnd:10 ssthresh:10 bytes_sent:312488262 bytes_retrans:245567 bytes_acked:228063971 bytes_received:31729222 segs_out:18945 > >>>>>>> segs_in:20021 data_segs_out:18091 data_segs_in:11862 send 394446201bps lastsnd:2762 lastrcv:136952 lastack:136933 pacing_rate 709983112bps delivery_rate > >>>>>>> 104772800000bps delivered:16643 app_limited busy:382149ms rwnd_limited:127ms(0.0%) retrans:0/26 rcv_rtt:7666.22 rcv_space:2279928 rcv_ssthresh:24999268 > >>>>>>> notsent:3928980 minrtt:0.003 > >>>>>>> [root@ct523c-0bdd ~]# > >>>>>>> > >>>>>>> > >>>>>>> We can reproduce this readily at current, and I'm happy to try patches and/or do more debugging. We also tried with a 5.12 kernel, > >>>>>>> and saw same problems, but in all cases, we have local patches applied, and there is no way for us to do this test without > >>>>>>> at least a fair bit of local patches applied. > >>>>>> > >>>>>> Thanks for the ss traces and tcpdump output! The tcpdump traces are > >>>>>> nice, in that they start before the connection starts, so capture the > >>>>>> SYN and its critical options like wscale. > >>>>>> > >>>>>> From the "timer:(persist" in the ss output, it seems the stalls (that > >>>>>> are preventing the send buffers from being transmitted) are caused by > >>>>>> a 0-byte receive window causing the senders to stop sending, and > >>>>>> periodically fire the ICSK_TIME_PROBE0 timer to check for an open > >>>>>> receive window. From "backoff:9" it seems this condition has lasted > >>>>>> for a very long exponential backoff process. > >>>>>> > >>>>>> I don't see 0-byte receive window problems in the trace, but this is > >>>>>> probably because the tcpdump traces only last through 10:12:47 PST, > >>>>>> and the problem is showing up in ss at 10:14:49 AM PST and later. > >>>>>> > >>>>>> Is it possible to reproduce the problem again, and this time let the > >>>>>> tcpdump traces run all the way through the period where the > >>>>>> connections freeze and you grab the "ss" output? > >>>>>> > >>>>>> You may also have to explicitly kill the tcpdump. Perhaps the tail of > >>>>>> the trace was buffered in tcpdump's output buffer and not flushed to > >>>>>> disk. A "killall tcpdump" should do the trick to force it to cleanly > >>>>>> flush everything. > >>>>> > >>>>> Here is another set of debugging, I made sure tcpdump ran the entire time, > >>>>> as well as the ss monitoring script. > >>>>> > >>>>> http://www.candelatech.com/downloads/ss_log.txt > >>>>> http://www.candelatech.com/downloads/trace-lo-4004-b.pcap > >>>>> > >>>>> In addition, here are logs from my tool with msec timestamps. It is detecting > >>>>> communication failure and logging about it. Interestingly, I think it recovered > >>>>> after one long timeout, but in the end, it went past the 2-minute cutoff mark > >>>>> where my program will close the TCP connection and restart things. > >>>>> > >>>>> 1641506767983: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 34458ms, sending req for update, read-isset: 0 > >>>>> 1641506773839: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 40314ms, sending req for update, read-isset: 0 > >>>>> 1641506780563: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 47038ms, sending req for update, read-isset: 0 > >>>>> 1641506786567: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 53041ms, sending req for update, read-isset: 0 > >>>>> 1641506823537: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 34949ms, sending req for update, read-isset: 0 > >>>>> 1641506829280: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 40692ms, sending req for update, read-isset: 0 > >>>>> 1641506834878: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 46289ms, sending req for update, read-isset: 0 > >>>>> 1641506840778: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 52189ms, sending req for update, read-isset: 0 > >>>>> 1641506846786: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 58198ms, sending req for update, read-isset: 0 > >>>>> 1641506852746: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 64158ms, sending req for update, read-isset: 0 > >>>>> 1641506858280: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 69692ms, sending req for update, read-isset: 0 > >>>>> 1641506864200: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 75612ms, sending req for update, read-isset: 0 > >>>>> 1641506870556: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 81968ms, sending req for update, read-isset: 0 > >>>>> 1641506876564: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 87976ms, sending req for update, read-isset: 0 > >>>>> 1641506882774: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 94185ms, sending req for update, read-isset: 0 > >>>>> > >>>>> # Recovered between here and above it seems. > >>>>> > >>>>> 1641507005029: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 35840ms, sending req for update, read-isset: 0 > >>>>> 1641507035759: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 30164ms, sending req for update, read-isset: 0 > >>>>> 1641507042161: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 36565ms, sending req for update, read-isset: 0 > >>>>> 1641507048397: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 42802ms, sending req for update, read-isset: 0 > >>>>> 1641507054491: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 48896ms, sending req for update, read-isset: 0 > >>>>> 1641507060748: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 55153ms, sending req for update, read-isset: 0 > >>>>> 1641507067083: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 61488ms, sending req for update, read-isset: 0 > >>>>> 1641507073438: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 67842ms, sending req for update, read-isset: 0 > >>>>> 1641507079638: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 74042ms, sending req for update, read-isset: 0 > >>>>> 1641507085926: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 80330ms, sending req for update, read-isset: 0 > >>>>> 1641507091788: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 86192ms, sending req for update, read-isset: 0 > >>>>> 1641507098042: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 92447ms, sending req for update, read-isset: 0 > >>>>> 1641507104283: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 98687ms, sending req for update, read-isset: 0 > >>>>> 1641507110466: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 104871ms, sending req for update, read-isset: 0 > >>>>> 1641507116381: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 110786ms, sending req for update, read-isset: 0 > >>>>> 1641507123034: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 117439ms, sending req for update, read-isset: 0 > >>>>> 1641507128975: Card.cc 801: WARNING: Card: Shelf: 1, Card: 1 has not received communication in: 123379ms, sending req for update, read-isset: 0 > >>>> > >>>> Hello Neal, > >>>> > >>>> Do the new captures help any? > >>>> > >>>> From my own looking at things, it seems that the sniffer fails to get frames near when the problem > >>>> starts happening. I am baffled as to how that can happen, especially since it seems to stop getting > >>>> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back > >>>> related connections to the same IP port). > >>>> > >>>> And, if I interpret the ss output properly, after the problem happens, the sockets still think they are > >>>> sending data. I didn't check closely enough to see if the peer side thought it received it. > >>>> > >>>> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that. > >>>> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my > >>>> test team (I did not debug it in detail). > >>>> > >>>> Do you have any suggestions for how to debug this further? I am happy to hack stuff into the > >>>> kernel if you have some suggested places to add debugging... > >>>> > >>>> Thanks, > >>>> Ben > >>>> > >>> These 2 packets (or lack of packets between them) are suspicious. > >>> > >>> 14:06:28.294557 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [.], seq 226598466:226663949, ack 31270804, win 24414, options [nop,nop,TS val 3325283657 ecr > >>> 3325283657], length 65483 > >>> > >>> 14:08:04.143548 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 31270804:31336287, ack 226663949, win 58408, options [nop,nop,TS val 3325379506 ecr > >>> 3325283657], length 65483 > >>> > >>> > >>> It looks like ACK for first packet is not sent, and packet is not retransmitted. > >>> > >>> This could be a bug in conntrack (dropping rtx packets over loopback so pcap do not show the attempts), > >>> > >>> or persistent memory allocation errors in __tcp_send_ack() > >>> > >>> > >>> Or this could be that the missing ACK packet was dropped by tcpdump. > >>> > >>> Another suspicious sequence is: > >>> > >>> 14:09:29.159816 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [P.], seq 32608288:32609208, ack 265640849, win 58408, options [nop,nop,TS val 3325464522 ecr > >>> 3325464521], length 920 > >>> > >>> 14:10:05.487756 IP 127.0.0.1.57234 > 127.0.0.1.4004: Flags [.], seq 32609208:32674691, ack 265640849, win 58408, options [nop,nop,TS val 3325500850 ecr > >>> 3325464521], length 65483 > >>> > >>> 14:10:05.487762 IP 127.0.0.1.4004 > 127.0.0.1.57234: Flags [P.], seq 265640849:265706332, ack 32674691, win 24382, options [nop,nop,TS val 3325500850 ecr > >>> 3325500850], length 65483 > >>> > >>> Same pattern here, missing ACK for 1st packet, and no retransmits. > >> > >> Thanks for your insight as well. > >> > >> I have not poked in the tcp stack much, certainly not recently. Do you have any suggestions > >> for checking if conntrack is at fault? For instance, could I run a command that would clear > >> all contrack entries and then see if it re-built them properly and recovered? > >> > >> Or something like the 'ss' tool that would show enough debug to find problematic conntrack > >> entries? > >> > >> Or, if nothing else, I was planning to try to add printk in the packet tx path, trying to match only for > >> frames to or from port 4004 and to/from 127.0.0.1. > >> > >> Two captures in a row showed that packet capture stopped near time of the problem starting, so > >> I think it is probably not just some random packet-loss issue with the capture, but rather > >> a real symptom of the problem. > >> > > > > Just to clarify: > > > > Have you any qdisc on lo interface ? > > > > Can you try: > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index 5079832af5c1090917a8fd5dfb1a3025e2d85ae0..81a26ce4d79fd48f870b5c1d076a9082950e2a57 > > 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -2769,6 +2769,7 @@ bool tcp_schedule_loss_probe(struct sock *sk, > > bool advancing_rto) > > static bool skb_still_in_host_queue(struct sock *sk, > > const struct sk_buff *skb) > > { > > +#if 0 > > if (unlikely(skb_fclone_busy(sk, skb))) { > > set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags); > > smp_mb__after_atomic(); > > @@ -2778,6 +2779,7 @@ static bool skb_still_in_host_queue(struct sock *sk, > > return true; > > } > > } > > +#endif > > return false; > > } > > > > I will try that today. > > I don't think I have qdisc on lo: > > # tc qdisc show|grep 'dev lo' > qdisc noqueue 0: dev lo root refcnt 2 Great, I wanted to make sure you were not hitting some bug there (pfifo_fast has been buggy for many kernel versions) > > The eth ports are using fq_codel, and I guess they are using mq as well. > > We moved one of the processes off of the problematic machine so that it communicates over > Ethernet instead of 'lo', and problem seems to have gone away. But, that also > changes system load, so it could be coincidence. > > Also, conntrack -L showed nothing on a machine with simpler config where the two problematic processes > are talking over 'lo'. The machine that shows problem does have a lot of conntrack entries because it > is also doing some NAT for other data connections, but I don't think this should affect the 127.0.0.1 traffic. > There is a decent chance I mis-understand your comment about conntrack though... This was a wild guess. Honestly, I do not have a smoking gun yet. > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com