All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Vishwanath Pai <vpai@akamai.com>, Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Hunt, Joshua" <johunt@akamai.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>, David Miller <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linuxarm@huawei.com" <linuxarm@huawei.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"dsahern@gmail.com" <dsahern@gmail.com>,
	Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Subject: Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
Date: Mon, 2 Nov 2020 17:08:13 +0800	[thread overview]
Message-ID: <ab88bf4e-e022-dafe-4150-7314bf70c817@huawei.com> (raw)
In-Reply-To: <cd4b2482-c3dc-fba6-6287-1218dc4bed6e@akamai.com>

On 2020/10/30 1:20, Vishwanath Pai wrote:
> On 10/29/20 6:24 AM, Yunsheng Lin wrote:
>> On 2020/10/29 12:50, Vishwanath Pai wrote:
>>> On 10/28/20 10:37 PM, Yunsheng Lin wrote:
>>>> On 2020/10/29 4:04, Vishwanath Pai wrote:
>>>>> On 10/28/20 1:47 PM, Cong Wang wrote:
>>>>>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath <vpai@akamai.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We noticed some problems when testing the latest 5.4 LTS kernel and traced it
>>>>>>> back to this commit using git bisect. When running our tests the machine stops
>>>>>>> responding to all traffic and the only way to recover is a reboot. I do not see
>>>>>>> a stack trace on the console.
>>>>>>
>>>>>> Do you mean the machine is still running fine just the network is down?
>>>>>>
>>>>>> If so, can you dump your tc config with stats when the problem is happening?
>>>>>> (You can use `tc -s -d qd show ...`.)
>>>>>>
>>>>>>>
>>>>>>> This can be reproduced using the packetdrill test below, it should be run a
>>>>>>> few times or in a loop. You should hit this issue within a few tries but
>>>>>>> sometimes might take up to 15-20 tries.
>>>>>> ...
>>>>>>> I can reproduce the issue easily on v5.4.68, and after reverting this commit it
>>>>>>> does not happen anymore.
>>>>>>
>>>>>> This is odd. The patch in this thread touches netdev reset path, if packetdrill
>>>>>> is the only thing you use to trigger the bug (that is netdev is always active),
>>>>>> I can not connect them.
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>> Hi Cong,
>>>>>
>>>>>> Do you mean the machine is still running fine just the network is down?
>>>>>
>>>>> I was able to access the machine via serial console, it looks like it is
>>>>> up and running, just that networking is down.
>>>>>
>>>>>> If so, can you dump your tc config with stats when the problem is happening?
>>>>>> (You can use `tc -s -d qd show ...`.)
>>>>>
>>>>> If I try running tc when the machine is in this state the command never
>>>>> returns. It doesn't print anything but doesn't exit either.
>>>>>
>>>>>> This is odd. The patch in this thread touches netdev reset path, if packetdrill
>>>>>> is the only thing you use to trigger the bug (that is netdev is always active),
>>>>>> I can not connect them.
>>>>>
>>>>> I think packetdrill creates a tun0 interface when it starts the
>>>>> test and tears it down at the end, so it might be hitting this code path
>>>>> during teardown.
>>>>
>>>> Hi, Is there any preparation setup before running the above packetdrill test
>>>> case, I run the above test case in 5.9-rc4 with this patch applied without any
>>>> preparation setup, did not reproduce it.
>>>>
>>>> By the way, I am newbie to packetdrill:), it would be good to provide the
>>>> detail setup to reproduce it,thanks.
>>>>
>>>>>
>>>>> P.S: My mail server is having connectivity issues with vger.kernel.org
>>>>> so messages aren't getting delivered to netdev. It'll hopefully get
>>>>> resolved soon.
>>>>>
>>>>> Thanks,
>>>>> Vishwanath
>>>>>
>>>>>
>>>>> .
>>>>>
>>>
>>> I can't reproduce it on v5.9-rc4 either, it is probably an issue only on
>>> 5.4 then (and maybe older LTS versions). Can you give it a try on
>>> 5.4.68?
>>>
>>> For running packetdrill, download the latest version from their github
>>> repo, then run it in a loop without any special arguments. This is what
>>> I do to reproduce it:
>>>
>>> while true; do ./packetdrill <test-file>; done
>>>
>>> I don't think any other setup is necessary.
>>
>> Hi, run the above test for above an hour using 5.4.68, still did not
>> reproduce it, as below:
>>
>>
>> root@(none)$ cd /home/root/
>> root@(none)$ ls
>> creat_vlan.sh  packetdrill    test.pd
>> root@(none)$ cat test.pd
>> 0 `echo 4 > /proc/sys/net/ipv4/tcp_min_tso_segs`
>>
>> 0.400 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>> 0.400 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>>
>> // set maxseg to 1000 to work with both ipv4 and ipv6
>> 0.500 setsockopt(3, SOL_TCP, TCP_MAXSEG, [1000], 4) = 0
>> 0.500 bind(3, ..., ...) = 0
>> 0.500 listen(3, 1) = 0
>>
>> // Establish connection
>> 0.600 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 5>
>> 0.600 > S. 0:0(0) ack 1 <...>
>>
>> 0.800 < . 1:1(0) ack 1 win 320
>> 0.800 accept(3, ..., ...) = 4
>>
>> // Send 4 data segments.
>> +0 write(4, ..., 4000) = 4000
>> +0 > P. 1:4001(4000) ack 1
>>
>> // Receive a SACK
>> +.1 < . 1:1(0) ack 1 win 320 <sack 1001:2001,nop,nop>
>>
>> +.3 %{ print "TCP CA state: ",tcpi_ca_state  }%
>> root@(none)$ cat creat_vlan.sh
>> #!/bin/sh
>>
>> for((i=0; i<10000; i++))
>> do
>>     ./packetdrill test.pd
>> done
>> root@(none)$ ./creat_vlan.sh
>> TCP CA state:  3
>> ^C
>> root@(none)$ ifconfig
>> eth0      Link encap:Ethernet  HWaddr 5c:e8:83:0d:f7:ed
>>           inet addr:192.168.1.93  Bcast:192.168.1.255 Mask:255.255.255.0
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:3570 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:3190 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:1076349 (1.0 MiB)  TX bytes:414874 (405.1 KiB)
>>
>> eth2      Link encap:Ethernet  HWaddr 5c:e8:83:0d:f7:ec
>>           inet addr:192.168.100.1  Bcast:192.168.100.255 Mask:255.255.255.0
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:81848576 errors:0 dropped:0 overruns:0 frame:78
>>           TX packets:72497816 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:2044282289568 (1.8 TiB)  TX bytes:2457441698852 (2.2 TiB)
>>
>> lo        Link encap:Local Loopback
>>           inet addr:127.0.0.1  Mask:255.0.0.0
>>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>>           RX packets:1 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:1 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:68 (68.0 B)  TX bytes:68 (68.0 B)
>>
>> root@(none)$ ./creat_vlan.sh
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> TCP CA state:  3
>> ^C
>> root@(none)$ cat /proc/cmdline
>> BOOT_IMAGE=/linyunsheng/Image.5.0 rdinit=/init console=ttyAMA0,115200 earlycon=pl011,mmio32,0x94080000 iommu.strict=1
>> root@(none)$ cat /proc/version
>> Linux version 5.4.68 (linyunsheng@ubuntu) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)) #1 SMP PREEMPT Thu Oct 29 16:59:37 CST 2020
>> root@(none)$
>>
>>
>>
>>>
>>> -Vishwanath
>>>
>>> .
>>>
> I couldn't get it to reproduce on a ubuntu VM, maybe something is
> different with the way we setup our machines. We do have some scripts in
> /etc/network/{if-up.d,if-post-down.d} etc, or probably something else.
> I'll let you know when I can reliably reproduce it on the VM.

Hi, Vishwanath
    Please see if the patch in the below link fix your problem, thanks.
https://www.spinics.net/lists/netdev/msg695908.html

> 
> .
> 

  reply	other threads:[~2020-11-02  9:08 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-08 11:02 [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc Yunsheng Lin
2020-09-10 19:39 ` David Miller
2020-09-10 20:07 ` Cong Wang
2020-09-11  8:13   ` Yunsheng Lin
2020-09-11  8:25     ` Yunsheng Lin
2020-09-17 19:26     ` Cong Wang
     [not found]       ` <CAP12E-+3DY-dgzVercKc-NYGPExWO1NjTOr1Gf3tPLKvp6O6+g@mail.gmail.com>
2020-10-28 15:37         ` Pai, Vishwanath
2020-10-28 17:47           ` Cong Wang
2020-10-28 20:04             ` Vishwanath Pai
2020-10-29  2:37               ` Yunsheng Lin
2020-10-29  4:50                 ` Vishwanath Pai
2020-10-29 10:24                   ` Yunsheng Lin
2020-10-29 17:20                     ` Vishwanath Pai
2020-11-02  9:08                       ` Yunsheng Lin [this message]
2020-11-02 18:23                         ` Vishwanath Pai
2020-10-28 17:46       ` Vishwanath Pai
2020-10-29  2:52       ` Yunsheng Lin
2020-10-29 19:05         ` Cong Wang
2020-10-30  7:37           ` Yunsheng Lin
2020-11-02 16:55             ` Cong Wang
2020-11-03  7:24               ` Yunsheng Lin
2020-11-05  6:04                 ` Cong Wang
2020-11-05  6:16                   ` Cong Wang
2020-11-05  6:32                     ` Yunsheng Lin
2020-11-05  6:22                   ` Yunsheng Lin
2020-09-09  4:07 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab88bf4e-e022-dafe-4150-7314bf70c817@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=Joakim.Tjernlund@infinera.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=vpai@akamai.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.