All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Ben Greear <greearb@candelatech.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	netdev <netdev@vger.kernel.org>
Subject: Re: 5.15-rc3+ crash in fq-codel?
Date: Tue, 28 Sep 2021 16:25:39 -0700	[thread overview]
Message-ID: <7f1d67f1-3a2c-2e74-bb86-c02a56370526@gmail.com> (raw)
In-Reply-To: <7e87883e-42f5-2341-ab67-9f1614fb8b86@candelatech.com>



On 9/28/21 3:00 PM, Ben Greear wrote:
> On 9/27/21 5:16 PM, Ben Greear wrote:
>> On 9/27/21 5:04 PM, Ben Greear wrote:
>>> On 9/27/21 4:49 PM, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 9/27/21 4:30 PM, Ben Greear wrote:
>>>>> Hello,
>>>>>
>>>>> In a hacked upon kernel, I'm getting crashes in fq-codel when doing bi-directional
>>>>> pktgen traffic on top of mac-vlans.  Unfortunately for me, I've made big changes to
>>>>> pktgen so I cannot easily run this test on stock kernels, and there is some chance
>>>>> some of my hackings have caused this issue.
>>>>>
>>>>> But, in case others have seen similar, please let me know.  I shall go digging
>>>>> in the meantime...
>>>>>
>>>>> Looks to me like 'skb' is NULL in line 120 below.
>>>>
>>>>
>>>> pktgen must not be used in a mode where a single skb
>>>> is cloned and reused, if packet needs to be stored in a qdisc.
>>>>
>>>> qdisc of all sorts assume skb->next/prev can be used as
>>>> anchor in their list.
>>>>
>>>> If the same skb is queued multiple times, lists are corrupted.
>>>>
>>>> Please double check your clone_skb pktgen setup.
>>>>
>>>> I thought we had IFF_TX_SKB_SHARING for this, and that macvlan was properly clearing this bit.
>>>
>>> My pktgen config was not using any duplicated queueing in this case.
>>>
>>> I changed to pfifo fast and so far it is stable for ~10 minutes, where before it would crash
>>> within a minute.  I'll let it bake overnight....
>>
>> Still running stable.  I also notice we have been using fq-codel for a while and haven't noticed
>> this problem (next most recent kernel we might have run similar test on would be 5.13-ish).
>>
>> I'll duplicate this test on our older kernels tomorrow to see if it looks like a regression or
>> if we just haven't actually done this exact test in a while...
> 
> We can reproduce this crash as far back as 5.4 using fq-codel, with our pktgen driving mac-vlans.
> We did not try any kernels older than 5.4.
> We cannot reproduce with pfifo on 5.15-rc3 on an overnight run.
> We cannot produce with user-space UDP traffic on any kernel/qdisc combination.
> Our pktgen is configured for multi-skb of 0 (no multiple submits of the same skb)
> 
> While looking briefly at fq-codel, I didn't notice any locking in the code that crashed.
> Any chance that it makes assumptions that would be incorrect with pktgen running multiple
> threads (one thread per mac-vlan) on top of a single qdisc belonging to the underlying NIC?
> 


qdisc are protected by a qdisc spinlock.

fq-codel does not have to lock anything in its enqueue() and dequeue() methods.

I guess your local changes to pktgen might be to blame.

pfifo is much simpler than fq-codel, it uses less fields from skb.

  reply	other threads:[~2021-09-28 23:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27 23:30 5.15-rc3+ crash in fq-codel? Ben Greear
2021-09-27 23:49 ` Eric Dumazet
2021-09-28  0:04   ` Ben Greear
2021-09-28  0:16     ` Ben Greear
2021-09-28 22:00       ` Ben Greear
2021-09-28 23:25         ` Eric Dumazet [this message]
2021-09-29 19:07           ` Ben Greear
2021-09-29 23:21             ` Eric Dumazet
2021-09-29 23:28               ` Eric Dumazet
2021-09-29 23:42                 ` Eric Dumazet
2021-09-29 23:48                   ` Ben Greear
2021-09-30  0:04                     ` Ben Greear
2021-09-30  0:29                       ` Eric Dumazet
2021-09-30  0:40                         ` Eric Dumazet
2021-09-30  1:36                           ` Ben Greear
2021-09-30 16:44                             ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f1d67f1-3a2c-2e74-bb86-c02a56370526@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=greearb@candelatech.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.