netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia.com>
To: Peilin Ye <yepeilin.cs@gmail.com>, Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Pedro Tammela <pctammela@mojatatu.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Peilin Ye <peilin.ye@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"John Fastabend" <john.fastabend@gmail.com>,
	Hillf Danton <hdanton@sina.com>, <netdev@vger.kernel.org>,
	Cong Wang <cong.wang@bytedance.com>
Subject: Re: [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting
Date: Mon, 29 May 2023 14:50:26 +0300	[thread overview]
Message-ID: <87jzwrxrz8.fsf@nvidia.com> (raw)
In-Reply-To: <CAM0EoM=xLkAr5EF7bty+ETmZ3GXnmB9De3fYSCrQjKPb8qDy7Q@mail.gmail.com>

On Sun 28 May 2023 at 14:54, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On Sat, May 27, 2023 at 4:23 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
>>
>> Hi Jakub and all,
>>
>> On Fri, May 26, 2023 at 07:33:24PM -0700, Jakub Kicinski wrote:
>> > On Fri, 26 May 2023 16:09:51 -0700 Peilin Ye wrote:
>> > > Thanks a lot, I'll get right on it.
>> >
>> > Any insights? Is it just a live-lock inherent to the retry scheme
>> > or we actually forget to release the lock/refcnt?
>>
>> I think it's just a thread holding the RTNL mutex for too long (replaying
>> too many times).  We could replay for arbitrary times in
>> tc_{modify,get}_qdisc() if the user keeps sending RTNL-unlocked filter
>> requests for the old Qdisc.

After looking very carefully at the code I think I know what the issue
might be:

   Task 1 graft Qdisc   Task 2 new filter
           +                    +
           |                    |
           v                    v
        rtnl_lock()       take  q->refcnt
           +                    +
           |                    |
           v                    v
Spin while q->refcnt!=1   Block on rtnl_lock() indefinitely due to -EAGAIN

This will cause a real deadlock with the proposed patch. I'll try to
come up with a better approach. Sorry for not seeing it earlier.

>>
>> I tested the new reproducer Pedro posted, on:
>>
>> 1. All 6 v5 patches, FWIW, which caused a similar hang as Pedro reported
>>
>> 2. First 5 v5 patches, plus patch 6 in v1 (no replaying), did not trigger
>>    any issues (in about 30 minutes).
>>
>> 3. All 6 v5 patches, plus this diff:
>>
>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>> index 286b7c58f5b9..988718ba5abe 100644
>> --- a/net/sched/sch_api.c
>> +++ b/net/sched/sch_api.c
>> @@ -1090,8 +1090,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
>>                          * RTNL-unlocked filter request(s).  This is the counterpart of that
>>                          * qdisc_refcount_inc_nz() call in __tcf_qdisc_find().
>>                          */
>> -                       if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping))
>> +                       if (!qdisc_refcount_dec_if_one(dev_queue->qdisc_sleeping)) {
>> +                               rtnl_unlock();
>> +                               rtnl_lock();
>>                                 return -EAGAIN;
>> +                       }
>>                 }
>>
>>                 if (dev->flags & IFF_UP)
>>
>>    Did not trigger any issues (in about 30 mintues) either.
>>
>> What would you suggest?
>
>
> I am more worried it is a wackamole situation. We fixed the first
> reproducer with essentially patches 1-4 but we opened a new one which
> the second reproducer catches. One thing the current reproducer does
> is create a lot rtnl contention in the beggining by creating all those
> devices and then after it is just creating/deleting qdisc and doing
> update with flower where such contention is reduced. i.e it may just
> take longer for the mole to pop up.
>
> Why dont we push the V1 patch in and then worry about getting clever
> with EAGAIN after? Can you test the V1 version with the repro Pedro
> posted? It shouldnt have these issues. Also it would be interesting to
> see how performance of the parallel updates to flower is affected.

This or at least push first 4 patches of this series. They target other
older commits and fix straightforward issues with the API.


  reply	other threads:[~2023-05-29 12:06 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24  1:16 [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Peilin Ye
2023-05-24  1:17 ` [PATCH v5 net 1/6] net/sched: sch_ingress: Only create under TC_H_INGRESS Peilin Ye
2023-05-24 15:37   ` Pedro Tammela
2023-05-24 15:57   ` Jamal Hadi Salim
2023-05-24  1:18 ` [PATCH v5 net 2/6] net/sched: sch_clsact: Only create under TC_H_CLSACT Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24 15:58     ` Jamal Hadi Salim
2023-05-24  1:19 ` [PATCH v5 net 3/6] net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24  1:19 ` [PATCH v5 net 4/6] net/sched: Prohibit regrafting ingress or clsact Qdiscs Peilin Ye
2023-05-24 15:38   ` Pedro Tammela
2023-05-24  1:20 ` [PATCH v5 net 5/6] net/sched: Refactor qdisc_graft() for ingress and " Peilin Ye
2023-05-24  1:20 ` [PATCH v5 net 6/6] net/sched: qdisc_destroy() old ingress and clsact Qdiscs before grafting Peilin Ye
2023-05-24 15:39   ` Pedro Tammela
2023-05-24 16:09     ` Jamal Hadi Salim
2023-05-25  9:25       ` Paolo Abeni
2023-05-26 12:19         ` Jamal Hadi Salim
2023-05-26 12:20     ` Jamal Hadi Salim
2023-05-26 19:47       ` Jamal Hadi Salim
2023-05-26 20:21         ` Pedro Tammela
2023-05-26 23:09           ` Peilin Ye
2023-05-27  2:33             ` Jakub Kicinski
2023-05-27  8:23               ` Peilin Ye
2023-05-28 18:54                 ` Jamal Hadi Salim
2023-05-29 11:50                   ` Vlad Buslov [this message]
2023-05-29 12:58                     ` Vlad Buslov
2023-05-30  1:03                       ` Jakub Kicinski
2023-05-30  9:11                       ` Peilin Ye
2023-05-30 12:18                         ` Vlad Buslov
2023-05-31  0:29                           ` Peilin Ye
2023-06-01  3:57                           ` Peilin Ye
2023-06-01  6:20                             ` Vlad Buslov
2023-06-07  0:57                               ` Peilin Ye
2023-06-07  8:18                                 ` Vlad Buslov
2023-06-08  1:08                                   ` Peilin Ye
2023-06-08  7:48                                     ` Vlad Buslov
2023-06-11  3:25                                       ` Peilin Ye
2023-06-08  0:39                               ` Peilin Ye
2023-06-08  9:17                                 ` Vlad Buslov
2023-06-10  0:20                                   ` Peilin Ye
2023-06-01 13:03                             ` Pedro Tammela
2023-06-07  4:25                               ` Peilin Ye
2023-05-29 13:55                     ` Jamal Hadi Salim
2023-05-29 19:14                       ` Peilin Ye
2023-05-25 17:16 ` [PATCH v5 net 0/6] net/sched: Fixes for sch_ingress and sch_clsact Vlad Buslov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jzwrxrz8.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=peilin.ye@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).