All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamal Hadi Salim <jhs@mojatatu.com>
To: Eric Dumazet <edumazet@google.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	jiri@resnulli.us,  xiyou.wangcong@gmail.com,
	netdev@vger.kernel.org, renmingshuai@huawei.com,
	 Victor Nogueira <victor@mojatatu.com>
Subject: Re: [PATCH RFC net 1/1] net/sched: Fix mirred to self recursion
Date: Mon, 15 Apr 2024 10:01:42 -0400	[thread overview]
Message-ID: <CAM0EoMmdp_ik6EA2q8vhr+gGh=OcxUkvBOsxPHFWjn1eDX_33Q@mail.gmail.com> (raw)
In-Reply-To: <CAM0EoMm+cqkY9tQC6+jpvLJrRxw43Gzffgw85Q3Fe2tBgA7k2Q@mail.gmail.com>

On Mon, Apr 15, 2024 at 9:59 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Mon, Apr 15, 2024 at 5:20 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, Apr 10, 2024 at 10:30 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > >
> > > On Tue, Apr 2, 2024 at 1:35 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > >
> > > > On Tue, Apr 2, 2024 at 12:47 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 11:58 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > > > >
> > > > > > On Wed, Mar 27, 2024 at 9:23 AM Eric Dumazet <edumazet@google.com> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 27, 2024 at 12:03 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > > > > > >
> > > > > > > > When the mirred action is used on a classful egress qdisc and a packet is
> > > > > > > > mirrored or redirected to self we hit a qdisc lock deadlock.
> > > > > > > > See trace below.
> > > > > > > >
> > > > > > > > [..... other info removed for brevity....]
> > > > > > > > [   82.890906]
> > > > > > > > [   82.890906] ============================================
> > > > > > > > [   82.890906] WARNING: possible recursive locking detected
> > > > > > > > [   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
> > > > > > > > [   82.890906] --------------------------------------------
> > > > > > > > [   82.890906] ping/418 is trying to acquire lock:
> > > > > > > > [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
> > > > > > > > __dev_queue_xmit+0x1778/0x3550
> > > > > > > > [   82.890906]
> > > > > > > > [   82.890906] but task is already holding lock:
> > > > > > > > [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
> > > > > > > > __dev_queue_xmit+0x1778/0x3550
> > > > > > > > [   82.890906]
> > > > > > > > [   82.890906] other info that might help us debug this:
> > > > > > > > [   82.890906]  Possible unsafe locking scenario:
> > > > > > > > [   82.890906]
> > > > > > > > [   82.890906]        CPU0
> > > > > > > > [   82.890906]        ----
> > > > > > > > [   82.890906]   lock(&sch->q.lock);
> > > > > > > > [   82.890906]   lock(&sch->q.lock);
> > > > > > > > [   82.890906]
> > > > > > > > [   82.890906]  *** DEADLOCK ***
> > > > > > > > [   82.890906]
> > > > > > > > [..... other info removed for brevity....]
> > > > > > > >
> > > > > > > > Example setup (eth0->eth0) to recreate
> > > > > > > > tc qdisc add dev eth0 root handle 1: htb default 30
> > > > > > > > tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
> > > > > > > >      action mirred egress redirect dev eth0
> > > > > > > >
> > > > > > > > Another example(eth0->eth1->eth0) to recreate
> > > > > > > > tc qdisc add dev eth0 root handle 1: htb default 30
> > > > > > > > tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
> > > > > > > >      action mirred egress redirect dev eth1
> > > > > > > >
> > > > > > > > tc qdisc add dev eth1 root handle 1: htb default 30
> > > > > > > > tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
> > > > > > > >      action mirred egress redirect dev eth0
> > > > > > > >
> > > > > > > > We fix this by adding a per-cpu, per-qdisc recursion counter which is
> > > > > > > > incremented the first time a root qdisc is entered and on a second attempt
> > > > > > > > enter the same root qdisc from the top, the packet is dropped to break the
> > > > > > > > loop.
> > > > > > > >
> > > > > > > > Reported-by: renmingshuai@huawei.com
> > > > > > > > Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
> > > > > > > > Fixes: 3bcb846ca4cf ("net: get rid of spin_trylock() in net_tx_action()")
> > > > > > > > Fixes: e578d9c02587 ("net: sched: use counter to break reclassify loops")
> > > > > > > > Co-developed-by: Victor Nogueira <victor@mojatatu.com>
> > > > > > > > Signed-off-by: Victor Nogueira <victor@mojatatu.com>
> > > > > > > > Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > > > > > ---
> > > > > > > >  include/net/sch_generic.h |  2 ++
> > > > > > > >  net/core/dev.c            |  9 +++++++++
> > > > > > > >  net/sched/sch_api.c       | 12 ++++++++++++
> > > > > > > >  net/sched/sch_generic.c   |  2 ++
> > > > > > > >  4 files changed, 25 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > > > > > > > index cefe0c4bdae3..f9f99df037ed 100644
> > > > > > > > --- a/include/net/sch_generic.h
> > > > > > > > +++ b/include/net/sch_generic.h
> > > > > > > > @@ -125,6 +125,8 @@ struct Qdisc {
> > > > > > > >         spinlock_t              busylock ____cacheline_aligned_in_smp;
> > > > > > > >         spinlock_t              seqlock;
> > > > > > > >
> > > > > > > > +       u16 __percpu            *xmit_recursion;
> > > > > > > > +
> > > > > > > >         struct rcu_head         rcu;
> > > > > > > >         netdevice_tracker       dev_tracker;
> > > > > > > >         /* private data */
> > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > > > > index 9a67003e49db..2b712388c06f 100644
> > > > > > > > --- a/net/core/dev.c
> > > > > > > > +++ b/net/core/dev.c
> > > > > > > > @@ -3789,6 +3789,13 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
> > > > > > > >         if (unlikely(contended))
> > > > > > > >                 spin_lock(&q->busylock);
> > > > > > >
> > > > > > > This could hang here (busylock)
> > > > > >
> > > > > > Notice the goto free_skb_list has an spin_unlock(&q->busylock);  in
> > > > > > its code vicinity. Am I missing something?
> > > > >
> > > > > The hang would happen in above spin_lock(&q->busylock), before you can
> > > > > get a chance...
> > > > >
> > > > > If you want to test your patch, add this debugging feature, pretending
> > > > > the spinlock is contended.
> > > > >
> > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > index 818699dea9d7040ee74532ccdebf01c4fd6887cc..b2fe3aa2716f0fe128ef10f9d06c2431b3246933
> > > > > 100644
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -3816,7 +3816,7 @@ static inline int __dev_xmit_skb(struct sk_buff
> > > > > *skb, struct Qdisc *q,
> > > > >          * sent after the qdisc owner is scheduled again. To prevent this
> > > > >          * scenario the task always serialize on the lock.
> > > > >          */
> > > > > -       contended = qdisc_is_running(q) || IS_ENABLED(CONFIG_PREEMPT_RT);
> > > > > +       contended = true; // DEBUG for Jamal
> > > > >         if (unlikely(contended))
> > > > >                 spin_lock(&q->busylock);
> > > >
> > > > Will do.
> > >
> > > Finally got time to look again. Probably being too clever, but moving
> > > the check before the contended check resolves it as well. The only
> > > strange thing is now with the latest net-next seems to be spitting
> > > some false positive lockdep splat for the test of A->B->A (i am sure
> > > it's fixable).
> > >
> > > See attached. Didnt try the other idea, see if you like this one.
> >
> > A spinlock can only be held by one cpu at a time, so recording the cpu
> > number of the lock owner should be
> > enough to avoid a deadlock.
> >
> > So I really do not understand your push for a per-cpu variable with
> > extra cache line misses.
> >
>
> I was asking earlier - do per-cpu (qdisc struct) variables incur extra
> cache misses on top of the cache miss that is incurred when accessing
> the qdisc struct? I think you have been saying yes, but not being
> explicit ;->
>
> > I think the following would work just fine ? What do you think ?
> >
>
> Yep, good compromise ;-> Small missing thing - the initialization. See attached.
> Do you want to send it or should we?

Sorry - shows Victor's name but this is your patch, so feel free if
you send to add your name as author.

cheers,
jamal

  reply	other threads:[~2024-04-15 14:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 23:03 [PATCH RFC net 1/1] net/sched: Fix mirred to self recursion Jamal Hadi Salim
2024-03-27 13:23 ` Eric Dumazet
2024-03-27 22:57   ` Jamal Hadi Salim
2024-03-27 23:12     ` Jamal Hadi Salim
2024-04-02  2:00       ` renmingshuai
2024-04-02 16:38         ` Jamal Hadi Salim
2024-04-02 16:47     ` Eric Dumazet
2024-04-02 17:35       ` Jamal Hadi Salim
2024-04-10 20:30         ` Jamal Hadi Salim
2024-04-15  9:20           ` Eric Dumazet
2024-04-15  9:29             ` Eric Dumazet
2024-04-15 13:59             ` Jamal Hadi Salim
2024-04-15 14:01               ` Jamal Hadi Salim [this message]
2024-04-15 14:11                 ` Eric Dumazet
2024-04-15 14:17                   ` Jamal Hadi Salim
2024-04-15 21:14                   ` Jamal Hadi Salim
2024-04-16  8:05                     ` Davide Caratti
2024-04-16  9:14                       ` Eric Dumazet
2024-04-16  9:28                         ` Davide Caratti
2024-04-16  9:37                           ` Eric Dumazet
2024-04-16 10:30                       ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM0EoMmdp_ik6EA2q8vhr+gGh=OcxUkvBOsxPHFWjn1eDX_33Q@mail.gmail.com' \
    --to=jhs@mojatatu.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=renmingshuai@huawei.com \
    --cc=victor@mojatatu.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.