All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	jiri@resnulli.us,  xiyou.wangcong@gmail.com,
	netdev@vger.kernel.org, renmingshuai@huawei.com,
	 Victor Nogueira <victor@mojatatu.com>
Subject: Re: [PATCH RFC net 1/1] net/sched: Fix mirred to self recursion
Date: Mon, 15 Apr 2024 11:20:27 +0200	[thread overview]
Message-ID: <CANn89iLmhaC8fuu4UpPdELOAapBzLv0+S50gr0Rs+J+=4+9j=g@mail.gmail.com> (raw)
In-Reply-To: <CAM0EoMk33ga5dh12ViZz8QeFwjwNQBvykM53VQo1B3BdfAZtaQ@mail.gmail.com>

On Wed, Apr 10, 2024 at 10:30 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Tue, Apr 2, 2024 at 1:35 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Tue, Apr 2, 2024 at 12:47 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Wed, Mar 27, 2024 at 11:58 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > >
> > > > On Wed, Mar 27, 2024 at 9:23 AM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 12:03 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > > > > >
> > > > > > When the mirred action is used on a classful egress qdisc and a packet is
> > > > > > mirrored or redirected to self we hit a qdisc lock deadlock.
> > > > > > See trace below.
> > > > > >
> > > > > > [..... other info removed for brevity....]
> > > > > > [   82.890906]
> > > > > > [   82.890906] ============================================
> > > > > > [   82.890906] WARNING: possible recursive locking detected
> > > > > > [   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
> > > > > > [   82.890906] --------------------------------------------
> > > > > > [   82.890906] ping/418 is trying to acquire lock:
> > > > > > [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
> > > > > > __dev_queue_xmit+0x1778/0x3550
> > > > > > [   82.890906]
> > > > > > [   82.890906] but task is already holding lock:
> > > > > > [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
> > > > > > __dev_queue_xmit+0x1778/0x3550
> > > > > > [   82.890906]
> > > > > > [   82.890906] other info that might help us debug this:
> > > > > > [   82.890906]  Possible unsafe locking scenario:
> > > > > > [   82.890906]
> > > > > > [   82.890906]        CPU0
> > > > > > [   82.890906]        ----
> > > > > > [   82.890906]   lock(&sch->q.lock);
> > > > > > [   82.890906]   lock(&sch->q.lock);
> > > > > > [   82.890906]
> > > > > > [   82.890906]  *** DEADLOCK ***
> > > > > > [   82.890906]
> > > > > > [..... other info removed for brevity....]
> > > > > >
> > > > > > Example setup (eth0->eth0) to recreate
> > > > > > tc qdisc add dev eth0 root handle 1: htb default 30
> > > > > > tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
> > > > > >      action mirred egress redirect dev eth0
> > > > > >
> > > > > > Another example(eth0->eth1->eth0) to recreate
> > > > > > tc qdisc add dev eth0 root handle 1: htb default 30
> > > > > > tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
> > > > > >      action mirred egress redirect dev eth1
> > > > > >
> > > > > > tc qdisc add dev eth1 root handle 1: htb default 30
> > > > > > tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
> > > > > >      action mirred egress redirect dev eth0
> > > > > >
> > > > > > We fix this by adding a per-cpu, per-qdisc recursion counter which is
> > > > > > incremented the first time a root qdisc is entered and on a second attempt
> > > > > > enter the same root qdisc from the top, the packet is dropped to break the
> > > > > > loop.
> > > > > >
> > > > > > Reported-by: renmingshuai@huawei.com
> > > > > > Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
> > > > > > Fixes: 3bcb846ca4cf ("net: get rid of spin_trylock() in net_tx_action()")
> > > > > > Fixes: e578d9c02587 ("net: sched: use counter to break reclassify loops")
> > > > > > Co-developed-by: Victor Nogueira <victor@mojatatu.com>
> > > > > > Signed-off-by: Victor Nogueira <victor@mojatatu.com>
> > > > > > Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
> > > > > > ---
> > > > > >  include/net/sch_generic.h |  2 ++
> > > > > >  net/core/dev.c            |  9 +++++++++
> > > > > >  net/sched/sch_api.c       | 12 ++++++++++++
> > > > > >  net/sched/sch_generic.c   |  2 ++
> > > > > >  4 files changed, 25 insertions(+)
> > > > > >
> > > > > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > > > > > index cefe0c4bdae3..f9f99df037ed 100644
> > > > > > --- a/include/net/sch_generic.h
> > > > > > +++ b/include/net/sch_generic.h
> > > > > > @@ -125,6 +125,8 @@ struct Qdisc {
> > > > > >         spinlock_t              busylock ____cacheline_aligned_in_smp;
> > > > > >         spinlock_t              seqlock;
> > > > > >
> > > > > > +       u16 __percpu            *xmit_recursion;
> > > > > > +
> > > > > >         struct rcu_head         rcu;
> > > > > >         netdevice_tracker       dev_tracker;
> > > > > >         /* private data */
> > > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > > index 9a67003e49db..2b712388c06f 100644
> > > > > > --- a/net/core/dev.c
> > > > > > +++ b/net/core/dev.c
> > > > > > @@ -3789,6 +3789,13 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
> > > > > >         if (unlikely(contended))
> > > > > >                 spin_lock(&q->busylock);
> > > > >
> > > > > This could hang here (busylock)
> > > >
> > > > Notice the goto free_skb_list has an spin_unlock(&q->busylock);  in
> > > > its code vicinity. Am I missing something?
> > >
> > > The hang would happen in above spin_lock(&q->busylock), before you can
> > > get a chance...
> > >
> > > If you want to test your patch, add this debugging feature, pretending
> > > the spinlock is contended.
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 818699dea9d7040ee74532ccdebf01c4fd6887cc..b2fe3aa2716f0fe128ef10f9d06c2431b3246933
> > > 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -3816,7 +3816,7 @@ static inline int __dev_xmit_skb(struct sk_buff
> > > *skb, struct Qdisc *q,
> > >          * sent after the qdisc owner is scheduled again. To prevent this
> > >          * scenario the task always serialize on the lock.
> > >          */
> > > -       contended = qdisc_is_running(q) || IS_ENABLED(CONFIG_PREEMPT_RT);
> > > +       contended = true; // DEBUG for Jamal
> > >         if (unlikely(contended))
> > >                 spin_lock(&q->busylock);
> >
> > Will do.
>
> Finally got time to look again. Probably being too clever, but moving
> the check before the contended check resolves it as well. The only
> strange thing is now with the latest net-next seems to be spitting
> some false positive lockdep splat for the test of A->B->A (i am sure
> it's fixable).
>
> See attached. Didnt try the other idea, see if you like this one.

A spinlock can only be held by one cpu at a time, so recording the cpu
number of the lock owner should be
enough to avoid a deadlock.

So I really do not understand your push for a per-cpu variable with
extra cache line misses.

I think the following would work just fine ? What do you think ?

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 76db6be1608315102495dd6372fc30e6c9d41a99..dcd92ed7f69fae00deaca2c88fed248a559108ea
100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -117,6 +117,7 @@ struct Qdisc {
        struct qdisc_skb_head   q;
        struct gnet_stats_basic_sync bstats;
        struct gnet_stats_queue qstats;
+       int                     owner;
        unsigned long           state;
        unsigned long           state2; /* must be written under qdisc
spinlock */
        struct Qdisc            *next_sched;
diff --git a/net/core/dev.c b/net/core/dev.c
index 854a3a28a8d85b335a9158378ae0cca6dfbf8b36..d77cac53df4b4af478548dd17e7a3a7cfe4bd792
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3808,6 +3808,11 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
                return rc;
        }

+       if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
+               /* add a specific drop_reason later in net-next */
+               kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
+               return NET_XMIT_DROP;
+       }
        /*
         * Heuristic to force contended enqueues to serialize on a
         * separate lock before trying to get qdisc main lock.
@@ -3847,7 +3852,9 @@ static inline int __dev_xmit_skb(struct sk_buff
*skb, struct Qdisc *q,
                qdisc_run_end(q);
                rc = NET_XMIT_SUCCESS;
        } else {
+               WRITE_ONCE(q->owner, smp_processor_id());
                rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
+               WRITE_ONCE(q->owner, -1);
                if (qdisc_run_begin(q)) {
                        if (unlikely(contended)) {
                                spin_unlock(&q->busylock);

  reply	other threads:[~2024-04-15  9:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26 23:03 [PATCH RFC net 1/1] net/sched: Fix mirred to self recursion Jamal Hadi Salim
2024-03-27 13:23 ` Eric Dumazet
2024-03-27 22:57   ` Jamal Hadi Salim
2024-03-27 23:12     ` Jamal Hadi Salim
2024-04-02  2:00       ` renmingshuai
2024-04-02 16:38         ` Jamal Hadi Salim
2024-04-02 16:47     ` Eric Dumazet
2024-04-02 17:35       ` Jamal Hadi Salim
2024-04-10 20:30         ` Jamal Hadi Salim
2024-04-15  9:20           ` Eric Dumazet [this message]
2024-04-15  9:29             ` Eric Dumazet
2024-04-15 13:59             ` Jamal Hadi Salim
2024-04-15 14:01               ` Jamal Hadi Salim
2024-04-15 14:11                 ` Eric Dumazet
2024-04-15 14:17                   ` Jamal Hadi Salim
2024-04-15 21:14                   ` Jamal Hadi Salim
2024-04-16  8:05                     ` Davide Caratti
2024-04-16  9:14                       ` Eric Dumazet
2024-04-16  9:28                         ` Davide Caratti
2024-04-16  9:37                           ` Eric Dumazet
2024-04-16 10:30                       ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iLmhaC8fuu4UpPdELOAapBzLv0+S50gr0Rs+J+=4+9j=g@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=renmingshuai@huawei.com \
    --cc=victor@mojatatu.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.