All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengkehuan Feng <kehuan.feng@gmail.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Jike Song <albcamus@gmail.com>, Josh Hunt <johunt@akamai.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Jonas Bonn <jonas.bonn@netrounds.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Michael Zhivich <mzhivich@akamai.com>,
	David Miller <davem@davemloft.net>,
	John Fastabend <john.fastabend@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>
Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc
Date: Tue, 25 Aug 2020 10:18:05 +0800	[thread overview]
Message-ID: <CACS=qqKhsu6waaXndO5tQL_gC9TztuUQpqQigJA2Ac0y12czMQ@mail.gmail.com> (raw)
In-Reply-To: <20200822032800.16296-1-hdanton@sina.com>

[-- Attachment #1: Type: text/plain, Size: 5412 bytes --]

Hillf,

With the latest version (attached what I have changed on my tree), the
system failed to start up with cpu stalled.


Hillf Danton <hdanton@sina.com> 于2020年8月22日周六 上午11:30写道:
>
>
> On Thu, 20 Aug 2020 20:43:17 +0800 Hillf Danton wrote:
> > Hi Jike,
> >
> > On Thu, 20 Aug 2020 15:43:17 +0800 Jike Song wrote:
> > > Hi Josh,
> > >
> > > On Fri, Jul 3, 2020 at 2:14 AM Josh Hunt <johunt@akamai.com> wrote:
> > > {snip}
> > > > Initial results with Cong's patch look promising, so far no stalls. We
> > > > will let it run over the long weekend and report back on Tuesday.
> > > >
> > > > Paolo - I have concerns about possible performance regression with the
> > > > change as well. If you can gather some data that would be great. If
> > > > things look good with our low throughput test over the weekend we can
> > > > also try assessing performance next week.
> > > >
> > >
> > > We met possibly the same problem when testing nvidia/mellanox's
> >
> > Below is what was sent in reply to this thread early last month with
> > minor tuning, based on the seqlock. Feel free to drop an echo if it
> > makes ant-antenna-size sense in your tests.
> >
> > > GPUDirect RDMA product, we found that changing NET_SCH_DEFAULT to
> > > DEFAULT_FQ_CODEL mitigated the problem, having no idea why. Maybe you
> > > can also have a try?
> > >
> > > Besides, our testing is pretty complex, do you have a quick test to
> > > reproduce it?
> > >
> > > --
> > > Thanks,
> > > Jike
> >
> >
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -79,6 +79,7 @@ struct Qdisc {
> >  #define TCQ_F_INVISIBLE              0x80 /* invisible by default in dump */
> >  #define TCQ_F_NOLOCK         0x100 /* qdisc does not require locking */
> >  #define TCQ_F_OFFLOADED              0x200 /* qdisc is offloaded to HW */
> > +     int                     pkt_seq;
> >       u32                     limit;
> >       const struct Qdisc_ops  *ops;
> >       struct qdisc_size_table __rcu *stab;
> > @@ -156,6 +157,7 @@ static inline bool qdisc_is_empty(const
> >  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> >  {
> >       if (qdisc->flags & TCQ_F_NOLOCK) {
> > +             qdisc->pkt_seq++;
> >               if (!spin_trylock(&qdisc->seqlock))
> >                       return false;
> >               WRITE_ONCE(qdisc->empty, false);
> > --- a/include/net/pkt_sched.h
> > +++ b/include/net/pkt_sched.h
> > @@ -117,7 +117,9 @@ void __qdisc_run(struct Qdisc *q);
> >
> >  static inline void qdisc_run(struct Qdisc *q)
> >  {
> > -     if (qdisc_run_begin(q)) {
> > +     while (qdisc_run_begin(q)) {
> > +             int seq = q->pkt_seq;
> > +
> >               /* NOLOCK qdisc must check 'state' under the qdisc seqlock
> >                * to avoid racing with dev_qdisc_reset()
> >                */
> > @@ -125,6 +127,9 @@ static inline void qdisc_run(struct Qdis
> >                   likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> >                       __qdisc_run(q);
> >               qdisc_run_end(q);
> > +
> > +             if (!(q->flags & TCQ_F_NOLOCK) || seq == q->pkt_seq)
> > +                     return;
> >       }
> >  }
>
> The echo from Feng indicates that it's hard to conclude that TCQ_F_NOLOCK
> is the culprit, lets try again with it ignored for now.
>
> Every pkt enqueued on pfifo_fast is tracked in the below diff, and those
> pkts enqueued while we're running qdisc are detected and handled to cut
> the chance for the stuck pkts reported.
>
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -79,6 +79,7 @@ struct Qdisc {
>  #define TCQ_F_INVISIBLE                0x80 /* invisible by default in dump */
>  #define TCQ_F_NOLOCK           0x100 /* qdisc does not require locking */
>  #define TCQ_F_OFFLOADED                0x200 /* qdisc is offloaded to HW */
> +       int                     pkt_seq;
>         u32                     limit;
>         const struct Qdisc_ops  *ops;
>         struct qdisc_size_table __rcu *stab;
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -631,6 +631,7 @@ static int pfifo_fast_enqueue(struct sk_
>                         return qdisc_drop(skb, qdisc, to_free);
>         }
>
> +       qdisc->pkt_seq++;
>         qdisc_update_stats_at_enqueue(qdisc, pkt_len);
>         return NET_XMIT_SUCCESS;
>  }
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -117,7 +117,8 @@ void __qdisc_run(struct Qdisc *q);
>
>  static inline void qdisc_run(struct Qdisc *q)
>  {
> -       if (qdisc_run_begin(q)) {
> +       while (qdisc_run_begin(q)) {
> +               int seq = q->pkt_seq;
>                 /* NOLOCK qdisc must check 'state' under the qdisc seqlock
>                  * to avoid racing with dev_qdisc_reset()
>                  */
> @@ -125,6 +126,12 @@ static inline void qdisc_run(struct Qdis
>                     likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
>                         __qdisc_run(q);
>                 qdisc_run_end(q);
> +
> +               /* go another round if there are pkts enqueued after
> +                * taking seq_lock
> +                */
> +               if (seq != q->pkt_seq)
> +                       continue;
>         }
>  }
>
>

[-- Attachment #2: fix_nolock_from_hillf.patch --]
[-- Type: application/octet-stream, Size: 1260 bytes --]

--- include/net/sch_generic.h.orig	2020-08-21 15:13:51.787952710 +0800
+++ include/net/sch_generic.h	2020-08-24 22:01:46.718709912 +0800
@@ -79,6 +79,7 @@
 #define TCQ_F_INVISIBLE		0x80 /* invisible by default in dump */
 #define TCQ_F_NOLOCK		0x100 /* qdisc does not require locking */
 #define TCQ_F_OFFLOADED		0x200 /* qdisc is offloaded to HW */
+	int                     pkt_seq;
 	u32			limit;
 	const struct Qdisc_ops	*ops;
 	struct qdisc_size_table	__rcu *stab;
--- net/sched/sch_generic.c.orig	2020-08-24 22:02:04.589830751 +0800
+++ net/sched/sch_generic.c	2020-08-24 22:03:48.010728381 +0800
@@ -638,6 +638,8 @@
 	 * so we better not use qdisc_qstats_cpu_backlog_inc()
 	 */
 	this_cpu_add(qdisc->cpu_qstats->backlog, pkt_len);
+
+	qdisc->pkt_seq++;
 	return NET_XMIT_SUCCESS;
 }
 
--- include/net/pkt_sched.h.orig	2020-08-21 15:13:51.787952710 +0800
+++ include/net/pkt_sched.h	2020-08-24 22:06:58.856005213 +0800
@@ -116,9 +116,16 @@
 
 static inline void qdisc_run(struct Qdisc *q)
 {
-	if (qdisc_run_begin(q)) {
+	while (qdisc_run_begin(q)) {
+		int seq = q->pkt_seq;
 		__qdisc_run(q);
 		qdisc_run_end(q);
+
+		/* go another round if there are pkts enqueued after
+ 		* taking seq_lock
+ 		*/
+ 		if (seq != q->pkt_seq)
+			continue;
 	}
 }
 

  parent reply	other threads:[~2020-08-25  2:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-09  6:46 Packet gets stuck in NOLOCK pfifo_fast qdisc Jonas Bonn
2019-10-09 19:14 ` Paolo Abeni
2019-10-10  6:27   ` Jonas Bonn
2019-10-11  0:39   ` Jonas Bonn
2020-06-23 13:42     ` Michael Zhivich
2020-06-30 19:14       ` Josh Hunt
2020-07-01  7:53         ` Jonas Bonn
2020-07-01 16:05         ` Cong Wang
2020-07-01 19:58           ` Cong Wang
2020-07-01 22:02             ` Josh Hunt
2020-07-02  6:14             ` Jonas Bonn
2020-07-02  9:45               ` Paolo Abeni
2020-07-02 18:08                 ` Josh Hunt
2020-07-07 14:18                   ` Paolo Abeni
2020-07-08 20:16                     ` Cong Wang
2020-07-09  9:20                       ` Paolo Abeni
2020-07-08 20:33                   ` Zhivich, Michael
2020-08-20  7:43                   ` Jike Song
2020-08-20 18:13                     ` Josh Hunt
     [not found]                     ` <20200822032800.16296-1-hdanton@sina.com>
2020-08-25  2:18                       ` Fengkehuan Feng [this message]
     [not found]                         ` <20200825032312.11776-1-hdanton@sina.com>
2020-08-25  7:14                           ` Fengkehuan Feng
     [not found]                             ` <20200825162329.11292-1-hdanton@sina.com>
2020-08-26  2:38                               ` Kehuan Feng
     [not found]                                 ` <CACS=qqKptAQQGiMoCs1Zgs9S4ZppHhasy1AK4df2NxnCDR+vCw@mail.gmail.com>
     [not found]                                   ` <5f46032e.1c69fb81.9880c.7a6cSMTPIN_ADDED_MISSING@mx.google.com>
2020-08-27  6:56                                     ` Kehuan Feng
     [not found]                                       ` <20200827125747.5816-1-hdanton@sina.com>
2020-08-28  1:45                                         ` Kehuan Feng
2020-09-03  5:01                                           ` Cong Wang
2020-09-03  8:39                                             ` Paolo Abeni
2020-09-03 17:43                                               ` Cong Wang
2020-09-04  5:07                                                 ` John Fastabend
2020-09-10 20:15                                                   ` Cong Wang
2020-09-10 21:07                                                     ` John Fastabend
2020-09-10 21:40                                                       ` Paolo Abeni
2021-04-02 19:25                                                   ` Jiri Kosina
2021-04-02 19:33                                                     ` Josh Hunt
     [not found]                                                     ` <20210403003537.2032-1-hdanton@sina.com>
2021-04-03 12:23                                                       ` Jiri Kosina
2021-04-06  0:55                                                         ` Yunsheng Lin
2021-04-06  7:06                                                           ` Michal Kubecek
2021-04-06 10:13                                                             ` Juergen Gross
2021-04-06 12:17                                                               ` Yunsheng Lin
2021-04-06  1:49                                                         ` Cong Wang
2021-04-06  2:46                                                           ` Yunsheng Lin
2021-04-06  7:31                                                             ` Michal Kubecek
2021-04-06 12:24                                                               ` Yunsheng Lin
     [not found]                                               ` <20200903101957.428-1-hdanton@sina.com>
2020-09-04  3:20                                                 ` Kehuan Feng
2020-09-10 20:19                                                   ` Cong Wang
2020-09-14  2:10                                                     ` Yunsheng Lin
2020-09-17 19:52                                                       ` Cong Wang
2020-09-18  2:06                                                         ` Kehuan Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACS=qqKhsu6waaXndO5tQL_gC9TztuUQpqQigJA2Ac0y12czMQ@mail.gmail.com' \
    --to=kehuan.feng@gmail.com \
    --cc=albcamus@gmail.com \
    --cc=davem@davemloft.net \
    --cc=hdanton@sina.com \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.