bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: "David Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Vladimir Oltean" <olteanv@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Wei Wang" <weiwan@google.com>,
	"Cong Wang ." <cong.wang@bytedance.com>,
	"Taehee Yoo" <ap420073@gmail.com>,
	"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linuxarm@openeuler.org, "Marc Kleine-Budde" <mkl@pengutronix.de>,
	linux-can@vger.kernel.org, "Jamal Hadi Salim" <jhs@mojatatu.com>,
	"Jiri Pirko" <jiri@resnulli.us>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	kpsingh@kernel.org, bpf <bpf@vger.kernel.org>,
	"Jonas Bonn" <jonas.bonn@netrounds.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Michael Zhivich" <mzhivich@akamai.com>,
	"Josh Hunt" <johunt@akamai.com>, "Jike Song" <albcamus@gmail.com>,
	"Kehuan Feng" <kehuan.feng@gmail.com>,
	"Ahmad Fatoum" <a.fatoum@pengutronix.de>,
	atenart@kernel.org, "Alexander Duyck" <alexander.duyck@gmail.com>,
	"Hillf Danton" <hdanton@sina.com>,
	jgross@suse.com, JKosina@suse.com,
	"Michal Kubecek" <mkubecek@suse.cz>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Alexander Lobakin" <alobakin@pm.me>
Subject: Re: [PATCH net v8 1/3] net: sched: fix packet stuck problem for lockless qdisc
Date: Fri, 14 May 2021 16:36:16 -0700	[thread overview]
Message-ID: <CAM_iQpXWgYQxf8Ba-D4JQJMPUaR9MBfQFTLFCHWJMVq9PcUWRg@mail.gmail.com> (raw)
In-Reply-To: <1620959218-17250-2-git-send-email-linyunsheng@huawei.com>

On Thu, May 13, 2021 at 7:27 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>  struct qdisc_size_table {
> @@ -159,8 +160,33 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
>  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  {
>         if (qdisc->flags & TCQ_F_NOLOCK) {
> +               if (spin_trylock(&qdisc->seqlock))
> +                       goto nolock_empty;
> +
> +               /* If the MISSED flag is set, it means other thread has
> +                * set the MISSED flag before second spin_trylock(), so
> +                * we can return false here to avoid multi cpus doing
> +                * the set_bit() and second spin_trylock() concurrently.
> +                */
> +               if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))
> +                       return false;
> +
> +               /* Set the MISSED flag before the second spin_trylock(),
> +                * if the second spin_trylock() return false, it means
> +                * other cpu holding the lock will do dequeuing for us
> +                * or it will see the MISSED flag set after releasing
> +                * lock and reschedule the net_tx_action() to do the
> +                * dequeuing.
> +                */
> +               set_bit(__QDISC_STATE_MISSED, &qdisc->state);
> +
> +               /* Retry again in case other CPU may not see the new flag
> +                * after it releases the lock at the end of qdisc_run_end().
> +                */
>                 if (!spin_trylock(&qdisc->seqlock))
>                         return false;
> +
> +nolock_empty:
>                 WRITE_ONCE(qdisc->empty, false);
>         } else if (qdisc_is_running(qdisc)) {
>                 return false;
> @@ -176,8 +202,15 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  static inline void qdisc_run_end(struct Qdisc *qdisc)
>  {
>         write_seqcount_end(&qdisc->running);
> -       if (qdisc->flags & TCQ_F_NOLOCK)
> +       if (qdisc->flags & TCQ_F_NOLOCK) {
>                 spin_unlock(&qdisc->seqlock);
> +
> +               if (unlikely(test_bit(__QDISC_STATE_MISSED,
> +                                     &qdisc->state))) {
> +                       clear_bit(__QDISC_STATE_MISSED, &qdisc->state);


We have test_and_clear_bit() which is atomic, test_bit()+clear_bit()
is not.


> +                       __netif_schedule(qdisc);
> +               }
> +       }
>  }
>
>  static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 44991ea..795d986 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -640,8 +640,10 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>  {
>         struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>         struct sk_buff *skb = NULL;
> +       bool need_retry = true;
>         int band;
>
> +retry:
>         for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>                 struct skb_array *q = band2list(priv, band);
>
> @@ -652,6 +654,23 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>         }
>         if (likely(skb)) {
>                 qdisc_update_stats_at_dequeue(qdisc, skb);
> +       } else if (need_retry &&
> +                  test_bit(__QDISC_STATE_MISSED, &qdisc->state)) {
> +               /* Delay clearing the STATE_MISSED here to reduce
> +                * the overhead of the second spin_trylock() in
> +                * qdisc_run_begin() and __netif_schedule() calling
> +                * in qdisc_run_end().
> +                */
> +               clear_bit(__QDISC_STATE_MISSED, &qdisc->state);

Ditto.

> +
> +               /* Make sure dequeuing happens after clearing
> +                * STATE_MISSED.
> +                */
> +               smp_mb__after_atomic();
> +
> +               need_retry = false;
> +
> +               goto retry;

Two concurrent pfifo_fast_dequeue() would possibly retry it at the
same time when they test __QDISC_STATE_MISSED at the same
time and get true. Is this a problem?

Also, any reason why you want pfifo_fast to handle a generic
Qdisc flag? IOW, why not handle this logic in, for example,
qdisc_restart()?

Thanks.

  reply	other threads:[~2021-05-14 23:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14  2:26 [PATCH net v8 0/3] fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-05-14  2:26 ` [PATCH net v8 1/3] net: sched: " Yunsheng Lin
2021-05-14 23:36   ` Cong Wang [this message]
2021-05-14 23:39     ` Jakub Kicinski
2021-05-14 23:57       ` Cong Wang
2021-05-15  0:17         ` Jakub Kicinski
2021-05-15  2:25           ` Yunsheng Lin
2021-05-18  0:49             ` Cong Wang
2021-05-14  2:26 ` [PATCH net v8 2/3] net: sched: fix tx action rescheduling issue during deactivation Yunsheng Lin
2021-05-14  2:26 ` [PATCH net v8 3/3] net: sched: fix tx action reschedule issue with stopped queue Yunsheng Lin
2021-05-14  2:56 ` [PATCH net v8 0/3] fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-05-14  3:16 Yunsheng Lin
2021-05-14  3:16 ` [PATCH net v8 1/3] net: sched: " Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM_iQpXWgYQxf8Ba-D4JQJMPUaR9MBfQFTLFCHWJMVq9PcUWRg@mail.gmail.com \
    --to=xiyou.wangcong@gmail.com \
    --cc=JKosina@suse.com \
    --cc=a.fatoum@pengutronix.de \
    --cc=albcamus@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jgross@suse.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=kafai@fb.com \
    --cc=kehuan.feng@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mkl@pengutronix.de \
    --cc=mkubecek@suse.cz \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=weiwan@google.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).