From: Jakub Kicinski <kuba@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: <davem@davemloft.net>, <olteanv@gmail.com>, <ast@kernel.org>,
<daniel@iogearbox.net>, <andriin@fb.com>, <edumazet@google.com>,
<weiwan@google.com>, <cong.wang@bytedance.com>,
<ap420073@gmail.com>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linuxarm@openeuler.org>,
<mkl@pengutronix.de>, <linux-can@vger.kernel.org>,
<jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>,
<jiri@resnulli.us>, <andrii@kernel.org>, <kafai@fb.com>,
<songliubraving@fb.com>, <yhs@fb.com>, <john.fastabend@gmail.com>,
<kpsingh@kernel.org>, <bpf@vger.kernel.org>,
<jonas.bonn@netrounds.com>, <pabeni@redhat.com>,
<mzhivich@akamai.com>, <johunt@akamai.com>, <albcamus@gmail.com>,
<kehuan.feng@gmail.com>, <a.fatoum@pengutronix.de>,
<atenart@kernel.org>, <alexander.duyck@gmail.com>,
<hdanton@sina.com>, <jgross@suse.com>, <JKosina@suse.com>,
<mkubecek@suse.cz>, <bjorn@kernel.org>
Subject: Re: [PATCH net v5 1/3] net: sched: fix packet stuck problem for lockless qdisc
Date: Fri, 7 May 2021 16:57:03 -0700 [thread overview]
Message-ID: <20210507165703.70771c55@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <1620266264-48109-2-git-send-email-linyunsheng@huawei.com>
On Thu, 6 May 2021 09:57:42 +0800 Yunsheng Lin wrote:
> @@ -159,8 +160,37 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
> static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> {
> if (qdisc->flags & TCQ_F_NOLOCK) {
> + bool dont_retry = test_bit(__QDISC_STATE_MISSED,
> + &qdisc->state);
> +
> + if (spin_trylock(&qdisc->seqlock))
> + goto nolock_empty;
> +
> + /* If the flag is set before doing the spin_trylock() and
> + * the above spin_trylock() return false, it means other cpu
> + * holding the lock will do dequeuing for us, or it wil see
s/wil/will/
> + * the flag set after releasing lock and reschedule the
> + * net_tx_action() to do the dequeuing.
I don't understand why MISSED is checked before the trylock.
Could you explain why it can't be tested directly here?
> + */
> + if (dont_retry)
> + return false;
> +
> + /* We could do set_bit() before the first spin_trylock(),
> + * and avoid doing second spin_trylock() completely, then
> + * we could have multi cpus doing the set_bit(). Here use
> + * dont_retry to avoid doing the set_bit() and the second
> + * spin_trylock(), which has 5% performance improvement than
> + * doing the set_bit() before the first spin_trylock().
> + */
> + set_bit(__QDISC_STATE_MISSED, &qdisc->state);
> +
> + /* Retry again in case other CPU may not see the new flag
> + * after it releases the lock at the end of qdisc_run_end().
> + */
> if (!spin_trylock(&qdisc->seqlock))
> return false;
> +
> +nolock_empty:
> WRITE_ONCE(qdisc->empty, false);
> } else if (qdisc_is_running(qdisc)) {
> return false;
> @@ -176,8 +206,13 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> static inline void qdisc_run_end(struct Qdisc *qdisc)
> {
> write_seqcount_end(&qdisc->running);
> - if (qdisc->flags & TCQ_F_NOLOCK)
> + if (qdisc->flags & TCQ_F_NOLOCK) {
> spin_unlock(&qdisc->seqlock);
> +
> + if (unlikely(test_bit(__QDISC_STATE_MISSED,
> + &qdisc->state)))
> + __netif_schedule(qdisc);
> + }
> }
>
> static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 44991ea..9bc73ea 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -640,8 +640,10 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
> {
> struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
> struct sk_buff *skb = NULL;
> + bool need_retry = true;
> int band;
>
> +retry:
> for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> struct skb_array *q = band2list(priv, band);
>
> @@ -652,6 +654,16 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
> }
> if (likely(skb)) {
> qdisc_update_stats_at_dequeue(qdisc, skb);
> + } else if (need_retry &&
> + test_and_clear_bit(__QDISC_STATE_MISSED,
> + &qdisc->state)) {
Why test_and_clear_bit() here? AFAICT this is the only place the bit
is cleared. So the test and clear do not have to be atomic.
To my limited understanding on x86 test_bit() is never a locked
operation, while test_and_clear_bit() is always locked. So we'd save
an atomic operation in un-contended case if we tested first and then
cleared.
> + /* do another dequeuing after clearing the flag to
> + * avoid calling __netif_schedule().
> + */
> + smp_mb__after_atomic();
test_and_clear_bit() which returned true implies a memory barrier,
AFAIU, so the barrier is not needed with the code as is. It will be
needed if we switch to test_bit() + clear_bit(), but please clarify
what it is paring with.
> + need_retry = false;
> +
> + goto retry;
> } else {
> WRITE_ONCE(qdisc->empty, true);
> }
next prev parent reply other threads:[~2021-05-07 23:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-06 1:57 [PATCH net v5 0/3] fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-05-06 1:57 ` [PATCH net v5 1/3] net: sched: " Yunsheng Lin
2021-05-07 23:57 ` Jakub Kicinski [this message]
2021-05-08 2:55 ` Yunsheng Lin
2021-05-08 3:05 ` Jakub Kicinski
2021-05-06 1:57 ` [PATCH net v5 2/3] net: sched: fix endless tx action reschedule during deactivation Yunsheng Lin
2021-05-06 1:57 ` [PATCH net v5 3/3] net: sched: fix tx action reschedule issue with stopped queue Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210507165703.70771c55@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
--to=kuba@kernel.org \
--cc=JKosina@suse.com \
--cc=a.fatoum@pengutronix.de \
--cc=albcamus@gmail.com \
--cc=alexander.duyck@gmail.com \
--cc=andrii@kernel.org \
--cc=andriin@fb.com \
--cc=ap420073@gmail.com \
--cc=ast@kernel.org \
--cc=atenart@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hdanton@sina.com \
--cc=jgross@suse.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=john.fastabend@gmail.com \
--cc=johunt@akamai.com \
--cc=jonas.bonn@netrounds.com \
--cc=kafai@fb.com \
--cc=kehuan.feng@gmail.com \
--cc=kpsingh@kernel.org \
--cc=linux-can@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@openeuler.org \
--cc=linyunsheng@huawei.com \
--cc=mkl@pengutronix.de \
--cc=mkubecek@suse.cz \
--cc=mzhivich@akamai.com \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=pabeni@redhat.com \
--cc=songliubraving@fb.com \
--cc=weiwan@google.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).