linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: <davem@davemloft.net>, <olteanv@gmail.com>, <ast@kernel.org>,
	<daniel@iogearbox.net>, <andriin@fb.com>, <edumazet@google.com>,
	<weiwan@google.com>, <cong.wang@bytedance.com>,
	<ap420073@gmail.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linuxarm@openeuler.org>,
	<mkl@pengutronix.de>, <linux-can@vger.kernel.org>,
	<jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>,
	<jiri@resnulli.us>, <andrii@kernel.org>, <kafai@fb.com>,
	<songliubraving@fb.com>, <yhs@fb.com>, <john.fastabend@gmail.com>,
	<kpsingh@kernel.org>, <bpf@vger.kernel.org>,
	<jonas.bonn@netrounds.com>, <pabeni@redhat.com>,
	<mzhivich@akamai.com>, <johunt@akamai.com>, <albcamus@gmail.com>,
	<kehuan.feng@gmail.com>, <a.fatoum@pengutronix.de>,
	<atenart@kernel.org>, <alexander.duyck@gmail.com>,
	<hdanton@sina.com>, <jgross@suse.com>, <JKosina@suse.com>,
	<mkubecek@suse.cz>, <bjorn@kernel.org>
Subject: Re: [PATCH net v5 1/3] net: sched: fix packet stuck problem for lockless qdisc
Date: Fri, 7 May 2021 16:57:03 -0700	[thread overview]
Message-ID: <20210507165703.70771c55@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <1620266264-48109-2-git-send-email-linyunsheng@huawei.com>

On Thu, 6 May 2021 09:57:42 +0800 Yunsheng Lin wrote:
> @@ -159,8 +160,37 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
>  static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  {
>  	if (qdisc->flags & TCQ_F_NOLOCK) {
> +		bool dont_retry = test_bit(__QDISC_STATE_MISSED,
> +					   &qdisc->state);
> +
> +		if (spin_trylock(&qdisc->seqlock))
> +			goto nolock_empty;
> +
> +		/* If the flag is set before doing the spin_trylock() and
> +		 * the above spin_trylock() return false, it means other cpu
> +		 * holding the lock will do dequeuing for us, or it wil see

s/wil/will/

> +		 * the flag set after releasing lock and reschedule the
> +		 * net_tx_action() to do the dequeuing.

I don't understand why MISSED is checked before the trylock.
Could you explain why it can't be tested directly here?

> +		 */
> +		if (dont_retry)
> +			return false;
> +
> +		/* We could do set_bit() before the first spin_trylock(),
> +		 * and avoid doing second spin_trylock() completely, then
> +		 * we could have multi cpus doing the set_bit(). Here use
> +		 * dont_retry to avoid doing the set_bit() and the second
> +		 * spin_trylock(), which has 5% performance improvement than
> +		 * doing the set_bit() before the first spin_trylock().
> +		 */
> +		set_bit(__QDISC_STATE_MISSED, &qdisc->state);
> +
> +		/* Retry again in case other CPU may not see the new flag
> +		 * after it releases the lock at the end of qdisc_run_end().
> +		 */
>  		if (!spin_trylock(&qdisc->seqlock))
>  			return false;
> +
> +nolock_empty:
>  		WRITE_ONCE(qdisc->empty, false);
>  	} else if (qdisc_is_running(qdisc)) {
>  		return false;
> @@ -176,8 +206,13 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  static inline void qdisc_run_end(struct Qdisc *qdisc)
>  {
>  	write_seqcount_end(&qdisc->running);
> -	if (qdisc->flags & TCQ_F_NOLOCK)
> +	if (qdisc->flags & TCQ_F_NOLOCK) {
>  		spin_unlock(&qdisc->seqlock);
> +
> +		if (unlikely(test_bit(__QDISC_STATE_MISSED,
> +				      &qdisc->state)))
> +			__netif_schedule(qdisc);
> +	}
>  }
>  
>  static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 44991ea..9bc73ea 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -640,8 +640,10 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>  {
>  	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
>  	struct sk_buff *skb = NULL;
> +	bool need_retry = true;
>  	int band;
>  
> +retry:
>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
>  		struct skb_array *q = band2list(priv, band);
>  
> @@ -652,6 +654,16 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>  	}
>  	if (likely(skb)) {
>  		qdisc_update_stats_at_dequeue(qdisc, skb);
> +	} else if (need_retry &&
> +		   test_and_clear_bit(__QDISC_STATE_MISSED,
> +				      &qdisc->state)) {

Why test_and_clear_bit() here? AFAICT this is the only place the bit 
is cleared. So the test and clear do not have to be atomic.

To my limited understanding on x86 test_bit() is never a locked
operation, while test_and_clear_bit() is always locked. So we'd save
an atomic operation in un-contended case if we tested first and then
cleared.

> +		/* do another dequeuing after clearing the flag to
> +		 * avoid calling __netif_schedule().
> +		 */
> +		smp_mb__after_atomic();

test_and_clear_bit() which returned true implies a memory barrier,
AFAIU, so the barrier is not needed with the code as is. It will be
needed if we switch to test_bit() + clear_bit(), but please clarify
what it is paring with.

> +		need_retry = false;
> +
> +		goto retry;
>  	} else {
>  		WRITE_ONCE(qdisc->empty, true);
>  	}


  reply	other threads:[~2021-05-07 23:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06  1:57 [PATCH net v5 0/3] fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-05-06  1:57 ` [PATCH net v5 1/3] net: sched: " Yunsheng Lin
2021-05-07 23:57   ` Jakub Kicinski [this message]
2021-05-08  2:55     ` Yunsheng Lin
2021-05-08  3:05       ` Jakub Kicinski
2021-05-06  1:57 ` [PATCH net v5 2/3] net: sched: fix endless tx action reschedule during deactivation Yunsheng Lin
2021-05-06  1:57 ` [PATCH net v5 3/3] net: sched: fix tx action reschedule issue with stopped queue Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210507165703.70771c55@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=JKosina@suse.com \
    --cc=a.fatoum@pengutronix.de \
    --cc=albcamus@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jgross@suse.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=kafai@fb.com \
    --cc=kehuan.feng@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mkl@pengutronix.de \
    --cc=mkubecek@suse.cz \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=weiwan@google.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).