All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Oltean <olteanv@gmail.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, ast@kernel.org,
	daniel@iogearbox.net, andriin@fb.com, edumazet@google.com,
	weiwan@google.com, cong.wang@bytedance.com, ap420073@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org, mkl@pengutronix.de,
	linux-can@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, andrii@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	john.fastabend@gmail.com, kpsingh@kernel.org,
	bpf@vger.kernel.org, jonas.bonn@netrounds.com, pabeni@redhat.com,
	mzhivich@akamai.com, johunt@akamai.com, albcamus@gmail.com,
	kehuan.feng@gmail.com, a.fatoum@pengutronix.de
Subject: Re: [RFC v3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
Date: Mon, 22 Mar 2021 22:00:33 +0200	[thread overview]
Message-ID: <20210322200033.uphemtsunfqsvjej@skbuf> (raw)
In-Reply-To: <1616404156-11772-1-git-send-email-linyunsheng@huawei.com>

Hi Yunsheng,

On Mon, Mar 22, 2021 at 05:09:16PM +0800, Yunsheng Lin wrote:
> Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
> flag set, but queue discipline by-pass does not work for lockless
> qdisc because skb is always enqueued to qdisc even when the qdisc
> is empty, see __dev_xmit_skb().
> 
> This patch calls sch_direct_xmit() to transmit the skb directly
> to the driver for empty lockless qdisc too, which aviod enqueuing
> and dequeuing operation. qdisc->empty is set to false whenever a
> skb is enqueued, see pfifo_fast_enqueue(), and is set to true when
> skb dequeuing return NULL, see pfifo_fast_dequeue().
> 
> There is a data race between enqueue/dequeue and qdisc->empty
> setting, qdisc->empty is only used as a hint, so we need to call
> sch_may_need_requeuing() to see if the queue is really empty and if
> there is requeued skb, which has higher priority than the current
> skb.
> 
> The performance for ip_forward test increases about 10% with this
> patch.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
> Hi, Vladimir and Ahmad
> 	Please give it a test to see if there is any out of order
> packet for this patch, which has removed the priv->lock added in
> RFC v2.
> 
> There is a data race as below:
> 
>       CPU1                                   CPU2
> qdisc_run_begin(q)                            .
>         .                                q->enqueue()
> sch_may_need_requeuing()                      .
>     return true                               .
>         .                                     .
>         .                                     .
>     q->enqueue()                              .
> 
> When above happen, the skb enqueued by CPU1 is dequeued after the
> skb enqueued by CPU2 because sch_may_need_requeuing() return true.
> If there is not qdisc bypass, the CPU1 has better chance to queue
> the skb quicker than CPU2.
> 
> This patch does not take care of the above data race, because I
> view this as similar as below:
> 
> Even at the same time CPU1 and CPU2 write the skb to two socket
> which both heading to the same qdisc, there is no guarantee that
> which skb will hit the qdisc first, becuase there is a lot of
> factor like interrupt/softirq/cache miss/scheduling afffecting
> that.
> 
> So I hope the above data race will not cause problem for Vladimir
> and Ahmad.
> ---

Preliminary results on my test setup look fine, but please allow me to
run the canfdtest overnight, since as you say, races are still
theoretically possible.

  reply	other threads:[~2021-03-22 20:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-18  6:53 [PATCH net] net: sched: fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-03-19  9:25 ` [Linuxarm] " Yunsheng Lin
2021-03-19 19:45   ` Cong Wang
2021-03-22  1:37     ` Yunsheng Lin
2021-03-19 19:40 ` Cong Wang
2021-03-22  1:31   ` Yunsheng Lin
2021-03-22  9:09 ` [RFC v3] net: sched: implement TCQ_F_CAN_BYPASS " Yunsheng Lin
2021-03-22 20:00   ` Vladimir Oltean [this message]
2021-03-23 11:39     ` Vladimir Oltean
2021-03-23  6:37   ` Ahmad Fatoum
2021-03-23 11:34     ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210322200033.uphemtsunfqsvjej@skbuf \
    --to=olteanv@gmail.com \
    --cc=a.fatoum@pengutronix.de \
    --cc=albcamus@gmail.com \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=kafai@fb.com \
    --cc=kehuan.feng@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mkl@pengutronix.de \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=weiwan@google.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.