netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Yunsheng Lin <yunshenglin0825@gmail.com>
Cc: Yunsheng Lin <linyunsheng@huawei.com>,
	davem@davemloft.net, olteanv@gmail.com, ast@kernel.org,
	daniel@iogearbox.net, andriin@fb.com, edumazet@google.com,
	weiwan@google.com, cong.wang@bytedance.com, ap420073@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@openeuler.org, mkl@pengutronix.de,
	linux-can@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, andrii@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	john.fastabend@gmail.com, kpsingh@kernel.org,
	bpf@vger.kernel.org, jonas.bonn@netrounds.com, pabeni@redhat.com,
	mzhivich@akamai.com, johunt@akamai.com, albcamus@gmail.com,
	kehuan.feng@gmail.com, a.fatoum@pengutronix.de,
	atenart@kernel.org, alexander.duyck@gmail.com, hdanton@sina.com,
	jgross@suse.com, JKosina@suse.com, mkubecek@suse.cz,
	bjorn@kernel.org, alobakin@pm.me
Subject: Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
Date: Sun, 30 May 2021 13:21:11 -0700	[thread overview]
Message-ID: <20210530132111.3a974275@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> (raw)
In-Reply-To: <9cc9f513-7655-07df-3c74-5abe07ae8321@gmail.com>

On Sun, 30 May 2021 09:37:09 +0800 Yunsheng Lin wrote:
> On 2021/5/30 2:49, Jakub Kicinski wrote:
> > The fact that MISSED is only cleared under q->seqlock does not matter,
> > because setting it and ->enqueue() are not under any lock. If the thread
> > gets interrupted between:
> > 
> > 	if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
> > 	    qdisc_run_begin(q)) {
> > 
> > and ->enqueue() we can't guarantee that something else won't come in,
> > take q->seqlock and clear MISSED.
> > 
> > thread1                thread2             thread3
> > # holds seqlock
> >                        qdisc_run_begin(q)
> >                        set(MISSED)
> > pfifo_fast_dequeue
> >   clear(MISSED)
> >   # recheck the queue
> > qdisc_run_end()  
> >                        ->enqueue()  
> >                                             q->flags & TCQ_F_CAN_BYPASS..
> >                                             qdisc_run_begin() # true
> >                                             sch_direct_xmit()
> >                        qdisc_run_begin()
> >                        set(MISSED)
> > 
> > Or am I missing something?
> > 
> > Re-checking nolock_qdisc_is_empty() may or may not help.
> > But it doesn't really matter because there is no ordering
> > requirement between thread2 and thread3 here.  
> 
> I were more focued on explaining that using MISSED is reliable
> as sch_may_need_requeuing() checking in RFCv3 [1] to indicate a
> empty qdisc, and forgot to mention the data race described in
> RFCv3, which is kind of like the one described above:
> 
> "There is a data race as below:
> 
>       CPU1                                   CPU2
> qdisc_run_begin(q)                            .
>         .                                q->enqueue()
> sch_may_need_requeuing()                      .
>     return true                               .
>         .                                     .
>         .                                     .
>     q->enqueue()                              .
> 
> When above happen, the skb enqueued by CPU1 is dequeued after the
> skb enqueued by CPU2 because sch_may_need_requeuing() return true.
> If there is not qdisc bypass, the CPU1 has better chance to queue
> the skb quicker than CPU2.
> 
> This patch does not take care of the above data race, because I
> view this as similar as below:
> 
> Even at the same time CPU1 and CPU2 write the skb to two socket
> which both heading to the same qdisc, there is no guarantee that
> which skb will hit the qdisc first, becuase there is a lot of
> factor like interrupt/softirq/cache miss/scheduling afffecting
> that."
> 
> Does above make sense? Or any idea to avoid it?
> 
> 1. https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-send-email-linyunsheng@huawei.com/

We agree on this one.

Could you draw a sequence diagram of different CPUs (like the one
above) for the case where removing re-checking nolock_qdisc_is_empty()
under q->seqlock leads to incorrect behavior? 

If there is no such case would you be willing to repeat the benchmark
with and without this test?

Sorry for dragging the review out..

  reply	other threads:[~2021-05-30 20:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28  2:49 [PATCH net-next 0/3] Some optimization for lockless qdisc Yunsheng Lin
2021-05-28  2:49 ` [PATCH net-next 1/3] net: sched: avoid unnecessary seqcount operation " Yunsheng Lin
2021-05-28  2:49 ` [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS " Yunsheng Lin
2021-05-29  1:00   ` Jakub Kicinski
2021-05-29  1:44     ` Yunsheng Lin
2021-05-29  4:32       ` Jakub Kicinski
2021-05-29  7:03         ` Yunsheng Lin
2021-05-29 18:49           ` Jakub Kicinski
2021-05-30  1:37             ` Yunsheng Lin
2021-05-30 20:21               ` Jakub Kicinski [this message]
2021-05-31  0:40                 ` Yunsheng Lin
2021-05-31  1:10                   ` [Linuxarm] " Yunsheng Lin
2021-05-31 12:40                     ` Yunsheng Lin
2021-06-01  4:51                       ` Jakub Kicinski
2021-06-01  8:18                         ` Yunsheng Lin
2021-06-01 20:48                           ` Jakub Kicinski
2021-06-02  1:21                             ` Yunsheng Lin
2021-06-02 16:28                               ` Jakub Kicinski
2021-05-28  2:49 ` [PATCH net-next 3/3] net: sched: remove qdisc->empty " Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210530132111.3a974275@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net \
    --to=kuba@kernel.org \
    --cc=JKosina@suse.com \
    --cc=a.fatoum@pengutronix.de \
    --cc=albcamus@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jgross@suse.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=kafai@fb.com \
    --cc=kehuan.feng@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=linyunsheng@huawei.com \
    --cc=mkl@pengutronix.de \
    --cc=mkubecek@suse.cz \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=weiwan@google.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    --cc=yunshenglin0825@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).