netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: <davem@davemloft.net>, <olteanv@gmail.com>, <ast@kernel.org>,
	<daniel@iogearbox.net>, <andriin@fb.com>, <edumazet@google.com>,
	<weiwan@google.com>, <cong.wang@bytedance.com>,
	<ap420073@gmail.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linuxarm@openeuler.org>,
	<mkl@pengutronix.de>, <linux-can@vger.kernel.org>,
	<jhs@mojatatu.com>, <xiyou.wangcong@gmail.com>,
	<jiri@resnulli.us>, <andrii@kernel.org>, <kafai@fb.com>,
	<songliubraving@fb.com>, <yhs@fb.com>, <john.fastabend@gmail.com>,
	<kpsingh@kernel.org>, <bpf@vger.kernel.org>,
	<jonas.bonn@netrounds.com>, <pabeni@redhat.com>,
	<mzhivich@akamai.com>, <johunt@akamai.com>, <albcamus@gmail.com>,
	<kehuan.feng@gmail.com>, <a.fatoum@pengutronix.de>,
	<atenart@kernel.org>, <alexander.duyck@gmail.com>,
	<hdanton@sina.com>, <jgross@suse.com>, <JKosina@suse.com>,
	<mkubecek@suse.cz>, <bjorn@kernel.org>, <alobakin@pm.me>
Subject: Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
Date: Sat, 29 May 2021 15:03:09 +0800	[thread overview]
Message-ID: <ee1a62da-9758-70db-abd3-c5ca2e8e0ce0@huawei.com> (raw)
In-Reply-To: <20210528213218.2b90864c@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>

On 2021/5/29 12:32, Jakub Kicinski wrote:
> On Sat, 29 May 2021 09:44:57 +0800 Yunsheng Lin wrote:
>>>> @@ -3852,10 +3852,32 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>>>>  	qdisc_calculate_pkt_len(skb, q);
>>>>  
>>>>  	if (q->flags & TCQ_F_NOLOCK) {
>>>> +		if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
>>>> +		    qdisc_run_begin(q)) {
>>>> +			/* Retest nolock_qdisc_is_empty() within the protection
>>>> +			 * of q->seqlock to ensure qdisc is indeed empty.
>>>> +			 */
>>>> +			if (unlikely(!nolock_qdisc_is_empty(q))) {  
>>>
>>> This is just for the DRAINING case right? 
>>>
>>> MISSED can be set at any moment, testing MISSED seems confusing.  
>>
>> MISSED is only set when there is lock contention, which means it
>> is better not to do the qdisc bypass to avoid out of order packet
>> problem, 
> 
> Avoid as in make less likely? Nothing guarantees other thread is not
> interrupted after ->enqueue and before qdisc_run_begin().
> 
> TBH I'm not sure what out-of-order situation you're referring to,
> there is no ordering guarantee between separate threads trying to
> transmit AFAIU.
A thread need to do the bypass checking before doing enqueuing, so
it means MISSED is set or the trylock fails for the bypass transmiting(
which will set the MISSED after the first trylock), so the MISSED will
always be set before a thread doing a enqueuing, and we ensure MISSED
only be cleared during the protection of q->seqlock, after clearing
MISSED, we do anther round of dequeuing within the protection of
q->seqlock.

So if a thread has taken the q->seqlock and the MISSED is not set yet,
it is allowed to send the packet directly without going through the
qdisc enqueuing and dequeuing process.


> 
> IOW this check is not required for correctness, right?

if a thread has taken the q->seqlock and the MISSED is not set, it means
other thread has not set MISSED after the first trylock and before the
second trylock, which means the enqueuing is not done yet.
So I assume the this check is required for correctness if I understand
your question correctly.

> 
>> another good thing is that we could also do the batch
>> dequeuing and transmiting of packets when there is lock contention.
> 
> No doubt, but did you see the flag get set significantly often here 
> to warrant the double-checking?

No, that is just my guess:)

> 
>>> Is it really worth the extra code?  
>>
>> Currently DRAINING is only set for the netdev queue stopped.
>> We could only use DRAINING to indicate the non-empty of a qdisc,
>> then we need to set the DRAINING evrywhere MISSED is set, that is
>> why I use both DRAINING and MISSED to indicate a non-empty qdisc.



  reply	other threads:[~2021-05-29  7:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28  2:49 [PATCH net-next 0/3] Some optimization for lockless qdisc Yunsheng Lin
2021-05-28  2:49 ` [PATCH net-next 1/3] net: sched: avoid unnecessary seqcount operation " Yunsheng Lin
2021-05-28  2:49 ` [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS " Yunsheng Lin
2021-05-29  1:00   ` Jakub Kicinski
2021-05-29  1:44     ` Yunsheng Lin
2021-05-29  4:32       ` Jakub Kicinski
2021-05-29  7:03         ` Yunsheng Lin [this message]
2021-05-29 18:49           ` Jakub Kicinski
2021-05-30  1:37             ` Yunsheng Lin
2021-05-30 20:21               ` Jakub Kicinski
2021-05-31  0:40                 ` Yunsheng Lin
2021-05-31  1:10                   ` [Linuxarm] " Yunsheng Lin
2021-05-31 12:40                     ` Yunsheng Lin
2021-06-01  4:51                       ` Jakub Kicinski
2021-06-01  8:18                         ` Yunsheng Lin
2021-06-01 20:48                           ` Jakub Kicinski
2021-06-02  1:21                             ` Yunsheng Lin
2021-06-02 16:28                               ` Jakub Kicinski
2021-05-28  2:49 ` [PATCH net-next 3/3] net: sched: remove qdisc->empty " Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee1a62da-9758-70db-abd3-c5ca2e8e0ce0@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=JKosina@suse.com \
    --cc=a.fatoum@pengutronix.de \
    --cc=albcamus@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=andrii@kernel.org \
    --cc=andriin@fb.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hdanton@sina.com \
    --cc=jgross@suse.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jonas.bonn@netrounds.com \
    --cc=kafai@fb.com \
    --cc=kehuan.feng@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-can@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=mkl@pengutronix.de \
    --cc=mkubecek@suse.cz \
    --cc=mzhivich@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=songliubraving@fb.com \
    --cc=weiwan@google.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).