linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Juergen Gross <jgross@suse.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Jiri Kosina <JKosina@suse.com>
Subject: Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc
Date: Mon, 12 Apr 2021 20:00:43 +0800	[thread overview]
Message-ID: <d7b8a391-0b2f-f0a9-82ed-0609addcadb2@huawei.com> (raw)
In-Reply-To: <20210412072856.2046-1-hdanton@sina.com>

On 2021/4/12 15:28, Hillf Danton wrote:
> On Mon, 12 Apr 2021 11:37:24 Yunsheng Lin wrote:
>> On 2021/4/12 11:21, Hillf Danton wrote:
>>> On Mon, 12 Apr 2021 09:24:30  Yunsheng Lin wrote:
>>>> On 2021/4/9 17:09, Hillf Danton wrote:
>>>>> On Fri, 9 Apr 2021 07:31:03  Juergen Gross wrote:
>>>>>> On 25.03.21 04:13, Yunsheng Lin wrote:
>>>>>> I have a setup which is able to reproduce the issue quite reliably:
>>>>>>
>>>>>> In a Xen guest I'm mounting 8 NFS shares and run sysbench fileio on
>>>>>> each of them. The average latency reported by sysbench is well below
>>>>>> 1 msec, but at least once per hour I get latencies in the minute
>>>>>> range.
>>>>>>
>>>>>> With this patch I don't see these high latencies any longer (test
>>>>>> is running for more than 20 hours now).
>>>>>>
>>>>>> So you can add my:
>>>>>>
>>>>>> Tested-by: Juergen Gross <jgross@suse.com>
>>>>>>
>>>>>
>>>>> If retry is allowed in the dequeue method then a simple seqcount can do the
>>>>> work of serializing enqueuer and dequeuer. IIUC it was not attempted last year.
>>>>
>>>> At the first glance, I do not think the below patch fix the data race
>>>
>>> Thanks for taking a look.
>>>
>>>> described in the commit log, as it does not handle the time window
>>>> between dequeuing and q->seqlock releasing, as below:
>>>>
>>> Yes the time window does exist.
>>>
>>>> The cpu1 may not see the qdisc->pad changed after pfifo_fast_dequeue(),
>>>> and cpu2 is not able to take the q->seqlock yet because cpu1 do not
>>>> release the q->seqlock.
>>>>
>>> It's now covered by extending the seqcount aperture a bit.
>>>
>>> --- x/net/sched/sch_generic.c
>>> +++ y/net/sched/sch_generic.c
>>> @@ -380,14 +380,23 @@ void __qdisc_run(struct Qdisc *q)
>>>  {
>>>  	int quota = dev_tx_weight;
>>>  	int packets;
>>> +	int seq;
>>> +
>>> +again:
>>> +	seq = READ_ONCE(q->pad);
>>> +	smp_rmb();
>>>  
>>>  	while (qdisc_restart(q, &packets)) {
>>>  		quota -= packets;
>>>  		if (quota <= 0) {
>>>  			__netif_schedule(q);
>>> -			break;
>>> +			return;
>>>  		}
>>>  	}
>>> +
>>> +	smp_rmb();
>>> +	if (seq != READ_ONCE(q->pad))
>>> +		goto again;
>>
>> As my understanding, there is still time window between q->pad checking
>> above and q->seqlock releasing in qdisc_run_end().
>>
> Then extend the cover across q->seqlock on top of the flag you added.

Yes, the below patch seems to fix the data race described in
the commit log.
Then what is the difference between my patch and your patch below:)

> 
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -36,6 +36,7 @@ struct qdisc_rate_table {
>  enum qdisc_state_t {
>  	__QDISC_STATE_SCHED,
>  	__QDISC_STATE_DEACTIVATED,
> +	__QDISC_STATE_NEED_RESCHEDULE,
>  };
>  
>  struct qdisc_size_table {
> @@ -176,8 +177,13 @@ static inline bool qdisc_run_begin(struc
>  static inline void qdisc_run_end(struct Qdisc *qdisc)
>  {
>  	write_seqcount_end(&qdisc->running);
> -	if (qdisc->flags & TCQ_F_NOLOCK)
> +	if (qdisc->flags & TCQ_F_NOLOCK) {
>  		spin_unlock(&qdisc->seqlock);
> +
> +		if (test_and_clear_bit(__QDISC_STATE_NEED_RESCHEDULE,
> +							&qdisc->state))
> +			__netif_schedule(qdisc);
> +	}
>  }
>  
>  static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -381,13 +381,21 @@ void __qdisc_run(struct Qdisc *q)
>  	int quota = dev_tx_weight;
>  	int packets;
>  
> +	if (q->flags & TCQ_F_NOLOCK)
> +		clear_bit(__QDISC_STATE_NEED_RESCHEDULE, &q->state);
> +again:
>  	while (qdisc_restart(q, &packets)) {
>  		quota -= packets;
>  		if (quota <= 0) {
>  			__netif_schedule(q);
> -			break;
> +			return;
>  		}
>  	}
> +
> +	if (q->flags & TCQ_F_NOLOCK)
> +		if (test_and_clear_bit(__QDISC_STATE_NEED_RESCHEDULE,
> +					&q->state))
> +			goto again;
>  }
>  
>  unsigned long dev_trans_start(struct net_device *dev)
> @@ -632,6 +640,9 @@ static int pfifo_fast_enqueue(struct sk_
>  			return qdisc_drop(skb, qdisc, to_free);
>  	}
>  
> +	if (qdisc->flags & TCQ_F_NOLOCK)
> +		set_bit(__QDISC_STATE_NEED_RESCHEDULE, &qdisc->state);

Doing set_bit() in pfifo_fast_enqueue() unconditionally does not
seems to be performance friendly, because it requires exclusive access
to the cache line of qdisc->state.
Perhaps do some performance test?


> +
>  	qdisc_update_stats_at_enqueue(qdisc, pkt_len);
>  	return NET_XMIT_SUCCESS;
>  }
> 
> .
> 


  parent reply	other threads:[~2021-04-12 12:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25  3:13 [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc Yunsheng Lin
2021-04-09  5:31 ` Juergen Gross
2021-04-12  1:04   ` Yunsheng Lin
2021-04-12  8:21     ` Juergen Gross
     [not found] ` <20210409090909.1767-1-hdanton@sina.com>
2021-04-12  1:24   ` Yunsheng Lin
     [not found]   ` <20210412032111.1887-1-hdanton@sina.com>
2021-04-12  3:37     ` Yunsheng Lin
     [not found]     ` <20210412072856.2046-1-hdanton@sina.com>
2021-04-12 12:00       ` Yunsheng Lin [this message]
     [not found]       ` <20210413022129.2203-1-hdanton@sina.com>
2021-04-13  2:56         ` Yunsheng Lin
     [not found]         ` <20210413032620.2259-1-hdanton@sina.com>
2021-04-13  3:34           ` Yunsheng Lin
     [not found]           ` <20210413071241.2325-1-hdanton@sina.com>
2021-04-13  7:57             ` Yunsheng Lin
     [not found]             ` <20210413083352.2424-1-hdanton@sina.com>
2021-04-13  9:03               ` Yunsheng Lin
2021-04-13  9:15                 ` Juergen Gross
2021-04-16 11:46                   ` Jiri Kosina
2021-04-18 22:59 ` Michal Kubecek
2021-04-19  2:04   ` Yunsheng Lin
2021-04-19 12:21     ` [Linuxarm] " Yunsheng Lin
2021-04-19 15:25       ` Michal Kubecek
2021-04-19 14:57     ` Michal Kubecek
2021-04-20  1:45       ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7b8a391-0b2f-f0a9-82ed-0609addcadb2@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=JKosina@suse.com \
    --cc=hdanton@sina.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).