All of lore.kernel.org
 help / color / mirror / Atom feed
From: xiakaixu <xiakaixu@huawei.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>, <linux-kernel@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-block@vger.kernel.org>,
	<dchinner@redhat.com>, <sedat.dilek@gmail.com>,
	Huxinwei <huxinwei@huawei.com>,
	"miaoxie (A)" <miaoxie@huawei.com>,
	Bintian <bintian.wang@huawei.com>,
	Xia Kaixu <xiakaixu@huawei.com>
Subject: Re: [PATCHSET v5] Make background writeback great again for the first time
Date: Thu, 28 Apr 2016 12:06:51 +0800	[thread overview]
Message-ID: <57218C5B.2070308@huawei.com> (raw)
In-Reply-To: <20160427205915.GC25397@kernel.dk>

于 2016/4/28 4:59, Jens Axboe 写道:
> On Wed, Apr 27 2016, Jens Axboe wrote:
>> On Wed, Apr 27 2016, Jens Axboe wrote:
>>> On 04/27/2016 12:01 PM, Jan Kara wrote:
>>>> Hi,
>>>>
>>>> On Tue 26-04-16 09:55:23, Jens Axboe wrote:
>>>>> Since the dawn of time, our background buffered writeback has sucked.
>>>>> When we do background buffered writeback, it should have little impact
>>>>> on foreground activity. That's the definition of background activity...
>>>>> But for as long as I can remember, heavy buffered writers have not
>>>>> behaved like that. For instance, if I do something like this:
>>>>>
>>>>> $ dd if=/dev/zero of=foo bs=1M count=10k
>>>>>
>>>>> on my laptop, and then try and start chrome, it basically won't start
>>>>> before the buffered writeback is done. Or, for server oriented
>>>>> workloads, where installation of a big RPM (or similar) adversely
>>>>> impacts database reads or sync writes. When that happens, I get people
>>>>> yelling at me.
>>>>>
>>>>> I have posted plenty of results previously, I'll keep it shorter
>>>>> this time. Here's a run on my laptop, using read-to-pipe-async for
>>>>> reading a 5g file, and rewriting it. You can find this test program
>>>>> in the fio git repo.
>>>>
>>>> I have tested your patchset on my test system. Generally I have observed
>>>> noticeable drop in average throughput for heavy background writes without
>>>> any other disk activity and also somewhat increased variance in the
>>>> runtimes. It is most visible on this simple testcases:
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000
>>>>
>>>> and
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
>>>>
>>>> The machine has 4GB of ram, /mnt is an ext3 filesystem that is freshly
>>>> created before each dd run on a dedicated disk.
>>>>
>>>> Without your patches I get pretty stable dd runtimes for both cases:
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000
>>>> Runtimes: 87.9611 87.3279 87.2554
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
>>>> Runtimes: 93.3502 93.2086 93.541
>>>>
>>>> With your patches the numbers look like:
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000
>>>> Runtimes: 108.183, 97.184, 99.9587
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
>>>> Runtimes: 104.9, 102.775, 102.892
>>>>
>>>> I have checked whether the variance is due to some interaction with CFQ
>>>> which is used for the disk. When I switched the disk to deadline, I still
>>>> get some variance although, the throughput is still ~10% lower:
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000
>>>> Runtimes: 100.417 100.643 100.866
>>>>
>>>> dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
>>>> Runtimes: 104.208 106.341 105.483
>>>>
>>>> The disk is rotational SATA drive with writeback cache, queue depth of the
>>>> disk reported in /sys/block/sdb/device/queue_depth is 1.
>>>>
>>>> So I think we still need some tweaking on the low end of the storage
>>>> spectrum so that we don't lose 10% of throughput for simple cases like
>>>> this.
>>>
>>> Thanks for testing, Jan! I haven't tried old QD=1 SATA. I wonder if
>>> you are seeing smaller requests, and that is why it both varies and
>>> you get lower throughput? I'll try and setup a test here similar to
>>> yours.
>>
>> Jan, care to try the below patch? I can't fully reproduce your issue on
>> a SCSI disk limited to QD=1, but I have a feeling this might help. It's
>> a bit of a hack, but the general idea is to allow one more request to
>> build up for QD=1 devices. That eliminates wait time between one request
>> finishing, and the next being submitted.
> 
> That accidentally added a potentially stall, this one is both cleaner
> and should have that fixed.
> 
> diff --git a/lib/wbt.c b/lib/wbt.c
> index 650da911f24f..322f5e04e994 100644
> --- a/lib/wbt.c
> +++ b/lib/wbt.c
> @@ -98,18 +98,23 @@ void __wbt_done(struct rq_wb *rwb)
>  	else
>  		limit = rwb->wb_normal;
Hi Jens,

This statement 'limit = rwb->wb_normal' is executed twice, maybe once is
enough. It is not a big deal anyway :)


Another question about this if branch:

   if (rwb->wc && !atomic_read(&rwb->bdi->wb.dirty_sleeping))
	limit = 0;

I can't follow the logic of this if branch. why set limit equal to 0
when the device supports write back caches and there are tasks being
limited in balance_dirty_pages(). Could you pelase give more info
about this ?  Thanks!
>  
> +	inflight = atomic_dec_return(&rwb->inflight);
> +
>  	/*
> -	 * Don't wake anyone up if we are above the normal limit. If
> -	 * throttling got disabled (limit == 0) with waiters, ensure
> -	 * that we wake them up.
> +	 * wbt got disabled with IO in flight. Wake up any potential
> +	 * waiters, we don't have to do more than that.
>  	 */
> -	inflight = atomic_dec_return(&rwb->inflight);
> -	if (limit && inflight >= limit) {
> -		if (!rwb->wb_max)
> -			wake_up_all(&rwb->wait);
> +	if (!rwb_enabled(rwb)) {
> +		wake_up_all(&rwb->wait);
>  		return;
>  	}

Maybe it is better that executing this if branch earlier. So we can wake up
potential waiters in time when wbt got disabled.
>  
> +	/*
> +	 * Don't wake anyone up if we are above the normal limit.
> +	 */
> +	if (inflight && inflight >= limit)
> +		return;
> +
>  	if (waitqueue_active(&rwb->wait)) {
>  		int diff = limit - inflight;
>  
> @@ -150,14 +155,26 @@ static void calc_wb_limits(struct rq_wb *rwb)
>  		return;
>  	}
>  
> -	depth = min_t(unsigned int, RWB_MAX_DEPTH, rwb->queue_depth);
> -
>  	/*
> -	 * Reduce max depth by 50%, and re-calculate normal/bg based on that
> +	 * For QD=1 devices, this is a special case. It's important for those
> +	 * to have one request ready when one completes, so force a depth of
> +	 * 2 for those devices. On the backend, it'll be a depth of 1 anyway,
> +	 * since the device can't have more than that in flight.
>  	 */
> -	rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step));
> -	rwb->wb_normal = (rwb->wb_max + 1) / 2;
> -	rwb->wb_background = (rwb->wb_max + 3) / 4;
> +	if (rwb->queue_depth == 1) {
> +		rwb->wb_max = rwb->wb_normal = 2;
> +		rwb->wb_background = 1;
> +	} else {
> +		depth = min_t(unsigned int, RWB_MAX_DEPTH, rwb->queue_depth);
> +
> +		/*
> +		 * Reduce max depth by 50%, and re-calculate normal/bg based on
> +		 * that.
> +		 */
> +		rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step));
> +		rwb->wb_normal = (rwb->wb_max + 1) / 2;
> +		rwb->wb_background = (rwb->wb_max + 3) / 4;
> +	}
>  }
>  
>  static bool inline stat_sample_valid(struct blk_rq_stat *stat)
> 


-- 
Regards
Kaixu Xia


  reply	other threads:[~2016-04-28  4:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-26 15:55 [PATCHSET v5] Make background writeback great again for the first time Jens Axboe
2016-04-26 15:55 ` [PATCH 1/8] block: add WRITE_BG Jens Axboe
2016-04-26 15:55 ` [PATCH 2/8] writeback: add wbc_to_write_cmd() Jens Axboe
2016-04-26 15:55 ` [PATCH 3/8] writeback: use WRITE_BG for kupdate and background writeback Jens Axboe
2016-04-26 15:55 ` [PATCH 4/8] writeback: track if we're sleeping on progress in balance_dirty_pages() Jens Axboe
2016-04-26 15:55 ` [PATCH 5/8] block: add code to track actual device queue depth Jens Axboe
2016-04-26 15:55 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe
2016-05-05  7:52   ` Ming Lei
2016-04-26 15:55 ` [PATCH 7/8] wbt: add general throttling mechanism Jens Axboe
2016-04-27 12:06   ` xiakaixu
2016-04-27 15:21     ` Jens Axboe
2016-04-28  3:29       ` xiakaixu
2016-04-28 11:05   ` Jan Kara
2016-04-28 18:53     ` Jens Axboe
2016-04-28 19:03       ` Jens Axboe
2016-05-03  9:34       ` Jan Kara
2016-05-03 14:23         ` Jens Axboe
2016-05-03 15:22           ` Jan Kara
2016-05-03 15:32             ` Jens Axboe
2016-05-03 15:40         ` Jan Kara
2016-05-03 15:48           ` Jan Kara
2016-05-03 16:59             ` Jens Axboe
2016-05-03 18:14               ` Jens Axboe
2016-05-03 19:07                 ` Jens Axboe
2016-04-26 15:55 ` [PATCH 8/8] writeback: throttle buffered writeback Jens Axboe
2016-04-27 18:01 ` [PATCHSET v5] Make background writeback great again for the first time Jan Kara
2016-04-27 18:17   ` Jens Axboe
2016-04-27 20:37     ` Jens Axboe
2016-04-27 20:59       ` Jens Axboe
2016-04-28  4:06         ` xiakaixu [this message]
2016-04-28 18:36           ` Jens Axboe
2016-04-28 11:54         ` Jan Kara
2016-04-28 18:46           ` Jens Axboe
2016-05-03 12:17             ` Jan Kara
2016-05-03 12:40               ` Chris Mason
2016-05-03 13:06                 ` Jan Kara
2016-05-03 13:42                   ` Chris Mason
2016-05-03 13:57                     ` Jan Kara
2016-05-11 16:36               ` Jan Kara
2016-05-13 18:29                 ` Jens Axboe
2016-05-16  7:47                   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57218C5B.2070308@huawei.com \
    --to=xiakaixu@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=bintian.wang@huawei.com \
    --cc=dchinner@redhat.com \
    --cc=huxinwei@huawei.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miaoxie@huawei.com \
    --cc=sedat.dilek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.