linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>, <linux-kernel@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-block@vger.kernel.org>,
	<dchinner@redhat.com>, <sedat.dilek@gmail.com>
Subject: Re: [PATCHSET v5] Make background writeback great again for the first time
Date: Tue, 3 May 2016 09:42:40 -0400	[thread overview]
Message-ID: <20160503134240.6p65qehgv5uruxt4@floor.masoncoding.com> (raw)
In-Reply-To: <20160503130609.GB25436@quack2.suse.cz>

On Tue, May 03, 2016 at 03:06:09PM +0200, Jan Kara wrote:
> On Tue 03-05-16 08:40:11, Chris Mason wrote:
> > On Tue, May 03, 2016 at 02:17:19PM +0200, Jan Kara wrote:
> > > On Thu 28-04-16 12:46:41, Jens Axboe wrote:
> > > > >>-	rwb->wb_max = 1 + ((depth - 1) >> min(31U, rwb->scale_step));
> > > > >>-	rwb->wb_normal = (rwb->wb_max + 1) / 2;
> > > > >>-	rwb->wb_background = (rwb->wb_max + 3) / 4;
> > > > >>+	if (rwb->queue_depth == 1) {
> > > > >>+		rwb->wb_max = rwb->wb_normal = 2;
> > > > >>+		rwb->wb_background = 1;
> > > > >
> > > > >This breaks the detection of too big scale_step in scale_up() where we key
> > > > >of wb_max == 1 value. However even with that fixed no luck :(:
> > > > 
> > > > Yeah, I need to look at that. For QD=1, I think the only sensible values for
> > > > max/normal/bg is 2/2/1 and 1/1/1 if we step down.
> > > > 
> > > > >dd if=/dev/zero of=/mnt/file bs=1M count=10000 conv=fsync
> > > > >Runtime: 105.126 107.125 105.641
> > > > >
> > > > >So about the same as before. I'll try to debug this later today...
> > > > 
> > > > Thanks, I'm very interested in what you find!
> > > 
> > > OK, so the reason was relatively standard in the end. I was using ext3 (or
> > > more exactly ext4 without delayed allocation) for the test. The throttling
> > > of background writes gave more priority to writes from the journalling
> > > thread which happen with WRITE_SYNC and thus are not throttled. Thus the
> > > journalling thread ended up having to do more data writeback to be able to
> > > commit a transaction (due to requirements of data=ordered mode) and it is
> > > less efficient at that than the normal flusher thread.
> > > 
> > > So this is an example where throttling background writeback effectively
> > > just pushes more work into another context which does it less efficiently
> > > and indirectly makes everyone wait for it. ext3 has been always sensitive to
> > > issues like this. ext4 is using delayed allocation and thus only data
> > > writes into holes end up being part of a transaction -> simple dd test case
> > > doesn't hit that path. And indeed when I repeat the same test with ext4,
> > > the numbers with and without your patch are exactly the same.
> > > 
> > > The question remains how common a pattern where throttling of background
> > > writeback delays also something else is. I'll schedule a couple of
> > > benchmarks to measure impact of your patches for a wider range of workloads
> > > (but sadly pretty limited set of hw). If ext3 is the only one seeing
> > > issues, I would be willing to accept that ext3 takes the hit since it is
> > > doing something rather stupid (but inherent in its journal design) and we
> > > have a way to deal with this either by enabling delayed allocation or by
> > > turning off the writeback throttling...
> > 
> > At least in the case of io that we know is going to be data=ordered, we
> > can bump the prio of those pages?
> 
> But how would flusher thread, which is submitting IO, know that? We would
> have to somehow mark inodes that are part of the running transaction and
> flusher thread could give more priority to such writeback - e.g. by using
> WRITE_SYNC or at least plain writes. Hmm, if we use an inode flag for that,
> it could be doable.

This would be specific to the data=ordered code in the FS.  If there's
some way to test for an inode or a page's status in the data=ordered
list, the FS writepages call could flag the IO as higher prio?

-chris

  reply	other threads:[~2016-05-03 13:43 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-26 15:55 [PATCHSET v5] Make background writeback great again for the first time Jens Axboe
2016-04-26 15:55 ` [PATCH 1/8] block: add WRITE_BG Jens Axboe
2016-04-26 15:55 ` [PATCH 2/8] writeback: add wbc_to_write_cmd() Jens Axboe
2016-04-26 15:55 ` [PATCH 3/8] writeback: use WRITE_BG for kupdate and background writeback Jens Axboe
2016-04-26 15:55 ` [PATCH 4/8] writeback: track if we're sleeping on progress in balance_dirty_pages() Jens Axboe
2016-04-26 15:55 ` [PATCH 5/8] block: add code to track actual device queue depth Jens Axboe
2016-04-26 15:55 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe
2016-05-05  7:52   ` Ming Lei
2016-04-26 15:55 ` [PATCH 7/8] wbt: add general throttling mechanism Jens Axboe
2016-04-27 12:06   ` xiakaixu
2016-04-27 15:21     ` Jens Axboe
2016-04-28  3:29       ` xiakaixu
2016-04-28 11:05   ` Jan Kara
2016-04-28 18:53     ` Jens Axboe
2016-04-28 19:03       ` Jens Axboe
2016-05-03  9:34       ` Jan Kara
2016-05-03 14:23         ` Jens Axboe
2016-05-03 15:22           ` Jan Kara
2016-05-03 15:32             ` Jens Axboe
2016-05-03 15:40         ` Jan Kara
2016-05-03 15:48           ` Jan Kara
2016-05-03 16:59             ` Jens Axboe
2016-05-03 18:14               ` Jens Axboe
2016-05-03 19:07                 ` Jens Axboe
2016-04-26 15:55 ` [PATCH 8/8] writeback: throttle buffered writeback Jens Axboe
2016-04-27 18:01 ` [PATCHSET v5] Make background writeback great again for the first time Jan Kara
2016-04-27 18:17   ` Jens Axboe
2016-04-27 20:37     ` Jens Axboe
2016-04-27 20:59       ` Jens Axboe
2016-04-28  4:06         ` xiakaixu
2016-04-28 18:36           ` Jens Axboe
2016-04-28 11:54         ` Jan Kara
2016-04-28 18:46           ` Jens Axboe
2016-05-03 12:17             ` Jan Kara
2016-05-03 12:40               ` Chris Mason
2016-05-03 13:06                 ` Jan Kara
2016-05-03 13:42                   ` Chris Mason [this message]
2016-05-03 13:57                     ` Jan Kara
2016-05-11 16:36               ` Jan Kara
2016-05-13 18:29                 ` Jens Axboe
2016-05-16  7:47                   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160503134240.6p65qehgv5uruxt4@floor.masoncoding.com \
    --to=clm@fb.com \
    --cc=axboe@kernel.dk \
    --cc=dchinner@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sedat.dilek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).