linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Myklebust, Trond" <Trond.Myklebust@netapp.com>,
	linux-fsdevel@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: write-behind on streaming writes
Date: Thu, 7 Jun 2012 11:45:04 +0200	[thread overview]
Message-ID: <20120607094504.GB25074@quack.suse.cz> (raw)
In-Reply-To: <20120606170428.GB8133@redhat.com>

On Wed 06-06-12 13:04:28, Vivek Goyal wrote:
> On Wed, Jun 06, 2012 at 10:00:58PM +0800, Fengguang Wu wrote:
> > On Wed, Jun 06, 2012 at 08:14:08AM -0400, Vivek Goyal wrote:
> > > On Tue, Jun 05, 2012 at 08:14:08PM -0700, Linus Torvalds wrote:
> > > > On Tue, Jun 5, 2012 at 7:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > I had expected a bigger difference as sync_file_range() is just driving
> > > > > max queue depth of 32 (total 16MB IO in flight), while flushers are
> > > > > driving queue depths up to 140 or so. So in this paritcular test, driving
> > > > > much deeper queue depths is not really helping much. (I have seen higher
> > > > > throughputs with higher queue depths in the past. Now sure why don't we
> > > > > see it here).
> > > > 
> > > > How did interactivity feel?
> > > > 
> > > > Because quite frankly, if the throughput difference is 12.5 vs 12
> > > > seconds, I suspect the interactivity thing is what dominates.
> > > > 
> > > > And from my memory of the interactivity different was absolutely
> > > > *huge*. Even back when I used rotational media, I basically couldn't
> > > > even notice the background write with the sync_file_range() approach.
> > > > While the regular writeback without the writebehind had absolutely
> > > > *huge* pauses if you used something like firefox that uses fsync()
> > > > etc. And starting new applications that weren't cached was noticeably
> > > > worse too - and then with sync_file_range it wasn't even all that
> > > > noticeable.
> > > > 
> > > > NOTE! For the real "firefox + fsync" test, I suspect you'd need to do
> > > > the writeback on the same filesystem (and obviously disk) as your home
> > > > directory is. If the big write is to another filesystem and another
> > > > disk, I think you won't see the same issues.
> > > 
> > > Ok, I did following test on my single SATA disk and my root filesystem
> > > is on this disk.
> > > 
> > > I dropped caches and launched firefox and monitored the time it takes
> > > for firefox to start. (cache cold).
> > > 
> > > And my results are reverse of what you have been seeing. With
> > > sync_file_range() running, firefox takes roughly 30 seconds to start and
> > > with flusher in operation, it takes roughly 20 seconds to start. (I have
> > > approximated the average of 3 runs for simplicity).
> > > 
> > > I think it is happening because sync_file_range() will send all
> > > the writes as SYNC and it will compete with firefox IO. On the other
> > > hand, flusher's IO will show up as ASYNC and CFQ  will be penalize it
> > > heavily and firefox's IO will be prioritized. And this effect should
> > > just get worse as more processes do sync_file_range().
> > > 
> > > So write-behind should provide better interactivity if writes submitted
> > > are ASYNC and not SYNC.
> > 
> > Hi Vivek, thanks for testing all of these out! The result is
> > definitely interesting and a surprise: we overlooked the SYNC nature
> > of sync_file_range().
> > 
> > I'd suggest to use these calls to achieve the write-and-drop-behind
> > behavior, *with* WB_SYNC_NONE:
> > 
> >         posix_fadvise(fd, offset, len, POSIX_FADV_DONTNEED);
> >         sync_file_range(fd, offset, len, SYNC_FILE_RANGE_WAIT_AFTER);
> > 
> > The caveat is, the below bdi_write_congested() will never evaluate to
> > true since we are only filling the request queue with 8MB data.
> > 
> > SYSCALL_DEFINE(fadvise64_64):
> > 
> >         case POSIX_FADV_DONTNEED:
> >                 if (!bdi_write_congested(mapping->backing_dev_info))
> >                         __filemap_fdatawrite_range(mapping, offset, endbyte,
> >                                                    WB_SYNC_NONE);
> 
> Hi Fengguang,
> 
> Instead of above, I modified sync_file_range() to call
> __filemap_fdatawrite_range(WB_SYNC_NONE) and I do see now ASYNC writes
> showing up at elevator.
> 
> With 4 processes doing sync_file_range() now, firefox start time test
> clocks around 18-19 seconds which is better than 30-35 seconds of 4
> processes doing buffered writes. And system looks pretty good from
> interactivity point of view.
  So do you have any idea why is that? Do we drive shallower queues? Also
how does speed of the writers compare to the speed with normal buffered
writes + fsync (you'd need fsync for sync_file_range writers as well to
make comparison fair)?

								Honza
> ---
>  fs/sync.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/fs/sync.c
> ===================================================================
> --- linux-2.6.orig/fs/sync.c	2012-06-06 00:12:33.000000000 -0400
> +++ linux-2.6/fs/sync.c	2012-06-06 23:11:17.050691776 -0400
> @@ -342,7 +342,7 @@ SYSCALL_DEFINE(sync_file_range)(int fd, 
>  	}
>  
>  	if (flags & SYNC_FILE_RANGE_WRITE) {
> -		ret = filemap_fdatawrite_range(mapping, offset, endbyte);
> +		ret = __filemap_fdatawrite_range(mapping, offset, endbyte, WB_SYNC_NONE);
>  		if (ret < 0)
>  			goto out_put;
>  	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2012-06-07  9:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-28 11:41 [GIT PULL] writeback changes for 3.5-rc1 Fengguang Wu
2012-05-28 17:09 ` Linus Torvalds
2012-05-29 15:57   ` write-behind on streaming writes Fengguang Wu
2012-05-29 17:35     ` Linus Torvalds
2012-05-30  3:21       ` Fengguang Wu
2012-06-05  1:01         ` Dave Chinner
2012-06-05 17:18           ` Vivek Goyal
2012-06-05 17:23         ` Vivek Goyal
2012-06-05 17:41           ` Vivek Goyal
2012-06-05 18:48             ` Vivek Goyal
2012-06-05 20:10               ` Vivek Goyal
2012-06-06  2:57                 ` Vivek Goyal
2012-06-06  3:14                   ` Linus Torvalds
2012-06-06 12:14                     ` Vivek Goyal
2012-06-06 14:00                       ` Fengguang Wu
2012-06-06 17:04                         ` Vivek Goyal
2012-06-07  9:45                           ` Jan Kara [this message]
2012-06-07 19:06                             ` Vivek Goyal
2012-06-06 16:15                       ` Vivek Goyal
2012-06-06 14:08                   ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120607094504.GB25074@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=Trond.Myklebust@netapp.com \
    --cc=axboe@kernel.dk \
    --cc=fengguang.wu@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).