All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Theodore Tso <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"richard@rsk.demon.co.uk" <richard@rsk.demon.co.uk>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>
Subject: Re: regression in page writeback
Date: Tue, 6 Oct 2009 21:18:40 +0800	[thread overview]
Message-ID: <20091006131840.GA14111@localhost> (raw)
In-Reply-To: <20091006125519.GB22781@duck.suse.cz>

On Tue, Oct 06, 2009 at 08:55:19PM +0800, Jan Kara wrote:
> On Fri 02-10-09 11:27:14, Wu Fengguang wrote:
> > On Fri, Oct 02, 2009 at 06:17:39AM +0800, Jan Kara wrote:
> > > On Wed 30-09-09 13:32:23, Wu Fengguang wrote:
> > > > writeback: bump up writeback chunk size to 128MB
> > > > 
> > > > Adjust the writeback call stack to support larger writeback chunk size.
> > > > 
> > > > - make wbc.nr_to_write a per-file parameter
> > > > - init wbc.nr_to_write with MAX_WRITEBACK_PAGES=128MB
> > > >   (proposed by Ted)
> > > > - add wbc.nr_segments to limit seeks inside sparsely dirtied file
> > > >   (proposed by Chris)
> > > > - add wbc.timeout which will be used to control IO submission time
> > > >   either per-file or globally.
> > > >   
> > > > The wbc.nr_segments is now determined purely by logical page index
> > > > distance: if two pages are 1MB apart, it makes a new segment.
> > > > 
> > > > Filesystems could do this better with real extent knowledges.
> > > > One possible scheme is to record the previous page index in
> > > > wbc.writeback_index, and let ->writepage compare if the current and
> > > > previous pages lie in the same extent, and decrease wbc.nr_segments
> > > > accordingly. Care should taken to avoid double decreases in writepage
> > > > and write_cache_pages.
> > > > 
> > > > The wbc.timeout (when used per-file) is mainly a safeguard against slow
> > > > devices, which may take too long time to sync 128MB data.
> > > > 
> > > > The wbc.timeout (when used globally) could be useful when we decide to
> > > > do two sync scans on dirty pages and dirty metadata. XFS could say:
> > > > please return to sync dirty metadata after 10s. Would need another
> > > > b_io_metadata queue, but that's possible.
> > > > 
> > > > This work depends on the balance_dirty_pages() wait queue patch.
> > >   I don't know, I think it gets too complicated... I'd either use the
> > > segments idea or the timeout idea but not both (unless you can find real
> > > world tests in which both help).
>   I'm sorry for a delayed reply but I had to work on something else.
> 
> > Maybe complicated, but nr_segments and timeout each has their target
> > application.  nr_segments serves two major purposes:
> > - fairness between two large files, one is continuously dirtied,
> >   another is sparsely dirtied. Given the same amount of dirty pages,
> >   it could take vastly different time to sync them to the _same_
> >   device. The nr_segments check helps to favor continuous data.
> > - avoid seeks/fragmentations. To give each file fair chance of
> >   writeback, we have to abort a file when some nr_to_write or timeout
> >   is reached. However they are both not good abort conditions.
> >   The best is for filesystem to abort earlier in seek boundaries,
> >   and treat nr_to_write/timeout as large enough bottom lines.
> > timeout is mainly a safeguard in case nr_to_write is too large for
> > slow devices. It is not necessary if nr_to_write is auto-computed,
> > however timeout in itself serves as a simple throughput adapting
> > scheme.
>   I understand why you have introduced both segments and timeout value
> and a completely agree with your reasons to introduce them. I just think
> that when the system gets too complex (there will be several independent
> methods of determining when writeback should be terminated, and even
> though each method is simple on its own, their interactions needn't be
> simple...) it will be hard to debug all the corner cases - even more
> because they will manifest "just" by slow or unfair writeback. So I'd

I definitely agree on the complications. There are some known issues
as well as possibly some corner cases to be discovered. One problem I
noticed now is, what if all the files are sparsely dirtied? Then
a small nr_segments can only hurt.  Another problem is, the block
device file tend to have sparsely dirtied pages (with metadata on
them).  Not sure how to detect/handle such conditions..

> prefer a single metric to determine when to stop writeback of an inode
> even though it might be a bit more complicated.
>   For example terminating on writeout does not really get a file fair
> chance of writeback because it might have been blocked just because we were
> writing some heavily fragmented file just before. And your nr_segments

You mean timeout? I've dropped that idea in favor of an nr_to_write
adaptive to the bdi write speed :)

> check is just a rough guess of whether a writeback is going to be
> fragmented or not.

It could be made accurate if btrfs decreases it in its own writepages,
based on the extent info. Should also be possible for ext4.

>   So I'd rather implement in mpage_ functions a proper detection of how
> fragmented the writeback is and give each inode a limit on number of
> fragments which mpage_ functions would obey. We could even use a queue's
> NONROT flag (set for solid state disks) to detect whether we should expect
> higher or lower seek times.

Yes, mpage_* can also utilize nr_segments.

Anyway nr_segments is not perfect, I'll post a patch and let fs
developers decide whether it is convenient/useful :) 

Thanks,
Fengguang

  reply	other threads:[~2009-10-06 13:19 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22  5:49 regression in page writeback Shaohua Li
2009-09-22  6:40 ` Peter Zijlstra
2009-09-22  8:05   ` Wu Fengguang
2009-09-22  8:09     ` Peter Zijlstra
2009-09-22  8:24       ` Wu Fengguang
2009-09-22  8:32         ` Peter Zijlstra
2009-09-22  8:51           ` Wu Fengguang
2009-09-22  8:52           ` Richard Kennedy
2009-09-22  9:05             ` Wu Fengguang
2009-09-22 11:41               ` Shaohua Li
2009-09-22 15:52           ` Chris Mason
2009-09-23  0:22             ` Wu Fengguang
2009-09-23  0:54               ` Andrew Morton
2009-09-23  1:17                 ` Wu Fengguang
2009-09-23  1:27                   ` Wu Fengguang
2009-09-23  1:28                   ` Andrew Morton
2009-09-23  1:32                     ` Wu Fengguang
2009-09-23  1:47                       ` Andrew Morton
2009-09-23  2:01                         ` Wu Fengguang
2009-09-23  2:09                           ` Andrew Morton
2009-09-23  3:07                             ` Wu Fengguang
2009-09-23  1:45                     ` Wu Fengguang
2009-09-23  1:59                       ` Andrew Morton
2009-09-23  2:26                         ` Wu Fengguang
2009-09-23  2:36                           ` Andrew Morton
2009-09-23  2:49                             ` Wu Fengguang
2009-09-23  2:56                               ` Andrew Morton
2009-09-23  3:11                                 ` Wu Fengguang
2009-09-23  3:10                               ` Shaohua Li
2009-09-23  3:14                                 ` Wu Fengguang
2009-09-23  3:25                                   ` Wu Fengguang
2009-09-23 14:00                             ` Chris Mason
2009-09-24  3:15                               ` Wu Fengguang
2009-09-24 12:10                                 ` Chris Mason
2009-09-25  3:26                                   ` Wu Fengguang
2009-09-25  0:11                                 ` Dave Chinner
2009-09-25  0:38                                   ` Chris Mason
2009-09-25  5:04                                     ` Dave Chinner
2009-09-25  6:45                                       ` Wu Fengguang
2009-09-28  1:07                                         ` Dave Chinner
2009-09-28  7:15                                           ` Wu Fengguang
2009-09-28 13:08                                             ` Christoph Hellwig
2009-09-28 14:07                                               ` Theodore Tso
2009-09-30  5:26                                                 ` Wu Fengguang
2009-09-30  5:32                                                   ` Wu Fengguang
2009-10-01 22:17                                                     ` Jan Kara
2009-10-02  3:27                                                       ` Wu Fengguang
2009-10-06 12:55                                                         ` Jan Kara
2009-10-06 13:18                                                           ` Wu Fengguang [this message]
2009-09-30 14:11                                                   ` Theodore Tso
2009-10-01 15:14                                                     ` Wu Fengguang
2009-10-01 21:54                                                       ` Theodore Tso
2009-10-02  2:55                                                         ` Wu Fengguang
2009-10-02  8:19                                                           ` Wu Fengguang
2009-10-02 17:26                                                             ` Theodore Tso
2009-10-03  6:10                                                               ` Wu Fengguang
2009-09-29  2:32                                               ` Wu Fengguang
2009-09-29 14:00                                                 ` Chris Mason
2009-09-29 14:21                                                 ` Christoph Hellwig
2009-09-29  0:15                                             ` Wu Fengguang
2009-09-28 14:25                                           ` Chris Mason
2009-09-29 23:39                                             ` Dave Chinner
2009-09-30  1:30                                               ` Wu Fengguang
2009-09-25 12:06                                       ` Chris Mason
2009-09-25  3:19                                   ` Wu Fengguang
2009-09-26  1:47                                     ` Dave Chinner
2009-09-26  3:02                                       ` Wu Fengguang
2009-09-26  3:02                                         ` Wu Fengguang
2009-09-23  9:19                         ` Richard Kennedy
2009-09-23  9:23                           ` Peter Zijlstra
2009-09-23  9:37                             ` Wu Fengguang
2009-09-23 10:30                               ` Wu Fengguang
2009-09-23  6:41             ` Shaohua Li
2009-09-22 10:49 ` Wu Fengguang
2009-09-22 11:50   ` Shaohua Li
2009-09-22 13:39     ` Wu Fengguang
2009-09-23  1:52       ` Shaohua Li
2009-09-23  4:00         ` Wu Fengguang
2009-09-25  6:14           ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091006131840.GA14111@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=shaohua.li@intel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.