All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: "Li, Shaohua" <shaohua.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"richard@rsk.demon.co.uk" <richard@rsk.demon.co.uk>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>
Subject: Re: regression in page writeback
Date: Wed, 23 Sep 2009 11:25:08 +0800	[thread overview]
Message-ID: <20090923032508.GA28860@localhost> (raw)
In-Reply-To: <20090923031450.GC26530@localhost>

On Wed, Sep 23, 2009 at 11:14:50AM +0800, Wu Fengguang wrote:
> On Wed, Sep 23, 2009 at 11:10:12AM +0800, Li, Shaohua wrote:
> > On Wed, Sep 23, 2009 at 10:49:58AM +0800, Wu, Fengguang wrote:
> > > On Wed, Sep 23, 2009 at 10:36:22AM +0800, Andrew Morton wrote:
> > > > On Wed, 23 Sep 2009 10:26:22 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > > 
> > > > > On Wed, Sep 23, 2009 at 09:59:41AM +0800, Andrew Morton wrote:
> > > > > > On Wed, 23 Sep 2009 09:45:00 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > > > > 
> > > > > > > On Wed, Sep 23, 2009 at 09:28:32AM +0800, Andrew Morton wrote:
> > > > > > > > On Wed, 23 Sep 2009 09:17:58 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > > > > > > 
> > > > > > > > > On Wed, Sep 23, 2009 at 08:54:52AM +0800, Andrew Morton wrote:
> > > > > > > > > > On Wed, 23 Sep 2009 08:22:20 +0800 Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > > Jens' per-bdi writeback has another improvement. In 2.6.31, when
> > > > > > > > > > > superblocks A and B both have 100000 dirty pages, it will first
> > > > > > > > > > > exhaust A's 100000 dirty pages before going on to sync B's.
> > > > > > > > > > 
> > > > > > > > > > That would only be true if someone broke 2.6.31.  Did they?
> > > > > > > > > > 
> > > > > > > > > > SYSCALL_DEFINE0(sync)
> > > > > > > > > > {
> > > > > > > > > > 	wakeup_pdflush(0);
> > > > > > > > > > 	sync_filesystems(0);
> > > > > > > > > > 	sync_filesystems(1);
> > > > > > > > > > 	if (unlikely(laptop_mode))
> > > > > > > > > > 		laptop_sync_completion();
> > > > > > > > > > 	return 0;
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > the sync_filesystems(0) is supposed to non-blockingly start IO against
> > > > > > > > > > all devices.  It used to do that correctly.  But people mucked with it
> > > > > > > > > > so perhaps it no longer does.
> > > > > > > > > 
> > > > > > > > > I'm referring to writeback_inodes(). Each invocation of which (to sync
> > > > > > > > > 4MB) will do the same iteration over superblocks A => B => C ... So if
> > > > > > > > > A has dirty pages, it will always be served first.
> > > > > > > > > 
> > > > > > > > > So if wbc->bdi == NULL (which is true for kupdate/background sync), it
> > > > > > > > > will have to first exhaust A before going on to B and C.
> > > > > > > > 
> > > > > > > > But that works OK.  We fill the first device's queue, then it gets
> > > > > > > > congested and sync_sb_inodes() does nothing and we advance to the next
> > > > > > > > queue.
> > > > > > > 
> > > > > > > So in common cases "exhaust" is a bit exaggerated, but A does receive
> > > > > > > much more opportunity than B. Computation resources for IO submission
> > > > > > > are unbalanced for A, and there are pointless overheads in rechecking A.
> > > > > > 
> > > > > > That's unquantified handwaving.  One CPU can do a *lot* of IO.
> > > > > 
> > > > > Yes.. I had the impression that the writeback submission can be pretty slow.
> > > > > It should be because of the congestion_wait. Now that it is removed,
> > > > > things are going faster when queue is not full.
> > > > 
> > > > What?  The wait is short.  The design intent there is that we repoll
> > > > all previously-congested queues well before they start to run empty.
> > > 
> > > When queue is not congested (in which case congestion_wait is not
> > > necessary), the congestion_wait() degrades io submission speed to near
> > > io completion speed.
> > > 
> > > > > > > > If a device has more than a queue's worth of dirty data then we'll
> > > > > > > > probably leave some of that dirty memory un-queued, so there's some
> > > > > > > > lack of concurrency in that situation.
> > > > > > > 
> > > > > > > Good insight.
> > > > > > 
> > > > > > It was wrong.  See the other email.
> > > > > 
> > > > > No your first insight is correct. Because the (unnecessary) teeny
> > > > > sleeps is independent of the A=>B=>C traversing order. Only queue
> > > > > congestion could help skip A.
> > > > 
> > > > The sleeps are completely necessary!  Otherwise we end up busywaiting.
> > > > 
> > > > After the sleep we repoll all queues.
> > > 
> > > I mean, it is not always necessary. Only when _all_ superblocks cannot
> > > writeback their inodes (eg. all in congestion), we should wait.
> > > 
> > > Just before Jens' work, I had patch to convert
> > > 
> > > -                       if (wbc.encountered_congestion || wbc.more_io)
> > > -                               congestion_wait(WRITE, HZ/10);
> > > -                       else
> > > -                               break;
> > > 
> > > to
> > > 
> > > +       if (wbc->encountered_congestion && wbc->nr_to_write == MAX_WRITEBACK_PAGES)
> > > +               congestion_wait(WRITE, HZ/10);
> > > 
> > > Note that wbc->encountered_congestion only means "at least one bdi
> > > encountered congestion". We may still make progress in other bdis
> > > hence should not sleep.
> > Hi,
> > encountered_congestion only is checked when nr_to_write > 0, if some superblocks
> > aren't congestions, nr_to_write should be 0, right?
> 
> Yeah, good spot! So the change only helps some corner cases.

Then it remains a problem why the io submission is slow when !congested.

For example, this trace shows that the io submission speed is about

        4MB / 0.01s = 400MB/s

Workload is a plain copy on a 2Ghz Intel Core 2 CPU.

[   71.487121] mm/page-writeback.c 761 background_writeout: comm=pdflush pid=343 n=-4096
[   71.489635] global dirty=65513 writeback=5925 nfs=1 flags=_M towrite=0 skipped=0
[   71.496019] redirty_tail +442: inode 79232
[   71.497432] mm/page-writeback.c 761 background_writeout: comm=pdflush pid=343 n=-5120
[   71.498890] global dirty=64490 writeback=6700 nfs=1 flags=_M towrite=0 skipped=0
[   71.506355] redirty_tail +442: inode 79232
[   71.508473] mm/page-writeback.c 761 background_writeout: comm=pdflush pid=343 n=-6144
[   71.510538] global dirty=62475 writeback=7599 nfs=1 flags=_M towrite=0 skipped=0
[   71.511910] redirty_tail +502: inode 3438
[   71.512846] redirty_tail +502: inode 1920

Thanks,
Fengguang

  reply	other threads:[~2009-09-23  3:25 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22  5:49 regression in page writeback Shaohua Li
2009-09-22  6:40 ` Peter Zijlstra
2009-09-22  8:05   ` Wu Fengguang
2009-09-22  8:09     ` Peter Zijlstra
2009-09-22  8:24       ` Wu Fengguang
2009-09-22  8:32         ` Peter Zijlstra
2009-09-22  8:51           ` Wu Fengguang
2009-09-22  8:52           ` Richard Kennedy
2009-09-22  9:05             ` Wu Fengguang
2009-09-22 11:41               ` Shaohua Li
2009-09-22 15:52           ` Chris Mason
2009-09-23  0:22             ` Wu Fengguang
2009-09-23  0:54               ` Andrew Morton
2009-09-23  1:17                 ` Wu Fengguang
2009-09-23  1:27                   ` Wu Fengguang
2009-09-23  1:28                   ` Andrew Morton
2009-09-23  1:32                     ` Wu Fengguang
2009-09-23  1:47                       ` Andrew Morton
2009-09-23  2:01                         ` Wu Fengguang
2009-09-23  2:09                           ` Andrew Morton
2009-09-23  3:07                             ` Wu Fengguang
2009-09-23  1:45                     ` Wu Fengguang
2009-09-23  1:59                       ` Andrew Morton
2009-09-23  2:26                         ` Wu Fengguang
2009-09-23  2:36                           ` Andrew Morton
2009-09-23  2:49                             ` Wu Fengguang
2009-09-23  2:56                               ` Andrew Morton
2009-09-23  3:11                                 ` Wu Fengguang
2009-09-23  3:10                               ` Shaohua Li
2009-09-23  3:14                                 ` Wu Fengguang
2009-09-23  3:25                                   ` Wu Fengguang [this message]
2009-09-23 14:00                             ` Chris Mason
2009-09-24  3:15                               ` Wu Fengguang
2009-09-24 12:10                                 ` Chris Mason
2009-09-25  3:26                                   ` Wu Fengguang
2009-09-25  0:11                                 ` Dave Chinner
2009-09-25  0:38                                   ` Chris Mason
2009-09-25  5:04                                     ` Dave Chinner
2009-09-25  6:45                                       ` Wu Fengguang
2009-09-28  1:07                                         ` Dave Chinner
2009-09-28  7:15                                           ` Wu Fengguang
2009-09-28 13:08                                             ` Christoph Hellwig
2009-09-28 14:07                                               ` Theodore Tso
2009-09-30  5:26                                                 ` Wu Fengguang
2009-09-30  5:32                                                   ` Wu Fengguang
2009-10-01 22:17                                                     ` Jan Kara
2009-10-02  3:27                                                       ` Wu Fengguang
2009-10-06 12:55                                                         ` Jan Kara
2009-10-06 13:18                                                           ` Wu Fengguang
2009-09-30 14:11                                                   ` Theodore Tso
2009-10-01 15:14                                                     ` Wu Fengguang
2009-10-01 21:54                                                       ` Theodore Tso
2009-10-02  2:55                                                         ` Wu Fengguang
2009-10-02  8:19                                                           ` Wu Fengguang
2009-10-02 17:26                                                             ` Theodore Tso
2009-10-03  6:10                                                               ` Wu Fengguang
2009-09-29  2:32                                               ` Wu Fengguang
2009-09-29 14:00                                                 ` Chris Mason
2009-09-29 14:21                                                 ` Christoph Hellwig
2009-09-29  0:15                                             ` Wu Fengguang
2009-09-28 14:25                                           ` Chris Mason
2009-09-29 23:39                                             ` Dave Chinner
2009-09-30  1:30                                               ` Wu Fengguang
2009-09-25 12:06                                       ` Chris Mason
2009-09-25  3:19                                   ` Wu Fengguang
2009-09-26  1:47                                     ` Dave Chinner
2009-09-26  3:02                                       ` Wu Fengguang
2009-09-26  3:02                                         ` Wu Fengguang
2009-09-23  9:19                         ` Richard Kennedy
2009-09-23  9:23                           ` Peter Zijlstra
2009-09-23  9:37                             ` Wu Fengguang
2009-09-23 10:30                               ` Wu Fengguang
2009-09-23  6:41             ` Shaohua Li
2009-09-22 10:49 ` Wu Fengguang
2009-09-22 11:50   ` Shaohua Li
2009-09-22 13:39     ` Wu Fengguang
2009-09-23  1:52       ` Shaohua Li
2009-09-23  4:00         ` Wu Fengguang
2009-09-25  6:14           ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090923032508.GA28860@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.