linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: dgc@sgi.com
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [patch 3/8] per backing_dev dirty and writeback page accounting
Date: Mon, 12 Mar 2007 23:36:16 +0100	[thread overview]
Message-ID: <E1HQt7o-00042r-00@dorka.pomaz.szeredi.hu> (raw)
In-Reply-To: <20070312214405.GQ6095633@melbourne.sgi.com> (message from David Chinner on Tue, 13 Mar 2007 08:44:05 +1100)

I'll try to explain the reason for the deadlock first.

> IIUC, your problem is that there's another bdi that holds all the
> dirty pages, and this throttle loop never flushes pages from that
> other bdi and we sleep instead. It seems to me that the fundamental
> problem is that to clean the pages we need to flush both bdi's, not
> just the bdi we are directly dirtying.

This is what happens:

write fault on upper filesystem
  balance_dirty_pages
    submit write requests
  loop ...
------- fuse IPC ---------------
[fuse loopback fs thread 1]
read request
sys_write
  mutex_lock(i_mutex)
  ...
     balance_dirty_pages
        submit write requests
        loop ... write requests completed ... dirty still over limit ... 
	... loop forever

[fuse loopback fs thread 1]
read request
sys_write
  mute_lock(i_mutex) blocks

So the queue for the upper filesystem is full.  The queue for the
lower filesystem is empty.  There are no dirty pages in the lower
filesystem.

So kicking pdflush for the lower filesystem doesn't help, there's
nothing to do.  balance_dirty_pages() for the lower filesystem should
just realize that there's nothing to do and return, and then there
would be progress.

So there's there's really no need to do any accounting, just some
logic to determine that a backing dev is nearly or completely
quiescent.

And getting out of this tight situation doesn't have to be efficient.
This is probably a very rare corner case, that almost never happens in
real life, only with aggressive test tools like bash_shared_mapping.

> > OK.  How about just accounting writeback pages?  That should be much
> > less of a problem, since normally writeback is started from
> > pdflush/kupdate in large batches without any concurrency.
> 
> Except when you are throttling you bounce the cacheline around
> each cpu as it triggers foreground writeback.....

Yeah, we'd loose a bit of CPU, but not any write performance, since it
is being throttled back anyway.

> > Or is it possible to export the state of the device queue to mm?
> > E.g. could balance_dirty_pages() query the backing dev if there are
> > any outstanding write requests?
> 
> Not directly - writeback_in_progress(bdi) is a coarse measure
> indicating pdflush is active on this bdi, which implies outstanding
> write requests).

Hmm, not quite what I need.

> > > I'd call this a showstopper right now - maybe you need to look at
> > > something like the ZVC code that Christoph Lameter wrote, perhaps?
> > 
> > That's rather a heavyweight approach for this I think.
> 
> But if you want to use per-page accounting, you are going to
> need a per-cpu or per-zone set of counters on each bdi to do
> this without introducing regressions.

Yes, this is an option, but I hope for a simpler solution.

Thanks,
Miklos

  reply	other threads:[~2007-03-12 22:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-06 18:04 [patch 0/8] VFS/VM patches Miklos Szeredi
2007-03-06 18:04 ` [patch 1/8] fix race in clear_page_dirty_for_io() Miklos Szeredi
2007-03-06 22:25   ` Andrew Morton
2007-03-06 18:04 ` [patch 2/8] update ctime and mtime for mmaped write Miklos Szeredi
2007-03-06 20:32   ` Peter Zijlstra
2007-03-06 21:24     ` Miklos Szeredi
2007-03-06 21:47       ` Peter Zijlstra
2007-03-06 22:00         ` Miklos Szeredi
2007-03-06 22:07         ` Peter Zijlstra
2007-03-06 22:18           ` Miklos Szeredi
2007-03-06 22:28             ` Peter Zijlstra
2007-03-06 22:36               ` Miklos Szeredi
2007-03-06 18:04 ` [patch 3/8] per backing_dev dirty and writeback page accounting Miklos Szeredi
2007-03-12  6:23   ` David Chinner
2007-03-12 11:40     ` Miklos Szeredi
2007-03-12 21:44       ` David Chinner
2007-03-12 22:36         ` Miklos Szeredi [this message]
2007-03-12 23:12           ` David Chinner
2007-03-13  8:21             ` Miklos Szeredi
2007-03-13 22:12               ` David Chinner
2007-03-14 22:09                 ` Miklos Szeredi
2007-03-06 18:04 ` [patch 4/8] fix deadlock in balance_dirty_pages Miklos Szeredi
2007-03-06 18:04 ` [patch 5/8] fix deadlock in throttle_vm_writeout Miklos Szeredi
2007-03-06 18:04 ` [patch 6/8] balance dirty pages from loop device Miklos Szeredi
2007-03-06 18:04 ` [patch 7/8] add filesystem subtype support Miklos Szeredi
2007-03-06 18:04 ` [patch 8/8] consolidate generic_writepages and mpage_writepages fix Miklos Szeredi
2007-03-07 20:46   ` Andrew Morton
2007-03-07 21:26     ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1HQt7o-00042r-00@dorka.pomaz.szeredi.hu \
    --to=miklos@szeredi.hu \
    --cc=akpm@linux-foundation.org \
    --cc=dgc@sgi.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).