linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: dgc@sgi.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [patch 3/8] per backing_dev dirty and writeback page accounting
Date: Wed, 14 Mar 2007 09:12:36 +1100	[thread overview]
Message-ID: <20070313221236.GY6095633@melbourne.sgi.com> (raw)
In-Reply-To: <E1HR2Gd-0004oR-00@dorka.pomaz.szeredi.hu>

On Tue, Mar 13, 2007 at 09:21:59AM +0100, Miklos Szeredi wrote:
> > > read request
> > > sys_write
> > >   mutex_lock(i_mutex)
> > >   ...
> > >      balance_dirty_pages
> > >         submit write requests
> > >         loop ... write requests completed ... dirty still over limit ... 
> > > 	... loop forever
> > 
> > Hmmm - the situation in balance_dirty_pages() after an attempt
> > to writeback_inodes(&wbc) that has written nothing because there
> > is nothing to write would be:
> > 
> > 	wbc->nr_write == write_chunk &&
> > 	wbc->pages_skipped == 0 &&
> > 	wbc->encountered_congestion == 0 &&
> > 	!bdi_congested(wbc->bdi)
> > 
> > What happens if you make that an exit condition to the loop?
> 
> That's almost right.  The only problem is that even if there's no
> congestion, the device queue can be holding a great amount of yet
> unwritten pages.  So exiting on this condition would mean, that
> dirty+writeback could go way over the threshold.

Only if the queue depth is not bound. Queue depths are bound and so
the distance we can go over the threshold is limited.  This is the
fundamental principle on which the throttling is based.....

Hence, if the queue is not full, then we will have either written
dirty pages to it (i.e wbc->nr_write != write_chunk so we will throttle
or continue normally if write_chunk was written) or we have no more
dirty pages left.

Having no dirty pages left on the bdi and it not being congested
means we effectively have a clean, idle bdi. We should not be trying
to throttle writeback here - we can't do anything to improve the
situation by continuing to try to do writeback on this bdi, so we
may as well give up and let the writer continue. Once we have dirty
pages on the bdi, we'll get throttled appropriately.

The point I'm making here is that if the bdi is not congested, any
pages dirtied on that bdi can be cleaned _quickly_ and so writing
more pages to it isn't a big deal even if we are over the global
dirty threshold.

Remember, the global dirty threshold is not really a hard limit -
it's a threshold at which we change behaviour. Throttling idle bdi's
does not contribute usefully to reducing the number of dirty pages
in the system; all it really does is deny service to devices that could
otherwise be doing useful work.

> How much this would be a problem?  I don't know, I guess it depends on
> many things: how many queues, how many requests per queue, how many
> bytes per request.

Right, and most ppl don't have enough devices in their system for
this to be a problem. Even those of us that do have enough devices
for this to potentially be a problem usually have enough RAM in
the machine so that it is not a problem....

> > Or alternatively, adding another bit to the wbc structure to
> > say "there was nothing to do" and setting that if we find
> > list_empty(&sb->s_dirty) when trying to flush dirty inodes."
> > 
> > [ FWIW, this may also solve another problem of fast block devices
> > being throttled incorrectly when a slow block dev is consuming
> > all the dirty pages... ]
> 
> There may be a patch floating around, which I think basically does
> this, but only as long as the dirty+writeback are over a soft limit,
> but under the hard limit.
> 
> When over the the hard limit, balance_dirty_pages still loops until
> dirty+writeback go below the threshold.

The difference between the two methods is that if there is any hard
limit that results in balance_dirty_pages looping then you have a
potential deadlock.  Hence the soft+hard limits will reduce the
occurrence but not remove the deadlock. Breaking out of the loop
when there is nothing to do simply means we'll reenter again
with something to do very shortly (and *then* throttle) if the
process continues to write.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2007-03-13 22:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-06 18:04 [patch 0/8] VFS/VM patches Miklos Szeredi
2007-03-06 18:04 ` [patch 1/8] fix race in clear_page_dirty_for_io() Miklos Szeredi
2007-03-06 22:25   ` Andrew Morton
2007-03-06 18:04 ` [patch 2/8] update ctime and mtime for mmaped write Miklos Szeredi
2007-03-06 20:32   ` Peter Zijlstra
2007-03-06 21:24     ` Miklos Szeredi
2007-03-06 21:47       ` Peter Zijlstra
2007-03-06 22:00         ` Miklos Szeredi
2007-03-06 22:07         ` Peter Zijlstra
2007-03-06 22:18           ` Miklos Szeredi
2007-03-06 22:28             ` Peter Zijlstra
2007-03-06 22:36               ` Miklos Szeredi
2007-03-06 18:04 ` [patch 3/8] per backing_dev dirty and writeback page accounting Miklos Szeredi
2007-03-12  6:23   ` David Chinner
2007-03-12 11:40     ` Miklos Szeredi
2007-03-12 21:44       ` David Chinner
2007-03-12 22:36         ` Miklos Szeredi
2007-03-12 23:12           ` David Chinner
2007-03-13  8:21             ` Miklos Szeredi
2007-03-13 22:12               ` David Chinner [this message]
2007-03-14 22:09                 ` Miklos Szeredi
2007-03-06 18:04 ` [patch 4/8] fix deadlock in balance_dirty_pages Miklos Szeredi
2007-03-06 18:04 ` [patch 5/8] fix deadlock in throttle_vm_writeout Miklos Szeredi
2007-03-06 18:04 ` [patch 6/8] balance dirty pages from loop device Miklos Szeredi
2007-03-06 18:04 ` [patch 7/8] add filesystem subtype support Miklos Szeredi
2007-03-06 18:04 ` [patch 8/8] consolidate generic_writepages and mpage_writepages fix Miklos Szeredi
2007-03-07 20:46   ` Andrew Morton
2007-03-07 21:26     ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070313221236.GY6095633@melbourne.sgi.com \
    --to=dgc@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).