All of lore.kernel.org
 help / color / mirror / Atom feed
From: "John Stoffel" <john@stoffel.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	miklos@szeredi.hu, akpm@linux-foundation.org, neilb@suse.de,
	dgc@sgi.com, tomoki.sekiyama.qu@hitachi.com,
	nikita@clusterfs.com, trond.myklebust@fys.uio.no,
	yingchao.zhou@gmail.com, richard@rsk.demon.co.uk,
	torvalds@linux-foundation.org
Subject: Re: [PATCH 21/23] mm: per device dirty threshold
Date: Tue, 11 Sep 2007 22:36:12 -0400	[thread overview]
Message-ID: <18151.20636.425784.226044@stoffel.org> (raw)
In-Reply-To: <20070911200015.732492000@chello.nl>


Peter> Scale writeback cache per backing device, proportional to its
Peter> writeout speed.  By decoupling the BDI dirty thresholds a
Peter> number of problems we currently have will go away, namely:

Ah, this clarifies my questions!  Thanks!

Peter>  - mutual interference starvation (for any number of BDIs);
Peter>  - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).

Peter> It might be that all dirty pages are for a single BDI while
Peter> other BDIs are idling. By giving each BDI a 'fair' share of the
Peter> dirty limit, each one can have dirty pages outstanding and make
Peter> progress.

Question, can you change (shrink) the limit on a BDI while it has IO
in flight?  And what will that do to the system?  I.e. if you have one
device doing IO, so that it has a majority of the dirty limit.  Then
another device starts IO, and it's a *faster* device, how
quickly/slowly does the BDI dirty limits change for both the old and
new device?  

Peter> A global threshold also creates a deadlock for stacked BDIs;
Peter> when A writes to B, and A generates enough dirty pages to get
Peter> throttled, B will never start writeback until the dirty pages
Peter> go away. Again, by giving each BDI its own 'independent' dirty
Peter> limit, this problem is avoided.

Peter> So the problem is to determine how to distribute the total
Peter> dirty limit across the BDIs fairly and efficiently. A DBI that

You mean BDI here, not DBI.  

Peter> has a large dirty limit but does not have any dirty pages
Peter> outstanding is a waste.

Peter> What is done is to keep a floating proportion between the DBIs
Peter> based on writeback completions. This way faster/more active
Peter> devices get a larger share than slower/idle devices.

Does a slower device get a BDI which is calculated to keep it's limit
under a certain number of seconds of outstanding IO?  This way no
device can build up more than say 15 seconds of outstanding IO to
flush at any one time.  

Thanks!
John

WARNING: multiple messages have this Message-ID (diff)
From: "John Stoffel" <john@stoffel.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	miklos@szeredi.hu, akpm@linux-foundation.org, neilb@suse.de,
	dgc@sgi.com, tomoki.sekiyama.qu@hitachi.com,
	nikita@clusterfs.com, trond.myklebust@fys.uio.no,
	yingchao.zhou@gmail.com, richard@rsk.demon.co.uk,
	torvalds@linux-foundation.org
Subject: Re: [PATCH 21/23] mm: per device dirty threshold
Date: Tue, 11 Sep 2007 22:36:12 -0400	[thread overview]
Message-ID: <18151.20636.425784.226044@stoffel.org> (raw)
In-Reply-To: <20070911200015.732492000@chello.nl>

Peter> Scale writeback cache per backing device, proportional to its
Peter> writeout speed.  By decoupling the BDI dirty thresholds a
Peter> number of problems we currently have will go away, namely:

Ah, this clarifies my questions!  Thanks!

Peter>  - mutual interference starvation (for any number of BDIs);
Peter>  - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).

Peter> It might be that all dirty pages are for a single BDI while
Peter> other BDIs are idling. By giving each BDI a 'fair' share of the
Peter> dirty limit, each one can have dirty pages outstanding and make
Peter> progress.

Question, can you change (shrink) the limit on a BDI while it has IO
in flight?  And what will that do to the system?  I.e. if you have one
device doing IO, so that it has a majority of the dirty limit.  Then
another device starts IO, and it's a *faster* device, how
quickly/slowly does the BDI dirty limits change for both the old and
new device?  

Peter> A global threshold also creates a deadlock for stacked BDIs;
Peter> when A writes to B, and A generates enough dirty pages to get
Peter> throttled, B will never start writeback until the dirty pages
Peter> go away. Again, by giving each BDI its own 'independent' dirty
Peter> limit, this problem is avoided.

Peter> So the problem is to determine how to distribute the total
Peter> dirty limit across the BDIs fairly and efficiently. A DBI that

You mean BDI here, not DBI.  

Peter> has a large dirty limit but does not have any dirty pages
Peter> outstanding is a waste.

Peter> What is done is to keep a floating proportion between the DBIs
Peter> based on writeback completions. This way faster/more active
Peter> devices get a larger share than slower/idle devices.

Does a slower device get a BDI which is calculated to keep it's limit
under a certain number of seconds of outstanding IO?  This way no
device can build up more than say 15 seconds of outstanding IO to
flush at any one time.  

Thanks!
John

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-09-12  2:37 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-11 19:53 [PATCH 00/23] per device dirty throttling -v10 Peter Zijlstra
2007-09-11 19:53 ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 01/23] nfs: remove congestion_end() Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 02/23] lib: percpu_counter_add Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 03/23] lib: percpu_counter_sub Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 04/23] lib: percpu_counter variable batch Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 05/23] lib: make percpu_counter_add take s64 Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 06/23] lib: percpu_counter_set Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 07/23] lib: percpu_counter_sum_positive Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 08/23] lib: percpu_count_sum() Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:53 ` [PATCH 09/23] lib: percpu_counter_init error handling Peter Zijlstra
2007-09-11 19:53   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 10/23] lib: percpu_counter_init_irq Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 11/23] mm: bdi init hooks Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 12/23] containers: " Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 13/23] mtd: " Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 14/23] mtd: clean up the backing_dev_info usage Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 15/23] mtd: give mtdconcat devices their own backing_dev_info Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 16/23] mm: scalable bdi statistics counters Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 17/23] mm: count reclaimable pages per BDI Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 18/23] mm: count writeback " Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 19/23] mm: expose BDI statistics in sysfs Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 20/23] lib: floating proportions Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 21/23] mm: per device dirty threshold Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-12  2:36   ` John Stoffel [this message]
2007-09-12  2:36     ` John Stoffel
2007-09-12  8:45     ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 22/23] mm: dirty balancing for tasks Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-11 19:54 ` [PATCH 23/23] debug: sysfs files for the current ratio/size/total Peter Zijlstra
2007-09-11 19:54   ` Peter Zijlstra
2007-09-12  2:31 ` [PATCH 00/23] per device dirty throttling -v10 John Stoffel
2007-09-12  2:31   ` John Stoffel
2007-09-12  9:00   ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2007-08-16  7:45 [PATCH 00/23] per device dirty throttling -v9 Peter Zijlstra
2007-08-16  7:45 ` [PATCH 21/23] mm: per device dirty threshold Peter Zijlstra
2007-08-16  7:45   ` Peter Zijlstra
2007-08-03 12:37 [PATCH 00/23] per device dirty throttling -v8 Peter Zijlstra
2007-08-03 12:37 ` [PATCH 21/23] mm: per device dirty threshold Peter Zijlstra
2007-08-03 12:37   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18151.20636.425784.226044@stoffel.org \
    --to=john@stoffel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dgc@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=neilb@suse.de \
    --cc=nikita@clusterfs.com \
    --cc=richard@rsk.demon.co.uk \
    --cc=tomoki.sekiyama.qu@hitachi.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@fys.uio.no \
    --cc=yingchao.zhou@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.