linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking
@ 2006-06-21 17:01 Nate Diller
  2006-06-21 18:08 ` Nick Piggin
  2006-06-21 18:13 ` Martin Bligh
  0 siblings, 2 replies; 5+ messages in thread
From: Nate Diller @ 2006-06-21 17:01 UTC (permalink / raw)
  To: Peter Zijlstra, linux-mm, linux-kernel
  Cc: Hugh Dickins, Andrew Morton, David Howells, Christoph Lameter,
	Martin Bligh, Nick Piggin, Linus Torvalds, Hans Reiser,
	E. Gryaznova

Update write throttling calculations now that we can track and
throttle dirty mmap'd pages.  A version of this patch has been tested
with iozone:

http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/e3-2.6.16-tr.drt.pgs-rt.40_vs_rt.80.html
http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/r4-2.6.16-tr.drt.pgs-rt.40_vs_rt.80.html

Signed-off-by: Nate Diller <nate.diller@gmail.com>

--- linux-2.6.orig/mm/page-writeback.c	2005-10-27 17:02:08.000000000 -0700
+++ linux-2.6/mm/page-writeback.c	2006-06-21 08:24:11.000000000 -0700
@@ -69,7 +69,7 @@ int dirty_background_ratio = 10;
 /*
  * The generator of dirty data starts writeback at this percentage
  */
-int vm_dirty_ratio = 40;
+int vm_dirty_ratio = 80;

 /*
  * The interval between `kupdate'-style writebacks, in centiseconds
@@ -119,15 +119,14 @@ static void get_writeback_state(struct w
  * Work out the current dirty-memory clamping and background writeout
  * thresholds.
  *
- * The main aim here is to lower them aggressively if there is a lot of mapped
- * memory around.  To avoid stressing page reclaim with lots of unreclaimable
- * pages.  It is better to clamp down on writers than to start swapping, and
- * performing lots of scanning.
- *
- * We only allow 1/2 of the currently-unmapped memory to be dirtied.
- *
- * We don't permit the clamping level to fall below 5% - that is getting rather
- * excessive.
+ * We now have dirty memory accounting for mmap'd pages, so we calculate the
+ * ratios based on the available memory.  We still have no way of tracking
+ * how many pages are pinned (eg BSD wired accounting), so we still need the
+ * hard clamping, but the default has been raised to 80.
+ *
+ * We now allow the ratios to be set to anything, because there is less risk
+ * of OOM, and because databases and such will need more flexible tuning,
+ * now that they are being throttled too.
  *
  * We make sure that the background writeout level is below the adjusted
  * clamping level.
@@ -136,9 +135,6 @@ static void
 get_dirty_limits(struct writeback_state *wbs, long *pbackground, long *pdirty,
 		struct address_space *mapping)
 {
-	int background_ratio;		/* Percentages */
-	int dirty_ratio;
-	int unmapped_ratio;
 	long background;
 	long dirty;
 	unsigned long available_memory = total_pages;
@@ -155,27 +151,16 @@ get_dirty_limits(struct writeback_state
 		available_memory -= totalhigh_pages;
 #endif

-
-	unmapped_ratio = 100 - (wbs->nr_mapped * 100) / total_pages;
-
-	dirty_ratio = vm_dirty_ratio;
-	if (dirty_ratio > unmapped_ratio / 2)
-		dirty_ratio = unmapped_ratio / 2;
-
-	if (dirty_ratio < 5)
-		dirty_ratio = 5;
-
-	background_ratio = dirty_background_ratio;
-	if (background_ratio >= dirty_ratio)
-		background_ratio = dirty_ratio / 2;
-
-	background = (background_ratio * available_memory) / 100;
-	dirty = (dirty_ratio * available_memory) / 100;
+	background = (dirty_background_ratio * available_memory) / 100;
+	dirty = (vm_dirty_ratio * available_memory) / 100;
 	tsk = current;
 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
 		background += background / 4;
-		dirty += dirty / 4;
+		dirty += dirty / 8;
 	}
+	if (background > dirty)
+		background = dirty;
+
 	*pbackground = background;
 	*pdirty = dirty;
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking
  2006-06-21 17:01 [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking Nate Diller
@ 2006-06-21 18:08 ` Nick Piggin
  2006-06-21 22:25   ` Nate Diller
  2006-06-21 18:13 ` Martin Bligh
  1 sibling, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2006-06-21 18:08 UTC (permalink / raw)
  To: Nate Diller
  Cc: Peter Zijlstra, linux-mm, linux-kernel, Hugh Dickins,
	Andrew Morton, David Howells, Christoph Lameter, Martin Bligh,
	Linus Torvalds, Hans Reiser, E. Gryaznova

On Wed, Jun 21, 2006 at 10:01:17AM -0700, Nate Diller wrote:
> Update write throttling calculations now that we can track and
> throttle dirty mmap'd pages.  A version of this patch has been tested
> with iozone:

Your changelog doesn't tell much about the "why" side of things,
and omits the fact that you have upped the dirty ratio to 80.

> 
> http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/e3-2.6.16-tr.drt.pgs-rt.40_vs_rt.80.html
> http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/r4-2.6.16-tr.drt.pgs-rt.40_vs_rt.80.html

I'm guessing the reason you get all those red numbers when
iozone files are larger than RAM is because writeout and reclaim
tend to get worse when there are large amounts of dirty pages
floating around in memory?

> 
> Signed-off-by: Nate Diller <nate.diller@gmail.com>
> 
> --- linux-2.6.orig/mm/page-writeback.c	2005-10-27 17:02:08.000000000 -0700
> +++ linux-2.6/mm/page-writeback.c	2006-06-21 08:24:11.000000000 -0700
> @@ -69,7 +69,7 @@ int dirty_background_ratio = 10;
> /*
>  * The generator of dirty data starts writeback at this percentage
>  */
> -int vm_dirty_ratio = 40;
> +int vm_dirty_ratio = 80;
> 
> /*
>  * The interval between `kupdate'-style writebacks, in centiseconds
> @@ -119,15 +119,14 @@ static void get_writeback_state(struct w
>  * Work out the current dirty-memory clamping and background writeout
>  * thresholds.
>  *
> - * The main aim here is to lower them aggressively if there is a lot of 
> mapped
> - * memory around.  To avoid stressing page reclaim with lots of 
> unreclaimable
> - * pages.  It is better to clamp down on writers than to start swapping, 
> and
> - * performing lots of scanning.
> - *
> - * We only allow 1/2 of the currently-unmapped memory to be dirtied.
> - *
> - * We don't permit the clamping level to fall below 5% - that is getting 
> rather
> - * excessive.
> + * We now have dirty memory accounting for mmap'd pages, so we calculate 
> the
> + * ratios based on the available memory.  We still have no way of tracking
> + * how many pages are pinned (eg BSD wired accounting), so we still need 
> the
> + * hard clamping, but the default has been raised to 80.
> + *
> + * We now allow the ratios to be set to anything, because there is less 
> risk
> + * of OOM, and because databases and such will need more flexible tuning,
> + * now that they are being throttled too.
>  *
>  * We make sure that the background writeout level is below the adjusted
>  * clamping level.
> @@ -136,9 +135,6 @@ static void
> get_dirty_limits(struct writeback_state *wbs, long *pbackground, long 
> *pdirty,
> 		struct address_space *mapping)
> {
> -	int background_ratio;		/* Percentages */
> -	int dirty_ratio;
> -	int unmapped_ratio;
> 	long background;
> 	long dirty;
> 	unsigned long available_memory = total_pages;
> @@ -155,27 +151,16 @@ get_dirty_limits(struct writeback_state
> 		available_memory -= totalhigh_pages;
> #endif
> 
> -
> -	unmapped_ratio = 100 - (wbs->nr_mapped * 100) / total_pages;
> -
> -	dirty_ratio = vm_dirty_ratio;
> -	if (dirty_ratio > unmapped_ratio / 2)
> -		dirty_ratio = unmapped_ratio / 2;
> -
> -	if (dirty_ratio < 5)
> -		dirty_ratio = 5;
> -
> -	background_ratio = dirty_background_ratio;
> -	if (background_ratio >= dirty_ratio)
> -		background_ratio = dirty_ratio / 2;
> -
> -	background = (background_ratio * available_memory) / 100;
> -	dirty = (dirty_ratio * available_memory) / 100;
> +	background = (dirty_background_ratio * available_memory) / 100;
> +	dirty = (vm_dirty_ratio * available_memory) / 100;
> 	tsk = current;
> 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
> 		background += background / 4;
> -		dirty += dirty / 4;
> +		dirty += dirty / 8;
> 	}
> +	if (background > dirty)
> +		background = dirty;
> +
> 	*pbackground = background;
> 	*pdirty = dirty;
> }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking
  2006-06-21 17:01 [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking Nate Diller
  2006-06-21 18:08 ` Nick Piggin
@ 2006-06-21 18:13 ` Martin Bligh
  1 sibling, 0 replies; 5+ messages in thread
From: Martin Bligh @ 2006-06-21 18:13 UTC (permalink / raw)
  To: Nate Diller
  Cc: Peter Zijlstra, linux-mm, linux-kernel, Hugh Dickins,
	Andrew Morton, David Howells, Christoph Lameter, Nick Piggin,
	Linus Torvalds, Hans Reiser, E. Gryaznova


> -int vm_dirty_ratio = 40;
> +int vm_dirty_ratio = 80;

I don't think you can do that. Because ...

>     unsigned long available_memory = total_pages;
...
> +    dirty = (vm_dirty_ratio * available_memory) / 100;

... there are other things in memory besides pagecache. Limiting
dirty pages to 80% of pagecache might be fine, but not 80%
of total memory.

dirty = (vm_dirty_ratio * (nr_active + nr_inactive)) / 100

might be more sensible. Frankly the whole thing is a crock
anyway, because we should be counting easily freeable clean
pages, not dirty pages, but still.

M.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking
  2006-06-21 18:08 ` Nick Piggin
@ 2006-06-21 22:25   ` Nate Diller
  2006-06-23  7:31     ` Hans Reiser
  0 siblings, 1 reply; 5+ messages in thread
From: Nate Diller @ 2006-06-21 22:25 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, linux-mm, linux-kernel, Hugh Dickins,
	Andrew Morton, David Howells, Christoph Lameter, Martin Bligh,
	Linus Torvalds, Hans Reiser, E. Gryaznova

On 6/21/06, Nick Piggin <npiggin@suse.de> wrote:
> On Wed, Jun 21, 2006 at 10:01:17AM -0700, Nate Diller wrote:
> > Update write throttling calculations now that we can track and
> > throttle dirty mmap'd pages.  A version of this patch has been tested
> > with iozone:
>
> Your changelog doesn't tell much about the "why" side of things,
> and omits the fact that you have upped the dirty ratio to 80.

hmm, you are right, documenting it in the code comment is not really
enough here, because there are going to be performance corner cases
and such for this patch (as well as the whole tracking dirty patchset)

> >
> > http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/e3-2.6.16-tr.drt.pgs-rt.40_vs_rt.80.html
> > http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/r4-2.6.16-tr.drt.pgs-rt.40_vs_rt.80.html
>
> I'm guessing the reason you get all those red numbers when
> iozone files are larger than RAM is because writeout and reclaim
> tend to get worse when there are large amounts of dirty pages
> floating around in memory?

actually, there is a great deal of variation in the test results once
you get into the large I/O part of the test.  also, the fact that we
are tracking mmap'd pages at all changes the preformance.  here are
links which compare the old and new configurations, but with
dirty_pages set to 40 on both:

http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/e3-2.6.16_vs_tr.drt.pgs-rt.40.html
http://namesys.com/intbenchmarks/iozone/06.06.19.tracking.dirty.page-noatime_-B/r4-2.6.16_vs_tr.drt.pgs-rt.40.html

grev posted the variance as well, but for some reason the link doesn't work.

NATE

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking
  2006-06-21 22:25   ` Nate Diller
@ 2006-06-23  7:31     ` Hans Reiser
  0 siblings, 0 replies; 5+ messages in thread
From: Hans Reiser @ 2006-06-23  7:31 UTC (permalink / raw)
  To: Nate Diller
  Cc: Nick Piggin, Peter Zijlstra, linux-mm, linux-kernel,
	Hugh Dickins, Andrew Morton, David Howells, Christoph Lameter,
	Martin Bligh, Linus Torvalds, E. Gryaznova

Nate, you should note that A: increasing to 80% was my idea, and B: the
data from the benchmarks provide no indication that it is a good idea.

That said, it is very possible that C: the benchmark is flawed, because
the variance is so high that I am suspicious that something is wrong
with the benchmark, and D: that the implementation is flawed in some way
we don't yet see.

All that said, I cannot say that we have anything here that suggests the
change is a good change.   My intuition says it should be a good change,
but the data does not.  Not yet. 

Hans

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-06-23  7:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-21 17:01 [PATCH] mm/tracking dirty pages: update get_dirty_limits for mmap tracking Nate Diller
2006-06-21 18:08 ` Nick Piggin
2006-06-21 22:25   ` Nate Diller
2006-06-23  7:31     ` Hans Reiser
2006-06-21 18:13 ` Martin Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).