Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 15:01 [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22 Shantanu Goel
@ 2003-08-29 14:57 ` Antonio Vargas
  2003-08-29 17:55   ` Shantanu Goel
  2003-08-29 19:11 ` [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22 Rahul Karnik
  1 sibling, 1 reply; 21+ messages in thread
From: Antonio Vargas @ 2003-08-29 14:57 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: linux-kernel

On Fri, Aug 29, 2003 at 08:01:11AM -0700, Shantanu Goel wrote:
> Hi kernel hackers,
> 
> The VM subsystem in Linux 2.4.22 can cause spurious
> swapouts in the presence of lots of dirty pages. 
> Presently, as dirty pages are encountered,
> shrink_cache() schedules a writepage() and moves the
> page to the head of the inactive list.  When a lot of
> dirty pages are present, this can break the FIFO
> ordering of the inactive list because clean pages
> further down the list will be reclaimed first.  The
> following patch records the pages being laundered, and
> once SWAP_CLUSTER_MAX pages have been accumulated or
> the scan is complete, goes back and attempts to move
> them back to the tail of the list.
> 
> The second part of the patch reclaims unused
> inode/dentry/dquot entries more aggressively.  I have
> observed that on an NFS server where swap out activity
> is low, the VM can shrink the page cache to the point
> where most pages are used up by unused inode/dentry
> entries.  This is because page cache reclamation
> succeeds most of the time except when a swap_out()
> happens.
> 
> Feedback and comments are welcome.

Microoptimization (which helps on x86 a lot):
 
-       for (i = nr_pages - 1; i >= 0; i--) {
-               page = laundry[i];
+	laundry += nr_pages;
+	for (i = -nr_pages; ++i ;){
+               page = laundry[i];

Your original code reads from higher to lower addresses,
while the new one reads from lower to higer addresses.

The new code changes and then tests the loop counter,
so it's a little bit faster :)

Both check against zero, so both can use result flags
directly and do no intervening "cmp ecx,CONSTANT".

To the "powers that be", would this type of microoptimizations
be futher welcomed?

Greets, Antonio.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
@ 2003-08-29 15:01 Shantanu Goel
  2003-08-29 14:57 ` Antonio Vargas
  2003-08-29 19:11 ` [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22 Rahul Karnik
  0 siblings, 2 replies; 21+ messages in thread
From: Shantanu Goel @ 2003-08-29 15:01 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

Hi kernel hackers,

The VM subsystem in Linux 2.4.22 can cause spurious
swapouts in the presence of lots of dirty pages. 
Presently, as dirty pages are encountered,
shrink_cache() schedules a writepage() and moves the
page to the head of the inactive list.  When a lot of
dirty pages are present, this can break the FIFO
ordering of the inactive list because clean pages
further down the list will be reclaimed first.  The
following patch records the pages being laundered, and
once SWAP_CLUSTER_MAX pages have been accumulated or
the scan is complete, goes back and attempts to move
them back to the tail of the list.

The second part of the patch reclaims unused
inode/dentry/dquot entries more aggressively.  I have
observed that on an NFS server where swap out activity
is low, the VM can shrink the page cache to the point
where most pages are used up by unused inode/dentry
entries.  This is because page cache reclamation
succeeds most of the time except when a swap_out()
happens.

Feedback and comments are welcome.

Thanks,
Shantanu Goel

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

[-- Attachment #2: 2.4.22-vm-writeback-inode.patch --]
[-- Type: application/octet-stream, Size: 3331 bytes --]

--- vmscan.c.~1~	2002-11-28 18:53:15.000000000 -0500
+++ vmscan.c	2003-08-28 19:09:48.000000000 -0400
@@ -23,6 +23,11 @@
 #include <linux/init.h>
 #include <linux/highmem.h>
 #include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/dcache.h>
+#ifdef CONFIG_QUOTA
+#include <linux/quota.h>
+#endif
 
 #include <asm/pgalloc.h>
 
@@ -334,12 +339,52 @@
 	return 0;
 }
 
+/*
+ * Wait for pages being laundered and move them
+ * to tail of inactive list if they are still freeable.
+ */
+static void process_laundry(struct page **laundry, int nr_pages)
+{
+	int i;
+	struct page *page;
+
+	for (i = nr_pages - 1; i >= 0; i--) {
+		page = laundry[i];
+		spin_lock(&pagemap_lru_lock);
+
+		if (!PageLRU(page) || PageActive(page))
+			goto next_page;
+
+		if (TryLockPage(page)) {
+			if (PageLaunder(page)) {
+				spin_unlock(&pagemap_lru_lock);
+				wait_on_page(page);
+				i++;
+				continue;
+			}
+			goto next_page;
+		}
+
+		if (!PageDirty(page) && !PageError(page) && page->mapping && (page_count(page) - !!page->buffers) == 2) {
+			list_del(&page->lru);
+			list_add_tail(&page->lru, &inactive_list);
+		}
+
+		UnlockPage(page);
+next_page:
+		spin_unlock(&pagemap_lru_lock);
+		page_cache_release(page);
+	}
+}
+
 static int FASTCALL(shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int priority));
 static int shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int priority)
 {
 	struct list_head * entry;
+	int nr_laundry = 0;
 	int max_scan = nr_inactive_pages / priority;
 	int max_mapped = min((nr_pages << (10 - priority)), max_scan / 10);
+	struct page *laundry[SWAP_CLUSTER_MAX];
 
 	spin_lock(&pagemap_lru_lock);
 	while (--max_scan >= 0 && (entry = inactive_list.prev) != &inactive_list) {
@@ -408,8 +453,14 @@
 				page_cache_get(page);
 				spin_unlock(&pagemap_lru_lock);
 
-				writepage(page);
-				page_cache_release(page);
+				if (!writepage(page)) {
+					laundry[nr_laundry++] = page;
+					if (nr_laundry == SWAP_CLUSTER_MAX) {
+						process_laundry(laundry, nr_laundry);
+						nr_laundry = 0;
+					}
+				} else
+					page_cache_release(page);
 
 				spin_lock(&pagemap_lru_lock);
 				continue;
@@ -483,7 +534,7 @@
 			 */
 			spin_unlock(&pagemap_lru_lock);
 			swap_out(priority, gfp_mask, classzone);
-			return nr_pages;
+			goto out;
 		}
 
 		/*
@@ -519,6 +570,9 @@
 		break;
 	}
 	spin_unlock(&pagemap_lru_lock);
+out:
+	if (nr_laundry)
+		process_laundry(laundry, nr_laundry);
 
 	return nr_pages;
 }
@@ -572,14 +626,22 @@
 	refill_inactive(ratio);
 
 	nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
-	if (nr_pages <= 0)
-		return 0;
 
-	shrink_dcache_memory(priority, gfp_mask);
-	shrink_icache_memory(priority, gfp_mask);
+	/*
+	 * Keep an eye on unused inode/dcache/quota entries.
+	 */
+	ratio = (inodes_stat.nr_unused * sizeof(struct inode)) >> PAGE_SHIFT;
+	ratio += (dentry_stat.nr_unused * sizeof(struct dentry)) >> PAGE_SHIFT;
+#ifdef CONFIG_QUOTA
+	ratio += (dqstats.free_dquots * sizeof(struct dquot)) >> PAGE_SHIFT;
+#endif
+	if (nr_pages > 0 || ratio > SWAP_CLUSTER_MAX) {
+		shrink_dcache_memory(priority, gfp_mask);
+		shrink_icache_memory(priority, gfp_mask);
 #ifdef CONFIG_QUOTA
-	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
+		shrink_dqcache_memory(priority, gfp_mask);
 #endif
+	}
 
 	return nr_pages;
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 14:57 ` Antonio Vargas
@ 2003-08-29 17:55   ` Shantanu Goel
  2003-08-29 18:06     ` Mike Fedyk
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-08-29 17:55 UTC (permalink / raw)
  To: Antonio Vargas; +Cc: linux-kernel

I am not very knowledgeable about micro-optimizations.
 I'll take your word for it. ;-)  What interests me
more is whether others notice any performance
improvement under swapout with this patch given that
is on the order of milliseconds.

--- Antonio Vargas <wind@cocodriloo.com> wrote:
> On Fri, Aug 29, 2003 at 08:01:11AM -0700, Shantanu
> Goel wrote:
> > Hi kernel hackers,
> > 
> > The VM subsystem in Linux 2.4.22 can cause
> spurious
> > swapouts in the presence of lots of dirty pages. 
> > Presently, as dirty pages are encountered,
> > shrink_cache() schedules a writepage() and moves
> the
> > page to the head of the inactive list.  When a lot
> of
> > dirty pages are present, this can break the FIFO
> > ordering of the inactive list because clean pages
> > further down the list will be reclaimed first. 
> The
> > following patch records the pages being laundered,
> and
> > once SWAP_CLUSTER_MAX pages have been accumulated
> or
> > the scan is complete, goes back and attempts to
> move
> > them back to the tail of the list.
> > 
> > The second part of the patch reclaims unused
> > inode/dentry/dquot entries more aggressively.  I
> have
> > observed that on an NFS server where swap out
> activity
> > is low, the VM can shrink the page cache to the
> point
> > where most pages are used up by unused
> inode/dentry
> > entries.  This is because page cache reclamation
> > succeeds most of the time except when a swap_out()
> > happens.
> > 
> > Feedback and comments are welcome.
> 
> Microoptimization (which helps on x86 a lot):
>  
> -       for (i = nr_pages - 1; i >= 0; i--) {
> -               page = laundry[i];
> +	laundry += nr_pages;
> +	for (i = -nr_pages; ++i ;){
> +               page = laundry[i];
> 
> Your original code reads from higher to lower
> addresses,
> while the new one reads from lower to higer
> addresses.
> 
> The new code changes and then tests the loop
> counter,
> so it's a little bit faster :)
> 
> Both check against zero, so both can use result
> flags
> directly and do no intervening "cmp ecx,CONSTANT".
> 
> To the "powers that be", would this type of
> microoptimizations
> be futher welcomed?
> 
> Greets, Antonio.


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 17:55   ` Shantanu Goel
@ 2003-08-29 18:06     ` Mike Fedyk
  2003-08-29 18:46       ` Shantanu Goel
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Fedyk @ 2003-08-29 18:06 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: Antonio Vargas, linux-kernel

On Fri, Aug 29, 2003 at 10:55:44AM -0700, Shantanu Goel wrote:
> I am not very knowledgeable about micro-optimizations.
>  I'll take your word for it. ;-)  What interests me
> more is whether others notice any performance
> improvement under swapout with this patch given that
> is on the order of milliseconds.

But have you compared your patch with the VM patches in -aa?  Will your
patch apply on -aa and make improvements there too?

In other words: Why would I want to use this patch when I could use -aa?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 18:06     ` Mike Fedyk
@ 2003-08-29 18:46       ` Shantanu Goel
  2003-08-29 18:57         ` Mike Fedyk
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-08-29 18:46 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Antonio Vargas, linux-kernel

I prefer to run stock kernels so I don't have as much
experience with the -aa patches.  However, I took a
look at the relevant code in 2.4.22pre7aa1 and I
believe my patch should help there as well.  The
writepage() and page rotation behaviour is similar to
stock 2.4.22 though the inactive_list is per-classzone
in -aa.  I am less sure about the inode/dcache part
though under -aa.

--- Mike Fedyk <mfedyk@matchmail.com> wrote:
> On Fri, Aug 29, 2003 at 10:55:44AM -0700, Shantanu
> Goel wrote:
> > I am not very knowledgeable about
> micro-optimizations.
> >  I'll take your word for it. ;-)  What interests
> me
> > more is whether others notice any performance
> > improvement under swapout with this patch given
> that
> > is on the order of milliseconds.
> 
> But have you compared your patch with the VM patches
> in -aa?  Will your
> patch apply on -aa and make improvements there too?
> 
> In other words: Why would I want to use this patch
> when I could use -aa?
> 

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 18:46       ` Shantanu Goel
@ 2003-08-29 18:57         ` Mike Fedyk
  2003-08-29 19:28           ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Fedyk @ 2003-08-29 18:57 UTC (permalink / raw)
  To: Shantanu Goel
  Cc: Antonio Vargas, linux-kernel, Andrea Arcangeli, Marc-Christian Petersen

[CCing AA & MCP]

> --- Mike Fedyk <mfedyk@matchmail.com> wrote:
> > But have you compared your patch with the VM patches
> > in -aa?  Will your
> > patch apply on -aa and make improvements there too?
> > 
> > In other words: Why would I want to use this patch
> > when I could use -aa?

On Fri, Aug 29, 2003 at 11:46:44AM -0700, Shantanu Goel wrote:
> I prefer to run stock kernels so I don't have as much
> experience with the -aa patches.  However, I took a
> look at the relevant code in 2.4.22pre7aa1 and I
> believe my patch should help there as well.  The
> writepage() and page rotation behaviour is similar to
> stock 2.4.22 though the inactive_list is per-classzone
> in -aa.  I am less sure about the inode/dcache part
> though under -aa.

You need to integrate with -aa on the VM.  It has been hard enough for
Andrea to get his stuff in, I doubt you will fair any better.

If your patch shows improvements when applied on -aa Andrea will probably
integrate it.

Marc/Andrea, what do you think?  Any holes to poke in this here patch?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 15:01 [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22 Shantanu Goel
  2003-08-29 14:57 ` Antonio Vargas
@ 2003-08-29 19:11 ` Rahul Karnik
  1 sibling, 0 replies; 21+ messages in thread
From: Rahul Karnik @ 2003-08-29 19:11 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: linux-kernel

Shantanu Goel wrote:

> Feedback and comments are welcome.

This is an optimization patch. Since the VM is all voodoo :), there are 
no obvious improvements. Numbers and/or test scripts would go a long way 
in getting acceptance.

-Rahul
-- 
Rahul Karnik
rahul@genebrew.com
http://www.genebrew.com/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 18:57         ` Mike Fedyk
@ 2003-08-29 19:28           ` Andrea Arcangeli
  2003-08-29 19:46             ` Shantanu Goel
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-08-29 19:28 UTC (permalink / raw)
  To: Shantanu Goel, Antonio Vargas, linux-kernel, Marc-Christian Petersen

On Fri, Aug 29, 2003 at 11:57:28AM -0700, Mike Fedyk wrote:
> [CCing AA & MCP]
> 
> > --- Mike Fedyk <mfedyk@matchmail.com> wrote:
> > > But have you compared your patch with the VM patches
> > > in -aa?  Will your
> > > patch apply on -aa and make improvements there too?
> > > 
> > > In other words: Why would I want to use this patch
> > > when I could use -aa?
> 
> On Fri, Aug 29, 2003 at 11:46:44AM -0700, Shantanu Goel wrote:
> > I prefer to run stock kernels so I don't have as much
> > experience with the -aa patches.  However, I took a
> > look at the relevant code in 2.4.22pre7aa1 and I
> > believe my patch should help there as well.  The
> > writepage() and page rotation behaviour is similar to
> > stock 2.4.22 though the inactive_list is per-classzone
> > in -aa.  I am less sure about the inode/dcache part
> > though under -aa.
> 
> You need to integrate with -aa on the VM.  It has been hard enough for
> Andrea to get his stuff in, I doubt you will fair any better.
> 
> If your patch shows improvements when applied on -aa Andrea will probably
> integrate it.

yes, at this point in time I'm willing to merge only anything that is an
obvious improvement. More doubious things would better go in 2.6.

I didn't see the patch in question, Shantanu, if you're interested to
merge it in -aa, could you submit against 2.4.22pre7aa1? Otherwise I'll
check it and possibly merge it myself later (i've quite some backlog to
merge already for the short term, but it can go in queue)

> Marc/Andrea, what do you think?  Any holes to poke in this here patch?

didn't check it yet.

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 19:28           ` Andrea Arcangeli
@ 2003-08-29 19:46             ` Shantanu Goel
  2003-08-29 19:55               ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-08-29 19:46 UTC (permalink / raw)
  To: Andrea Arcangeli, Antonio Vargas, linux-kernel, Marc-Christian Petersen

Andrea,

I'll test and submit a patch against -aa.  Also, is
there a common benchmark that you use to test for
regression?

Thanks,
Shantanu

--- Andrea Arcangeli <andrea@suse.de> wrote:
> On Fri, Aug 29, 2003 at 11:57:28AM -0700, Mike Fedyk
> wrote:
> > [CCing AA & MCP]
> > 
> > > --- Mike Fedyk <mfedyk@matchmail.com> wrote:
> > > > But have you compared your patch with the VM
> patches
> > > > in -aa?  Will your
> > > > patch apply on -aa and make improvements there
> too?
> > > > 
> > > > In other words: Why would I want to use this
> patch
> > > > when I could use -aa?
> > 
> > On Fri, Aug 29, 2003 at 11:46:44AM -0700, Shantanu
> Goel wrote:
> > > I prefer to run stock kernels so I don't have as
> much
> > > experience with the -aa patches.  However, I
> took a
> > > look at the relevant code in 2.4.22pre7aa1 and I
> > > believe my patch should help there as well.  The
> > > writepage() and page rotation behaviour is
> similar to
> > > stock 2.4.22 though the inactive_list is
> per-classzone
> > > in -aa.  I am less sure about the inode/dcache
> part
> > > though under -aa.
> > 
> > You need to integrate with -aa on the VM.  It has
> been hard enough for
> > Andrea to get his stuff in, I doubt you will fair
> any better.
> > 
> > If your patch shows improvements when applied on
> -aa Andrea will probably
> > integrate it.
> 
> yes, at this point in time I'm willing to merge only
> anything that is an
> obvious improvement. More doubious things would
> better go in 2.6.
> 
> I didn't see the patch in question, Shantanu, if
> you're interested to
> merge it in -aa, could you submit against
> 2.4.22pre7aa1? Otherwise I'll
> check it and possibly merge it myself later (i've
> quite some backlog to
> merge already for the short term, but it can go in
> queue)
> 
> > Marc/Andrea, what do you think?  Any holes to poke
> in this here patch?
> 
> didn't check it yet.
> 
> Andrea


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 19:46             ` Shantanu Goel
@ 2003-08-29 19:55               ` Andrea Arcangeli
  2003-08-29 20:20                 ` Shantanu Goel
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-08-29 19:55 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: Antonio Vargas, linux-kernel, Marc-Christian Petersen

On Fri, Aug 29, 2003 at 12:46:36PM -0700, Shantanu Goel wrote:
> Andrea,
> 
> I'll test and submit a patch against -aa.  Also, is
> there a common benchmark that you use to test for
> regression?

bonnie,tiobench,dbench would be a very good start for the basics (note:
dbench can be misleading, but at the same fariness levels, it's
interesting too, it's just that dbench doesn't measure the fariness
level itself [like tiobench started doing relatively recently]).

(I'm assuming the patch makes difference not only for mmapped dirty
pages, in such case the above would be non interesting)

thanks,

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 19:55               ` Andrea Arcangeli
@ 2003-08-29 20:20                 ` Shantanu Goel
  2003-08-30  5:01                   ` Antonio Vargas
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-08-29 20:20 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Antonio Vargas, linux-kernel, Marc-Christian Petersen

Thanks for the pointer to the benchmarks.

The patch I posted only helps the mmap case so it
won't help (or hurt hopefully ;-) any program that
primarily does read/write instead of mmap.  The
extreme case where I observed this was a perl script
that created a gigantic hash and tried to populate it.
 The perl in question uses mmap for malloc.  The
difference in execution time between stock 2.4.22 and
one with the patch was insignificant because it is
primarily I/O bound, however the other apps I was
running, Mozilla and several xterm's, were paged out
much less frequently in the latter case.  The machine
has 256MB of memory and perl grew to about 1 GB.

I have written another patch that more aggresively
tries to free pages with dirty buffers which should
help with the buffer I/O case.  It essentially changes
try_to_free_buffers() so it immediately starts and
waits for I/O to complete if the gfp_mask allows it. 
It does not do any clustering so its performance is
questionable at the moment.

--- Andrea Arcangeli <andrea@suse.de> wrote:
> On Fri, Aug 29, 2003 at 12:46:36PM -0700, Shantanu
> Goel wrote:
> > Andrea,
> > 
> > I'll test and submit a patch against -aa.  Also,
> is
> > there a common benchmark that you use to test for
> > regression?
> 
> bonnie,tiobench,dbench would be a very good start
> for the basics (note:
> dbench can be misleading, but at the same fariness
> levels, it's
> interesting too, it's just that dbench doesn't
> measure the fariness
> level itself [like tiobench started doing relatively
> recently]).
> 
> (I'm assuming the patch makes difference not only
> for mmapped dirty
> pages, in such case the above would be non
> interesting)
> 
> thanks,
> 
> Andrea

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22
  2003-08-29 20:20                 ` Shantanu Goel
@ 2003-08-30  5:01                   ` Antonio Vargas
  2003-09-20 13:39                     ` A couple of 2.4.23-pre4 VM nits Shantanu Goel
  0 siblings, 1 reply; 21+ messages in thread
From: Antonio Vargas @ 2003-08-30  5:01 UTC (permalink / raw)
  To: Shantanu Goel
  Cc: Andrea Arcangeli, Antonio Vargas, linux-kernel, Marc-Christian Petersen

On Fri, Aug 29, 2003 at 01:20:01PM -0700, Shantanu Goel wrote:
> Thanks for the pointer to the benchmarks.
> 
> The patch I posted only helps the mmap case so it
> won't help (or hurt hopefully ;-) any program that
> primarily does read/write instead of mmap.  The
> extreme case where I observed this was a perl script
> that created a gigantic hash and tried to populate it.

I've experienced this workload and it's easily reproducible:
get the lxr tools and try to build the indexes.

>  The perl in question uses mmap for malloc.  The
> difference in execution time between stock 2.4.22 and
> one with the patch was insignificant because it is
> primarily I/O bound, however the other apps I was
> running, Mozilla and several xterm's, were paged out
> much less frequently in the latter case.  The machine
> has 256MB of memory and perl grew to about 1 GB.
> 
> I have written another patch that more aggresively
> tries to free pages with dirty buffers which should
> help with the buffer I/O case.  It essentially changes
> try_to_free_buffers() so it immediately starts and
> waits for I/O to complete if the gfp_mask allows it. 
> It does not do any clustering so its performance is
> questionable at the moment.
> 
> --- Andrea Arcangeli <andrea@suse.de> wrote:
> > On Fri, Aug 29, 2003 at 12:46:36PM -0700, Shantanu
> > Goel wrote:
> > > Andrea,
> > > 
> > > I'll test and submit a patch against -aa.  Also,
> > is
> > > there a common benchmark that you use to test for
> > > regression?
> > 
> > bonnie,tiobench,dbench would be a very good start
> > for the basics (note:
> > dbench can be misleading, but at the same fariness
> > levels, it's
> > interesting too, it's just that dbench doesn't
> > measure the fariness
> > level itself [like tiobench started doing relatively
> > recently]).
> > 
> > (I'm assuming the patch makes difference not only
> > for mmapped dirty
> > pages, in such case the above would be non
> > interesting)
> > 
> > thanks,
> > 
> > Andrea
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com

-- 
winden/network

1. Dado un programa, siempre tiene al menos un fallo.
2. Dadas varias lineas de codigo, siempre se pueden acortar a menos lineas.
3. Por induccion, todos los programas se pueden
   reducir a una linea que no funciona.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* A couple of 2.4.23-pre4 VM nits
  2003-08-30  5:01                   ` Antonio Vargas
@ 2003-09-20 13:39                     ` Shantanu Goel
  2003-09-21  2:46                       ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-09-20 13:39 UTC (permalink / raw)
  To: andrea; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

Hi Andrea,

The VM fixes perform rather well in my testing
(thanks!), but I noticed a couple of glitches that the
attached patch addresses.

1. max_scan is never decremented in shrink_cache().  I
am assuming this is a typo.

2. The second part of the patch makes sure that
inode/dentry caches are shrunk at least once every 5
secs.  On a machine with a heavy inode stat/directory
lookup load (e.g. NFS server), most of the memory
winds up sitting idle in unused inodes/dentry.  The
present code only reclaims these when a swap_out()
happens or shrink_caches() fails.  This can take a
while on a machine will very few mapped pages such as
an NFS server.

Thanks,
Shantanu

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

[-- Attachment #2: vmscan.patch --]
[-- Type: application/octet-stream, Size: 2287 bytes --]

--- vmscan.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ vmscan.c	2003-09-17 18:23:03.000000000 -0400
@@ -364,6 +364,20 @@
 	return 0;
 }
 
+static void shrink_vfs_caches(int force, unsigned int gfp_mask)
+{
+	static unsigned long last_scan = 0;
+
+	if (force || time_after(jiffies, last_scan + 5*HZ)) {
+		shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
+		shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
+#ifdef CONFIG_QUOTA
+		shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
+#endif
+		last_scan = jiffies;
+	}
+}
+
 static void FASTCALL(refill_inactive(int nr_pages, zone_t * classzone));
 static int FASTCALL(shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout));
 static int shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout)
@@ -372,7 +386,7 @@
 	int max_scan = (classzone->nr_inactive_pages + classzone->nr_active_pages) / vm_cache_scan_ratio;
 	int max_mapped = vm_mapped_ratio * nr_pages;
 
-	while (max_scan && classzone->nr_inactive_pages && (entry = inactive_list.prev) != &inactive_list) {
+	while (--max_scan >= 0 && classzone->nr_inactive_pages && (entry = inactive_list.prev) != &inactive_list) {
 		struct page * page;
 
 		if (unlikely(current->need_resched)) {
@@ -516,11 +530,7 @@
 				if (nr_pages <= 0)
 					goto out;
 
-				shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-				shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-				shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+				shrink_vfs_caches(*failed_swapout, gfp_mask);
 
 				if (!*failed_swapout)
 					*failed_swapout = !swap_out(classzone);
@@ -614,6 +624,8 @@
 	if (nr_pages <= 0)
 		goto out;
 
+	shrink_vfs_caches(0, gfp_mask);
+
 	spin_lock(&pagemap_lru_lock);
 	refill_inactive(nr_pages, classzone);
 
@@ -638,11 +650,9 @@
 			nr_pages = shrink_caches(classzone, gfp_mask, nr_pages, &failed_swapout);
 			if (nr_pages <= 0)
 				return 1;
-			shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-			shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-			shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+
+			shrink_vfs_caches(1, gfp_mask);
+
 			if (!failed_swapout)
 				failed_swapout = !swap_out(classzone);
 		} while (--tries);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-20 13:39                     ` A couple of 2.4.23-pre4 VM nits Shantanu Goel
@ 2003-09-21  2:46                       ` Andrea Arcangeli
  2003-09-21  5:32                         ` Shantanu Goel
  2003-09-21 18:37                         ` Marcelo Tosatti
  0 siblings, 2 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2003-09-21  2:46 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: linux-kernel, Marcelo Tosatti

Hi Shantanu,

On Sat, Sep 20, 2003 at 06:39:30AM -0700, Shantanu Goel wrote:
> Hi Andrea,
> 
> The VM fixes perform rather well in my testing
> (thanks!), but I noticed a couple of glitches that the
> attached patch addresses.
> 
> 1. max_scan is never decremented in shrink_cache().  I
> am assuming this is a typo.

this is an huge half-merging error, no surprise Andi run into vm
troubles with pre4. My tree looks like this for years:

		if (unlikely(!page_count(page)))
			continue;

		only_metadata = 0;
		if (!memclass(page_zone(page), classzone)) {
			/*
			 * Hack to address an issue found by Rik. The
			 * problem is that
			 * highmem pages can hold buffer headers
			 * allocated
			 * from the slab on lowmem, and so if we are
			 * working
			 * on the NORMAL classzone here, it is correct
			 * not to
			 * try to free the highmem pages themself (that
			 * would be useless)
			 * but we must make sure to drop any lowmem
			 * metadata related to those
			 * highmem pages.
			 */
			if (page->buffers && page->mapping) { /* fast
path racy check */
				if (unlikely(TryLockPage(page)))
					continue;
				if (page->buffers && page->mapping &&
memclass_related_bhs(page, classzone)) { /* non racy check */
					only_metadata = 1;
					goto free_bhs;
				}
				UnlockPage(page);
			}
			continue;
		}

		max_scan--;

max_scan-- should happen only after the memclass check to ensure not
failing too early on GFP_KERNEL/DMA allocations (i.e. no highmem)

This is the right fix:

--- 2.4.23pre4/mm/vmscan.c.~1~	2003-09-13 00:08:04.000000000 +0200
+++ 2.4.23pre4/mm/vmscan.c	2003-09-21 04:40:12.000000000 +0200
@@ -401,6 +401,8 @@ static int shrink_cache(int nr_pages, zo
 		if (!memclass(page_zone(page), classzone))
 			continue;

+		max_scan--;
+
 		/* Racy check to avoid trylocking when not worthwhile */
 		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
 			goto page_mapped;

so your fix is slightly wrong in non-highmem terms (also for scheduling
terms, you would decrease it even when there's a schedule). We need to
finish the merge ASAP with Marcelo. I didn't send specific patches to
Marcelo myself yet, I only pointed out the url and list of them plus
I described the details he wanted to know. I understand he merged what
he found most interesting so I didn't notice this half merge problem yet
but I will start looking into this with the highest prio on Monday (I
was going to look into pre4 very soon for -aa that now will reject
bigtime, and the watermarks too but I had some trouble with the ram and
the scheduler in my fast amd64 this week that kept me busy, the
scheduler fix will be in next -aa and improves ppc as well 11%, on some
of my workloads it's a 99% improvement [this isn't relevant for mainline
though and apparently it was already fixed in 2.6] ;).

> 2. The second part of the patch makes sure that
> inode/dentry caches are shrunk at least once every 5
> secs.  On a machine with a heavy inode stat/directory
> lookup load (e.g. NFS server), most of the memory
> winds up sitting idle in unused inodes/dentry.  The
> present code only reclaims these when a swap_out()
> happens or shrink_caches() fails.  This can take a
> while on a machine will very few mapped pages such as
> an NFS server.

For lots of workloads this will nuke dcache way too fast and rebuilding
it with the fs and buffercache is costly.  However if you make the delay
a sysctl set to MAX_SCHEDULE_TIMEOUT, I would be definitely fine in
merging it. That way it won't make any difference by default and it can
help in the corner cases where hacks like this may provide benefits as
you noted.

thanks!

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/
	    svn://svn.kernel.org/linux-2.[46]/trunk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21  2:46                       ` Andrea Arcangeli
@ 2003-09-21  5:32                         ` Shantanu Goel
  2003-09-21 14:04                           ` Andrea Arcangeli
  2003-09-21 18:37                         ` Marcelo Tosatti
  1 sibling, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-09-21  5:32 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 4602 bytes --]

Agreed on all counts.  I just blindly copied the
max_scan decrement from 2.4.22.  Even there your
suggestion would make sense.  Attached is a new patch
which adds support for vm_vfs_scan_interval sysctl and
also fixes the location of max_scan decrement.

Thanks,
Shantanu

--- Andrea Arcangeli <andrea@suse.de> wrote:
> Hi Shantanu,
> 
> On Sat, Sep 20, 2003 at 06:39:30AM -0700, Shantanu
> Goel wrote:
> > Hi Andrea,
> > 
> > The VM fixes perform rather well in my testing
> > (thanks!), but I noticed a couple of glitches that
> the
> > attached patch addresses.
> > 
> > 1. max_scan is never decremented in
> shrink_cache().  I
> > am assuming this is a typo.
> 
> this is an huge half-merging error, no surprise Andi
> run into vm
> troubles with pre4. My tree looks like this for
> years:
> 
> 		if (unlikely(!page_count(page)))
> 			continue;
> 
> 		only_metadata = 0;
> 		if (!memclass(page_zone(page), classzone)) {
> 			/*
> 			 * Hack to address an issue found by Rik. The
> 			 * problem is that
> 			 * highmem pages can hold buffer headers
> 			 * allocated
> 			 * from the slab on lowmem, and so if we are
> 			 * working
> 			 * on the NORMAL classzone here, it is correct
> 			 * not to
> 			 * try to free the highmem pages themself (that
> 			 * would be useless)
> 			 * but we must make sure to drop any lowmem
> 			 * metadata related to those
> 			 * highmem pages.
> 			 */
> 			if (page->buffers && page->mapping) { /* fast
> path racy check */
> 				if (unlikely(TryLockPage(page)))
> 					continue;
> 				if (page->buffers && page->mapping &&
> memclass_related_bhs(page, classzone)) { /* non racy
> check */
> 					only_metadata = 1;
> 					goto free_bhs;
> 				}
> 				UnlockPage(page);
> 			}
> 			continue;
> 		}
> 
> 		max_scan--;
> 
> max_scan-- should happen only after the memclass
> check to ensure not
> failing too early on GFP_KERNEL/DMA allocations
> (i.e. no highmem)
> 
> This is the right fix:
> 
> --- 2.4.23pre4/mm/vmscan.c.~1~	2003-09-13
> 00:08:04.000000000 +0200
> +++ 2.4.23pre4/mm/vmscan.c	2003-09-21
> 04:40:12.000000000 +0200
> @@ -401,6 +401,8 @@ static int shrink_cache(int
> nr_pages, zo
>  		if (!memclass(page_zone(page), classzone))
>  			continue;
>  
> +		max_scan--;
> +
>  		/* Racy check to avoid trylocking when not
> worthwhile */
>  		if (!page->buffers && (page_count(page) != 1 ||
> !page->mapping))
>  			goto page_mapped;
> 
> so your fix is slightly wrong in non-highmem terms
> (also for scheduling
> terms, you would decrease it even when there's a
> schedule). We need to
> finish the merge ASAP with Marcelo. I didn't send
> specific patches to
> Marcelo myself yet, I only pointed out the url and
> list of them plus
> I described the details he wanted to know. I
> understand he merged what
> he found most interesting so I didn't notice this
> half merge problem yet
> but I will start looking into this with the highest
> prio on Monday (I
> was going to look into pre4 very soon for -aa that
> now will reject
> bigtime, and the watermarks too but I had some
> trouble with the ram and
> the scheduler in my fast amd64 this week that kept
> me busy, the
> scheduler fix will be in next -aa and improves ppc
> as well 11%, on some
> of my workloads it's a 99% improvement [this isn't
> relevant for mainline
> though and apparently it was already fixed in 2.6]
> ;).
> 
> > 2. The second part of the patch makes sure that
> > inode/dentry caches are shrunk at least once every
> 5
> > secs.  On a machine with a heavy inode
> stat/directory
> > lookup load (e.g. NFS server), most of the memory
> > winds up sitting idle in unused inodes/dentry. 
> The
> > present code only reclaims these when a swap_out()
> > happens or shrink_caches() fails.  This can take a
> > while on a machine will very few mapped pages such
> as
> > an NFS server.
> 
> For lots of workloads this will nuke dcache way too
> fast and rebuilding
> it with the fs and buffercache is costly.  However
> if you make the delay
> a sysctl set to MAX_SCHEDULE_TIMEOUT, I would be
> definitely fine in
> merging it. That way it won't make any difference by
> default and it can
> help in the corner cases where hacks like this may
> provide benefits as
> you noted.
> 
> thanks!
> 
> Andrea - If you prefer relying on open source
> software, check these links:
> 	   
>
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> 	    http://www.cobite.com/cvsps/
> 	    svn://svn.kernel.org/linux-2.[46]/trunk

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

[-- Attachment #2: vfs-interval.patch --]
[-- Type: application/octet-stream, Size: 4046 bytes --]

--- ./kernel/sysctl.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ ./kernel/sysctl.c	2003-09-21 00:49:32.000000000 -0400
@@ -281,6 +281,8 @@
 	 &vm_gfp_debug, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_VFS_SCAN_RATIO, "vm_vfs_scan_ratio", 
 	 &vm_vfs_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
+	{VM_VFS_SCAN_INTERVAL, "vm_vfs_scan_interval", 
+	 &vm_vfs_scan_interval, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_CACHE_SCAN_RATIO, "vm_cache_scan_ratio", 
 	 &vm_cache_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_MAPPED_RATIO, "vm_mapped_ratio", 
--- ./mm/vmscan.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ ./mm/vmscan.c	2003-09-21 00:59:11.000000000 -0400
@@ -65,6 +65,12 @@
 int vm_vfs_scan_ratio = 6;
 
 /*
+ * "vm_vfs_scan_interval" is how often (in seconds)
+ * memory gets reclaimed from inode/dentry cache.
+ */
+int vm_vfs_scan_interval = MAX_SCHEDULE_TIMEOUT / HZ;
+
+/*
  * The swap-out function returns 1 if it successfully
  * scanned all the pages it was asked to (`count').
  * It returns zero if it couldn't do anything,
@@ -364,6 +370,20 @@
 	return 0;
 }
 
+static void shrink_vfs_caches(int force, unsigned int gfp_mask)
+{
+	static unsigned long last_scan = 0;
+
+	if (force || time_after(jiffies, last_scan + vm_vfs_scan_interval * HZ)) {
+		shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
+		shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
+#ifdef CONFIG_QUOTA
+		shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
+#endif
+		last_scan = jiffies;
+	}
+}
+
 static void FASTCALL(refill_inactive(int nr_pages, zone_t * classzone));
 static int FASTCALL(shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout));
 static int shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout)
@@ -401,6 +421,8 @@
 		if (!memclass(page_zone(page), classzone))
 			continue;
 
+		max_scan--;
+
 		/* Racy check to avoid trylocking when not worthwhile */
 		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
 			goto page_mapped;
@@ -516,11 +538,7 @@
 				if (nr_pages <= 0)
 					goto out;
 
-				shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-				shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-				shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+				shrink_vfs_caches(*failed_swapout, gfp_mask);
 
 				if (!*failed_swapout)
 					*failed_swapout = !swap_out(classzone);
@@ -614,6 +632,8 @@
 	if (nr_pages <= 0)
 		goto out;
 
+	shrink_vfs_caches(0, gfp_mask);
+
 	spin_lock(&pagemap_lru_lock);
 	refill_inactive(nr_pages, classzone);
 
@@ -638,11 +658,9 @@
 			nr_pages = shrink_caches(classzone, gfp_mask, nr_pages, &failed_swapout);
 			if (nr_pages <= 0)
 				return 1;
-			shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-			shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-			shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+
+			shrink_vfs_caches(1, gfp_mask);
+
 			if (!failed_swapout)
 				failed_swapout = !swap_out(classzone);
 		} while (--tries);
--- ./include/linux/swap.h.~1~	2003-09-16 20:46:35.000000000 -0400
+++ ./include/linux/swap.h	2003-09-21 00:56:23.000000000 -0400
@@ -116,6 +116,7 @@
 extern int FASTCALL(try_to_free_pages_zone(zone_t *, unsigned int));
 extern int FASTCALL(try_to_free_pages(unsigned int));
 extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio;
+extern int vm_vfs_scan_interval;
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, struct page *);
--- ./include/linux/sysctl.h.~1~	2003-09-16 20:46:35.000000000 -0400
+++ ./include/linux/sysctl.h	2003-09-21 00:59:37.000000000 -0400
@@ -154,6 +154,7 @@
 	VM_GFP_DEBUG=18,        /* debug GFP failures */
 	VM_CACHE_SCAN_RATIO=19, /* part of the inactive cache list to scan */
 	VM_MAPPED_RATIO=20,     /* amount of unfreeable pages that triggers swapout */
+	VM_VFS_SCAN_INTERVAL=21,/* interval (in secs) between reclaiming memory from inode/dentry cache */
 };
 
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21  5:32                         ` Shantanu Goel
@ 2003-09-21 14:04                           ` Andrea Arcangeli
  2003-09-21 14:51                             ` Shantanu Goel
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-09-21 14:04 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: linux-kernel, Marcelo Tosatti

On Sat, Sep 20, 2003 at 10:32:09PM -0700, Shantanu Goel wrote:
> Agreed on all counts.  I just blindly copied the
> max_scan decrement from 2.4.22.  Even there your
> suggestion would make sense.  Attached is a new patch
> which adds support for vm_vfs_scan_interval sysctl and
> also fixes the location of max_scan decrement.

this patch looks fine to me thanks. Marcelo, feel free to merge this one
instead of my one liner fix.

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/
	    svn://svn.kernel.org/linux-2.[46]/trunk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21 14:04                           ` Andrea Arcangeli
@ 2003-09-21 14:51                             ` Shantanu Goel
  2003-09-21 15:14                               ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-09-21 14:51 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

I just realized the original code had one desirable
behaviour that my patch is missing, namely, it
reclaimed memory from dcache/inode every time swap_out
is called.  Please use the attached patch that
restores the original behaviour.  Otherwise, if the
interval is very long, no reclamation will happen
until swap_out() fails which in the common case is
unlikely.

Thanks,
Shantanu

--- Andrea Arcangeli <andrea@suse.de> wrote:
> On Sat, Sep 20, 2003 at 10:32:09PM -0700, Shantanu
> Goel wrote:
> > Agreed on all counts.  I just blindly copied the
> > max_scan decrement from 2.4.22.  Even there your
> > suggestion would make sense.  Attached is a new
> patch
> > which adds support for vm_vfs_scan_interval sysctl
> and
> > also fixes the location of max_scan decrement.
> 
> this patch looks fine to me thanks. Marcelo, feel
> free to merge this one
> instead of my one liner fix.
> 
> Andrea - If you prefer relying on open source
> software, check these links:
> 	   
>
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> 	    http://www.cobite.com/cvsps/
> 	    svn://svn.kernel.org/linux-2.[46]/trunk

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

[-- Attachment #2: vfs-interval2.patch --]
[-- Type: application/octet-stream, Size: 4032 bytes --]

--- ./kernel/sysctl.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ ./kernel/sysctl.c	2003-09-21 00:49:32.000000000 -0400
@@ -281,6 +281,8 @@
 	 &vm_gfp_debug, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_VFS_SCAN_RATIO, "vm_vfs_scan_ratio", 
 	 &vm_vfs_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
+	{VM_VFS_SCAN_INTERVAL, "vm_vfs_scan_interval", 
+	 &vm_vfs_scan_interval, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_CACHE_SCAN_RATIO, "vm_cache_scan_ratio", 
 	 &vm_cache_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_MAPPED_RATIO, "vm_mapped_ratio", 
--- ./mm/vmscan.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ ./mm/vmscan.c	2003-09-21 02:47:49.000000000 -0400
@@ -65,6 +65,12 @@
 int vm_vfs_scan_ratio = 6;
 
 /*
+ * "vm_vfs_scan_interval" is how often (in seconds)
+ * memory gets reclaimed from inode/dentry cache.
+ */
+int vm_vfs_scan_interval = MAX_SCHEDULE_TIMEOUT / HZ;
+
+/*
  * The swap-out function returns 1 if it successfully
  * scanned all the pages it was asked to (`count').
  * It returns zero if it couldn't do anything,
@@ -364,6 +370,20 @@
 	return 0;
 }
 
+static void shrink_vfs_caches(int force, unsigned int gfp_mask)
+{
+	static unsigned long last_scan = 0;
+
+	if (force || time_after(jiffies, last_scan + vm_vfs_scan_interval * HZ)) {
+		shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
+		shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
+#ifdef CONFIG_QUOTA
+		shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
+#endif
+		last_scan = jiffies;
+	}
+}
+
 static void FASTCALL(refill_inactive(int nr_pages, zone_t * classzone));
 static int FASTCALL(shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout));
 static int shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout)
@@ -401,6 +421,8 @@
 		if (!memclass(page_zone(page), classzone))
 			continue;
 
+		max_scan--;
+
 		/* Racy check to avoid trylocking when not worthwhile */
 		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
 			goto page_mapped;
@@ -516,11 +538,7 @@
 				if (nr_pages <= 0)
 					goto out;
 
-				shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-				shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-				shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+				shrink_vfs_caches(1, gfp_mask);
 
 				if (!*failed_swapout)
 					*failed_swapout = !swap_out(classzone);
@@ -614,6 +632,8 @@
 	if (nr_pages <= 0)
 		goto out;
 
+	shrink_vfs_caches(0, gfp_mask);
+
 	spin_lock(&pagemap_lru_lock);
 	refill_inactive(nr_pages, classzone);
 
@@ -638,11 +658,9 @@
 			nr_pages = shrink_caches(classzone, gfp_mask, nr_pages, &failed_swapout);
 			if (nr_pages <= 0)
 				return 1;
-			shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-			shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-			shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+
+			shrink_vfs_caches(1, gfp_mask);
+
 			if (!failed_swapout)
 				failed_swapout = !swap_out(classzone);
 		} while (--tries);
--- ./include/linux/swap.h.~1~	2003-09-16 20:46:35.000000000 -0400
+++ ./include/linux/swap.h	2003-09-21 00:56:23.000000000 -0400
@@ -116,6 +116,7 @@
 extern int FASTCALL(try_to_free_pages_zone(zone_t *, unsigned int));
 extern int FASTCALL(try_to_free_pages(unsigned int));
 extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio;
+extern int vm_vfs_scan_interval;
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, struct page *);
--- ./include/linux/sysctl.h.~1~	2003-09-16 20:46:35.000000000 -0400
+++ ./include/linux/sysctl.h	2003-09-21 00:59:37.000000000 -0400
@@ -154,6 +154,7 @@
 	VM_GFP_DEBUG=18,        /* debug GFP failures */
 	VM_CACHE_SCAN_RATIO=19, /* part of the inactive cache list to scan */
 	VM_MAPPED_RATIO=20,     /* amount of unfreeable pages that triggers swapout */
+	VM_VFS_SCAN_INTERVAL=21,/* interval (in secs) between reclaiming memory from inode/dentry cache */
 };
 
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21 14:51                             ` Shantanu Goel
@ 2003-09-21 15:14                               ` Andrea Arcangeli
  2003-09-21 15:28                                 ` Shantanu Goel
  0 siblings, 1 reply; 21+ messages in thread
From: Andrea Arcangeli @ 2003-09-21 15:14 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: linux-kernel, Marcelo Tosatti

On Sun, Sep 21, 2003 at 07:51:27AM -0700, Shantanu Goel wrote:
> I just realized the original code had one desirable
> behaviour that my patch is missing, namely, it
> reclaimed memory from dcache/inode every time swap_out
> is called.  Please use the attached patch that
> restores the original behaviour.  Otherwise, if the
> interval is very long, no reclamation will happen
> until swap_out() fails which in the common case is
> unlikely.

I overlooked the *failed_swapout, I thought you used only 0 and 1 as
parameters, the new version is fine.

BTW, it would also be cleaner to add a __ in front of the function name,
and to #define a _force version that will pass 1, so you don't have less
readable 0/1 in the caller, but I don't mind even with the status in
version 2 (it's simple enough to understand the semantics of the 0/1).

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/
	    svn://svn.kernel.org/linux-2.[46]/trunk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21 15:14                               ` Andrea Arcangeli
@ 2003-09-21 15:28                                 ` Shantanu Goel
  2003-09-21 15:57                                   ` Andrea Arcangeli
  0 siblings, 1 reply; 21+ messages in thread
From: Shantanu Goel @ 2003-09-21 15:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

Great suggestion.  Attached is a fixed patch.

--- Andrea Arcangeli <andrea@suse.de> wrote:
> On Sun, Sep 21, 2003 at 07:51:27AM -0700, Shantanu
> Goel wrote:
> > I just realized the original code had one
> desirable
> > behaviour that my patch is missing, namely, it
> > reclaimed memory from dcache/inode every time
> swap_out
> > is called.  Please use the attached patch that
> > restores the original behaviour.  Otherwise, if
> the
> > interval is very long, no reclamation will happen
> > until swap_out() fails which in the common case is
> > unlikely.
> 
> I overlooked the *failed_swapout, I thought you used
> only 0 and 1 as
> parameters, the new version is fine.
> 
> BTW, it would also be cleaner to add a __ in front
> of the function name,
> and to #define a _force version that will pass 1, so
> you don't have less
> readable 0/1 in the caller, but I don't mind even
> with the status in
> version 2 (it's simple enough to understand the
> semantics of the 0/1).
> 
> Andrea - If you prefer relying on open source
> software, check these links:
> 	   
>
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> 	    http://www.cobite.com/cvsps/
> 	    svn://svn.kernel.org/linux-2.[46]/trunk

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

[-- Attachment #2: vfs-interval3.patch --]
[-- Type: application/octet-stream, Size: 4186 bytes --]

--- ./kernel/sysctl.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ ./kernel/sysctl.c	2003-09-21 00:49:32.000000000 -0400
@@ -281,6 +281,8 @@
 	 &vm_gfp_debug, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_VFS_SCAN_RATIO, "vm_vfs_scan_ratio", 
 	 &vm_vfs_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
+	{VM_VFS_SCAN_INTERVAL, "vm_vfs_scan_interval", 
+	 &vm_vfs_scan_interval, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_CACHE_SCAN_RATIO, "vm_cache_scan_ratio", 
 	 &vm_cache_scan_ratio, sizeof(int), 0644, NULL, &proc_dointvec},
 	{VM_MAPPED_RATIO, "vm_mapped_ratio", 
--- ./mm/vmscan.c.~1~	2003-09-16 20:44:14.000000000 -0400
+++ ./mm/vmscan.c	2003-09-21 11:24:49.000000000 -0400
@@ -65,6 +65,12 @@
 int vm_vfs_scan_ratio = 6;
 
 /*
+ * "vm_vfs_scan_interval" is how often (in seconds)
+ * memory gets reclaimed from inode/dentry cache.
+ */
+int vm_vfs_scan_interval = MAX_SCHEDULE_TIMEOUT / HZ;
+
+/*
  * The swap-out function returns 1 if it successfully
  * scanned all the pages it was asked to (`count').
  * It returns zero if it couldn't do anything,
@@ -364,6 +370,23 @@
 	return 0;
 }
 
+#define shrink_vfs_caches(gfp_mask)		__shrink_vfs_caches(0, gfp_mask)
+#define shrink_vfs_caches_force(gfp_mask)	__shrink_vfs_caches(1, gfp_mask)
+
+static void __shrink_vfs_caches(int force, unsigned int gfp_mask)
+{
+	static unsigned long last_scan = 0;
+
+	if (force || time_after(jiffies, last_scan + vm_vfs_scan_interval * HZ)) {
+		shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
+		shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
+#ifdef CONFIG_QUOTA
+		shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
+#endif
+		last_scan = jiffies;
+	}
+}
+
 static void FASTCALL(refill_inactive(int nr_pages, zone_t * classzone));
 static int FASTCALL(shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout));
 static int shrink_cache(int nr_pages, zone_t * classzone, unsigned int gfp_mask, int * failed_swapout)
@@ -401,6 +424,8 @@
 		if (!memclass(page_zone(page), classzone))
 			continue;
 
+		max_scan--;
+
 		/* Racy check to avoid trylocking when not worthwhile */
 		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
 			goto page_mapped;
@@ -516,11 +541,7 @@
 				if (nr_pages <= 0)
 					goto out;
 
-				shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-				shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-				shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+				shrink_vfs_caches_force(gfp_mask);
 
 				if (!*failed_swapout)
 					*failed_swapout = !swap_out(classzone);
@@ -614,6 +635,8 @@
 	if (nr_pages <= 0)
 		goto out;
 
+	shrink_vfs_caches(gfp_mask);
+
 	spin_lock(&pagemap_lru_lock);
 	refill_inactive(nr_pages, classzone);
 
@@ -638,11 +661,9 @@
 			nr_pages = shrink_caches(classzone, gfp_mask, nr_pages, &failed_swapout);
 			if (nr_pages <= 0)
 				return 1;
-			shrink_dcache_memory(vm_vfs_scan_ratio, gfp_mask);
-			shrink_icache_memory(vm_vfs_scan_ratio, gfp_mask);
-#ifdef CONFIG_QUOTA
-			shrink_dqcache_memory(vm_vfs_scan_ratio, gfp_mask);
-#endif
+
+			shrink_vfs_caches_force(gfp_mask);
+
 			if (!failed_swapout)
 				failed_swapout = !swap_out(classzone);
 		} while (--tries);
--- ./include/linux/swap.h.~1~	2003-09-16 20:46:35.000000000 -0400
+++ ./include/linux/swap.h	2003-09-21 00:56:23.000000000 -0400
@@ -116,6 +116,7 @@
 extern int FASTCALL(try_to_free_pages_zone(zone_t *, unsigned int));
 extern int FASTCALL(try_to_free_pages(unsigned int));
 extern int vm_vfs_scan_ratio, vm_cache_scan_ratio, vm_lru_balance_ratio, vm_passes, vm_gfp_debug, vm_mapped_ratio;
+extern int vm_vfs_scan_interval;
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, struct page *);
--- ./include/linux/sysctl.h.~1~	2003-09-16 20:46:35.000000000 -0400
+++ ./include/linux/sysctl.h	2003-09-21 00:59:37.000000000 -0400
@@ -154,6 +154,7 @@
 	VM_GFP_DEBUG=18,        /* debug GFP failures */
 	VM_CACHE_SCAN_RATIO=19, /* part of the inactive cache list to scan */
 	VM_MAPPED_RATIO=20,     /* amount of unfreeable pages that triggers swapout */
+	VM_VFS_SCAN_INTERVAL=21,/* interval (in secs) between reclaiming memory from inode/dentry cache */
 };
 
 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21 15:28                                 ` Shantanu Goel
@ 2003-09-21 15:57                                   ` Andrea Arcangeli
  0 siblings, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2003-09-21 15:57 UTC (permalink / raw)
  To: Shantanu Goel; +Cc: linux-kernel, Marcelo Tosatti

On Sun, Sep 21, 2003 at 08:28:55AM -0700, Shantanu Goel wrote:
> Great suggestion.  Attached is a fixed patch.

thinks looks even better now! :) Thanks.

Andrea - If you prefer relying on open source software, check these links:
	    rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
	    http://www.cobite.com/cvsps/
	    svn://svn.kernel.org/linux-2.[46]/trunk

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: A couple of 2.4.23-pre4 VM nits
  2003-09-21  2:46                       ` Andrea Arcangeli
  2003-09-21  5:32                         ` Shantanu Goel
@ 2003-09-21 18:37                         ` Marcelo Tosatti
  1 sibling, 0 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2003-09-21 18:37 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Shantanu Goel, linux-kernel, Marcelo Tosatti



On Sun, 21 Sep 2003, Andrea Arcangeli wrote:

> Hi Shantanu,
> 
> On Sat, Sep 20, 2003 at 06:39:30AM -0700, Shantanu Goel wrote:
> > Hi Andrea,
> > 
> > The VM fixes perform rather well in my testing
> > (thanks!), but I noticed a couple of glitches that the
> > attached patch addresses.
> > 
> > 1. max_scan is never decremented in shrink_cache().  I
> > am assuming this is a typo.
> 
> this is an huge half-merging error, no surprise Andi run into vm
> troubles with pre4. My tree looks like this for years:
> 
> 		if (unlikely(!page_count(page)))
> 			continue;
> 
> 		only_metadata = 0;
> 		if (!memclass(page_zone(page), classzone)) {
> 			/*
> 			 * Hack to address an issue found by Rik. The
> 			 * problem is that
> 			 * highmem pages can hold buffer headers
> 			 * allocated
> 			 * from the slab on lowmem, and so if we are
> 			 * working
> 			 * on the NORMAL classzone here, it is correct
> 			 * not to
> 			 * try to free the highmem pages themself (that
> 			 * would be useless)
> 			 * but we must make sure to drop any lowmem
> 			 * metadata related to those
> 			 * highmem pages.
> 			 */
> 			if (page->buffers && page->mapping) { /* fast
> path racy check */
> 				if (unlikely(TryLockPage(page)))
> 					continue;
> 				if (page->buffers && page->mapping &&
> memclass_related_bhs(page, classzone)) { /* non racy check */
> 					only_metadata = 1;
> 					goto free_bhs;
> 				}
> 				UnlockPage(page);
> 			}
> 			continue;
> 		}
> 
> 		max_scan--;
> 
> max_scan-- should happen only after the memclass check to ensure not
> failing too early on GFP_KERNEL/DMA allocations (i.e. no highmem)
> 
> This is the right fix:
> 
> --- 2.4.23pre4/mm/vmscan.c.~1~	2003-09-13 00:08:04.000000000 +0200
> +++ 2.4.23pre4/mm/vmscan.c	2003-09-21 04:40:12.000000000 +0200
> @@ -401,6 +401,8 @@ static int shrink_cache(int nr_pages, zo
>  		if (!memclass(page_zone(page), classzone))
>  			continue;
>  
> +		max_scan--;
> +
>  		/* Racy check to avoid trylocking when not worthwhile */
>  		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
>  			goto page_mapped;

Right! Thanks Andrea. 

Sorry for the merge mistake people. Shame on me.

Im going to release pre5 now with this. 


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-09-21 18:35 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-29 15:01 [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22 Shantanu Goel
2003-08-29 14:57 ` Antonio Vargas
2003-08-29 17:55   ` Shantanu Goel
2003-08-29 18:06     ` Mike Fedyk
2003-08-29 18:46       ` Shantanu Goel
2003-08-29 18:57         ` Mike Fedyk
2003-08-29 19:28           ` Andrea Arcangeli
2003-08-29 19:46             ` Shantanu Goel
2003-08-29 19:55               ` Andrea Arcangeli
2003-08-29 20:20                 ` Shantanu Goel
2003-08-30  5:01                   ` Antonio Vargas
2003-09-20 13:39                     ` A couple of 2.4.23-pre4 VM nits Shantanu Goel
2003-09-21  2:46                       ` Andrea Arcangeli
2003-09-21  5:32                         ` Shantanu Goel
2003-09-21 14:04                           ` Andrea Arcangeli
2003-09-21 14:51                             ` Shantanu Goel
2003-09-21 15:14                               ` Andrea Arcangeli
2003-09-21 15:28                                 ` Shantanu Goel
2003-09-21 15:57                                   ` Andrea Arcangeli
2003-09-21 18:37                         ` Marcelo Tosatti
2003-08-29 19:11 ` [VM PATCH] Faster reclamation of dirty pages and unused inode/dcache entries in 2.4.22 Rahul Karnik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).