linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] 2.4.6-pre2 page_launder() improvements
@ 2001-06-10  4:40 Rik van Riel
  2001-06-10  8:38 ` George Bonser
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Rik van Riel @ 2001-06-10  4:40 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

[Request For Testers ... patch below]

Hi,

during my holidays I've written the following patch (forward-ported
to 2.4.6-pre2 and improved a tad today), which implements these
improvements to page_launder():

1) don't "roll over" inactive_dirty pages to the back of the
   list, but reclaim them in something more resembling LRU
   order;  this is especially good when the system has tons
   of inactive_dirty pages due to eg. background scanning

2) eliminate the infinite penalty clean pages had over dirty
   pages by not scanning the complete inactive_dirty list and
   letting real dirty pages build up near the front of the
   list ... we flush them asynchronously when we have enough
   of them

3) when going into the launder_loop, we scan a larger fraction
   of the inactive_dirty list; under most workloads this means
   we can always flush the dirty pages asynchronously because
   we'll have clean, freeable pages in the part of the list we
   only scan in the launder_loop

4) when we have only dirty pages and cannot free pages, we
   remember this for the next run of page_launder() and won't
   waste CPU by scanning pages without flushing them in the
   launder loop (after maxlaunder goes negative)

5) this same logic is used to control when we use synchronous
   IO; only when we cannot free any pages now do we wait on
   IO, this stops kswapd CPU wastage under heavy write loads

6) the "sync" argument to page_launder() now means whether
   we're _allowed_ to do synchronous IO or not ... page_launder()
   is now smart enough to determine if we should use asynchronous
   IO only or if we should wait on IO

This patch has given excellent results on my laptop and my
workstation here and seems to improve kernel behaviour in tests
quite a bit. I can play mp3's unbuffered during moderate write
loads or moderately heavy IO ;)

YMMV, please test it. If it works great for everybody I'd like
to get this improvement merged into the next -pre kernel.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


diff -ur linux-2.4.6-pre2-virgin/include/linux/mm.h linux-2.4.6-pre2/include/linux/mm.h
--- linux-2.4.6-pre2-virgin/include/linux/mm.h	Sun Jun 10 00:44:01 2001
+++ linux-2.4.6-pre2/include/linux/mm.h	Sat Jun  9 23:19:54 2001
@@ -169,6 +169,7 @@
 #define PG_inactive_clean	11
 #define PG_highmem		12
 #define PG_checked		13	/* kill me in 2.5.<early>. */
+#define PG_marker		14
 				/* bits 21-29 unused */
 #define PG_arch_1		30
 #define PG_reserved		31
@@ -242,6 +243,9 @@
 #define PageInactiveClean(page)	test_bit(PG_inactive_clean, &(page)->flags)
 #define SetPageInactiveClean(page)	set_bit(PG_inactive_clean, &(page)->flags)
 #define ClearPageInactiveClean(page)	clear_bit(PG_inactive_clean, &(page)->flags)
+
+#define PageMarker(page)	test_bit(PG_marker, &(page)->flags)
+#define SetPageMarker(page)	set_bit(PG_marker, &(page)->flags)

 #ifdef CONFIG_HIGHMEM
 #define PageHighMem(page)		test_bit(PG_highmem, &(page)->flags)
diff -ur linux-2.4.6-pre2-virgin/include/linux/swap.h linux-2.4.6-pre2/include/linux/swap.h
--- linux-2.4.6-pre2-virgin/include/linux/swap.h	Sun Jun 10 00:44:01 2001
+++ linux-2.4.6-pre2/include/linux/swap.h	Sat Jun  9 23:19:54 2001
@@ -205,6 +205,16 @@
 	page->zone->inactive_dirty_pages++; \
 }

+/* Like the above, but add us after the bookmark. */
+#define add_page_to_inactive_dirty_list_marker(page) { \
+	DEBUG_ADD_PAGE \
+	ZERO_PAGE_BUG \
+	SetPageInactiveDirty(page); \
+	list_add(&(page)->lru, marker_lru); \
+	nr_inactive_dirty_pages++; \
+	page->zone->inactive_dirty_pages++; \
+}
+
 #define add_page_to_inactive_clean_list(page) { \
 	DEBUG_ADD_PAGE \
 	ZERO_PAGE_BUG \
diff -ur linux-2.4.6-pre2-virgin/mm/vmscan.c linux-2.4.6-pre2/mm/vmscan.c
--- linux-2.4.6-pre2-virgin/mm/vmscan.c	Sun Jun 10 00:44:02 2001
+++ linux-2.4.6-pre2/mm/vmscan.c	Sun Jun 10 00:57:25 2001
@@ -407,7 +407,7 @@
 /**
  * page_launder - clean dirty inactive pages, move to inactive_clean list
  * @gfp_mask: what operations we are allowed to do
- * @sync: should we wait synchronously for the cleaning of pages
+ * @sync: are we allowed to do synchronous IO in emergencies ?
  *
  * When this function is called, we are most likely low on free +
  * inactive_clean pages. Since we want to refill those pages as
@@ -428,20 +428,54 @@
 #define CAN_DO_BUFFERS		(gfp_mask & __GFP_BUFFER)
 int page_launder(int gfp_mask, int sync)
 {
+	static int cannot_free_pages;
 	int launder_loop, maxscan, cleaned_pages, maxlaunder;
-	struct list_head * page_lru;
+	struct list_head * page_lru, * marker_lru;
 	struct page * page;

+	/* Our bookmark of where we are in the inactive_dirty list. */
+	struct page marker_page_struct = {
+		flags: (1<<PG_marker),
+		lru: { NULL, NULL },
+	};
+	marker_lru = &marker_page_struct.lru;
+
 	launder_loop = 0;
 	maxlaunder = 0;
 	cleaned_pages = 0;

 dirty_page_rescan:
 	spin_lock(&pagemap_lru_lock);
-	maxscan = nr_inactive_dirty_pages;
-	while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
-				maxscan-- > 0) {
+	/*
+	 * By not scanning all inactive dirty pages we'll write out
+	 * really old dirty pages before evicting newer clean pages.
+	 * This should cause some LRU behaviour if we have a large
+	 * amount of inactive pages (due to eg. drop behind).
+	 *
+	 * It also makes us accumulate dirty pages until we have enough
+	 * to be worth writing to disk without causing excessive disk
+	 * seeks and eliminates the infinite penalty clean pages incurred
+	 * vs. dirty pages.
+	 */
+	maxscan = nr_inactive_dirty_pages / 4;
+	if (launder_loop)
+		maxscan *= 2;
+	list_add_tail(marker_lru, &inactive_dirty_list);
+	while ((page_lru = marker_lru->prev) != &inactive_dirty_list &&
+			maxscan-- > 0 && free_shortage()) {
 		page = list_entry(page_lru, struct page, lru);
+		/* We move the bookmark forward by flipping the page ;) */
+		list_del(page_lru);
+		list_add(page_lru, marker_lru);
+
+		/* Don't waste CPU if chances are we cannot free anything. */
+		if (launder_loop && maxlaunder < 0 && cannot_free_pages)
+			break;
+
+		/* Skip other people's marker pages. */
+		if (PageMarker(page)) {
+			continue;
+		}

 		/* Wrong page on list?! (list corruption, should not happen) */
 		if (!PageInactiveDirty(page)) {
@@ -454,7 +488,6 @@

 		/* Page is or was in use?  Move it to the active list. */
 		if (PageReferenced(page) || page->age > 0 ||
-				page->zone->free_pages > page->zone->pages_high ||
 				(!page->buffers && page_count(page) > 1) ||
 				page_ramdisk(page)) {
 			del_page_from_inactive_dirty_list(page);
@@ -464,11 +497,9 @@

 		/*
 		 * The page is locked. IO in progress?
-		 * Move it to the back of the list.
+		 * Skip the page, we'll take a look when it unlocks.
 		 */
 		if (TryLockPage(page)) {
-			list_del(page_lru);
-			list_add(page_lru, &inactive_dirty_list);
 			continue;
 		}

@@ -482,10 +513,8 @@
 			if (!writepage)
 				goto page_active;

-			/* First time through? Move it to the back of the list */
+			/* First time through? Skip the page. */
 			if (!launder_loop || !CAN_DO_IO) {
-				list_del(page_lru);
-				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);
 				continue;
 			}
@@ -544,7 +573,7 @@

 			/* The buffers were not freed. */
 			if (!clearedbuf) {
-				add_page_to_inactive_dirty_list(page);
+				add_page_to_inactive_dirty_list_marker(page);

 			/* The page was only in the buffer cache. */
 			} else if (!page->mapping) {
@@ -600,6 +629,8 @@
 			UnlockPage(page);
 		}
 	}
+	/* Remove our marker. */
+	list_del(marker_lru);
 	spin_unlock(&pagemap_lru_lock);

 	/*
@@ -615,16 +646,29 @@
 	 */
 	if ((CAN_DO_IO || CAN_DO_BUFFERS) && !launder_loop && free_shortage()) {
 		launder_loop = 1;
-		/* If we cleaned pages, never do synchronous IO. */
-		if (cleaned_pages)
+		/*
+		 * If we, or the previous process running page_launder(),
+		 * managed to free any pages we never do synchronous IO.
+		 */
+		if (cleaned_pages || !cannot_free_pages)
 			sync = 0;
+		/* Else, do synchronous IO (if we are allowed to). */
+		else if (sync)
+			sync = 1;
 		/* We only do a few "out of order" flushes. */
 		maxlaunder = MAX_LAUNDER;
-		/* Kflushd takes care of the rest. */
+		/* Let bdflush take care of the rest. */
 		wakeup_bdflush(0);
 		goto dirty_page_rescan;
 	}

+	/*
+	 * If we failed to free pages (because all pages are dirty)
+	 * we remember this for the next time. This will prevent us
+	 * from wasting too much CPU here.
+	 */
+	cannot_free_pages = !cleaned_pages;
+
 	/* Return the number of pages moved to the inactive_clean list. */
 	return cleaned_pages;
 }
@@ -852,7 +896,7 @@
 	 * list, so this is a relatively cheap operation.
 	 */
 	if (free_shortage()) {
-		ret += page_launder(gfp_mask, user);
+		ret += page_launder(gfp_mask, 1);
 		shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
 		shrink_icache_memory(DEF_PRIORITY, gfp_mask);
 	}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  4:40 [PATCH] 2.4.6-pre2 page_launder() improvements Rik van Riel
@ 2001-06-10  8:38 ` George Bonser
  2001-06-10  8:43   ` Rik van Riel
  2001-06-11  3:03 ` Daniel Stone
  2001-06-13  4:42 ` Alok K. Dhir
  2 siblings, 1 reply; 12+ messages in thread
From: George Bonser @ 2001-06-10  8:38 UTC (permalink / raw)
  To: Rik van Riel, linux-mm; +Cc: linux-kernel

>
> This patch has given excellent results on my laptop and my
> workstation here and seems to improve kernel behaviour in tests
> quite a bit. I can play mp3's unbuffered during moderate write
> loads or moderately heavy IO ;)
>
> YMMV, please test it. If it works great for everybody I'd like
> to get this improvement merged into the next -pre kernel.

For the test I ran it through, it was about the same as 2.4.6-pre2 vanilla.

The test I did can be simulated easilly enough. Apache with keepalives off.
about 50 connections/second pulling a 10K file. Have half a gig of swap.
Told the machine on boot that it had 64MB of RAM. Prestarted 250 apache
children. Once everything settles down, I was about 10meg into swap and
everything is running smoothly. So far so good. Now I figured I would push
it a little more into swap so this being a production server, I can't really
tell the world to make more connections so I do the next best thing and turn
keepalives on with a timeout of 2 seconds figuring this would increase the
number of apache children alive and push me deeper into swap. Restarted
apache and within about 5 seconds the machine stopped responding to console
input. top would not update the screen but the machine would respond to
pings.

I took it out of the load balancer and regained control in seconds. The 15
minute load average showed somewhere over 150 with a bazillion apache
processes. Even top -q would not update when I put it back into the
balancer. The load average and number of processes started to increase until
I got to some point where it would just stop providing output. Again,
control returned within seconds after taking it out of the balancer. As far
as I could tell, I never at any time got more than 100MB into swap.

Your patch did seem to keep the machine alive a little longer but that is
subjective and I have no data to back that statement up. Vanilla 2.4.6-pre2
seemed to die off a little faster. Again, with both kernels, pings were
fine, just no interactive at all. I was logged in over the net with no
console so I could not see what was hogging the CPU bot it did not appear to
be a user process. That top -q would not update tells me it was likely in
the kernel because that runs as the highest priority user process. I just
could not get any CPU in user space.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  8:38 ` George Bonser
@ 2001-06-10  8:43   ` Rik van Riel
  2001-06-10  8:52     ` George Bonser
                       ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Rik van Riel @ 2001-06-10  8:43 UTC (permalink / raw)
  To: George Bonser; +Cc: linux-mm, linux-kernel

On Sun, 10 Jun 2001, George Bonser wrote:

> I took it out of the load balancer and regained control in
> seconds. The 15 minute load average showed somewhere over 150
> with a bazillion apache processes. Even top -q would not update
> when I put it back into the balancer. The load average and
> number of processes started to increase until I got to some
> point where it would just stop providing output. Again, control
> returned within seconds after taking it out of the balancer. As
> far as I could tell, I never at any time got more than 100MB
> into swap.

OK, I guess it's just thrashing.  Having 64MB of RAM with
250 Apache processes will give you about 256kB per Apache
process ... minus page table, TCP, etc... overhead.

That sounds like the machine just gets a working set
larger than the amount of available memory. It should
work better with eg. 96, 128 or more MBs of memory.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  8:43   ` Rik van Riel
@ 2001-06-10  8:52     ` George Bonser
  2001-06-10  9:06     ` George Bonser
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: George Bonser @ 2001-06-10  8:52 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

>
> That sounds like the machine just gets a working set
> larger than the amount of available memory. It should
> work better with eg. 96, 128 or more MBs of memory.
>

Right, I run them with 256M ... thought I would try to squeeze it a bit to
see what broke.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  8:43   ` Rik van Riel
  2001-06-10  8:52     ` George Bonser
@ 2001-06-10  9:06     ` George Bonser
  2001-06-10  9:08     ` George Bonser
  2001-06-10 19:30     ` George Bonser
  3 siblings, 0 replies; 12+ messages in thread
From: George Bonser @ 2001-06-10  9:06 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

>
> That sounds like the machine just gets a working set
> larger than the amount of available memory. It should
> work better with eg. 96, 128 or more MBs of memory.

Now that I think about it a little more ... once I took it out of the
balancer and I got control back, I had over 500 apache kids alive and it was
responsive.  Also, when top -q starting giving out, it was still updating
the screen though it started getting slower and slower ... at that point I
only had MAYBE 300 apache processes. It almost felt like the system could
not catch up as fast as the new connections were arriving. Lets say it "goes
dead" at about 300 or so connections, I let it run for a while then take it
out of the rotation and it "comes back" and shows me it has about 500
processes and its interactive response is fine and it is only about 100MB
into swap. It just feels like it can't get out of its own way fast enough.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  8:43   ` Rik van Riel
  2001-06-10  8:52     ` George Bonser
  2001-06-10  9:06     ` George Bonser
@ 2001-06-10  9:08     ` George Bonser
  2001-06-10 19:30     ` George Bonser
  3 siblings, 0 replies; 12+ messages in thread
From: George Bonser @ 2001-06-10  9:08 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

My bad, I just looked at my notes again. It both went away and returned with
right around 500 processes.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  8:43   ` Rik van Riel
                       ` (2 preceding siblings ...)
  2001-06-10  9:08     ` George Bonser
@ 2001-06-10 19:30     ` George Bonser
  3 siblings, 0 replies; 12+ messages in thread
From: George Bonser @ 2001-06-10 19:30 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

Ok, new test.  Apache, no keepalives. 85 requests/sec for a 10K file
128MB of RAM Processor is UP 700MHz Intel

vanilla 2.4.6-pre2

After everything settles down I have about 230-250 apache process running.
about 4% of CPU in user and roughly 6% in system.

Top shows:

 18:12:47 up 59 min,  2 users,  load average: 1.36, 3.26, 12.76 <- from a
previous test,
240 processes: 239 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   3.7% user,   6.8% system,   0.0% nice,  89.5% idle
Mem:    127184K total,    95308K used,    31876K free,    14300K buffers
Swap:   498004K total,     5032K used,   492972K free,    14244K cached

I have about 5MB in swap.

Now kick off make -j8 in the linux kernel tree.  CPU goes to about 80/20
user/system
Top looks like this:

 18:21:46 up  1:08,  2 users,  load average: 17.71, 12.19, 12.55
299 processes: 290 sleeping, 9 running, 0 zombie, 0 stopped
CPU states:  77.4% user,  22.6% system,   0.0% nice,   0.0% idle
Mem:    127184K total,   124664K used,     2520K free,     2644K buffers
Swap:   498004K total,    20160K used,   477844K free,    11612K cached

So I have pushed about 20M into swap with a really busy system (SCSI ext2)

I run make -j8 in the source tree and am able to get the machine in a state
where it is no longer interactive at the terminal, it seems to be hung, the
make is generating nothing, top is not refreshing, machine responds to pings
fine.  Last top refresh shows:

 18:26:41 up  1:13,  3 users,  load average: 39.39, 19.70, 14.99
393 processes: 383 sleeping, 10 running, 0 zombie, 0 stopped
CPU states:  39.6% user,  35.7% system,   0.0% nice,  24.7% idle
Mem:    127184K total,   123244K used,     3940K free,     2880K buffers
Swap:   498004K total,    34504K used,   463500K free,    26332K cached

I wait about 5 mintes to be sure I am not going to get anything back any
time soon.
Send cntl-c to the make, it takes abot a minute to quit. top starts
refreshing and
looks like this:

 18:31:55 up  1:19,  3 users,  load average: 228.76, 167.10, 80.71
331 processes: 330 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   1.3% user,  27.7% system,   0.0% nice,  71.0% idle
Mem:    127184K total,    87804K used,    39380K free,     3184K buffers
Swap:   498004K total,    31776K used,   466228K free,    24856K cached

If I try again to run the make -j8 I can not seem to get it to "hang" again.
Figuring it now has something cached that it can get to, I move to a
different source tree and run another make -j and it hangs again in a
different spot but with the same symptoms.

So much for vanilla 2.4.6-pre2. I shut down and reboot into 2.4.6-pre2+Rik's
patch.
After everything settles down top shows:

 19:07:32 up 12 min,  2 users,  load average: 0.08, 0.05, 0.00
300 processes: 297 sleeping, 1 running, 2 zombie, 0 stopped
CPU states:   5.3% user,   6.7% system,   0.0% nice,  88.0% idle
Mem:    127028K total,   106788K used,    20240K free,     1144K buffers
Swap:   498004K total,        0K used,   498004K free,    27408K cached

Hmmm, not in swap at all. Ok ... now lets kick off a make -j8

CPU goes to 85/15 user/system  about a 5% difference from the vanilla
kernel.
About 20MB into swap and the compile is progressing well.
Compile finishes with no hangup top looks like:

 19:13:19 up 18 min,  2 users,  load average: 2.80, 3.89, 1.85
259 processes: 258 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   3.7% user,   6.3% system,   0.0% nice,  90.0% idle
Mem:    127028K total,    87368K used,    39660K free,     1276K buffers
Swap:   498004K total,    17424K used,   480580K free,    34064K cached

So I pushed it about 17M into swap. Now let me move over to the other tree
and
build it just so it won't find anything cached that it can use :-) I am not
timing the compiles but they run faster under Rik's patch.

Second compile the system looks like this:

 19:18:28 up 23 min,  2 users,  load average: 8.21, 6.06, 3.23
257 processes: 251 sleeping, 6 running, 0 zombie, 0 stopped
CPU states:  24.1% user,   7.6% system,   0.0% nice,  68.2% idle
Mem:    127028K total,    78384K used,    48644K free,      976K buffers
Swap:   498004K total,    19760K used,   478244K free,    28992K cached

And it completes normally, no hangs. I can pretty much cause 2.4.6-pre2 to
"wedge"
(for lack of a better description) by just making it busy and having it do
something
that takes up swap. With Rik's patch everything stayed usable, interactive
response
was good and I could log in while the compile was running and got good
response
to the login prompts.

Several minutes after the last compile, things are pretty much back to
normal:

 19:26:14 up 30 min,  2 users,  load average: 0.02, 1.27, 1.94
266 processes: 265 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   3.7% user,   8.1% system,   0.0% nice,  88.2% idle
Mem:    127028K total,   103512K used,    23516K free,     1168K buffers
Swap:   498004K total,     7768K used,   490236K free,    38784K cached

I am going to run it a few days and see if anything breaks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  4:40 [PATCH] 2.4.6-pre2 page_launder() improvements Rik van Riel
  2001-06-10  8:38 ` George Bonser
@ 2001-06-11  3:03 ` Daniel Stone
  2001-06-11 16:18   ` George Bonser
  2001-06-13  4:42 ` Alok K. Dhir
  2 siblings, 1 reply; 12+ messages in thread
From: Daniel Stone @ 2001-06-11  3:03 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Sun, Jun 10, 2001 at 01:40:44AM -0300, Rik van Riel wrote:
> [Request For Testers ... patch below]
> 
> Hi,
> 
> during my holidays I've written the following patch (forward-ported
> to 2.4.6-pre2 and improved a tad today), which implements these
> improvements to page_launder():
> 
> YMMV, please test it. If it works great for everybody I'd like
> to get this improvement merged into the next -pre kernel.

I forgot about vmstat, but this is -ac12, anecdotal evidence - my system
(weak) performs far better under heavy load (mpg123 nice'd to -20 + apt/dpkg
+ gcc), than with vanilla -ac12. To get it to compile on -ac, just hand-hack
in the patch, and s/CAN_GET_IO/can_get_io_locks/ in vmscan.c.

:) d

-- 
Daniel Stone		<daniel@kabuki.openfridge.net> <daniel@kabuki.sfarc.net>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-11  3:03 ` Daniel Stone
@ 2001-06-11 16:18   ` George Bonser
  0 siblings, 0 replies; 12+ messages in thread
From: George Bonser @ 2001-06-11 16:18 UTC (permalink / raw)
  To: Daniel Stone; +Cc: linux-kernel

I ran some more tests yesterday with a little more RAM than last 
time and Rik's kernel performed much better than the vanilla kernel 
in the face of memory pressure when it was very busy. I could get 
both kernels into situations where they were unresponsive but these 
periods of time were much shorter with Rik's than with vanilla 
2.4.6-pre2.  A background kernel compile completed much faster  on
Rik's version on a fairly busy web server with 128MB of RAM.

I goofed and forwarded the vmstat logs to the linux-mm
mailing list so there is a huge message there with my results :-/
but I can forward them to anyone interested.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-10  4:40 [PATCH] 2.4.6-pre2 page_launder() improvements Rik van Riel
  2001-06-10  8:38 ` George Bonser
  2001-06-11  3:03 ` Daniel Stone
@ 2001-06-13  4:42 ` Alok K. Dhir
  2001-06-13 13:49   ` Rik van Riel
  2001-06-13 18:46   ` Marcelo Tosatti
  2 siblings, 2 replies; 12+ messages in thread
From: Alok K. Dhir @ 2001-06-13  4:42 UTC (permalink / raw)
  To: 'Rik van Riel'; +Cc: linux-kernel


Are these page_launder improvements included in 2.4.6-pre3?  Linus
mentions "VM tuning has also happened" in the announcement - but there
doesn't seem to be mention of it in his list of changes from -pre2...

Thanks

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org 
> [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Rik van Riel
> Sent: Sunday, June 10, 2001 12:41 AM
> To: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Subject: [PATCH] 2.4.6-pre2 page_launder() improvements
> 
> 
> [Request For Testers ... patch below]
> 
> Hi,
> 
> during my holidays I've written the following patch 
> (forward-ported to 2.4.6-pre2 and improved a tad today), 
> which implements these improvements to page_launder():
> 
> 1) don't "roll over" inactive_dirty pages to the back of the
>    list, but reclaim them in something more resembling LRU
>    order;  this is especially good when the system has tons
>    of inactive_dirty pages due to eg. background scanning
> 
> 2) eliminate the infinite penalty clean pages had over dirty
>    pages by not scanning the complete inactive_dirty list and
>    letting real dirty pages build up near the front of the
>    list ... we flush them asynchronously when we have enough
>    of them
> 
> 3) when going into the launder_loop, we scan a larger fraction
>    of the inactive_dirty list; under most workloads this means
>    we can always flush the dirty pages asynchronously because
>    we'll have clean, freeable pages in the part of the list we
>    only scan in the launder_loop
> 
> 4) when we have only dirty pages and cannot free pages, we
>    remember this for the next run of page_launder() and won't
>    waste CPU by scanning pages without flushing them in the
>    launder loop (after maxlaunder goes negative)
> 
> 5) this same logic is used to control when we use synchronous
>    IO; only when we cannot free any pages now do we wait on
>    IO, this stops kswapd CPU wastage under heavy write loads
> 
> 6) the "sync" argument to page_launder() now means whether
>    we're _allowed_ to do synchronous IO or not ... page_launder()
>    is now smart enough to determine if we should use asynchronous
>    IO only or if we should wait on IO
> 
> This patch has given excellent results on my laptop and my 
> workstation here and seems to improve kernel behaviour in 
> tests quite a bit. I can play mp3's unbuffered during 
> moderate write loads or moderately heavy IO ;)
> 
> YMMV, please test it. If it works great for everybody I'd 
> like to get this improvement merged into the next -pre kernel.
> 
> regards,
> 
> Rik
> --
> Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml
> 
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
> 
> 		http://www.surriel.com/
> http://www.conectiva.com/	http://distro.conectiva.com/
> 
> 
> diff -ur linux-2.4.6-pre2-virgin/include/linux/mm.h 
> linux-2.4.6-pre2/include/linux/mm.h
> --- linux-2.4.6-pre2-virgin/include/linux/mm.h	Sun Jun 
> 10 00:44:01 2001
> +++ linux-2.4.6-pre2/include/linux/mm.h	Sat Jun  9 23:19:54 2001
> @@ -169,6 +169,7 @@
>  #define PG_inactive_clean	11
>  #define PG_highmem		12
>  #define PG_checked		13	/* kill me in 2.5.<early>. */
> +#define PG_marker		14
>  				/* bits 21-29 unused */
>  #define PG_arch_1		30
>  #define PG_reserved		31
> @@ -242,6 +243,9 @@
>  #define PageInactiveClean(page)	
> test_bit(PG_inactive_clean, &(page)->flags)
>  #define SetPageInactiveClean(page)	
> set_bit(PG_inactive_clean, &(page)->flags)
>  #define ClearPageInactiveClean(page)	
> clear_bit(PG_inactive_clean, &(page)->flags)
> +
> +#define PageMarker(page)	test_bit(PG_marker, &(page)->flags)
> +#define SetPageMarker(page)	set_bit(PG_marker, &(page)->flags)
> 
>  #ifdef CONFIG_HIGHMEM
>  #define PageHighMem(page)		test_bit(PG_highmem, 
> &(page)->flags)
> diff -ur linux-2.4.6-pre2-virgin/include/linux/swap.h 
> linux-2.4.6-pre2/include/linux/swap.h
> --- linux-2.4.6-pre2-virgin/include/linux/swap.h	Sun Jun 
> 10 00:44:01 2001
> +++ linux-2.4.6-pre2/include/linux/swap.h	Sat Jun  9 23:19:54 2001
> @@ -205,6 +205,16 @@
>  	page->zone->inactive_dirty_pages++; \
>  }
> 
> +/* Like the above, but add us after the bookmark. */
> +#define add_page_to_inactive_dirty_list_marker(page) { \
> +	DEBUG_ADD_PAGE \
> +	ZERO_PAGE_BUG \
> +	SetPageInactiveDirty(page); \
> +	list_add(&(page)->lru, marker_lru); \
> +	nr_inactive_dirty_pages++; \
> +	page->zone->inactive_dirty_pages++; \
> +}
> +
>  #define add_page_to_inactive_clean_list(page) { \
>  	DEBUG_ADD_PAGE \
>  	ZERO_PAGE_BUG \
> diff -ur linux-2.4.6-pre2-virgin/mm/vmscan.c 
> linux-2.4.6-pre2/mm/vmscan.c
> --- linux-2.4.6-pre2-virgin/mm/vmscan.c	Sun Jun 10 00:44:02 2001
> +++ linux-2.4.6-pre2/mm/vmscan.c	Sun Jun 10 00:57:25 2001
> @@ -407,7 +407,7 @@
>  /**
>   * page_launder - clean dirty inactive pages, move to 
> inactive_clean list
>   * @gfp_mask: what operations we are allowed to do
> - * @sync: should we wait synchronously for the cleaning of pages
> + * @sync: are we allowed to do synchronous IO in emergencies ?
>   *
>   * When this function is called, we are most likely low on free +
>   * inactive_clean pages. Since we want to refill those pages 
> as @@ -428,20 +428,54 @@
>  #define CAN_DO_BUFFERS		(gfp_mask & __GFP_BUFFER)
>  int page_launder(int gfp_mask, int sync)
>  {
> +	static int cannot_free_pages;
>  	int launder_loop, maxscan, cleaned_pages, maxlaunder;
> -	struct list_head * page_lru;
> +	struct list_head * page_lru, * marker_lru;
>  	struct page * page;
> 
> +	/* Our bookmark of where we are in the inactive_dirty list. */
> +	struct page marker_page_struct = {
> +		flags: (1<<PG_marker),
> +		lru: { NULL, NULL },
> +	};
> +	marker_lru = &marker_page_struct.lru;
> +
>  	launder_loop = 0;
>  	maxlaunder = 0;
>  	cleaned_pages = 0;
> 
>  dirty_page_rescan:
>  	spin_lock(&pagemap_lru_lock);
> -	maxscan = nr_inactive_dirty_pages;
> -	while ((page_lru = inactive_dirty_list.prev) != 
> &inactive_dirty_list &&
> -				maxscan-- > 0) {
> +	/*
> +	 * By not scanning all inactive dirty pages we'll write out
> +	 * really old dirty pages before evicting newer clean pages.
> +	 * This should cause some LRU behaviour if we have a large
> +	 * amount of inactive pages (due to eg. drop behind).
> +	 *
> +	 * It also makes us accumulate dirty pages until we have enough
> +	 * to be worth writing to disk without causing excessive disk
> +	 * seeks and eliminates the infinite penalty clean 
> pages incurred
> +	 * vs. dirty pages.
> +	 */
> +	maxscan = nr_inactive_dirty_pages / 4;
> +	if (launder_loop)
> +		maxscan *= 2;
> +	list_add_tail(marker_lru, &inactive_dirty_list);
> +	while ((page_lru = marker_lru->prev) != &inactive_dirty_list &&
> +			maxscan-- > 0 && free_shortage()) {
>  		page = list_entry(page_lru, struct page, lru);
> +		/* We move the bookmark forward by flipping the 
> page ;) */
> +		list_del(page_lru);
> +		list_add(page_lru, marker_lru);
> +
> +		/* Don't waste CPU if chances are we cannot 
> free anything. */
> +		if (launder_loop && maxlaunder < 0 && cannot_free_pages)
> +			break;
> +
> +		/* Skip other people's marker pages. */
> +		if (PageMarker(page)) {
> +			continue;
> +		}
> 
>  		/* Wrong page on list?! (list corruption, 
> should not happen) */
>  		if (!PageInactiveDirty(page)) {
> @@ -454,7 +488,6 @@
> 
>  		/* Page is or was in use?  Move it to the 
> active list. */
>  		if (PageReferenced(page) || page->age > 0 ||
> -				page->zone->free_pages > 
> page->zone->pages_high ||
>  				(!page->buffers && 
> page_count(page) > 1) ||
>  				page_ramdisk(page)) {
>  			del_page_from_inactive_dirty_list(page);
> @@ -464,11 +497,9 @@
> 
>  		/*
>  		 * The page is locked. IO in progress?
> -		 * Move it to the back of the list.
> +		 * Skip the page, we'll take a look when it unlocks.
>  		 */
>  		if (TryLockPage(page)) {
> -			list_del(page_lru);
> -			list_add(page_lru, &inactive_dirty_list);
>  			continue;
>  		}
> 
> @@ -482,10 +513,8 @@
>  			if (!writepage)
>  				goto page_active;
> 
> -			/* First time through? Move it to the 
> back of the list */
> +			/* First time through? Skip the page. */
>  			if (!launder_loop || !CAN_DO_IO) {
> -				list_del(page_lru);
> -				list_add(page_lru, 
> &inactive_dirty_list);
>  				UnlockPage(page);
>  				continue;
>  			}
> @@ -544,7 +573,7 @@
> 
>  			/* The buffers were not freed. */
>  			if (!clearedbuf) {
> -				add_page_to_inactive_dirty_list(page);
> +				
> add_page_to_inactive_dirty_list_marker(page);
> 
>  			/* The page was only in the buffer cache. */
>  			} else if (!page->mapping) {
> @@ -600,6 +629,8 @@
>  			UnlockPage(page);
>  		}
>  	}
> +	/* Remove our marker. */
> +	list_del(marker_lru);
>  	spin_unlock(&pagemap_lru_lock);
> 
>  	/*
> @@ -615,16 +646,29 @@
>  	 */
>  	if ((CAN_DO_IO || CAN_DO_BUFFERS) && !launder_loop && 
> free_shortage()) {
>  		launder_loop = 1;
> -		/* If we cleaned pages, never do synchronous IO. */
> -		if (cleaned_pages)
> +		/*
> +		 * If we, or the previous process running 
> page_launder(),
> +		 * managed to free any pages we never do synchronous IO.
> +		 */
> +		if (cleaned_pages || !cannot_free_pages)
>  			sync = 0;
> +		/* Else, do synchronous IO (if we are allowed to). */
> +		else if (sync)
> +			sync = 1;
>  		/* We only do a few "out of order" flushes. */
>  		maxlaunder = MAX_LAUNDER;
> -		/* Kflushd takes care of the rest. */
> +		/* Let bdflush take care of the rest. */
>  		wakeup_bdflush(0);
>  		goto dirty_page_rescan;
>  	}
> 
> +	/*
> +	 * If we failed to free pages (because all pages are dirty)
> +	 * we remember this for the next time. This will prevent us
> +	 * from wasting too much CPU here.
> +	 */
> +	cannot_free_pages = !cleaned_pages;
> +
>  	/* Return the number of pages moved to the 
> inactive_clean list. */
>  	return cleaned_pages;
>  }
> @@ -852,7 +896,7 @@
>  	 * list, so this is a relatively cheap operation.
>  	 */
>  	if (free_shortage()) {
> -		ret += page_launder(gfp_mask, user);
> +		ret += page_launder(gfp_mask, 1);
>  		shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
>  		shrink_icache_memory(DEF_PRIORITY, gfp_mask);
>  	}
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in the body of a message to 
> majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> Please read the 
> FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-13  4:42 ` Alok K. Dhir
@ 2001-06-13 13:49   ` Rik van Riel
  2001-06-13 18:46   ` Marcelo Tosatti
  1 sibling, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2001-06-13 13:49 UTC (permalink / raw)
  To: Alok K. Dhir; +Cc: linux-kernel

On Wed, 13 Jun 2001, Alok K. Dhir wrote:

> Are these page_launder improvements included in 2.4.6-pre3?

Please, don't send whole patches to the list just to ask a
question like this.  But, since you sent the patch anyway,
why not read patch-2.4.6-pre3 to see if it's there?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] 2.4.6-pre2 page_launder() improvements
  2001-06-13  4:42 ` Alok K. Dhir
  2001-06-13 13:49   ` Rik van Riel
@ 2001-06-13 18:46   ` Marcelo Tosatti
  1 sibling, 0 replies; 12+ messages in thread
From: Marcelo Tosatti @ 2001-06-13 18:46 UTC (permalink / raw)
  To: Alok K. Dhir; +Cc: 'Rik van Riel', linux-kernel



On Wed, 13 Jun 2001, Alok K. Dhir wrote:

> 
> Are these page_launder improvements included in 2.4.6-pre3?  Linus
> mentions "VM tuning has also happened" in the announcement - but there
> doesn't seem to be mention of it in his list of changes from -pre2...

Yes, it is. 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-06-13 20:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-10  4:40 [PATCH] 2.4.6-pre2 page_launder() improvements Rik van Riel
2001-06-10  8:38 ` George Bonser
2001-06-10  8:43   ` Rik van Riel
2001-06-10  8:52     ` George Bonser
2001-06-10  9:06     ` George Bonser
2001-06-10  9:08     ` George Bonser
2001-06-10 19:30     ` George Bonser
2001-06-11  3:03 ` Daniel Stone
2001-06-11 16:18   ` George Bonser
2001-06-13  4:42 ` Alok K. Dhir
2001-06-13 13:49   ` Rik van Riel
2001-06-13 18:46   ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).