linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@conectiva.com.br>
To: <linux-mm@kvack.org>
Cc: <linux-kernel@vger.kernel.org>
Subject: [PATCH] 2.4.6-pre2 page_launder() improvements
Date: Sun, 10 Jun 2001 01:40:44 -0300 (BRST)	[thread overview]
Message-ID: <Pine.LNX.4.33.0106100128100.4239-100000@duckman.distro.conectiva> (raw)

[Request For Testers ... patch below]

Hi,

during my holidays I've written the following patch (forward-ported
to 2.4.6-pre2 and improved a tad today), which implements these
improvements to page_launder():

1) don't "roll over" inactive_dirty pages to the back of the
   list, but reclaim them in something more resembling LRU
   order;  this is especially good when the system has tons
   of inactive_dirty pages due to eg. background scanning

2) eliminate the infinite penalty clean pages had over dirty
   pages by not scanning the complete inactive_dirty list and
   letting real dirty pages build up near the front of the
   list ... we flush them asynchronously when we have enough
   of them

3) when going into the launder_loop, we scan a larger fraction
   of the inactive_dirty list; under most workloads this means
   we can always flush the dirty pages asynchronously because
   we'll have clean, freeable pages in the part of the list we
   only scan in the launder_loop

4) when we have only dirty pages and cannot free pages, we
   remember this for the next run of page_launder() and won't
   waste CPU by scanning pages without flushing them in the
   launder loop (after maxlaunder goes negative)

5) this same logic is used to control when we use synchronous
   IO; only when we cannot free any pages now do we wait on
   IO, this stops kswapd CPU wastage under heavy write loads

6) the "sync" argument to page_launder() now means whether
   we're _allowed_ to do synchronous IO or not ... page_launder()
   is now smart enough to determine if we should use asynchronous
   IO only or if we should wait on IO

This patch has given excellent results on my laptop and my
workstation here and seems to improve kernel behaviour in tests
quite a bit. I can play mp3's unbuffered during moderate write
loads or moderately heavy IO ;)

YMMV, please test it. If it works great for everybody I'd like
to get this improvement merged into the next -pre kernel.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


diff -ur linux-2.4.6-pre2-virgin/include/linux/mm.h linux-2.4.6-pre2/include/linux/mm.h
--- linux-2.4.6-pre2-virgin/include/linux/mm.h	Sun Jun 10 00:44:01 2001
+++ linux-2.4.6-pre2/include/linux/mm.h	Sat Jun  9 23:19:54 2001
@@ -169,6 +169,7 @@
 #define PG_inactive_clean	11
 #define PG_highmem		12
 #define PG_checked		13	/* kill me in 2.5.<early>. */
+#define PG_marker		14
 				/* bits 21-29 unused */
 #define PG_arch_1		30
 #define PG_reserved		31
@@ -242,6 +243,9 @@
 #define PageInactiveClean(page)	test_bit(PG_inactive_clean, &(page)->flags)
 #define SetPageInactiveClean(page)	set_bit(PG_inactive_clean, &(page)->flags)
 #define ClearPageInactiveClean(page)	clear_bit(PG_inactive_clean, &(page)->flags)
+
+#define PageMarker(page)	test_bit(PG_marker, &(page)->flags)
+#define SetPageMarker(page)	set_bit(PG_marker, &(page)->flags)

 #ifdef CONFIG_HIGHMEM
 #define PageHighMem(page)		test_bit(PG_highmem, &(page)->flags)
diff -ur linux-2.4.6-pre2-virgin/include/linux/swap.h linux-2.4.6-pre2/include/linux/swap.h
--- linux-2.4.6-pre2-virgin/include/linux/swap.h	Sun Jun 10 00:44:01 2001
+++ linux-2.4.6-pre2/include/linux/swap.h	Sat Jun  9 23:19:54 2001
@@ -205,6 +205,16 @@
 	page->zone->inactive_dirty_pages++; \
 }

+/* Like the above, but add us after the bookmark. */
+#define add_page_to_inactive_dirty_list_marker(page) { \
+	DEBUG_ADD_PAGE \
+	ZERO_PAGE_BUG \
+	SetPageInactiveDirty(page); \
+	list_add(&(page)->lru, marker_lru); \
+	nr_inactive_dirty_pages++; \
+	page->zone->inactive_dirty_pages++; \
+}
+
 #define add_page_to_inactive_clean_list(page) { \
 	DEBUG_ADD_PAGE \
 	ZERO_PAGE_BUG \
diff -ur linux-2.4.6-pre2-virgin/mm/vmscan.c linux-2.4.6-pre2/mm/vmscan.c
--- linux-2.4.6-pre2-virgin/mm/vmscan.c	Sun Jun 10 00:44:02 2001
+++ linux-2.4.6-pre2/mm/vmscan.c	Sun Jun 10 00:57:25 2001
@@ -407,7 +407,7 @@
 /**
  * page_launder - clean dirty inactive pages, move to inactive_clean list
  * @gfp_mask: what operations we are allowed to do
- * @sync: should we wait synchronously for the cleaning of pages
+ * @sync: are we allowed to do synchronous IO in emergencies ?
  *
  * When this function is called, we are most likely low on free +
  * inactive_clean pages. Since we want to refill those pages as
@@ -428,20 +428,54 @@
 #define CAN_DO_BUFFERS		(gfp_mask & __GFP_BUFFER)
 int page_launder(int gfp_mask, int sync)
 {
+	static int cannot_free_pages;
 	int launder_loop, maxscan, cleaned_pages, maxlaunder;
-	struct list_head * page_lru;
+	struct list_head * page_lru, * marker_lru;
 	struct page * page;

+	/* Our bookmark of where we are in the inactive_dirty list. */
+	struct page marker_page_struct = {
+		flags: (1<<PG_marker),
+		lru: { NULL, NULL },
+	};
+	marker_lru = &marker_page_struct.lru;
+
 	launder_loop = 0;
 	maxlaunder = 0;
 	cleaned_pages = 0;

 dirty_page_rescan:
 	spin_lock(&pagemap_lru_lock);
-	maxscan = nr_inactive_dirty_pages;
-	while ((page_lru = inactive_dirty_list.prev) != &inactive_dirty_list &&
-				maxscan-- > 0) {
+	/*
+	 * By not scanning all inactive dirty pages we'll write out
+	 * really old dirty pages before evicting newer clean pages.
+	 * This should cause some LRU behaviour if we have a large
+	 * amount of inactive pages (due to eg. drop behind).
+	 *
+	 * It also makes us accumulate dirty pages until we have enough
+	 * to be worth writing to disk without causing excessive disk
+	 * seeks and eliminates the infinite penalty clean pages incurred
+	 * vs. dirty pages.
+	 */
+	maxscan = nr_inactive_dirty_pages / 4;
+	if (launder_loop)
+		maxscan *= 2;
+	list_add_tail(marker_lru, &inactive_dirty_list);
+	while ((page_lru = marker_lru->prev) != &inactive_dirty_list &&
+			maxscan-- > 0 && free_shortage()) {
 		page = list_entry(page_lru, struct page, lru);
+		/* We move the bookmark forward by flipping the page ;) */
+		list_del(page_lru);
+		list_add(page_lru, marker_lru);
+
+		/* Don't waste CPU if chances are we cannot free anything. */
+		if (launder_loop && maxlaunder < 0 && cannot_free_pages)
+			break;
+
+		/* Skip other people's marker pages. */
+		if (PageMarker(page)) {
+			continue;
+		}

 		/* Wrong page on list?! (list corruption, should not happen) */
 		if (!PageInactiveDirty(page)) {
@@ -454,7 +488,6 @@

 		/* Page is or was in use?  Move it to the active list. */
 		if (PageReferenced(page) || page->age > 0 ||
-				page->zone->free_pages > page->zone->pages_high ||
 				(!page->buffers && page_count(page) > 1) ||
 				page_ramdisk(page)) {
 			del_page_from_inactive_dirty_list(page);
@@ -464,11 +497,9 @@

 		/*
 		 * The page is locked. IO in progress?
-		 * Move it to the back of the list.
+		 * Skip the page, we'll take a look when it unlocks.
 		 */
 		if (TryLockPage(page)) {
-			list_del(page_lru);
-			list_add(page_lru, &inactive_dirty_list);
 			continue;
 		}

@@ -482,10 +513,8 @@
 			if (!writepage)
 				goto page_active;

-			/* First time through? Move it to the back of the list */
+			/* First time through? Skip the page. */
 			if (!launder_loop || !CAN_DO_IO) {
-				list_del(page_lru);
-				list_add(page_lru, &inactive_dirty_list);
 				UnlockPage(page);
 				continue;
 			}
@@ -544,7 +573,7 @@

 			/* The buffers were not freed. */
 			if (!clearedbuf) {
-				add_page_to_inactive_dirty_list(page);
+				add_page_to_inactive_dirty_list_marker(page);

 			/* The page was only in the buffer cache. */
 			} else if (!page->mapping) {
@@ -600,6 +629,8 @@
 			UnlockPage(page);
 		}
 	}
+	/* Remove our marker. */
+	list_del(marker_lru);
 	spin_unlock(&pagemap_lru_lock);

 	/*
@@ -615,16 +646,29 @@
 	 */
 	if ((CAN_DO_IO || CAN_DO_BUFFERS) && !launder_loop && free_shortage()) {
 		launder_loop = 1;
-		/* If we cleaned pages, never do synchronous IO. */
-		if (cleaned_pages)
+		/*
+		 * If we, or the previous process running page_launder(),
+		 * managed to free any pages we never do synchronous IO.
+		 */
+		if (cleaned_pages || !cannot_free_pages)
 			sync = 0;
+		/* Else, do synchronous IO (if we are allowed to). */
+		else if (sync)
+			sync = 1;
 		/* We only do a few "out of order" flushes. */
 		maxlaunder = MAX_LAUNDER;
-		/* Kflushd takes care of the rest. */
+		/* Let bdflush take care of the rest. */
 		wakeup_bdflush(0);
 		goto dirty_page_rescan;
 	}

+	/*
+	 * If we failed to free pages (because all pages are dirty)
+	 * we remember this for the next time. This will prevent us
+	 * from wasting too much CPU here.
+	 */
+	cannot_free_pages = !cleaned_pages;
+
 	/* Return the number of pages moved to the inactive_clean list. */
 	return cleaned_pages;
 }
@@ -852,7 +896,7 @@
 	 * list, so this is a relatively cheap operation.
 	 */
 	if (free_shortage()) {
-		ret += page_launder(gfp_mask, user);
+		ret += page_launder(gfp_mask, 1);
 		shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
 		shrink_icache_memory(DEF_PRIORITY, gfp_mask);
 	}


             reply	other threads:[~2001-06-10  4:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-06-10  4:40 Rik van Riel [this message]
2001-06-10  8:38 ` George Bonser
2001-06-10  8:43   ` Rik van Riel
2001-06-10  8:52     ` George Bonser
2001-06-10  9:06     ` George Bonser
2001-06-10  9:08     ` George Bonser
2001-06-10 19:30     ` George Bonser
2001-06-11  3:03 ` Daniel Stone
2001-06-11 16:18   ` George Bonser
2001-06-13  4:42 ` Alok K. Dhir
2001-06-13 13:49   ` Rik van Riel
2001-06-13 18:46   ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.33.0106100128100.4239-100000@duckman.distro.conectiva \
    --to=riel@conectiva.com.br \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --subject='Re: [PATCH] 2.4.6-pre2 page_launder() improvements' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).