All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Osterlund <petero2@telia.com>
To: Andrew Morton <akpm@osdl.org>
Cc: pavel@ucw.cz, linux-kernel@vger.kernel.org
Subject: Re: Software suspend testing in 2.6.0-test1
Date: 20 Jul 2003 02:22:27 +0200	[thread overview]
Message-ID: <m2wuee9hdo.fsf@telia.com> (raw)
In-Reply-To: <20030718131527.7cf4ca5e.akpm@osdl.org>

Andrew Morton <akpm@osdl.org> writes:

> Yup.  Probaby we should not be throttling kswapd if it is making good
> progress, but this needs broad testing and thought.  Something like this,
> untested:
> 
> --- 25/mm/vmscan.c~less-kswapd-throttling	Thu Jul 17 15:35:09 2003
> +++ 25-akpm/mm/vmscan.c	Thu Jul 17 15:35:12 2003
> @@ -930,7 +930,8 @@ static int balance_pgdat(pg_data_t *pgda
>  		}
>  		if (all_zones_ok)
>  			break;
> -		blk_congestion_wait(WRITE, HZ/10);
> +		if (to_free)
> +			blk_congestion_wait(WRITE, HZ/10);
>  	}
>  	return nr_pages - to_free;
>  }
> 
> 
> > Note that I had to use HZ/5 in free_some_memory() or else the loop
> > would terminate too early. The main problem seems to be that
> > balance_pgdat() is too slow when freeing memory. The function can fail
> > to free memory in the inner loop either because the disk is congested
> > or because too few pages are scanned, but in both cases the function
> > calls blk_congestion_wait(), and in the second case the disk may be
> > idle and then blk_congestion_wait() doesn't return until the timeout
> > expires.
> 
> hm, perhaps because kswapd is throttled rather than doing work.
> 
> Try the above change, see if you still need the extended timeout in
> free_some_memory().

I have tried the change, but the writeout is still very slow. (Maybe
somewhat faster than the original code, but far from being limited by
disk bandwidth.)

It turns out the extended timeout in free_some_memory() is needed to
handle things like disk spin up time and thermal recalibration.
Ideally there should not be a timeout at all in that function. It
should wait until the congestion goes away or until all I/O is
finished. But that's probably much harder to implement. Is there some
global counter somewhere that can be used to figure out how many pages
are currently being on their way to the disk?

> Also, play around with using __GFP_WAIT|__GFP_HIGH|__GFP_NORETRY rather
> than GFP_ATOMIC.

I did, without much success.

One thing that does seem to work though, is to only throttle in
balance_pgdat() if we have called writepage() since the last
throttling, like in the patch below. (ugly, writepage_called should
not be a static variable.) With that patch I get full disk bandwidth
speed during page freeing.

Another disturbing thing I've seen is that on one of my machines, the
suspend process sometimes fails to write anything to disk. When this
happens, the suspend will fail if not enough memory is free, and the
machine hangs shortly afterwards. I've seen this with plain
2.6.0-test1 and with all patches I have tested, with and without
preempt. On another laptop, I have never seen this problem. (I'll try
to debug this further, but it is complicated by the fact that the
problematic laptop doesn't have a serial port.)


diff -u -r orig/linux-page/fs/jbd/journal.c linux-page/fs/jbd/journal.c
--- orig/linux-page/fs/jbd/journal.c	Fri Jul 18 21:07:12 2003
+++ linux-page/fs/jbd/journal.c	Sun Jul 20 01:08:56 2003
@@ -132,6 +132,8 @@
 
 	daemonize("kjournald");
 
+	current->flags |= PF_IOTHREAD;
+
 	/* Set up an interval timer which can be used to trigger a
            commit wakeup after the commit interval expires */
 	init_timer(&timer);
diff -u -r orig/linux-page/kernel/suspend.c linux-page/kernel/suspend.c
--- orig/linux-page/kernel/suspend.c	Sat Jul 19 21:07:14 2003
+++ linux-page/kernel/suspend.c	Sun Jul 20 01:17:33 2003
@@ -623,10 +623,32 @@
  */
 static void free_some_memory(void)
 {
-	printk("Freeing memory: ");
-	while (shrink_all_memory(10000))
-		printk(".");
+	LIST_HEAD(list);
+	struct page *page, *tmp;
+	int sleep_count = 0;
+	int i = 0;
+
+	while (sleep_count < 10) {
+		page = alloc_page(__GFP_HIGH | __GFP_NOWARN);
+		if (page) {
+			list_add(&page->list, &list);
+			sleep_count = 0;
+		} else {
+			blk_congestion_wait(WRITE, HZ/5);
+			sleep_count++;
+		}
+		i++;
+		if (!(i%1000))
+			printk(".");
+	}
 	printk("|\n");
+
+	i = 0;
+	list_for_each_entry_safe(page, tmp, &list, list) {
+		__free_page(page);
+		i++;
+	}
+	printk("%d pages freed\n", i);
 }
 
 /* Make disk drivers accept operations, again */
diff -u -r orig/linux-page/mm/pdflush.c linux-page/mm/pdflush.c
--- orig/linux-page/mm/pdflush.c	Fri Jul 18 21:08:47 2003
+++ linux-page/mm/pdflush.c	Sun Jul 20 01:08:56 2003
@@ -88,7 +88,7 @@
 {
 	daemonize("pdflush");
 
-	current->flags |= PF_FLUSHER;
+	current->flags |= PF_FLUSHER | PF_IOTHREAD;
 	my_work->fn = NULL;
 	my_work->who = current;
 	INIT_LIST_HEAD(&my_work->list);
diff -u -r orig/linux-page/mm/vmscan.c linux-page/mm/vmscan.c
--- orig/linux-page/mm/vmscan.c	Fri Jul 18 21:08:47 2003
+++ linux-page/mm/vmscan.c	Sun Jul 20 01:40:24 2003
@@ -49,6 +49,8 @@
 int vm_swappiness = 60;
 static long total_memory;
 
+static int writepage_called;
+
 #ifdef ARCH_HAS_PREFETCH
 #define prefetch_prev_lru_page(_page, _base, _field)			\
 	do {								\
@@ -339,6 +341,7 @@
 
 				SetPageReclaim(page);
 				res = mapping->a_ops->writepage(page, &wbc);
+				writepage_called = 1;
 
 				if (res == WRITEPAGE_ACTIVATE) {
 					ClearPageReclaim(page);
@@ -894,6 +897,8 @@
 	for (priority = DEF_PRIORITY; priority; priority--) {
 		int all_zones_ok = 1;
 
+		writepage_called = 0;
+
 		for (i = 0; i < pgdat->nr_zones; i++) {
 			struct zone *zone = pgdat->node_zones + i;
 			int nr_mapped = 0;
@@ -921,7 +926,7 @@
 			if (i < ZONE_HIGHMEM) {
 				reclaim_state->reclaimed_slab = 0;
 				shrink_slab(max_scan + nr_mapped, GFP_KERNEL);
-				to_free += reclaim_state->reclaimed_slab;
+				to_free -= reclaim_state->reclaimed_slab;
 			}
 			if (zone->all_unreclaimable)
 				continue;
@@ -930,7 +935,9 @@
 		}
 		if (all_zones_ok)
 			break;
-		blk_congestion_wait(WRITE, HZ/10);
+//		if (to_free)
+		if (writepage_called)
+			blk_congestion_wait(WRITE, HZ/10);
 	}
 	return nr_pages - to_free;
 }
@@ -976,10 +983,11 @@
 	 * us from recursively trying to free more memory as we're
 	 * trying to free the first piece of memory in the first place).
 	 */
-	tsk->flags |= PF_MEMALLOC|PF_KSWAPD;
+	tsk->flags |= PF_MEMALLOC|PF_KSWAPD|PF_IOTHREAD;
 
 	for ( ; ; ) {
 		struct page_state ps;
+		int freed;
 
 		if (current->flags & PF_FREEZE)
 			refrigerator(PF_IOTHREAD);
@@ -987,7 +995,8 @@
 		schedule();
 		finish_wait(&pgdat->kswapd_wait, &wait);
 		get_page_state(&ps);
-		balance_pgdat(pgdat, 0, &ps);
+		freed = balance_pgdat(pgdat, 0, &ps);
+//		printk("kswapd: freed %d pages\n", freed);
 	}
 }
 
-- 
Peter Osterlund - petero2@telia.com
http://w1.894.telia.com/~u89404340

  parent reply	other threads:[~2003-07-20  0:11 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-17 19:46 Software suspend testing in 2.6.0-test1 Peter Osterlund
2003-07-17 20:00 ` Pavel Machek
2003-07-17 20:09   ` Andrew Morton
2003-07-18  9:59     ` Peter Osterlund
2003-07-18 10:24       ` Andrew Morton
2003-07-18 15:22         ` Pavel Machek
2003-07-18 15:55           ` Peter Osterlund
2003-07-18 16:45             ` Andrew Morton
2003-07-18 17:50               ` Pavel Machek
2003-07-18 18:02                 ` Patrick Mochel
2003-07-18 18:04                   ` Pavel Machek
2003-07-18 21:05                     ` Nigel Cunningham
2003-07-18 19:37                 ` Andrew Morton
2003-07-18 19:58               ` Peter Osterlund
2003-07-18 20:15                 ` Andrew Morton
2003-07-18 22:13                   ` Pavel Machek
2003-07-20  0:22                   ` Peter Osterlund [this message]
2003-07-20  1:01                     ` Andrew Morton
2003-07-20  7:45                       ` Peter Osterlund
2003-07-18 21:05               ` Nigel Cunningham
2003-07-21 10:00   ` Peter Osterlund
2003-07-21 12:58     ` Pavel Machek
2003-07-21 14:36       ` Peter Osterlund
2003-07-21 21:28         ` Pavel Machek
2003-07-21 23:46           ` Peter Osterlund
2003-07-22 11:04             ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2wuee9hdo.fsf@telia.com \
    --to=petero2@telia.com \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.