All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Cc: Mel Gorman <mel@csn.ul.ie>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	hugh.dickins@tiscali.co.uk,
	pm list <linux-pm@lists.linux-foundation.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Alan Jenkins <sourcejedi.lkml@googlemail.com>
Subject: Re: s2disk hang update
Date: Tue, 16 Feb 2010 00:08:51 +0100	[thread overview]
Message-ID: <201002160008.51875.rjw__30604.8665698201$1266275467$gmane$org@sisk.pl> (raw)
In-Reply-To: <4B718F11.2010402@tuffmail.co.uk>

On Tuesday 09 February 2010, Alan Jenkins wrote:
> Alan Jenkins wrote:
> > On 2/2/10, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >   
> >> On Tuesday 02 February 2010, Alan Jenkins wrote:
> >>     
> >>> On 1/2/10, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >>>       
> >>>> On Saturday 02 January 2010, Alan Jenkins wrote:
> >>>> Hi,
> >>>>
> >>>>         
> >>>>> I've been suffering from s2disk hangs again.  This time, the hangs
> >>>>> were always before the hibernation image was written out.
> >>>>>
> >>>>> They're still frustratingly random.  I just started trying to work out
> >>>>> whether doubling PAGES_FOR_IO makes them go away, but they went away
> >>>>> on their own again.
> >>>>>
> >>>>> I did manage to capture a backtrace with debug info though.  Here it
> >>>>> is for 2.6.33-rc2.  (It has also happened on rc1).  I was able to get
> >>>>> the line numbers (using gdb, e.g.  "info line
> >>>>> *stop_machine_create+0x27"), having built the kernel with debug info.
> >>>>>
> >>>>> [top of trace lost due to screen height]
> >>>>> ? sync_page	(filemap.c:183)
> >>>>> ? wait_on_page_bit	(filemap.c:506)
> >>>>> ? wake_bit_function	(wait.c:174)
> >>>>> ? shrink_page_list	(vmscan.c:696)
> >>>>> ? __delayacct_blkio_end	(delayacct.c:94)
> >>>>> ? finish_wait	(list.h:142)
> >>>>> ? congestion_wait	(backing-dev.c:761)
> >>>>> ? shrink_inactive_list	(vmscan.c:1193)
> >>>>> ? scsi_request_fn	(spinlock.h:306)
> >>>>> ? blk_run_queue	(blk-core.c:434)
> >>>>> ? shrink_zone	(vmscan.c:1484)
> >>>>> ? do_try_to_free_pages	(vmscan.c:1684)
> >>>>> ? try_to_free_pages	(vmscan.c:1848)
> >>>>> ? isolate_pages_global	(vmscan.c:980)
> >>>>> ? __alloc_pages_nodemask	(page_alloc.c:1702)
> >>>>> ? __get_free_pages	(page_alloc.c:1990)
> >>>>> ? copy_process	(fork.c:237)
> >>>>> ? do_fork	(fork.c:1443)
> >>>>> ? rb_erase
> >>>>> ? __switch_to
> >>>>> ? kthread
> >>>>> ? kernel_thread
> >>>>> ? kthread
> >>>>> ? kernel_thread_helper
> >>>>> ? kthreadd
> >>>>> ? kthreadd
> >>>>> ? kernel_thread_helper
> >>>>>
> >>>>> INFO: task s2disk:2174 blocked for more than 120 seconds
> >>>>>           
> >>>> This looks like we have run out of memory while creating a new kernel
> >>>> thread
> >>>> and we have blocked on I/O while trying to free some space (quite
> >>>> obviously,
> >>>> because the I/O doesn't work at this point).
> >>>>         
> >>> For context, the kernel thread being created here is the stop_machine
> >>> thread.  It is created by disable_nonboot_cpus(), called from
> >>> hibernation_snapshot().  See e.g. this hung task backtrace -
> >>>
> >>> http://picasaweb.google.com/lh/photo/BkKUwZCrQ2ceBIM9ZOh7Ow?feat=directlink
> >>>
> >>>       
> >>>> I think it should help if you increase PAGES_FOR_IO, then.
> >>>>         
> >>> Ok, it's been happening again on 2.6.33-rc6.  Unfortunately increasing
> >>> PAGES_FOR_IO doesn't help.
> >>>
> >>> I've been using a test patch to make PAGES_FOR_IO tunable at run time.
> >>>  I get the same hang if I increase it by a factor of 10, to 10240:
> >>>
> >>> # cd /sys/module/kernel/parameters/
> >>> # ls
> >>> consoleblank  initcall_debug  PAGES_FOR_IO  panic  pause_on_oops
> >>> SPARE_PAGES
> >>> # echo 10240 > PAGES_FOR_IO
> >>> # echo 2560 > SPARE_PAGES
> >>> # cat SPARE_PAGES
> >>> 2560
> >>> # cat PAGES_FOR_IO
> >>> 10240
> >>>
> >>> I also added a debug patch to try and understand the calculations with
> >>> PAGES_FOR_IO in hibernate_preallocate_memory().  I still don't really
> >>> understand them and there could easily be errors in my debug patch,
> >>> but the output is interesting.
> >>>
> >>> Increasing PAGES_FOR_IO by almost 10000 has the expected effect of
> >>> decreasing "max_size" by the same amount.  However it doesn't appear
> >>> to increase the number of free pages at the critical moment.
> >>>
> >>> PAGES_FOR_IO = 1024:
> >>> http://picasaweb.google.com/lh/photo/DYQGvB_4hvCvVuxZf2ibxg?feat=directlink
> >>>
> >>> PAGES_FOR_IO = 10240:
> >>> http://picasaweb.google.com/lh/photo/AIkV_ZBwt22nzN-JdOJCWA?feat=directlink
> >>>
> >>>
> >>> You may remember that I was originally able to avoid the hang by
> >>> reverting commit 5f8dcc2.  It doesn't revert cleanly any more.
> >>> However, I tried applying my test&debug patches on top of 5f8dcc2~1
> >>> (just before the commit that triggered the hang).  That kernel
> >>> apparently left ~5000 pages free at hibernation time, v.s. ~1200 when
> >>> testing the same scenario on 2.6.33-rc6.  (As before, the number of
> >>> free pages remained the same if I increased PAGES_FOR_IO to 10240).
> >>>       
> >> I think the hang may be avoided by using this patch
> >> http://patchwork.kernel.org/patch/74740/
> >> but the hibernation will fail instead.
> >>
> >> Can you please repeat your experiments with the patch below applied and
> >> report back?
> >>
> >> Rafael
> >>     
> >
> > It causes hibernation to succeed <grin>.
> >   
> 
> Perhaps I spoke too soon.  I see the same hang if I run too many 
> applications.  The first hibernation fails with "not enough swap" as 
> expected, but the second or third attempt hangs (with the same backtrace 
> as before).
> 
> The patch definitely helps though.  Without the patch, I see a hang the 
> first time I try to hibernate with too many applications running.

Well, I have an idea.

Can you try to apply the appended patch in addition and see if that helps?

Rafael

---
 kernel/power/snapshot.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

Index: linux-2.6/kernel/power/snapshot.c
===================================================================
--- linux-2.6.orig/kernel/power/snapshot.c
+++ linux-2.6/kernel/power/snapshot.c
@@ -1179,6 +1179,17 @@ static void free_unnecessary_pages(void)
 		to_free_normal -= save_highmem - alloc_highmem;
 	}
 
+	/*
+	 * After we have preallocated memory for the image there may be too
+	 * little memory for other things done later down the road, like
+	 * starting new kernel threads for disabling nonboot CPUs.  Try to
+	 * mitigate this by reducing the number of pages that we're going to
+	 * keep preallocated by 20%.
+	 */
+	to_free_normal += (alloc_normal - to_free_normal) / 5;
+	if (to_free_normal > alloc_normal)
+		to_free_normal = alloc_normal;
+
 	memory_bm_position_reset(&copy_bm);
 
 	while (to_free_normal > 0 && to_free_highmem > 0) {

  parent reply	other threads:[~2010-02-15 23:08 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-02 15:03 s2disk hang update Alan Jenkins
2010-01-02 15:03 ` Alan Jenkins
2010-01-02 20:38 ` Rafael J. Wysocki
2010-01-02 20:38 ` Rafael J. Wysocki
2010-01-02 20:38   ` Rafael J. Wysocki
2010-02-02 14:21   ` Alan Jenkins
2010-02-02 14:21   ` Alan Jenkins
2010-02-02 20:34     ` Rafael J. Wysocki
2010-02-02 20:34     ` Rafael J. Wysocki
2010-02-03 11:14       ` Alan Jenkins
2010-02-09 16:36         ` Alan Jenkins
2010-02-09 16:36           ` Alan Jenkins
2010-02-15 23:08           ` Rafael J. Wysocki
2010-02-15 23:08             ` Rafael J. Wysocki
2010-02-16 11:09             ` Alan Jenkins
2010-02-16 11:09               ` Alan Jenkins
2010-02-16 15:12               ` Alan Jenkins
2010-02-16 15:12               ` Alan Jenkins
2010-02-16 15:12                 ` Alan Jenkins
2010-02-16 21:16                 ` Rafael J. Wysocki
2010-02-16 21:16                 ` Rafael J. Wysocki
2010-02-16 21:16                   ` Rafael J. Wysocki
2010-02-17 11:27                   ` Alan Jenkins
2010-02-17 11:27                   ` Alan Jenkins
2010-02-17 11:27                     ` Alan Jenkins
2010-02-17 19:58                     ` Rafael J. Wysocki
2010-02-17 19:58                       ` Rafael J. Wysocki
2010-02-18 12:53                       ` Alan Jenkins
2010-02-18 12:53                       ` Alan Jenkins
2010-02-18 12:53                         ` Alan Jenkins
2010-02-18 20:04                         ` Rafael J. Wysocki
2010-02-18 20:04                           ` Rafael J. Wysocki
2010-02-19 11:48                           ` Alan Jenkins
2010-02-19 11:48                           ` Alan Jenkins
2010-02-19 11:48                             ` Alan Jenkins
2010-02-21 20:47                             ` Rafael J. Wysocki
2010-02-22 15:35                               ` Alan Jenkins
2010-02-22 15:35                                 ` Alan Jenkins
2010-02-22 19:17                                 ` Rafael J. Wysocki
2010-02-22 19:17                                   ` Rafael J. Wysocki
2010-02-23 14:24                                   ` Alan Jenkins
2010-02-23 14:24                                     ` Alan Jenkins
2010-02-23 21:13                                     ` Rafael J. Wysocki
2010-02-23 21:13                                     ` Rafael J. Wysocki
2010-02-23 21:13                                       ` Rafael J. Wysocki
2010-02-23 21:13                                       ` Rafael J. Wysocki
2010-02-24  1:20                                       ` KAMEZAWA Hiroyuki
2010-02-24  1:20                                       ` KAMEZAWA Hiroyuki
2010-02-24  1:20                                         ` KAMEZAWA Hiroyuki
2010-02-24  1:20                                         ` KAMEZAWA Hiroyuki
2010-02-24 20:19                                         ` Alan Jenkins
2010-02-24 20:19                                           ` Alan Jenkins
2010-02-24 20:19                                           ` Alan Jenkins
2010-02-24 20:19                                         ` Alan Jenkins
2010-02-24 20:36                                         ` Rafael J. Wysocki
2010-02-24 20:36                                           ` Rafael J. Wysocki
2010-02-24 20:36                                           ` Rafael J. Wysocki
2010-02-24 20:36                                         ` Rafael J. Wysocki
2010-02-24 16:23                                       ` Alan Jenkins
2010-02-24 16:23                                       ` Alan Jenkins
2010-02-24 20:52                                         ` Rafael J. Wysocki
2010-02-24 20:52                                           ` Rafael J. Wysocki
2010-02-24 20:52                                           ` Rafael J. Wysocki
2010-02-25 13:10                                           ` Alan Jenkins
2010-02-25 13:10                                             ` Alan Jenkins
2010-02-25 13:10                                             ` Alan Jenkins
2010-02-25 20:04                                             ` Rafael J. Wysocki
2010-02-25 20:04                                             ` Rafael J. Wysocki
2010-02-25 20:04                                               ` Rafael J. Wysocki
2010-02-25 20:04                                               ` Rafael J. Wysocki
2010-02-26  9:26                                               ` Alan Jenkins
2010-02-26  9:26                                                 ` Alan Jenkins
2010-02-26  9:26                                                 ` Alan Jenkins
2010-02-26  9:26                                               ` Alan Jenkins
2010-02-25 13:10                                           ` Alan Jenkins
2010-02-24 20:52                                         ` Rafael J. Wysocki
2010-02-23 14:24                                   ` Alan Jenkins
2010-02-22 19:17                                 ` Rafael J. Wysocki
2010-02-22 15:35                               ` Alan Jenkins
2010-02-18 20:04                         ` Rafael J. Wysocki
2010-02-17 19:58                     ` Rafael J. Wysocki
2010-02-16 11:09             ` Alan Jenkins
2010-02-15 23:08           ` Rafael J. Wysocki [this message]
2010-02-09 16:36         ` Alan Jenkins
2010-02-03 11:14       ` Alan Jenkins
2010-01-02 15:03 Alan Jenkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='201002160008.51875.rjw__30604.8665698201$1266275467$gmane$org@sisk.pl' \
    --to=rjw@sisk.pl \
    --cc=alan-jenkins@tuffmail.co.uk \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=kernel-testers@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=mel@csn.ul.ie \
    --cc=sourcejedi.lkml@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.