On 2/2/10, Rafael J. Wysocki wrote: > On Tuesday 02 February 2010, Alan Jenkins wrote: >> On 1/2/10, Rafael J. Wysocki wrote: >> > On Saturday 02 January 2010, Alan Jenkins wrote: >> > Hi, >> > >> >> I've been suffering from s2disk hangs again. This time, the hangs >> >> were always before the hibernation image was written out. >> >> >> >> They're still frustratingly random. I just started trying to work out >> >> whether doubling PAGES_FOR_IO makes them go away, but they went away >> >> on their own again. >> >> >> >> I did manage to capture a backtrace with debug info though. Here it >> >> is for 2.6.33-rc2. (It has also happened on rc1). I was able to get >> >> the line numbers (using gdb, e.g. "info line >> >> *stop_machine_create+0x27"), having built the kernel with debug info. >> >> >> >> [top of trace lost due to screen height] >> >> ? sync_page (filemap.c:183) >> >> ? wait_on_page_bit (filemap.c:506) >> >> ? wake_bit_function (wait.c:174) >> >> ? shrink_page_list (vmscan.c:696) >> >> ? __delayacct_blkio_end (delayacct.c:94) >> >> ? finish_wait (list.h:142) >> >> ? congestion_wait (backing-dev.c:761) >> >> ? shrink_inactive_list (vmscan.c:1193) >> >> ? scsi_request_fn (spinlock.h:306) >> >> ? blk_run_queue (blk-core.c:434) >> >> ? shrink_zone (vmscan.c:1484) >> >> ? do_try_to_free_pages (vmscan.c:1684) >> >> ? try_to_free_pages (vmscan.c:1848) >> >> ? isolate_pages_global (vmscan.c:980) >> >> ? __alloc_pages_nodemask (page_alloc.c:1702) >> >> ? __get_free_pages (page_alloc.c:1990) >> >> ? copy_process (fork.c:237) >> >> ? do_fork (fork.c:1443) >> >> ? rb_erase >> >> ? __switch_to >> >> ? kthread >> >> ? kernel_thread >> >> ? kthread >> >> ? kernel_thread_helper >> >> ? kthreadd >> >> ? kthreadd >> >> ? kernel_thread_helper >> >> >> >> INFO: task s2disk:2174 blocked for more than 120 seconds >> > >> > This looks like we have run out of memory while creating a new kernel >> > thread >> > and we have blocked on I/O while trying to free some space (quite >> > obviously, >> > because the I/O doesn't work at this point). >> >> For context, the kernel thread being created here is the stop_machine >> thread. It is created by disable_nonboot_cpus(), called from >> hibernation_snapshot(). See e.g. this hung task backtrace - >> >> http://picasaweb.google.com/lh/photo/BkKUwZCrQ2ceBIM9ZOh7Ow?feat=directlink >> >> > I think it should help if you increase PAGES_FOR_IO, then. >> >> Ok, it's been happening again on 2.6.33-rc6. Unfortunately increasing >> PAGES_FOR_IO doesn't help. >> >> I've been using a test patch to make PAGES_FOR_IO tunable at run time. >> I get the same hang if I increase it by a factor of 10, to 10240: >> >> # cd /sys/module/kernel/parameters/ >> # ls >> consoleblank initcall_debug PAGES_FOR_IO panic pause_on_oops >> SPARE_PAGES >> # echo 10240 > PAGES_FOR_IO >> # echo 2560 > SPARE_PAGES >> # cat SPARE_PAGES >> 2560 >> # cat PAGES_FOR_IO >> 10240 >> >> I also added a debug patch to try and understand the calculations with >> PAGES_FOR_IO in hibernate_preallocate_memory(). I still don't really >> understand them and there could easily be errors in my debug patch, >> but the output is interesting. >> >> Increasing PAGES_FOR_IO by almost 10000 has the expected effect of >> decreasing "max_size" by the same amount. However it doesn't appear >> to increase the number of free pages at the critical moment. >> >> PAGES_FOR_IO = 1024: >> http://picasaweb.google.com/lh/photo/DYQGvB_4hvCvVuxZf2ibxg?feat=directlink >> >> PAGES_FOR_IO = 10240: >> http://picasaweb.google.com/lh/photo/AIkV_ZBwt22nzN-JdOJCWA?feat=directlink >> >> >> You may remember that I was originally able to avoid the hang by >> reverting commit 5f8dcc2. It doesn't revert cleanly any more. >> However, I tried applying my test&debug patches on top of 5f8dcc2~1 >> (just before the commit that triggered the hang). That kernel >> apparently left ~5000 pages free at hibernation time, v.s. ~1200 when >> testing the same scenario on 2.6.33-rc6. (As before, the number of >> free pages remained the same if I increased PAGES_FOR_IO to 10240). > > I think the hang may be avoided by using this patch > http://patchwork.kernel.org/patch/74740/ > but the hibernation will fail instead. > > Can you please repeat your experiments with the patch below applied and > report back? > > Rafael It causes hibernation to succeed . I've attached a dmesg from a successful hibernation with both patches applied. And for comparison, a screenshot from a hung hibernation without the fix, but with the debug patch you sent me. [In both cases I tested directly on top of v2.6.33-rc6, i.e. no changes to PAGES_FOR_IO or anything else]. Many thanks! Alan