All of lore.kernel.org
 help / color / mirror / Atom feed
* PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-07-12 16:05 Christoph
  2011-07-13  0:03 ` Dave Chinner
  0 siblings, 1 reply; 68+ messages in thread
From: Christoph @ 2011-07-12 16:05 UTC (permalink / raw)
  To: xfs

Hi!

I'd like you to have a look into this issue:

pm-hibernate locks up when using xfs while "Preallocating image memory".

https://bugzilla.kernel.org/show_bug.cgi?id=33622

I got at least this backtrace (2.6.39.3)

tia

chris



SysRq : Show Blocked State

pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
 ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
 ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
 ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
Call Trace:
 [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
 [<ffffffff81344b64>] wait_for_common+0xcb/0x148
 [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
 [<ffffffff81345527>] ? down_write+0x2d/0x31
 [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
 [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
 [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
 [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
 [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
 [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
 [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
 [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
 [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
 [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
 [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
 [<ffffffff810781bf>] hibernate+0xd1/0x1b8
 [<ffffffff81076c58>] state_store+0x57/0xce
 [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
 [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
 [<ffffffff810fca74>] vfs_write+0xa9/0x105
 [<ffffffff810fcb89>] sys_write+0x45/0x6c
 [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b

kworker/0:2     D 0000000000000000     0   379      2 0x00000000
 ffff88000ea99b90 0000000000000046 ffff88000b8ff840 0000000000000000
 ffff88000ea98010 ffff88000c319ac0 0000000000013900 ffff88000ea99fd8
 ffff88000ea99fd8 0000000000013900 ffffffff8148b020 ffff88000c319ac0
Call Trace:
 [<ffffffff8134515f>] __mutex_lock_common+0x12e/0x195
 [<ffffffff813451da>] __mutex_lock_slowpath+0x14/0x16
 [<ffffffff813452a8>] mutex_lock+0x1e/0x38
 [<ffffffffa0237738>] xfs_reclaim_inodes_ag+0x7a/0x264 [xfs]
 [<ffffffffa02379cc>] ? xfs_reclaim_inodes+0x1a/0x1a [xfs]
 [<ffffffffa02379ca>] xfs_reclaim_inodes+0x18/0x1a [xfs]
 [<ffffffffa02379e9>] xfs_reclaim_worker+0x1d/0x29 [xfs]
 [<ffffffff81059ec4>] process_one_work+0x1de/0x2ec
 [<ffffffff8105bf60>] worker_thread+0x13d/0x262
 [<ffffffff8105be23>] ? manage_workers+0x194/0x194
 [<ffffffff8105f25a>] kthread+0x7d/0x85
 [<ffffffff8134d5a4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
 [<ffffffff8134d5a0>] ? gs_change+0x13/0x13

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-12 16:05 PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Christoph
@ 2011-07-13  0:03 ` Dave Chinner
  2011-07-26 20:28     ` Rafael J. Wysocki
  0 siblings, 1 reply; 68+ messages in thread
From: Dave Chinner @ 2011-07-13  0:03 UTC (permalink / raw)
  To: Christoph; +Cc: rjw, xfs

On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> Hi!
> 
> I'd like you to have a look into this issue:
> 
> pm-hibernate locks up when using xfs while "Preallocating image memory".
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=33622
> 
> I got at least this backtrace (2.6.39.3)
> 
> tia
> 
> chris
> 
> 
> 
> SysRq : Show Blocked State
> 
> pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
>  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
>  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
>  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> Call Trace:
>  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
>  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
>  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
>  [<ffffffff81345527>] ? down_write+0x2d/0x31
>  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
>  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
>  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
>  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
>  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
>  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
>  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
>  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
>  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
>  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
>  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
>  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
>  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
>  [<ffffffff81076c58>] state_store+0x57/0xce
>  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
>  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
>  [<ffffffff810fca74>] vfs_write+0xa9/0x105
>  [<ffffffff810fcb89>] sys_write+0x45/0x6c
>  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b

It's waiting for IO completion, and holding an AG scan lock.

And IO completion requires a workqueue to run. Just FYI, this
process of inode reclaim can dirty the filesystem, long after
hibernate have assumed that it is clean due to the sys_sync() call
you do after freezing the processes. I pointed out this flaw in
using sync to write dirty data prior to hibernate a couple of years
ago.

Anyway, it's a good thing that XFS doesn't use freezable work
queues, otherwise it would hang on every hibernate. Perhaps I should
do that to force hibernate to do things properly in filesystems
land.

However, it is entirely possible that something else that XFS relies
on for IO completion has been put to sleep by this point.

/me finds the smoking cannon:

[  648.794455] xfsbufd/sda3    D 0000000000000000     0   192      2 0x00000000
[  648.794455]  ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
[  648.794455]  ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
[  648.794455]  ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
[  648.794455] Call Trace:
[  648.794455]  [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
[  648.794455]  [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
[  648.794455]  [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
[  648.794455]  [<ffffffff8105f25a>] kthread+0x7d/0x85
[  648.794455]  [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
[  648.794455]  [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
[  648.794455]  [<ffffffff8134d6e0>] ? gs_change+0x13/0x13

The xfsbufd, responsible for pushing out dirty metadata, has been
been frozen. sys_sync() does not push out dirty metadata because it
is already on stable storage in the journal. If the flush lock is
already held on the inode, then inode reclaim will wait for the
xfsbufd to flush the backing buffer because reclaim can't do it
directly. And hibernate has already frozen the xfsbufd.

IOWs, what hibernate does is:

	freeze_processes()
	sys_sync()
	allocate a large amount of memory

Freezing the processes causes parts of filesystems to be put in the
fridge, which means there is no guarantee that sys_sync() actually
does what it is supposed to. As it is, sys_sync() really only
guarantees file data is clean in memory - metadata does not need to
be clean as long s it has been journalled and the journal is safe on
disk.

Further, allocating memory can cause memory reclaim to enter the
filesystem and try to free memory held by the filesystem. In XFS (at
least) this can cause the filesystem to issue tranactions and
metadata IO to clean the dirty metadata to enable it to be
reclaimed. So hibernate is effectively guaranteed to dirty the
filesystem after it has frozen all the worker threads the filesystem
might rely on.

Also, by this point kswapd has already been frozen, so hibernate is
relying totally on direct memory reclaim to free up the memory it
requires. I'm not sure that's a good idea.

IOWs, hibernate is still broken by design - and broken in exactly
the way that was pointed out a couple of years ago by myself and
others in the filesystem world: sys_sync() does not quiesce or
guarantee a clean filesystem in memory after it completes.

There is a solution to this, and it already exists - it's called
freezing the filesystem. Effectively hibernate needs to allocate
memory before it freezes kernel/filesystem worker threads:

	freeze_userspace_processes()

	// just to clean the page cache quickly
	sys_sync()

	// optionally to free page/inode/dentry caches:
		iterate_supers(drop_pagecache_sb, NULL);
		drop_slab()

	allocate a large amount of memory

	// Now quiesce the filesystems and clean remaining metadata
	iterate_supers(freeze_super, NULL);

	freeze_remaining_processes()

This guarantees that filesystems are still working when memory
reclaim comes along to free memory for the hibernate image, and that
once it is allocated that filesystems will not be changed until
thawed on the hibernate wakeup.

So, like I said a couple of years ago: fix hibernate to quiesce
filesystems properly, and the hibernate will be much more reliable
and robust and less likely to break randomly in the future.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-13  0:03 ` Dave Chinner
@ 2011-07-26 20:28     ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-07-26 20:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph, Linux PM mailing list, Pavel Machek, xfs

On Wednesday, July 13, 2011, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> > Hi!
> > 
> > I'd like you to have a look into this issue:
> > 
> > pm-hibernate locks up when using xfs while "Preallocating image memory".
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=33622
> > 
> > I got at least this backtrace (2.6.39.3)
> > 
> > tia
> > 
> > chris
> > 
> > 
> > 
> > SysRq : Show Blocked State
> > 
> > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > Call Trace:
> >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> >  [<ffffffff81076c58>] state_store+0x57/0xce
> >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> 
> It's waiting for IO completion, and holding an AG scan lock.
> 
> And IO completion requires a workqueue to run. Just FYI, this
> process of inode reclaim can dirty the filesystem, long after
> hibernate have assumed that it is clean due to the sys_sync() call
> you do after freezing the processes. I pointed out this flaw in
> using sync to write dirty data prior to hibernate a couple of years
> ago.

However, attempts to remove the sys_sync() from the hibernate code
were objected to by some developers, since they believe it will increase
the probability of data loss in case of a failing hibernation in general. 

> Anyway, it's a good thing that XFS doesn't use freezable work
> queues, otherwise it would hang on every hibernate. Perhaps I should
> do that to force hibernate to do things properly in filesystems
> land.

Well, I'd say it's a very well known fact that filesystems are not
handled in any special way during hibernation, which is not a good
thing.  Nevertheless, I've never seen anyone from the filesystems land
pay any kind of attention to this issue.

> However, it is entirely possible that something else that XFS relies
> on for IO completion has been put to sleep by this point.
> 
> /me finds the smoking cannon:
> 
> [  648.794455] xfsbufd/sda3    D 0000000000000000     0   192      2 0x00000000
> [  648.794455]  ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
> [  648.794455]  ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
> [  648.794455]  ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
> [  648.794455] Call Trace:
> [  648.794455]  [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
> [  648.794455]  [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
> [  648.794455]  [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
> [  648.794455]  [<ffffffff8105f25a>] kthread+0x7d/0x85
> [  648.794455]  [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
> [  648.794455]  [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
> [  648.794455]  [<ffffffff8134d6e0>] ? gs_change+0x13/0x13
> 
> The xfsbufd, responsible for pushing out dirty metadata, has been
> been frozen. sys_sync() does not push out dirty metadata because it
> is already on stable storage in the journal. If the flush lock is
> already held on the inode, then inode reclaim will wait for the
> xfsbufd to flush the backing buffer because reclaim can't do it
> directly. And hibernate has already frozen the xfsbufd.
> 
> IOWs, what hibernate does is:
> 
> 	freeze_processes()
> 	sys_sync()
> 	allocate a large amount of memory
> 
> Freezing the processes causes parts of filesystems to be put in the
> fridge, which means there is no guarantee that sys_sync() actually
> does what it is supposed to. As it is, sys_sync() really only
> guarantees file data is clean in memory - metadata does not need to
> be clean as long s it has been journalled and the journal is safe on
> disk.
>
> Further, allocating memory can cause memory reclaim to enter the
> filesystem and try to free memory held by the filesystem. In XFS (at
> least) this can cause the filesystem to issue tranactions and
> metadata IO to clean the dirty metadata to enable it to be
> reclaimed. So hibernate is effectively guaranteed to dirty the
> filesystem after it has frozen all the worker threads the filesystem
> might rely on.
> 
> Also, by this point kswapd has already been frozen, so hibernate is
> relying totally on direct memory reclaim to free up the memory it
> requires. I'm not sure that's a good idea.
> 
> IOWs, hibernate is still broken by design - and broken in exactly
> the way that was pointed out a couple of years ago by myself and
> others in the filesystem world: sys_sync() does not quiesce or
> guarantee a clean filesystem in memory after it completes.
> 
> There is a solution to this, and it already exists - it's called
> freezing the filesystem. Effectively hibernate needs to allocate
> memory before it freezes kernel/filesystem worker threads:
> 
> 	freeze_userspace_processes()
> 
> 	// just to clean the page cache quickly
> 	sys_sync()
> 
> 	// optionally to free page/inode/dentry caches:
> 		iterate_supers(drop_pagecache_sb, NULL);
> 		drop_slab()
> 
> 	allocate a large amount of memory
> 
> 	// Now quiesce the filesystems and clean remaining metadata
> 	iterate_supers(freeze_super, NULL);
> 
> 	freeze_remaining_processes()
> 
> This guarantees that filesystems are still working when memory
> reclaim comes along to free memory for the hibernate image, and that
> once it is allocated that filesystems will not be changed until
> thawed on the hibernate wakeup.
> 
> So, like I said a couple of years ago: fix hibernate to quiesce
> filesystems properly, and the hibernate will be much more reliable
> and robust and less likely to break randomly in the future.

Why don't you simply submit a patch to do that?

Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-07-26 20:28     ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-07-26 20:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph, Linux PM mailing list, xfs

On Wednesday, July 13, 2011, Dave Chinner wrote:
> On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> > Hi!
> > 
> > I'd like you to have a look into this issue:
> > 
> > pm-hibernate locks up when using xfs while "Preallocating image memory".
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=33622
> > 
> > I got at least this backtrace (2.6.39.3)
> > 
> > tia
> > 
> > chris
> > 
> > 
> > 
> > SysRq : Show Blocked State
> > 
> > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > Call Trace:
> >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> >  [<ffffffff81076c58>] state_store+0x57/0xce
> >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> 
> It's waiting for IO completion, and holding an AG scan lock.
> 
> And IO completion requires a workqueue to run. Just FYI, this
> process of inode reclaim can dirty the filesystem, long after
> hibernate have assumed that it is clean due to the sys_sync() call
> you do after freezing the processes. I pointed out this flaw in
> using sync to write dirty data prior to hibernate a couple of years
> ago.

However, attempts to remove the sys_sync() from the hibernate code
were objected to by some developers, since they believe it will increase
the probability of data loss in case of a failing hibernation in general. 

> Anyway, it's a good thing that XFS doesn't use freezable work
> queues, otherwise it would hang on every hibernate. Perhaps I should
> do that to force hibernate to do things properly in filesystems
> land.

Well, I'd say it's a very well known fact that filesystems are not
handled in any special way during hibernation, which is not a good
thing.  Nevertheless, I've never seen anyone from the filesystems land
pay any kind of attention to this issue.

> However, it is entirely possible that something else that XFS relies
> on for IO completion has been put to sleep by this point.
> 
> /me finds the smoking cannon:
> 
> [  648.794455] xfsbufd/sda3    D 0000000000000000     0   192      2 0x00000000
> [  648.794455]  ffff88003720be00 0000000000000046 ffff88003720bd90 ffffffff00000000
> [  648.794455]  ffff88003720a010 ffff880056bc3580 0000000000013900 ffff88003720bfd8
> [  648.794455]  ffff88003720bfd8 0000000000013900 ffffffff8148b020 ffff880056bc3580
> [  648.794455] Call Trace:
> [  648.794455]  [<ffffffff81065c0a>] refrigerator+0xbd/0xd3
> [  648.794455]  [<ffffffffa022d072>] xfsbufd+0x93/0x14d [xfs]
> [  648.794455]  [<ffffffffa022cfdf>] ? xfs_free_buftarg+0x4c/0x4c [xfs]
> [  648.794455]  [<ffffffff8105f25a>] kthread+0x7d/0x85
> [  648.794455]  [<ffffffff8134d6e4>] kernel_thread_helper+0x4/0x10
> [  648.794455]  [<ffffffff8105f1dd>] ? kthread_worker_fn+0x148/0x148
> [  648.794455]  [<ffffffff8134d6e0>] ? gs_change+0x13/0x13
> 
> The xfsbufd, responsible for pushing out dirty metadata, has been
> been frozen. sys_sync() does not push out dirty metadata because it
> is already on stable storage in the journal. If the flush lock is
> already held on the inode, then inode reclaim will wait for the
> xfsbufd to flush the backing buffer because reclaim can't do it
> directly. And hibernate has already frozen the xfsbufd.
> 
> IOWs, what hibernate does is:
> 
> 	freeze_processes()
> 	sys_sync()
> 	allocate a large amount of memory
> 
> Freezing the processes causes parts of filesystems to be put in the
> fridge, which means there is no guarantee that sys_sync() actually
> does what it is supposed to. As it is, sys_sync() really only
> guarantees file data is clean in memory - metadata does not need to
> be clean as long s it has been journalled and the journal is safe on
> disk.
>
> Further, allocating memory can cause memory reclaim to enter the
> filesystem and try to free memory held by the filesystem. In XFS (at
> least) this can cause the filesystem to issue tranactions and
> metadata IO to clean the dirty metadata to enable it to be
> reclaimed. So hibernate is effectively guaranteed to dirty the
> filesystem after it has frozen all the worker threads the filesystem
> might rely on.
> 
> Also, by this point kswapd has already been frozen, so hibernate is
> relying totally on direct memory reclaim to free up the memory it
> requires. I'm not sure that's a good idea.
> 
> IOWs, hibernate is still broken by design - and broken in exactly
> the way that was pointed out a couple of years ago by myself and
> others in the filesystem world: sys_sync() does not quiesce or
> guarantee a clean filesystem in memory after it completes.
> 
> There is a solution to this, and it already exists - it's called
> freezing the filesystem. Effectively hibernate needs to allocate
> memory before it freezes kernel/filesystem worker threads:
> 
> 	freeze_userspace_processes()
> 
> 	// just to clean the page cache quickly
> 	sys_sync()
> 
> 	// optionally to free page/inode/dentry caches:
> 		iterate_supers(drop_pagecache_sb, NULL);
> 		drop_slab()
> 
> 	allocate a large amount of memory
> 
> 	// Now quiesce the filesystems and clean remaining metadata
> 	iterate_supers(freeze_super, NULL);
> 
> 	freeze_remaining_processes()
> 
> This guarantees that filesystems are still working when memory
> reclaim comes along to free memory for the hibernate image, and that
> once it is allocated that filesystems will not be changed until
> thawed on the hibernate wakeup.
> 
> So, like I said a couple of years ago: fix hibernate to quiesce
> filesystems properly, and the hibernate will be much more reliable
> and robust and less likely to break randomly in the future.

Why don't you simply submit a patch to do that?

Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-26 20:28     ` Rafael J. Wysocki
@ 2011-07-27  0:45       ` Dave Chinner
  -1 siblings, 0 replies; 68+ messages in thread
From: Dave Chinner @ 2011-07-27  0:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Linux PM mailing list, Pavel Machek, xfs

On Tue, Jul 26, 2011 at 10:28:11PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, July 13, 2011, Dave Chinner wrote:
> > On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
.....
> > > SysRq : Show Blocked State
> > > 
> > > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> > >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > > Call Trace:
> > >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> > >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > >  [<ffffffff81076c58>] state_store+0x57/0xce
> > >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> > 
> > It's waiting for IO completion, and holding an AG scan lock.
> > 
> > And IO completion requires a workqueue to run. Just FYI, this
> > process of inode reclaim can dirty the filesystem, long after
> > hibernate have assumed that it is clean due to the sys_sync() call
> > you do after freezing the processes. I pointed out this flaw in
> > using sync to write dirty data prior to hibernate a couple of years
> > ago.
> 
> However, attempts to remove the sys_sync() from the hibernate code
> were objected to by some developers, since they believe it will increase
> the probability of data loss in case of a failing hibernation in general. 

I'm not suggesting it gets removed, I'm suggesting it gets replaced
because it doesn't give the guarantees that you want or need.

> > Anyway, it's a good thing that XFS doesn't use freezable work
> > queues, otherwise it would hang on every hibernate. Perhaps I should
> > do that to force hibernate to do things properly in filesystems
> > land.
> 
> Well, I'd say it's a very well known fact that filesystems are not
> handled in any special way during hibernation, which is not a good
> thing.  Nevertheless, I've never seen anyone from the filesystems land
> pay any kind of attention to this issue.

I beg to differ. We went through this exact clas of bugs with swsusp
back in 2006:

https://lkml.org/lkml/2006/11/12/144

And this patch:

https://lkml.org/lkml/2006/11/1/155

([PATCH -mm] swsusp: Freeze filesystems during suspend)

"This is needed by swsusp, because some filesystems (eg. XFS) use
work queues and worker_threads run with PF_NOFREEZE set, so they can
cause some writes to be performed after the suspend image has been
created which may corrupt the filesystem.  The additional benefit of
it is that if the resume fails, the filesystems will be in a
consistent state and there won't be any journal replays needed."

--

And the patch essentially does:

-			sys_sync();
+			freeze_filesystems();


But, Pavel didn't like freezing filesystems to quiesce them
correctly, so the sys_sync() and all it's problems have remained
until this day, where we still have users tripping over the same
"filesystem not idle" problems.

[....]

> > IOWs, what hibernate does is:
> > 
> > 	freeze_processes()
> > 	sys_sync()
> > 	allocate a large amount of memory
> > 
> > Freezing the processes causes parts of filesystems to be put in the
> > fridge, which means there is no guarantee that sys_sync() actually
> > does what it is supposed to. As it is, sys_sync() really only
> > guarantees file data is clean in memory - metadata does not need to
> > be clean as long s it has been journalled and the journal is safe on
> > disk.
> >
> > Further, allocating memory can cause memory reclaim to enter the
> > filesystem and try to free memory held by the filesystem. In XFS (at
> > least) this can cause the filesystem to issue tranactions and
> > metadata IO to clean the dirty metadata to enable it to be
> > reclaimed. So hibernate is effectively guaranteed to dirty the
> > filesystem after it has frozen all the worker threads the filesystem
> > might rely on.
> > 
> > Also, by this point kswapd has already been frozen, so hibernate is
> > relying totally on direct memory reclaim to free up the memory it
> > requires. I'm not sure that's a good idea.
> > 
> > IOWs, hibernate is still broken by design - and broken in exactly
> > the way that was pointed out a couple of years ago by myself and
> > others in the filesystem world: sys_sync() does not quiesce or
> > guarantee a clean filesystem in memory after it completes.
> > 
> > There is a solution to this, and it already exists - it's called
> > freezing the filesystem. Effectively hibernate needs to allocate
> > memory before it freezes kernel/filesystem worker threads:
> > 
> > 	freeze_userspace_processes()
> > 
> > 	// just to clean the page cache quickly
> > 	sys_sync()
> > 
> > 	// optionally to free page/inode/dentry caches:
> > 		iterate_supers(drop_pagecache_sb, NULL);
> > 		drop_slab()
> > 
> > 	allocate a large amount of memory
> > 
> > 	// Now quiesce the filesystems and clean remaining metadata
> > 	iterate_supers(freeze_super, NULL);
> > 
> > 	freeze_remaining_processes()
> > 
> > This guarantees that filesystems are still working when memory
> > reclaim comes along to free memory for the hibernate image, and that
> > once it is allocated that filesystems will not be changed until
> > thawed on the hibernate wakeup.
> > 
> > So, like I said a couple of years ago: fix hibernate to quiesce
> > filesystems properly, and the hibernate will be much more reliable
> > and robust and less likely to break randomly in the future.
> 
> Why don't you simply submit a patch to do that?

a) I don't know how to test suspend/hibernate
b) I don't have any hardware I can test it on.
c) I don't scale to solving every problem Linux has
d) you guys need to decide how you're going to fix this because the
problem has already been solved once before and it didn't get merged
because nobody in the swsusp/hibernate world could agree on anything
at the time.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-07-27  0:45       ` Dave Chinner
  0 siblings, 0 replies; 68+ messages in thread
From: Dave Chinner @ 2011-07-27  0:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Linux PM mailing list, xfs

On Tue, Jul 26, 2011 at 10:28:11PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, July 13, 2011, Dave Chinner wrote:
> > On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
.....
> > > SysRq : Show Blocked State
> > > 
> > > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> > >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > > Call Trace:
> > >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> > >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > >  [<ffffffff81076c58>] state_store+0x57/0xce
> > >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> > 
> > It's waiting for IO completion, and holding an AG scan lock.
> > 
> > And IO completion requires a workqueue to run. Just FYI, this
> > process of inode reclaim can dirty the filesystem, long after
> > hibernate have assumed that it is clean due to the sys_sync() call
> > you do after freezing the processes. I pointed out this flaw in
> > using sync to write dirty data prior to hibernate a couple of years
> > ago.
> 
> However, attempts to remove the sys_sync() from the hibernate code
> were objected to by some developers, since they believe it will increase
> the probability of data loss in case of a failing hibernation in general. 

I'm not suggesting it gets removed, I'm suggesting it gets replaced
because it doesn't give the guarantees that you want or need.

> > Anyway, it's a good thing that XFS doesn't use freezable work
> > queues, otherwise it would hang on every hibernate. Perhaps I should
> > do that to force hibernate to do things properly in filesystems
> > land.
> 
> Well, I'd say it's a very well known fact that filesystems are not
> handled in any special way during hibernation, which is not a good
> thing.  Nevertheless, I've never seen anyone from the filesystems land
> pay any kind of attention to this issue.

I beg to differ. We went through this exact clas of bugs with swsusp
back in 2006:

https://lkml.org/lkml/2006/11/12/144

And this patch:

https://lkml.org/lkml/2006/11/1/155

([PATCH -mm] swsusp: Freeze filesystems during suspend)

"This is needed by swsusp, because some filesystems (eg. XFS) use
work queues and worker_threads run with PF_NOFREEZE set, so they can
cause some writes to be performed after the suspend image has been
created which may corrupt the filesystem.  The additional benefit of
it is that if the resume fails, the filesystems will be in a
consistent state and there won't be any journal replays needed."

--

And the patch essentially does:

-			sys_sync();
+			freeze_filesystems();


But, Pavel didn't like freezing filesystems to quiesce them
correctly, so the sys_sync() and all it's problems have remained
until this day, where we still have users tripping over the same
"filesystem not idle" problems.

[....]

> > IOWs, what hibernate does is:
> > 
> > 	freeze_processes()
> > 	sys_sync()
> > 	allocate a large amount of memory
> > 
> > Freezing the processes causes parts of filesystems to be put in the
> > fridge, which means there is no guarantee that sys_sync() actually
> > does what it is supposed to. As it is, sys_sync() really only
> > guarantees file data is clean in memory - metadata does not need to
> > be clean as long s it has been journalled and the journal is safe on
> > disk.
> >
> > Further, allocating memory can cause memory reclaim to enter the
> > filesystem and try to free memory held by the filesystem. In XFS (at
> > least) this can cause the filesystem to issue tranactions and
> > metadata IO to clean the dirty metadata to enable it to be
> > reclaimed. So hibernate is effectively guaranteed to dirty the
> > filesystem after it has frozen all the worker threads the filesystem
> > might rely on.
> > 
> > Also, by this point kswapd has already been frozen, so hibernate is
> > relying totally on direct memory reclaim to free up the memory it
> > requires. I'm not sure that's a good idea.
> > 
> > IOWs, hibernate is still broken by design - and broken in exactly
> > the way that was pointed out a couple of years ago by myself and
> > others in the filesystem world: sys_sync() does not quiesce or
> > guarantee a clean filesystem in memory after it completes.
> > 
> > There is a solution to this, and it already exists - it's called
> > freezing the filesystem. Effectively hibernate needs to allocate
> > memory before it freezes kernel/filesystem worker threads:
> > 
> > 	freeze_userspace_processes()
> > 
> > 	// just to clean the page cache quickly
> > 	sys_sync()
> > 
> > 	// optionally to free page/inode/dentry caches:
> > 		iterate_supers(drop_pagecache_sb, NULL);
> > 		drop_slab()
> > 
> > 	allocate a large amount of memory
> > 
> > 	// Now quiesce the filesystems and clean remaining metadata
> > 	iterate_supers(freeze_super, NULL);
> > 
> > 	freeze_remaining_processes()
> > 
> > This guarantees that filesystems are still working when memory
> > reclaim comes along to free memory for the hibernate image, and that
> > once it is allocated that filesystems will not be changed until
> > thawed on the hibernate wakeup.
> > 
> > So, like I said a couple of years ago: fix hibernate to quiesce
> > filesystems properly, and the hibernate will be much more reliable
> > and robust and less likely to break randomly in the future.
> 
> Why don't you simply submit a patch to do that?

a) I don't know how to test suspend/hibernate
b) I don't have any hardware I can test it on.
c) I don't scale to solving every problem Linux has
d) you guys need to decide how you're going to fix this because the
problem has already been solved once before and it didn't get merged
because nobody in the swsusp/hibernate world could agree on anything
at the time.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27  0:45       ` Dave Chinner
@ 2011-07-27  9:35         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-07-27  9:35 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Linux PM mailing list, Nigel Cunningham, Pavel Machek, xfs

On Wednesday, July 27, 2011, Dave Chinner wrote:
> On Tue, Jul 26, 2011 at 10:28:11PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, July 13, 2011, Dave Chinner wrote:
> > > On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> .....
> > > > SysRq : Show Blocked State
> > > > 
> > > > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> > > >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > > >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > > >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > > > Call Trace:
> > > >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > > >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > > >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > > >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> > > >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > > >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > > >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > > >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > > >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > > >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > > >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > > >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > > >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > > >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > > >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > > >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > > >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > > >  [<ffffffff81076c58>] state_store+0x57/0xce
> > > >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > > >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > > >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > > >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > > >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> > > 
> > > It's waiting for IO completion, and holding an AG scan lock.
> > > 
> > > And IO completion requires a workqueue to run. Just FYI, this
> > > process of inode reclaim can dirty the filesystem, long after
> > > hibernate have assumed that it is clean due to the sys_sync() call
> > > you do after freezing the processes. I pointed out this flaw in
> > > using sync to write dirty data prior to hibernate a couple of years
> > > ago.
> > 
> > However, attempts to remove the sys_sync() from the hibernate code
> > were objected to by some developers, since they believe it will increase
> > the probability of data loss in case of a failing hibernation in general. 
> 
> I'm not suggesting it gets removed, I'm suggesting it gets replaced
> because it doesn't give the guarantees that you want or need.
> 
> > > Anyway, it's a good thing that XFS doesn't use freezable work
> > > queues, otherwise it would hang on every hibernate. Perhaps I should
> > > do that to force hibernate to do things properly in filesystems
> > > land.
> > 
> > Well, I'd say it's a very well known fact that filesystems are not
> > handled in any special way during hibernation, which is not a good
> > thing.  Nevertheless, I've never seen anyone from the filesystems land
> > pay any kind of attention to this issue.
> 
> I beg to differ. We went through this exact clas of bugs with swsusp
> back in 2006:
> 
> https://lkml.org/lkml/2006/11/12/144

You're right, sorry.  We discussed this 5 years ago, but the context
was a bit different (the hibernation code was using a different
mechanism for freeing memory).

> And this patch:
> 
> https://lkml.org/lkml/2006/11/1/155
> 
> ([PATCH -mm] swsusp: Freeze filesystems during suspend)
> 
> "This is needed by swsusp, because some filesystems (eg. XFS) use
> work queues and worker_threads run with PF_NOFREEZE set, so they can
> cause some writes to be performed after the suspend image has been
> created which may corrupt the filesystem.  The additional benefit of
> it is that if the resume fails, the filesystems will be in a
> consistent state and there won't be any journal replays needed."
> 
> --
> 
> And the patch essentially does:
> 
> -			sys_sync();
> +			freeze_filesystems();

Well, if you still think the patch does the right thing, I can rebase it
on top of the current freezer code and resubmit.

> But, Pavel didn't like freezing filesystems to quiesce them
> correctly, so the sys_sync() and all it's problems have remained
> until this day, where we still have users tripping over the same
> "filesystem not idle" problems.

The problem seems to be quite specific to XFS, though.

The Pavel's objection, if I remember it correctly, was that some
(or the majority of?) filesystems didn't implement the freezing operation,
so they would be more vulnerable to data loss in case of a failing hibernation
after this change.  However, that's better than actively causing pain to XFS
users.

> 
> [....]
> 
> > > IOWs, what hibernate does is:
> > > 
> > > 	freeze_processes()
> > > 	sys_sync()
> > > 	allocate a large amount of memory
> > > 
> > > Freezing the processes causes parts of filesystems to be put in the
> > > fridge, which means there is no guarantee that sys_sync() actually
> > > does what it is supposed to. As it is, sys_sync() really only
> > > guarantees file data is clean in memory - metadata does not need to
> > > be clean as long s it has been journalled and the journal is safe on
> > > disk.
> > >
> > > Further, allocating memory can cause memory reclaim to enter the
> > > filesystem and try to free memory held by the filesystem. In XFS (at
> > > least) this can cause the filesystem to issue tranactions and
> > > metadata IO to clean the dirty metadata to enable it to be
> > > reclaimed. So hibernate is effectively guaranteed to dirty the
> > > filesystem after it has frozen all the worker threads the filesystem
> > > might rely on.
> > > 
> > > Also, by this point kswapd has already been frozen, so hibernate is
> > > relying totally on direct memory reclaim to free up the memory it
> > > requires. I'm not sure that's a good idea.
> > > 
> > > IOWs, hibernate is still broken by design - and broken in exactly
> > > the way that was pointed out a couple of years ago by myself and
> > > others in the filesystem world: sys_sync() does not quiesce or
> > > guarantee a clean filesystem in memory after it completes.
> > > 
> > > There is a solution to this, and it already exists - it's called
> > > freezing the filesystem. Effectively hibernate needs to allocate
> > > memory before it freezes kernel/filesystem worker threads:
> > > 
> > > 	freeze_userspace_processes()
> > > 
> > > 	// just to clean the page cache quickly
> > > 	sys_sync()
> > > 
> > > 	// optionally to free page/inode/dentry caches:
> > > 		iterate_supers(drop_pagecache_sb, NULL);
> > > 		drop_slab()
> > > 
> > > 	allocate a large amount of memory
> > > 
> > > 	// Now quiesce the filesystems and clean remaining metadata
> > > 	iterate_supers(freeze_super, NULL);
> > > 
> > > 	freeze_remaining_processes()
> > > 
> > > This guarantees that filesystems are still working when memory
> > > reclaim comes along to free memory for the hibernate image, and that
> > > once it is allocated that filesystems will not be changed until
> > > thawed on the hibernate wakeup.
> > > 
> > > So, like I said a couple of years ago: fix hibernate to quiesce
> > > filesystems properly, and the hibernate will be much more reliable
> > > and robust and less likely to break randomly in the future.
> > 
> > Why don't you simply submit a patch to do that?
> 
> a) I don't know how to test suspend/hibernate
> b) I don't have any hardware I can test it on.
> c) I don't scale to solving every problem Linux has
> d) you guys need to decide how you're going to fix this because the
> problem has already been solved once before and it didn't get merged
> because nobody in the swsusp/hibernate world could agree on anything
> at the time.

OK, I'm not a filesystem expert in turn.

As I said, I can revive the Nigel's patch if you think it's better than
the code we have, but I'm afraid that's all I can do without any help from
the filesystems people.

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-07-27  9:35         ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-07-27  9:35 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph, Linux PM mailing list, xfs

On Wednesday, July 27, 2011, Dave Chinner wrote:
> On Tue, Jul 26, 2011 at 10:28:11PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, July 13, 2011, Dave Chinner wrote:
> > > On Tue, Jul 12, 2011 at 06:05:01PM +0200, Christoph wrote:
> .....
> > > > SysRq : Show Blocked State
> > > > 
> > > > pm-hibernate    D 0000000000000000     0  3638   3637 0x00000000
> > > >  ffff8800017bf918 0000000000000082 ffff8800017be010 ffff880000000000
> > > >  ffff8800017be010 ffff88000b8a6170 0000000000013900 ffff8800017bffd8
> > > >  ffff8800017bffd8 0000000000013900 ffffffff8148b020 ffff88000b8a6170
> > > > Call Trace:
> > > >  [<ffffffff81344ce2>] schedule_timeout+0x22/0xbb
> > > >  [<ffffffff81344b64>] wait_for_common+0xcb/0x148
> > > >  [<ffffffff810408ea>] ? try_to_wake_up+0x18c/0x18c
> > > >  [<ffffffff81345527>] ? down_write+0x2d/0x31
> > > >  [<ffffffff81344c7b>] wait_for_completion+0x18/0x1a
> > > >  [<ffffffffa02374da>] xfs_reclaim_inode+0x74/0x258 [xfs]
> > > >  [<ffffffffa0237853>] xfs_reclaim_inodes_ag+0x195/0x264 [xfs]
> > > >  [<ffffffffa0237974>] xfs_reclaim_inode_shrink+0x52/0x90 [xfs]
> > > >  [<ffffffff810c4e21>] shrink_slab+0xdb/0x151
> > > >  [<ffffffff810c625a>] do_try_to_free_pages+0x204/0x39a
> > > >  [<ffffffff8134ce4e>] ? apic_timer_interrupt+0xe/0x20
> > > >  [<ffffffff810c647f>] shrink_all_memory+0x8f/0xa8
> > > >  [<ffffffff810cc41a>] ? next_online_pgdat+0x20/0x41
> > > >  [<ffffffff8107937d>] hibernate_preallocate_memory+0x1c4/0x30f
> > > >  [<ffffffff811a8fa2>] ? kobject_put+0x47/0x4b
> > > >  [<ffffffff81077eb2>] hibernation_snapshot+0x45/0x281
> > > >  [<ffffffff810781bf>] hibernate+0xd1/0x1b8
> > > >  [<ffffffff81076c58>] state_store+0x57/0xce
> > > >  [<ffffffff811a8d0b>] kobj_attr_store+0x17/0x19
> > > >  [<ffffffff81152bda>] sysfs_write_file+0xfc/0x138
> > > >  [<ffffffff810fca74>] vfs_write+0xa9/0x105
> > > >  [<ffffffff810fcb89>] sys_write+0x45/0x6c
> > > >  [<ffffffff8134c492>] system_call_fastpath+0x16/0x1b
> > > 
> > > It's waiting for IO completion, and holding an AG scan lock.
> > > 
> > > And IO completion requires a workqueue to run. Just FYI, this
> > > process of inode reclaim can dirty the filesystem, long after
> > > hibernate have assumed that it is clean due to the sys_sync() call
> > > you do after freezing the processes. I pointed out this flaw in
> > > using sync to write dirty data prior to hibernate a couple of years
> > > ago.
> > 
> > However, attempts to remove the sys_sync() from the hibernate code
> > were objected to by some developers, since they believe it will increase
> > the probability of data loss in case of a failing hibernation in general. 
> 
> I'm not suggesting it gets removed, I'm suggesting it gets replaced
> because it doesn't give the guarantees that you want or need.
> 
> > > Anyway, it's a good thing that XFS doesn't use freezable work
> > > queues, otherwise it would hang on every hibernate. Perhaps I should
> > > do that to force hibernate to do things properly in filesystems
> > > land.
> > 
> > Well, I'd say it's a very well known fact that filesystems are not
> > handled in any special way during hibernation, which is not a good
> > thing.  Nevertheless, I've never seen anyone from the filesystems land
> > pay any kind of attention to this issue.
> 
> I beg to differ. We went through this exact clas of bugs with swsusp
> back in 2006:
> 
> https://lkml.org/lkml/2006/11/12/144

You're right, sorry.  We discussed this 5 years ago, but the context
was a bit different (the hibernation code was using a different
mechanism for freeing memory).

> And this patch:
> 
> https://lkml.org/lkml/2006/11/1/155
> 
> ([PATCH -mm] swsusp: Freeze filesystems during suspend)
> 
> "This is needed by swsusp, because some filesystems (eg. XFS) use
> work queues and worker_threads run with PF_NOFREEZE set, so they can
> cause some writes to be performed after the suspend image has been
> created which may corrupt the filesystem.  The additional benefit of
> it is that if the resume fails, the filesystems will be in a
> consistent state and there won't be any journal replays needed."
> 
> --
> 
> And the patch essentially does:
> 
> -			sys_sync();
> +			freeze_filesystems();

Well, if you still think the patch does the right thing, I can rebase it
on top of the current freezer code and resubmit.

> But, Pavel didn't like freezing filesystems to quiesce them
> correctly, so the sys_sync() and all it's problems have remained
> until this day, where we still have users tripping over the same
> "filesystem not idle" problems.

The problem seems to be quite specific to XFS, though.

The Pavel's objection, if I remember it correctly, was that some
(or the majority of?) filesystems didn't implement the freezing operation,
so they would be more vulnerable to data loss in case of a failing hibernation
after this change.  However, that's better than actively causing pain to XFS
users.

> 
> [....]
> 
> > > IOWs, what hibernate does is:
> > > 
> > > 	freeze_processes()
> > > 	sys_sync()
> > > 	allocate a large amount of memory
> > > 
> > > Freezing the processes causes parts of filesystems to be put in the
> > > fridge, which means there is no guarantee that sys_sync() actually
> > > does what it is supposed to. As it is, sys_sync() really only
> > > guarantees file data is clean in memory - metadata does not need to
> > > be clean as long s it has been journalled and the journal is safe on
> > > disk.
> > >
> > > Further, allocating memory can cause memory reclaim to enter the
> > > filesystem and try to free memory held by the filesystem. In XFS (at
> > > least) this can cause the filesystem to issue tranactions and
> > > metadata IO to clean the dirty metadata to enable it to be
> > > reclaimed. So hibernate is effectively guaranteed to dirty the
> > > filesystem after it has frozen all the worker threads the filesystem
> > > might rely on.
> > > 
> > > Also, by this point kswapd has already been frozen, so hibernate is
> > > relying totally on direct memory reclaim to free up the memory it
> > > requires. I'm not sure that's a good idea.
> > > 
> > > IOWs, hibernate is still broken by design - and broken in exactly
> > > the way that was pointed out a couple of years ago by myself and
> > > others in the filesystem world: sys_sync() does not quiesce or
> > > guarantee a clean filesystem in memory after it completes.
> > > 
> > > There is a solution to this, and it already exists - it's called
> > > freezing the filesystem. Effectively hibernate needs to allocate
> > > memory before it freezes kernel/filesystem worker threads:
> > > 
> > > 	freeze_userspace_processes()
> > > 
> > > 	// just to clean the page cache quickly
> > > 	sys_sync()
> > > 
> > > 	// optionally to free page/inode/dentry caches:
> > > 		iterate_supers(drop_pagecache_sb, NULL);
> > > 		drop_slab()
> > > 
> > > 	allocate a large amount of memory
> > > 
> > > 	// Now quiesce the filesystems and clean remaining metadata
> > > 	iterate_supers(freeze_super, NULL);
> > > 
> > > 	freeze_remaining_processes()
> > > 
> > > This guarantees that filesystems are still working when memory
> > > reclaim comes along to free memory for the hibernate image, and that
> > > once it is allocated that filesystems will not be changed until
> > > thawed on the hibernate wakeup.
> > > 
> > > So, like I said a couple of years ago: fix hibernate to quiesce
> > > filesystems properly, and the hibernate will be much more reliable
> > > and robust and less likely to break randomly in the future.
> > 
> > Why don't you simply submit a patch to do that?
> 
> a) I don't know how to test suspend/hibernate
> b) I don't have any hardware I can test it on.
> c) I don't scale to solving every problem Linux has
> d) you guys need to decide how you're going to fix this because the
> problem has already been solved once before and it didn't get merged
> because nobody in the swsusp/hibernate world could agree on anything
> at the time.

OK, I'm not a filesystem expert in turn.

As I said, I can revive the Nigel's patch if you think it's better than
the code we have, but I'm afraid that's all I can do without any help from
the filesystems people.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27  9:35         ` Rafael J. Wysocki
@ 2011-07-27 10:33           ` Christoph Hellwig
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2011-07-27 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, xfs, Nigel Cunningham, Pavel Machek, Linux PM mailing list

On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> The Pavel's objection, if I remember it correctly, was that some
> (or the majority of?) filesystems didn't implement the freezing operation,
> so they would be more vulnerable to data loss in case of a failing hibernation
> after this change.  However, that's better than actively causing pain to XFS
> users.

The objection never made sense and only means he never read the code.
freeze_super (or freeze_bdev back then) always does a sync_filesystem
before even checking if we have a freeze method, and sync_filesystem is
what we iterate over for each superblock in sync().

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-07-27 10:33           ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2011-07-27 10:33 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Dave Chinner, xfs, Linux PM mailing list

On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> The Pavel's objection, if I remember it correctly, was that some
> (or the majority of?) filesystems didn't implement the freezing operation,
> so they would be more vulnerable to data loss in case of a failing hibernation
> after this change.  However, that's better than actively causing pain to XFS
> users.

The objection never made sense and only means he never read the code.
freeze_super (or freeze_bdev back then) always does a sync_filesystem
before even checking if we have a freeze method, and sync_filesystem is
what we iterate over for each superblock in sync().

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27 10:33           ` Christoph Hellwig
@ 2011-07-27 12:22             ` Nigel Cunningham
  -1 siblings, 0 replies; 68+ messages in thread
From: Nigel Cunningham @ 2011-07-27 12:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph, xfs, Rafael J. Wysocki, Pavel Machek, Linux PM mailing list

Hi.

On 27/07/11 20:33, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
>> The Pavel's objection, if I remember it correctly, was that some
>> (or the majority of?) filesystems didn't implement the freezing operation,
>> so they would be more vulnerable to data loss in case of a failing hibernation
>> after this change.  However, that's better than actively causing pain to XFS
>> users.
> 
> The objection never made sense and only means he never read the code.
> freeze_super (or freeze_bdev back then) always does a sync_filesystem
> before even checking if we have a freeze method, and sync_filesystem is
> what we iterate over for each superblock in sync().

I've had freezing supers in TOI for a couple of years now and it has
only ever helped. To be honest, if you have a ton of dirty pages, it
does result in a big delay, but that's the worst of it.

Regards,

Nigel
-- 
Evolution (n): A hypothetical process whereby improbable
events occur with alarming frequency, order arises from chaos, and
no one is given credit.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-07-27 12:22             ` Nigel Cunningham
  0 siblings, 0 replies; 68+ messages in thread
From: Nigel Cunningham @ 2011-07-27 12:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christoph, Dave Chinner, xfs, Linux PM mailing list

Hi.

On 27/07/11 20:33, Christoph Hellwig wrote:
> On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
>> The Pavel's objection, if I remember it correctly, was that some
>> (or the majority of?) filesystems didn't implement the freezing operation,
>> so they would be more vulnerable to data loss in case of a failing hibernation
>> after this change.  However, that's better than actively causing pain to XFS
>> users.
> 
> The objection never made sense and only means he never read the code.
> freeze_super (or freeze_bdev back then) always does a sync_filesystem
> before even checking if we have a freeze method, and sync_filesystem is
> what we iterate over for each superblock in sync().

I've had freezing supers in TOI for a couple of years now and it has
only ever helped. To be honest, if you have a ton of dirty pages, it
does result in a big delay, but that's the worst of it.

Regards,

Nigel
-- 
Evolution (n): A hypothetical process whereby improbable
events occur with alarming frequency, order arises from chaos, and
no one is given credit.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-08-03 21:15               ` Rafael J. Wysocki
  (?)
  (?)
@ 2011-08-03 17:29               ` Pavel Machek
  2011-08-04  9:27                   ` Rafael J. Wysocki
  2011-08-04  9:27                 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
  -1 siblings, 2 replies; 68+ messages in thread
From: Pavel Machek @ 2011-08-03 17:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Nigel Cunningham, Christoph Hellwig, Dave Chinner, Christoph,
	Linux PM mailing list, xfs, LKML

Hi!

> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.

> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	lockdep_off();

Ouch. So... why do we need to silence this?

> +	/*
> +	 * Freeze in reverse order so filesystems dependant upon others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY) ||
> +		    (sb->s_flags & MS_FROZEN))
> +			continue;

Should we stop NFS from modifying remote server, too?

Plus... ext3 writes to read-only filesystems on mount; not sure if it
does it later. But RDONLY means 'user cant write to it' not 'bdev will
not be modified'. Should we freeze all?

How can 'already frozen' happen?

> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

...because we'll unfreeze it even if we did not freeze it...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-08-03 21:15               ` Rafael J. Wysocki
  (?)
@ 2011-08-03 17:29               ` Pavel Machek
  -1 siblings, 0 replies; 68+ messages in thread
From: Pavel Machek @ 2011-08-03 17:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Dave Chinner, LKML, xfs, Christoph Hellwig,
	Linux PM mailing list

Hi!

> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.

> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	lockdep_off();

Ouch. So... why do we need to silence this?

> +	/*
> +	 * Freeze in reverse order so filesystems dependant upon others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY) ||
> +		    (sb->s_flags & MS_FROZEN))
> +			continue;

Should we stop NFS from modifying remote server, too?

Plus... ext3 writes to read-only filesystems on mount; not sure if it
does it later. But RDONLY means 'user cant write to it' not 'bdev will
not be modified'. Should we freeze all?

How can 'already frozen' happen?

> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

...because we'll unfreeze it even if we did not freeze it...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-07-27 12:22             ` Nigel Cunningham
@ 2011-08-03 21:15               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-03 21:15 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Christoph Hellwig, Dave Chinner, Christoph,
	Linux PM mailing list, Pavel Machek, xfs, LKML

On Wednesday, July 27, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 27/07/11 20:33, Christoph Hellwig wrote:
> > On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> >> The Pavel's objection, if I remember it correctly, was that some
> >> (or the majority of?) filesystems didn't implement the freezing operation,
> >> so they would be more vulnerable to data loss in case of a failing hibernation
> >> after this change.  However, that's better than actively causing pain to XFS
> >> users.
> > 
> > The objection never made sense and only means he never read the code.
> > freeze_super (or freeze_bdev back then) always does a sync_filesystem
> > before even checking if we have a freeze method, and sync_filesystem is
> > what we iterate over for each superblock in sync().
> 
> I've had freezing supers in TOI for a couple of years now and it has
> only ever helped. To be honest, if you have a ton of dirty pages, it
> does result in a big delay, but that's the worst of it.

OK, so below is the revived patch.

To be precise, we don't call sys_sync() from the freezer an more
(evidently, I'd removed in myself, but later forgot about that), so
it only adds freeze_filesystems() and thaw_filesystems().

Comments welcome.

Thanks,
Rafael

---

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Not-really-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/block_dev.c         |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 ++++++
 kernel/power/process.c |    7 ++++++-
 3 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,49 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+	/*
+	 * Freeze in reverse order so filesystems dependant upon others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY) ||
+		    (sb->s_flags & MS_FROZEN))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
@ 2011-08-03 21:15               ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-03 21:15 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Christoph, LKML, xfs, Christoph Hellwig, Pavel Machek,
	Linux PM mailing list

On Wednesday, July 27, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 27/07/11 20:33, Christoph Hellwig wrote:
> > On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> >> The Pavel's objection, if I remember it correctly, was that some
> >> (or the majority of?) filesystems didn't implement the freezing operation,
> >> so they would be more vulnerable to data loss in case of a failing hibernation
> >> after this change.  However, that's better than actively causing pain to XFS
> >> users.
> > 
> > The objection never made sense and only means he never read the code.
> > freeze_super (or freeze_bdev back then) always does a sync_filesystem
> > before even checking if we have a freeze method, and sync_filesystem is
> > what we iterate over for each superblock in sync().
> 
> I've had freezing supers in TOI for a couple of years now and it has
> only ever helped. To be honest, if you have a ton of dirty pages, it
> does result in a big delay, but that's the worst of it.

OK, so below is the revived patch.

To be precise, we don't call sys_sync() from the freezer an more
(evidently, I'd removed in myself, but later forgot about that), so
it only adds freeze_filesystems() and thaw_filesystems().

Comments welcome.

Thanks,
Rafael

---

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Not-really-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/block_dev.c         |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 ++++++
 kernel/power/process.c |    7 ++++++-
 3 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,49 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+	/*
+	 * Freeze in reverse order so filesystems dependant upon others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY) ||
+		    (sb->s_flags & MS_FROZEN))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-07-27 12:22             ` Nigel Cunningham
  (?)
  (?)
@ 2011-08-03 21:15             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-03 21:15 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Christoph, Dave Chinner, LKML, xfs, Christoph Hellwig,
	Linux PM mailing list

On Wednesday, July 27, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 27/07/11 20:33, Christoph Hellwig wrote:
> > On Wed, Jul 27, 2011 at 11:35:13AM +0200, Rafael J. Wysocki wrote:
> >> The Pavel's objection, if I remember it correctly, was that some
> >> (or the majority of?) filesystems didn't implement the freezing operation,
> >> so they would be more vulnerable to data loss in case of a failing hibernation
> >> after this change.  However, that's better than actively causing pain to XFS
> >> users.
> > 
> > The objection never made sense and only means he never read the code.
> > freeze_super (or freeze_bdev back then) always does a sync_filesystem
> > before even checking if we have a freeze method, and sync_filesystem is
> > what we iterate over for each superblock in sync().
> 
> I've had freezing supers in TOI for a couple of years now and it has
> only ever helped. To be honest, if you have a ton of dirty pages, it
> does result in a big delay, but that's the worst of it.

OK, so below is the revived patch.

To be precise, we don't call sys_sync() from the freezer an more
(evidently, I'd removed in myself, but later forgot about that), so
it only adds freeze_filesystems() and thaw_filesystems().

Comments welcome.

Thanks,
Rafael

---

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Not-really-signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/block_dev.c         |   43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 ++++++
 kernel/power/process.c |    7 ++++++-
 3 files changed, 55 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,49 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+	/*
+	 * Freeze in reverse order so filesystems dependant upon others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY) ||
+		    (sb->s_flags & MS_FROZEN))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-08-03 17:29               ` Pavel Machek
@ 2011-08-04  9:27                   ` Rafael J. Wysocki
  2011-08-04  9:27                 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
  1 sibling, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04  9:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nigel Cunningham, Christoph Hellwig, Dave Chinner, Christoph,
	Linux PM mailing list, xfs, LKML

On Wednesday, August 03, 2011, Pavel Machek wrote:
> Hi!
> 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> 
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	lockdep_off();
> 
> Ouch. So... why do we need to silence this?

So that it doesn't complain? :-)

I'll need some time to get the exact details here.

> > +	/*
> > +	 * Freeze in reverse order so filesystems dependant upon others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY) ||
> > +		    (sb->s_flags & MS_FROZEN))
> > +			continue;
> 
> Should we stop NFS from modifying remote server, too?

What do you mean exactly?

> Plus... ext3 writes to read-only filesystems on mount; not sure if it
> does it later. But RDONLY means 'user cant write to it' not 'bdev will
> not be modified'. Should we freeze all?
> 
> How can 'already frozen' happen?
> 
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> ...because we'll unfreeze it even if we did not freeze it...

So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
@ 2011-08-04  9:27                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04  9:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph, LKML, xfs, Christoph Hellwig, Nigel Cunningham,
	Linux PM mailing list

On Wednesday, August 03, 2011, Pavel Machek wrote:
> Hi!
> 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> 
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	lockdep_off();
> 
> Ouch. So... why do we need to silence this?

So that it doesn't complain? :-)

I'll need some time to get the exact details here.

> > +	/*
> > +	 * Freeze in reverse order so filesystems dependant upon others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY) ||
> > +		    (sb->s_flags & MS_FROZEN))
> > +			continue;
> 
> Should we stop NFS from modifying remote server, too?

What do you mean exactly?

> Plus... ext3 writes to read-only filesystems on mount; not sure if it
> does it later. But RDONLY means 'user cant write to it' not 'bdev will
> not be modified'. Should we freeze all?
> 
> How can 'already frozen' happen?
> 
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> ...because we'll unfreeze it even if we did not freeze it...

So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-08-03 17:29               ` Pavel Machek
  2011-08-04  9:27                   ` Rafael J. Wysocki
@ 2011-08-04  9:27                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04  9:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph, Dave Chinner, LKML, xfs, Christoph Hellwig,
	Linux PM mailing list

On Wednesday, August 03, 2011, Pavel Machek wrote:
> Hi!
> 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> 
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	lockdep_off();
> 
> Ouch. So... why do we need to silence this?

So that it doesn't complain? :-)

I'll need some time to get the exact details here.

> > +	/*
> > +	 * Freeze in reverse order so filesystems dependant upon others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY) ||
> > +		    (sb->s_flags & MS_FROZEN))
> > +			continue;
> 
> Should we stop NFS from modifying remote server, too?

What do you mean exactly?

> Plus... ext3 writes to read-only filesystems on mount; not sure if it
> does it later. But RDONLY means 'user cant write to it' not 'bdev will
> not be modified'. Should we freeze all?
> 
> How can 'already frozen' happen?
> 
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> ...because we'll unfreeze it even if we did not freeze it...

So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-08-04  9:27                   ` Rafael J. Wysocki
@ 2011-08-04 22:25                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04 22:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Nigel Cunningham, Christoph Hellwig, Dave Chinner, Christoph,
	Linux PM mailing list, xfs, LKML, linux-ext4, Theodore Ts'o

On Thursday, August 04, 2011, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2011, Pavel Machek wrote:
> > Hi!
> > 
> > > Freeze all filesystems during the freezing of tasks by calling
> > > freeze_bdev() for each of them and thaw them during the thawing
> > > of tasks with the help of thaw_bdev().
> > > 
> > > This is needed by hibernation, because some filesystems (e.g. XFS)
> > > deadlock with the preallocation of memory used by it if the memory
> > > pressure caused by it is too heavy.
> > > 
> > > The additional benefit of this change is that, if something goes
> > > wrong after filesystems have been frozen, they will stay in a
> > > consistent state and journal replays won't be necessary (e.g. after
> > > a failing suspend or resume).  In particular, this should help to
> > > solve a long-standing issue that in some cases during resume from
> > > hibernation the boot loader causes the journal to be replied for the
> > > filesystem containing the kernel image and initrd causing it to
> > > become inconsistent with the information stored in the hibernation
> > > image.
> > 
> > > +/**
> > > + * freeze_filesystems - Force all filesystems into a consistent state.
> > > + */
> > > +void freeze_filesystems(void)
> > > +{
> > > +	struct super_block *sb;
> > > +
> > > +	lockdep_off();
> > 
> > Ouch. So... why do we need to silence this?
> 
> So that it doesn't complain? :-)
> 
> I'll need some time to get the exact details here.

So, this is because ext3_freeze() that doesn't call
journal_unlock_updates() on success, which quite frankly looks like
a bug in ext3 to me.  At least that's different from what ext4 does
in exactly the same situation (which looks correct).

If ext3_freeze() called journal_unlock_updates() on success too and
the call to journal_unlock_updates() is removed from ext3_unfreeze(),
we wouldn't need that lockdep_off()/lockdep_on() around the loop.

I need someone with ext3/ext4 knowledge to comment here, though.

Moreover, I'm not sure if other filesystems don't do such things.

Anyway, this is just a false-positive, even with the ext3 code as is.

> > > +	/*
> > > +	 * Freeze in reverse order so filesystems dependant upon others are
> > > +	 * frozen in the right order (eg. loopback on ext3).
> > > +	 */
> > > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > > +		if (!sb->s_root || !sb->s_bdev ||
> > > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > +		    (sb->s_flags & MS_RDONLY) ||
> > > +		    (sb->s_flags & MS_FROZEN))
> > > +			continue;
> > 
> > Should we stop NFS from modifying remote server, too?
> 
> What do you mean exactly?
> 
> > Plus... ext3 writes to read-only filesystems on mount; not sure if it
> > does it later. But RDONLY means 'user cant write to it' not 'bdev will
> > not be modified'. Should we freeze all?
> > 
> > How can 'already frozen' happen?
> > 
> > > +	list_for_each_entry(sb, &super_blocks, s_list)
> > > +		if (sb->s_flags & MS_FROZEN) {
> > > +			sb->s_flags &= ~MS_FROZEN;
> > > +			thaw_bdev(sb->s_bdev, sb);
> > > +		}
> > 
> > ...because we'll unfreeze it even if we did not freeze it...
> 
> So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
@ 2011-08-04 22:25                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04 22:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Linux PM mailing list, linux-ext4

On Thursday, August 04, 2011, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2011, Pavel Machek wrote:
> > Hi!
> > 
> > > Freeze all filesystems during the freezing of tasks by calling
> > > freeze_bdev() for each of them and thaw them during the thawing
> > > of tasks with the help of thaw_bdev().
> > > 
> > > This is needed by hibernation, because some filesystems (e.g. XFS)
> > > deadlock with the preallocation of memory used by it if the memory
> > > pressure caused by it is too heavy.
> > > 
> > > The additional benefit of this change is that, if something goes
> > > wrong after filesystems have been frozen, they will stay in a
> > > consistent state and journal replays won't be necessary (e.g. after
> > > a failing suspend or resume).  In particular, this should help to
> > > solve a long-standing issue that in some cases during resume from
> > > hibernation the boot loader causes the journal to be replied for the
> > > filesystem containing the kernel image and initrd causing it to
> > > become inconsistent with the information stored in the hibernation
> > > image.
> > 
> > > +/**
> > > + * freeze_filesystems - Force all filesystems into a consistent state.
> > > + */
> > > +void freeze_filesystems(void)
> > > +{
> > > +	struct super_block *sb;
> > > +
> > > +	lockdep_off();
> > 
> > Ouch. So... why do we need to silence this?
> 
> So that it doesn't complain? :-)
> 
> I'll need some time to get the exact details here.

So, this is because ext3_freeze() that doesn't call
journal_unlock_updates() on success, which quite frankly looks like
a bug in ext3 to me.  At least that's different from what ext4 does
in exactly the same situation (which looks correct).

If ext3_freeze() called journal_unlock_updates() on success too and
the call to journal_unlock_updates() is removed from ext3_unfreeze(),
we wouldn't need that lockdep_off()/lockdep_on() around the loop.

I need someone with ext3/ext4 knowledge to comment here, though.

Moreover, I'm not sure if other filesystems don't do such things.

Anyway, this is just a false-positive, even with the ext3 code as is.

> > > +	/*
> > > +	 * Freeze in reverse order so filesystems dependant upon others are
> > > +	 * frozen in the right order (eg. loopback on ext3).
> > > +	 */
> > > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > > +		if (!sb->s_root || !sb->s_bdev ||
> > > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > +		    (sb->s_flags & MS_RDONLY) ||
> > > +		    (sb->s_flags & MS_FROZEN))
> > > +			continue;
> > 
> > Should we stop NFS from modifying remote server, too?
> 
> What do you mean exactly?
> 
> > Plus... ext3 writes to read-only filesystems on mount; not sure if it
> > does it later. But RDONLY means 'user cant write to it' not 'bdev will
> > not be modified'. Should we freeze all?
> > 
> > How can 'already frozen' happen?
> > 
> > > +	list_for_each_entry(sb, &super_blocks, s_list)
> > > +		if (sb->s_flags & MS_FROZEN) {
> > > +			sb->s_flags &= ~MS_FROZEN;
> > > +			thaw_bdev(sb->s_bdev, sb);
> > > +		}
> > 
> > ...because we'll unfreeze it even if we did not freeze it...
> 
> So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag)
  2011-08-04  9:27                   ` Rafael J. Wysocki
  (?)
@ 2011-08-04 22:25                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-04 22:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Christoph, Theodore Ts'o, Dave Chinner, LKML, xfs,
	Christoph Hellwig, Linux PM mailing list, linux-ext4

On Thursday, August 04, 2011, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2011, Pavel Machek wrote:
> > Hi!
> > 
> > > Freeze all filesystems during the freezing of tasks by calling
> > > freeze_bdev() for each of them and thaw them during the thawing
> > > of tasks with the help of thaw_bdev().
> > > 
> > > This is needed by hibernation, because some filesystems (e.g. XFS)
> > > deadlock with the preallocation of memory used by it if the memory
> > > pressure caused by it is too heavy.
> > > 
> > > The additional benefit of this change is that, if something goes
> > > wrong after filesystems have been frozen, they will stay in a
> > > consistent state and journal replays won't be necessary (e.g. after
> > > a failing suspend or resume).  In particular, this should help to
> > > solve a long-standing issue that in some cases during resume from
> > > hibernation the boot loader causes the journal to be replied for the
> > > filesystem containing the kernel image and initrd causing it to
> > > become inconsistent with the information stored in the hibernation
> > > image.
> > 
> > > +/**
> > > + * freeze_filesystems - Force all filesystems into a consistent state.
> > > + */
> > > +void freeze_filesystems(void)
> > > +{
> > > +	struct super_block *sb;
> > > +
> > > +	lockdep_off();
> > 
> > Ouch. So... why do we need to silence this?
> 
> So that it doesn't complain? :-)
> 
> I'll need some time to get the exact details here.

So, this is because ext3_freeze() that doesn't call
journal_unlock_updates() on success, which quite frankly looks like
a bug in ext3 to me.  At least that's different from what ext4 does
in exactly the same situation (which looks correct).

If ext3_freeze() called journal_unlock_updates() on success too and
the call to journal_unlock_updates() is removed from ext3_unfreeze(),
we wouldn't need that lockdep_off()/lockdep_on() around the loop.

I need someone with ext3/ext4 knowledge to comment here, though.

Moreover, I'm not sure if other filesystems don't do such things.

Anyway, this is just a false-positive, even with the ext3 code as is.

> > > +	/*
> > > +	 * Freeze in reverse order so filesystems dependant upon others are
> > > +	 * frozen in the right order (eg. loopback on ext3).
> > > +	 */
> > > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > > +		if (!sb->s_root || !sb->s_bdev ||
> > > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > > +		    (sb->s_flags & MS_RDONLY) ||
> > > +		    (sb->s_flags & MS_FROZEN))
> > > +			continue;
> > 
> > Should we stop NFS from modifying remote server, too?
> 
> What do you mean exactly?
> 
> > Plus... ext3 writes to read-only filesystems on mount; not sure if it
> > does it later. But RDONLY means 'user cant write to it' not 'bdev will
> > not be modified'. Should we freeze all?
> > 
> > How can 'already frozen' happen?
> > 
> > > +	list_for_each_entry(sb, &super_blocks, s_list)
> > > +		if (sb->s_flags & MS_FROZEN) {
> > > +			sb->s_flags &= ~MS_FROZEN;
> > > +			thaw_bdev(sb->s_bdev, sb);
> > > +		}
> > 
> > ...because we'll unfreeze it even if we did not freeze it...
> 
> So we need not check MS_FROZEN in freeze_filesystems().  OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-04 22:25                     ` Rafael J. Wysocki
@ 2011-08-06 21:17                       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-06 21:17 UTC (permalink / raw)
  To: Linux PM mailing list
  Cc: Pavel Machek, Nigel Cunningham, Christoph Hellwig, Dave Chinner,
	Christoph, xfs, LKML, linux-ext4, Theodore Ts'o,
	linux-fsdevel

From: Rafael J. Wysocki <rjw@sisk.pl>

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

OK, so nobody except for Pavel appears to have any comments, so I assume
that everyone except for Pavel is fine with the approach, interestingly enough.

I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
and added comments explaining why lockdep_off/on() are used.

Thanks,
Rafael

---
 fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 +++++
 kernel/power/process.c |    7 +++++-
 3 files changed, 68 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,62 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary, because some filesystems (e.g. ext3) lock
+	 * mutexes in their .freeze_fs() callbacks and leave them locked for
+	 * their .unfreeze_fs() callbacks to unlock.  This is done under
+	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
+	 * lockdep think something may be wrong when freeze_bdev() attempts
+	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
+	 */
+	lockdep_off();
+
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary for the same reason as in freeze_filesystems()
+	 * above.
+	 */
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-08-06 21:17                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-06 21:17 UTC (permalink / raw)
  To: Linux PM mailing list
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel, linux-ext4

From: Rafael J. Wysocki <rjw@sisk.pl>

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

OK, so nobody except for Pavel appears to have any comments, so I assume
that everyone except for Pavel is fine with the approach, interestingly enough.

I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
and added comments explaining why lockdep_off/on() are used.

Thanks,
Rafael

---
 fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 +++++
 kernel/power/process.c |    7 +++++-
 3 files changed, 68 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,62 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary, because some filesystems (e.g. ext3) lock
+	 * mutexes in their .freeze_fs() callbacks and leave them locked for
+	 * their .unfreeze_fs() callbacks to unlock.  This is done under
+	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
+	 * lockdep think something may be wrong when freeze_bdev() attempts
+	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
+	 */
+	lockdep_off();
+
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary for the same reason as in freeze_filesystems()
+	 * above.
+	 */
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-04 22:25                     ` Rafael J. Wysocki
  (?)
  (?)
@ 2011-08-06 21:17                     ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-06 21:17 UTC (permalink / raw)
  To: Linux PM mailing list
  Cc: Christoph, Theodore Ts'o, Dave Chinner, LKML, xfs,
	Christoph Hellwig, linux-fsdevel, linux-ext4

From: Rafael J. Wysocki <rjw@sisk.pl>

Freeze all filesystems during the freezing of tasks by calling
freeze_bdev() for each of them and thaw them during the thawing
of tasks with the help of thaw_bdev().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---

OK, so nobody except for Pavel appears to have any comments, so I assume
that everyone except for Pavel is fine with the approach, interestingly enough.

I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
and added comments explaining why lockdep_off/on() are used.

Thanks,
Rafael

---
 fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    6 +++++
 kernel/power/process.c |    7 +++++-
 3 files changed, 68 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* bdev has been frozen */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
 extern void emergency_thaw_all(void);
 extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
 extern int fsync_bdev(struct block_device *);
+extern void freeze_filesystems(void);
+extern void thaw_filesystems(void);
 #else
 static inline void bd_forget(struct inode *inode) {}
 static inline int sync_blockdev(struct block_device *bdev) { return 0; }
@@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
 {
 	return 0;
 }
+
+static inline void freeze_filesystems(void) {}
+static inline void thaw_filesystems(void) {}
 #endif
 extern int sync_filesystem(struct super_block *);
 extern const struct file_operations def_blk_fops;
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -314,6 +314,62 @@ out:
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - Force all filesystems into a consistent state.
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary, because some filesystems (e.g. ext3) lock
+	 * mutexes in their .freeze_fs() callbacks and leave them locked for
+	 * their .unfreeze_fs() callbacks to unlock.  This is done under
+	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
+	 * lockdep think something may be wrong when freeze_bdev() attempts
+	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
+	 */
+	lockdep_off();
+
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+
+	lockdep_on();
+}
+
+/**
+ * thaw_filesystems - Make all filesystems active again.
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * This is necessary for the same reason as in freeze_filesystems()
+	 * above.
+	 */
+	lockdep_off();
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+
+	lockdep_on();
+}
+
 static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
 {
 	return block_write_full_page(page, blkdev_get_block, wbc);
Index: linux-2.6/kernel/power/process.c
===================================================================
--- linux-2.6.orig/kernel/power/process.c
+++ linux-2.6/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,10 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	pr_info("Freezing filesystems ... ");
+	freeze_filesystems();
+	pr_info("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +192,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_filesystems();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-06 21:17                       ` Rafael J. Wysocki
@ 2011-08-07  0:14                         ` Dave Chinner
  -1 siblings, 0 replies; 68+ messages in thread
From: Dave Chinner @ 2011-08-07  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM mailing list, Pavel Machek, Nigel Cunningham,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.
> 
> This change is based on earlier work by Nigel Cunningham.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> OK, so nobody except for Pavel appears to have any comments, so I assume
> that everyone except for Pavel is fine with the approach, interestingly enough.
> 
> I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> and added comments explaining why lockdep_off/on() are used.
> 
> Thanks,
> Rafael
> 
> ---
>  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     |    6 +++++
>  kernel/power/process.c |    7 +++++-
>  3 files changed, 68 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -211,6 +211,7 @@ struct inodes_stat_t {
>  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
>  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
>  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
>  #define MS_NOSEC	(1<<28)
>  #define MS_BORN		(1<<29)
>  #define MS_ACTIVE	(1<<30)
> @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
> +extern void freeze_filesystems(void);
> +extern void thaw_filesystems(void);
>  #else
>  static inline void bd_forget(struct inode *inode) {}
>  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
>  {
>  	return 0;
>  }
> +
> +static inline void freeze_filesystems(void) {}
> +static inline void thaw_filesystems(void) {}
>  #endif
>  extern int sync_filesystem(struct super_block *);
>  extern const struct file_operations def_blk_fops;
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c
> +++ linux-2.6/fs/block_dev.c
> @@ -314,6 +314,62 @@ out:
>  }
>  EXPORT_SYMBOL(thaw_bdev);
>  
> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary, because some filesystems (e.g. ext3) lock
> +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> +	 * lockdep think something may be wrong when freeze_bdev() attempts
> +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> +	 */
> +	lockdep_off();

I thought those problems were fixed. If they aren't, then they most
certainly need to be because holding mutexes over system calls is a
bug.

Well, well:

[252182.603134] ================================================
[252182.604832] [ BUG: lock held when returning to user space! ]
[252182.606086] ------------------------------------------------
[252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
[252182.608905] 1 lock held by xfs_io/4917:
[252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100

<sigh>

Looks like the problem was fixed for ext4, but not ext3.  Please
report this to the ext3/4 list and get it fixed, don't work around
it here.

> +	/*
> +	 * Freeze in reverse order so filesystems depending on others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		freeze_bdev(sb->s_bdev);
> +		sb->s_flags |= MS_FROZEN;
> +	}

AFAIK, that won't work for btrfs - you have to call freeze_super()
directly for btrfs because it has a special relationship with
sb->s_bdev. And besides, all freeze_bdev does is get an active
reference on the superblock and call freeze_super().

Also, that's traversing the list of superblock with locking and
dereferencing the superblock without properly checking that the
superblock is not being torn down. You should probably use
iterate_supers (or at least copy the code), with a function that
drops the s_umount read lock befor calling freeze_super() and then
picks it back up afterwards.

> +
> +	lockdep_on();
> +}
> +
> +/**
> + * thaw_filesystems - Make all filesystems active again.
> + */
> +void thaw_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary for the same reason as in freeze_filesystems()
> +	 * above.
> +	 */
> +	lockdep_off();
> +
> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

And once again, iterate_supers() is what you want here. And you
should only call thaw_bdev() as it needs to do checks other than
checking MS_FROZEN e.g. the above will unfreeze filesystems that
were already frozen at the time a suspend occurs, and that could
lead to corruption depending on why the filesystem was frozen...

Also, you still need to check for a valid sb->s_bdev here, otherwise
<splat>.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-08-07  0:14                         ` Dave Chinner
  0 siblings, 0 replies; 68+ messages in thread
From: Dave Chinner @ 2011-08-07  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.
> 
> This change is based on earlier work by Nigel Cunningham.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> OK, so nobody except for Pavel appears to have any comments, so I assume
> that everyone except for Pavel is fine with the approach, interestingly enough.
> 
> I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> and added comments explaining why lockdep_off/on() are used.
> 
> Thanks,
> Rafael
> 
> ---
>  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     |    6 +++++
>  kernel/power/process.c |    7 +++++-
>  3 files changed, 68 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -211,6 +211,7 @@ struct inodes_stat_t {
>  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
>  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
>  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
>  #define MS_NOSEC	(1<<28)
>  #define MS_BORN		(1<<29)
>  #define MS_ACTIVE	(1<<30)
> @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
> +extern void freeze_filesystems(void);
> +extern void thaw_filesystems(void);
>  #else
>  static inline void bd_forget(struct inode *inode) {}
>  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
>  {
>  	return 0;
>  }
> +
> +static inline void freeze_filesystems(void) {}
> +static inline void thaw_filesystems(void) {}
>  #endif
>  extern int sync_filesystem(struct super_block *);
>  extern const struct file_operations def_blk_fops;
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c
> +++ linux-2.6/fs/block_dev.c
> @@ -314,6 +314,62 @@ out:
>  }
>  EXPORT_SYMBOL(thaw_bdev);
>  
> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary, because some filesystems (e.g. ext3) lock
> +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> +	 * lockdep think something may be wrong when freeze_bdev() attempts
> +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> +	 */
> +	lockdep_off();

I thought those problems were fixed. If they aren't, then they most
certainly need to be because holding mutexes over system calls is a
bug.

Well, well:

[252182.603134] ================================================
[252182.604832] [ BUG: lock held when returning to user space! ]
[252182.606086] ------------------------------------------------
[252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
[252182.608905] 1 lock held by xfs_io/4917:
[252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100

<sigh>

Looks like the problem was fixed for ext4, but not ext3.  Please
report this to the ext3/4 list and get it fixed, don't work around
it here.

> +	/*
> +	 * Freeze in reverse order so filesystems depending on others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		freeze_bdev(sb->s_bdev);
> +		sb->s_flags |= MS_FROZEN;
> +	}

AFAIK, that won't work for btrfs - you have to call freeze_super()
directly for btrfs because it has a special relationship with
sb->s_bdev. And besides, all freeze_bdev does is get an active
reference on the superblock and call freeze_super().

Also, that's traversing the list of superblock with locking and
dereferencing the superblock without properly checking that the
superblock is not being torn down. You should probably use
iterate_supers (or at least copy the code), with a function that
drops the s_umount read lock befor calling freeze_super() and then
picks it back up afterwards.

> +
> +	lockdep_on();
> +}
> +
> +/**
> + * thaw_filesystems - Make all filesystems active again.
> + */
> +void thaw_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary for the same reason as in freeze_filesystems()
> +	 * above.
> +	 */
> +	lockdep_off();
> +
> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

And once again, iterate_supers() is what you want here. And you
should only call thaw_bdev() as it needs to do checks other than
checking MS_FROZEN e.g. the above will unfreeze filesystems that
were already frozen at the time a suspend occurs, and that could
lead to corruption depending on why the filesystem was frozen...

Also, you still need to check for a valid sb->s_bdev here, otherwise
<splat>.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-06 21:17                       ` Rafael J. Wysocki
  (?)
@ 2011-08-07  0:14                       ` Dave Chinner
  -1 siblings, 0 replies; 68+ messages in thread
From: Dave Chinner @ 2011-08-07  0:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	linux-fsdevel, Linux PM mailing list, linux-ext4

On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.
> 
> This change is based on earlier work by Nigel Cunningham.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> 
> OK, so nobody except for Pavel appears to have any comments, so I assume
> that everyone except for Pavel is fine with the approach, interestingly enough.
> 
> I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> and added comments explaining why lockdep_off/on() are used.
> 
> Thanks,
> Rafael
> 
> ---
>  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     |    6 +++++
>  kernel/power/process.c |    7 +++++-
>  3 files changed, 68 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -211,6 +211,7 @@ struct inodes_stat_t {
>  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
>  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
>  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
>  #define MS_NOSEC	(1<<28)
>  #define MS_BORN		(1<<29)
>  #define MS_ACTIVE	(1<<30)
> @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
> +extern void freeze_filesystems(void);
> +extern void thaw_filesystems(void);
>  #else
>  static inline void bd_forget(struct inode *inode) {}
>  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
>  {
>  	return 0;
>  }
> +
> +static inline void freeze_filesystems(void) {}
> +static inline void thaw_filesystems(void) {}
>  #endif
>  extern int sync_filesystem(struct super_block *);
>  extern const struct file_operations def_blk_fops;
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c
> +++ linux-2.6/fs/block_dev.c
> @@ -314,6 +314,62 @@ out:
>  }
>  EXPORT_SYMBOL(thaw_bdev);
>  
> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary, because some filesystems (e.g. ext3) lock
> +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> +	 * lockdep think something may be wrong when freeze_bdev() attempts
> +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> +	 */
> +	lockdep_off();

I thought those problems were fixed. If they aren't, then they most
certainly need to be because holding mutexes over system calls is a
bug.

Well, well:

[252182.603134] ================================================
[252182.604832] [ BUG: lock held when returning to user space! ]
[252182.606086] ------------------------------------------------
[252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
[252182.608905] 1 lock held by xfs_io/4917:
[252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100

<sigh>

Looks like the problem was fixed for ext4, but not ext3.  Please
report this to the ext3/4 list and get it fixed, don't work around
it here.

> +	/*
> +	 * Freeze in reverse order so filesystems depending on others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		freeze_bdev(sb->s_bdev);
> +		sb->s_flags |= MS_FROZEN;
> +	}

AFAIK, that won't work for btrfs - you have to call freeze_super()
directly for btrfs because it has a special relationship with
sb->s_bdev. And besides, all freeze_bdev does is get an active
reference on the superblock and call freeze_super().

Also, that's traversing the list of superblock with locking and
dereferencing the superblock without properly checking that the
superblock is not being torn down. You should probably use
iterate_supers (or at least copy the code), with a function that
drops the s_umount read lock befor calling freeze_super() and then
picks it back up afterwards.

> +
> +	lockdep_on();
> +}
> +
> +/**
> + * thaw_filesystems - Make all filesystems active again.
> + */
> +void thaw_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary for the same reason as in freeze_filesystems()
> +	 * above.
> +	 */
> +	lockdep_off();
> +
> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

And once again, iterate_supers() is what you want here. And you
should only call thaw_bdev() as it needs to do checks other than
checking MS_FROZEN e.g. the above will unfreeze filesystems that
were already frozen at the time a suspend occurs, and that could
lead to corruption depending on why the filesystem was frozen...

Also, you still need to check for a valid sb->s_bdev here, otherwise
<splat>.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-07  0:14                         ` Dave Chinner
@ 2011-08-08 21:11                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-08 21:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linux PM mailing list, Pavel Machek, Nigel Cunningham,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> > 
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> > 
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> >  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h     |    6 +++++
> >  kernel/power/process.c |    7 +++++-
> >  3 files changed, 68 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> >  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
> >  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
> >  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
> >  #define MS_NOSEC	(1<<28)
> >  #define MS_BORN		(1<<29)
> >  #define MS_ACTIVE	(1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> >  extern void emergency_thaw_all(void);
> >  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> >  extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> >  #else
> >  static inline void bd_forget(struct inode *inode) {}
> >  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> >  #endif
> >  extern int sync_filesystem(struct super_block *);
> >  extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> >  }
> >  EXPORT_SYMBOL(thaw_bdev);
> >  
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary, because some filesystems (e.g. ext3) lock
> > +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> > +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> > +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > +	 * lockdep think something may be wrong when freeze_bdev() attempts
> > +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > +	 */
> > +	lockdep_off();
> 
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
> 
> Well, well:
> 
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
> 
> <sigh>
> 
> Looks like the problem was fixed for ext4, but not ext3.  Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.

OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)

> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().

OK, so do you mean I should call freeze_super() rather than freeze_bdev()?

> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

Hmm, I'll try that, but I doubt I'll get it right first time. :-)

> > +
> > +	lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary for the same reason as in freeze_filesystems()
> > +	 * above.
> > +	 */
> > +	lockdep_off();
> > +
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> And once again, iterate_supers() is what you want here.

OK

> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN

Hmm, I'm not really sure what you mean?

> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
> 
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.

I see.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-08-08 21:11                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-08 21:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> > 
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> > 
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> >  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h     |    6 +++++
> >  kernel/power/process.c |    7 +++++-
> >  3 files changed, 68 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> >  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
> >  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
> >  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
> >  #define MS_NOSEC	(1<<28)
> >  #define MS_BORN		(1<<29)
> >  #define MS_ACTIVE	(1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> >  extern void emergency_thaw_all(void);
> >  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> >  extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> >  #else
> >  static inline void bd_forget(struct inode *inode) {}
> >  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> >  #endif
> >  extern int sync_filesystem(struct super_block *);
> >  extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> >  }
> >  EXPORT_SYMBOL(thaw_bdev);
> >  
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary, because some filesystems (e.g. ext3) lock
> > +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> > +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> > +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > +	 * lockdep think something may be wrong when freeze_bdev() attempts
> > +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > +	 */
> > +	lockdep_off();
> 
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
> 
> Well, well:
> 
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
> 
> <sigh>
> 
> Looks like the problem was fixed for ext4, but not ext3.  Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.

OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)

> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().

OK, so do you mean I should call freeze_super() rather than freeze_bdev()?

> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

Hmm, I'll try that, but I doubt I'll get it right first time. :-)

> > +
> > +	lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary for the same reason as in freeze_filesystems()
> > +	 * above.
> > +	 */
> > +	lockdep_off();
> > +
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> And once again, iterate_supers() is what you want here.

OK

> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN

Hmm, I'm not really sure what you mean?

> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
> 
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.

I see.

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-07  0:14                         ` Dave Chinner
  (?)
  (?)
@ 2011-08-08 21:11                         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-08 21:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	linux-fsdevel, Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> > 
> > OK, so nobody except for Pavel appears to have any comments, so I assume
> > that everyone except for Pavel is fine with the approach, interestingly enough.
> > 
> > I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> > and added comments explaining why lockdep_off/on() are used.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> >  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/fs.h     |    6 +++++
> >  kernel/power/process.c |    7 +++++-
> >  3 files changed, 68 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/include/linux/fs.h
> > ===================================================================
> > --- linux-2.6.orig/include/linux/fs.h
> > +++ linux-2.6/include/linux/fs.h
> > @@ -211,6 +211,7 @@ struct inodes_stat_t {
> >  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
> >  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
> >  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> > +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
> >  #define MS_NOSEC	(1<<28)
> >  #define MS_BORN		(1<<29)
> >  #define MS_ACTIVE	(1<<30)
> > @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
> >  extern void emergency_thaw_all(void);
> >  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
> >  extern int fsync_bdev(struct block_device *);
> > +extern void freeze_filesystems(void);
> > +extern void thaw_filesystems(void);
> >  #else
> >  static inline void bd_forget(struct inode *inode) {}
> >  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> > @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline void freeze_filesystems(void) {}
> > +static inline void thaw_filesystems(void) {}
> >  #endif
> >  extern int sync_filesystem(struct super_block *);
> >  extern const struct file_operations def_blk_fops;
> > Index: linux-2.6/fs/block_dev.c
> > ===================================================================
> > --- linux-2.6.orig/fs/block_dev.c
> > +++ linux-2.6/fs/block_dev.c
> > @@ -314,6 +314,62 @@ out:
> >  }
> >  EXPORT_SYMBOL(thaw_bdev);
> >  
> > +/**
> > + * freeze_filesystems - Force all filesystems into a consistent state.
> > + */
> > +void freeze_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary, because some filesystems (e.g. ext3) lock
> > +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> > +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> > +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> > +	 * lockdep think something may be wrong when freeze_bdev() attempts
> > +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> > +	 */
> > +	lockdep_off();
> 
> I thought those problems were fixed. If they aren't, then they most
> certainly need to be because holding mutexes over system calls is a
> bug.
> 
> Well, well:
> 
> [252182.603134] ================================================
> [252182.604832] [ BUG: lock held when returning to user space! ]
> [252182.606086] ------------------------------------------------
> [252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
> [252182.608905] 1 lock held by xfs_io/4917:
> [252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100
> 
> <sigh>
> 
> Looks like the problem was fixed for ext4, but not ext3.  Please
> report this to the ext3/4 list and get it fixed, don't work around
> it here.

OK, but I guess I'll have to post a patch to fix this myself so that
anyone notices. :-)

> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().

OK, so do you mean I should call freeze_super() rather than freeze_bdev()?

> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

Hmm, I'll try that, but I doubt I'll get it right first time. :-)

> > +
> > +	lockdep_on();
> > +}
> > +
> > +/**
> > + * thaw_filesystems - Make all filesystems active again.
> > + */
> > +void thaw_filesystems(void)
> > +{
> > +	struct super_block *sb;
> > +
> > +	/*
> > +	 * This is necessary for the same reason as in freeze_filesystems()
> > +	 * above.
> > +	 */
> > +	lockdep_off();
> > +
> > +	list_for_each_entry(sb, &super_blocks, s_list)
> > +		if (sb->s_flags & MS_FROZEN) {
> > +			sb->s_flags &= ~MS_FROZEN;
> > +			thaw_bdev(sb->s_bdev, sb);
> > +		}
> 
> And once again, iterate_supers() is what you want here.

OK

> And you should only call thaw_bdev() as it needs to do checks other
> than checking MS_FROZEN

Hmm, I'm not really sure what you mean?

> e.g. the above will unfreeze filesystems that
> were already frozen at the time a suspend occurs, and that could
> lead to corruption depending on why the filesystem was frozen...
> 
> Also, you still need to check for a valid sb->s_bdev here, otherwise
> <splat>.

I see.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-07-27  9:35         ` Rafael J. Wysocki
@ 2011-08-10 21:43           ` Pavel Machek
  -1 siblings, 0 replies; 68+ messages in thread
From: Pavel Machek @ 2011-08-10 21:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Linux PM mailing list, Nigel Cunningham, xfs

Hi!

> > > Why don't you simply submit a patch to do that?
> > 
> > a) I don't know how to test suspend/hibernate
> > b) I don't have any hardware I can test it on.

hibernation should work on any hardware. Just add cmdline parameter
resume=<your swap> and use echo disk > /sys/power/state.

suspend is supported on most PCs, and certainly on any notebook. Just
echo mem > /sys/power/state.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-08-10 21:43           ` Pavel Machek
  0 siblings, 0 replies; 68+ messages in thread
From: Pavel Machek @ 2011-08-10 21:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Christoph, Linux PM mailing list, Dave Chinner, xfs

Hi!

> > > Why don't you simply submit a patch to do that?
> > 
> > a) I don't know how to test suspend/hibernate
> > b) I don't have any hardware I can test it on.

hibernation should work on any hardware. Just add cmdline parameter
resume=<your swap> and use echo disk > /sys/power/state.

suspend is supported on most PCs, and certainly on any notebook. Just
echo mem > /sys/power/state.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-07  0:14                         ` Dave Chinner
@ 2011-08-14  0:16                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-14  0:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linux PM mailing list, Pavel Machek, Nigel Cunningham,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
...
> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
> 
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

So, what about the patch below?  It appears to work on my test boxes.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Freezer: Freeze filesystems while freezing processes (v3)

Freeze all filesystems during the freezing of tasks by calling
freeze_super() for all superblocks and thaw them during the thawing
of tasks with the help of thaw_super().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/super.c             |   70 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    3 ++
 kernel/power/process.c |    9 +++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* Frozen filesystem */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2497,6 +2498,8 @@ extern void drop_super(struct super_bloc
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
+extern int freeze_supers(void);
+extern void thaw_supers(void);
 
 extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,12 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	printk("Freezing filesystems ... ");
+	error = freeze_supers();
+	if (error)
+		goto Exit;
+	printk("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +194,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_supers();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -590,6 +590,76 @@ void iterate_supers_type(struct file_sys
 EXPORT_SYMBOL(iterate_supers_type);
 
 /**
+ *	freeze_supers - call freeze_super() for all superblocks
+ */
+int freeze_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_root && sb->s_frozen != SB_FREEZE_TRANS
+		    && !(sb->s_flags & MS_RDONLY)) {
+			error = freeze_super(sb);
+			if (!error)
+				sb->s_flags |= MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (error)
+			break;
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+
+	return error;
+}
+
+/**
+ *	thaw_supers - call thaw_super() for all superblocks
+ */
+void thaw_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_flags & MS_FROZEN) {
+			thaw_super(sb);
+			sb->s_flags &= ~MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+}
+
+
+/**
  *	get_super - get the superblock of a device
  *	@bdev: device to get the superblock for
  *	

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-08-14  0:16                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-14  0:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
...
> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
> 
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

So, what about the patch below?  It appears to work on my test boxes.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Freezer: Freeze filesystems while freezing processes (v3)

Freeze all filesystems during the freezing of tasks by calling
freeze_super() for all superblocks and thaw them during the thawing
of tasks with the help of thaw_super().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/super.c             |   70 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    3 ++
 kernel/power/process.c |    9 +++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* Frozen filesystem */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2497,6 +2498,8 @@ extern void drop_super(struct super_bloc
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
+extern int freeze_supers(void);
+extern void thaw_supers(void);
 
 extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,12 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	printk("Freezing filesystems ... ");
+	error = freeze_supers();
+	if (error)
+		goto Exit;
+	printk("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +194,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_supers();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -590,6 +590,76 @@ void iterate_supers_type(struct file_sys
 EXPORT_SYMBOL(iterate_supers_type);
 
 /**
+ *	freeze_supers - call freeze_super() for all superblocks
+ */
+int freeze_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_root && sb->s_frozen != SB_FREEZE_TRANS
+		    && !(sb->s_flags & MS_RDONLY)) {
+			error = freeze_super(sb);
+			if (!error)
+				sb->s_flags |= MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (error)
+			break;
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+
+	return error;
+}
+
+/**
+ *	thaw_supers - call thaw_super() for all superblocks
+ */
+void thaw_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_flags & MS_FROZEN) {
+			thaw_super(sb);
+			sb->s_flags &= ~MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+}
+
+
+/**
  *	get_super - get the superblock of a device
  *	@bdev: device to get the superblock for
  *	

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-07  0:14                         ` Dave Chinner
                                           ` (3 preceding siblings ...)
  (?)
@ 2011-08-14  0:16                         ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-14  0:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	linux-fsdevel, Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
...
> > +	/*
> > +	 * Freeze in reverse order so filesystems depending on others are
> > +	 * frozen in the right order (eg. loopback on ext3).
> > +	 */
> > +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> > +		if (!sb->s_root || !sb->s_bdev ||
> > +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> > +		    (sb->s_flags & MS_RDONLY))
> > +			continue;
> > +
> > +		freeze_bdev(sb->s_bdev);
> > +		sb->s_flags |= MS_FROZEN;
> > +	}
> 
> AFAIK, that won't work for btrfs - you have to call freeze_super()
> directly for btrfs because it has a special relationship with
> sb->s_bdev. And besides, all freeze_bdev does is get an active
> reference on the superblock and call freeze_super().
> 
> Also, that's traversing the list of superblock with locking and
> dereferencing the superblock without properly checking that the
> superblock is not being torn down. You should probably use
> iterate_supers (or at least copy the code), with a function that
> drops the s_umount read lock befor calling freeze_super() and then
> picks it back up afterwards.

So, what about the patch below?  It appears to work on my test boxes.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Freezer: Freeze filesystems while freezing processes (v3)

Freeze all filesystems during the freezing of tasks by calling
freeze_super() for all superblocks and thaw them during the thawing
of tasks with the help of thaw_super().

This is needed by hibernation, because some filesystems (e.g. XFS)
deadlock with the preallocation of memory used by it if the memory
pressure caused by it is too heavy.

The additional benefit of this change is that, if something goes
wrong after filesystems have been frozen, they will stay in a
consistent state and journal replays won't be necessary (e.g. after
a failing suspend or resume).  In particular, this should help to
solve a long-standing issue that in some cases during resume from
hibernation the boot loader causes the journal to be replied for the
filesystem containing the kernel image and initrd causing it to
become inconsistent with the information stored in the hibernation
image.

This change is based on earlier work by Nigel Cunningham.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/super.c             |   70 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h     |    3 ++
 kernel/power/process.c |    9 +++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

Index: linux/include/linux/fs.h
===================================================================
--- linux.orig/include/linux/fs.h
+++ linux/include/linux/fs.h
@@ -211,6 +211,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_FROZEN	(1<<25) /* Frozen filesystem */
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)
@@ -2497,6 +2498,8 @@ extern void drop_super(struct super_bloc
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
+extern int freeze_supers(void);
+extern void thaw_supers(void);
 
 extern int dcache_dir_open(struct inode *, struct file *);
 extern int dcache_dir_close(struct inode *, struct file *);
Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -12,10 +12,10 @@
 #include <linux/oom.h>
 #include <linux/suspend.h>
 #include <linux/module.h>
-#include <linux/syscalls.h>
 #include <linux/freezer.h>
 #include <linux/delay.h>
 #include <linux/workqueue.h>
+#include <linux/fs.h>
 
 /* 
  * Timeout for stopping processes
@@ -147,6 +147,12 @@ int freeze_processes(void)
 		goto Exit;
 	printk("done.\n");
 
+	printk("Freezing filesystems ... ");
+	error = freeze_supers();
+	if (error)
+		goto Exit;
+	printk("done.\n");
+
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
 	if (error)
@@ -188,6 +194,7 @@ void thaw_processes(void)
 	printk("Restarting tasks ... ");
 	thaw_workqueues();
 	thaw_tasks(true);
+	thaw_supers();
 	thaw_tasks(false);
 	schedule();
 	printk("done.\n");
Index: linux/fs/super.c
===================================================================
--- linux.orig/fs/super.c
+++ linux/fs/super.c
@@ -590,6 +590,76 @@ void iterate_supers_type(struct file_sys
 EXPORT_SYMBOL(iterate_supers_type);
 
 /**
+ *	freeze_supers - call freeze_super() for all superblocks
+ */
+int freeze_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+	int error = 0;
+
+	spin_lock(&sb_lock);
+	/*
+	 * Freeze in reverse order so filesystems depending on others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_root && sb->s_frozen != SB_FREEZE_TRANS
+		    && !(sb->s_flags & MS_RDONLY)) {
+			error = freeze_super(sb);
+			if (!error)
+				sb->s_flags |= MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (error)
+			break;
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+
+	return error;
+}
+
+/**
+ *	thaw_supers - call thaw_super() for all superblocks
+ */
+void thaw_supers(void)
+{
+	struct super_block *sb, *p = NULL;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (list_empty(&sb->s_instances))
+			continue;
+		sb->s_count++;
+		spin_unlock(&sb_lock);
+
+		if (sb->s_flags & MS_FROZEN) {
+			thaw_super(sb);
+			sb->s_flags &= ~MS_FROZEN;
+		}
+
+		spin_lock(&sb_lock);
+		if (p)
+			__put_super(p);
+		p = sb;
+	}
+	if (p)
+		__put_super(p);
+	spin_unlock(&sb_lock);
+}
+
+
+/**
  *	get_super - get the superblock of a device
  *	@bdev: device to get the superblock for
  *	

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-08-10 21:43           ` Pavel Machek
  (?)
@ 2011-08-16 12:38           ` Christoph
  2011-08-16 18:05               ` Rafael J. Wysocki
  -1 siblings, 1 reply; 68+ messages in thread
From: Christoph @ 2011-08-16 12:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rafael J. Wysocki, Linux PM mailing list, Nigel Cunningham, xfs

On 10.08.2011 23:43, Pavel Machek wrote:
> Hi!
> 
>>>> Why don't you simply submit a patch to do that?
>>>
>>> a) I don't know how to test suspend/hibernate
>>> b) I don't have any hardware I can test it on.


Hi!

I do apologize fmy silence but I've been in a serious productions the last
weeks.

If there any patches to test, let me know. I can run some decent stress tests.

chris


> 
> hibernation should work on any hardware. Just add cmdline parameter
> resume=<your swap> and use echo disk > /sys/power/state.
> 
> suspend is supported on most PCs, and certainly on any notebook. Just
> echo mem > /sys/power/state.
> 									Pavel

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-08-10 21:43           ` Pavel Machek
  (?)
  (?)
@ 2011-08-16 12:38           ` Christoph
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-08-16 12:38 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Linux PM mailing list, Dave Chinner, xfs

On 10.08.2011 23:43, Pavel Machek wrote:
> Hi!
> 
>>>> Why don't you simply submit a patch to do that?
>>>
>>> a) I don't know how to test suspend/hibernate
>>> b) I don't have any hardware I can test it on.


Hi!

I do apologize fmy silence but I've been in a serious productions the last
weeks.

If there any patches to test, let me know. I can run some decent stress tests.

chris


> 
> hibernation should work on any hardware. Just add cmdline parameter
> resume=<your swap> and use echo disk > /sys/power/state.
> 
> suspend is supported on most PCs, and certainly on any notebook. Just
> echo mem > /sys/power/state.
> 									Pavel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
  2011-08-16 12:38           ` Christoph
@ 2011-08-16 18:05               ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-16 18:05 UTC (permalink / raw)
  To: Christoph; +Cc: Linux PM mailing list, Nigel Cunningham, Pavel Machek, xfs

On Tuesday, August 16, 2011, Christoph wrote:
> On 10.08.2011 23:43, Pavel Machek wrote:
> > Hi!
> > 
> >>>> Why don't you simply submit a patch to do that?
> >>>
> >>> a) I don't know how to test suspend/hibernate
> >>> b) I don't have any hardware I can test it on.
> 
> 
> Hi!
> 
> I do apologize fmy silence but I've been in a serious productions the last
> weeks.
> 
> If there any patches to test, let me know. I can run some decent stress tests.

Well, please try the one attached to
https://bugzilla.kernel.org/show_bug.cgi?id=33622#c10

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag
@ 2011-08-16 18:05               ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-08-16 18:05 UTC (permalink / raw)
  To: Christoph; +Cc: Linux PM mailing list, Dave Chinner, xfs

On Tuesday, August 16, 2011, Christoph wrote:
> On 10.08.2011 23:43, Pavel Machek wrote:
> > Hi!
> > 
> >>>> Why don't you simply submit a patch to do that?
> >>>
> >>> a) I don't know how to test suspend/hibernate
> >>> b) I don't have any hardware I can test it on.
> 
> 
> Hi!
> 
> I do apologize fmy silence but I've been in a serious productions the last
> weeks.
> 
> If there any patches to test, let me know. I can run some decent stress tests.

Well, please try the one attached to
https://bugzilla.kernel.org/show_bug.cgi?id=33622#c10

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-08-07  0:14                         ` Dave Chinner
  (?)
@ 2011-09-24 22:56                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-24 22:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linux PM mailing list, Pavel Machek, Nigel Cunningham,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---

Below is an alternative fix, the changelog pretty much explains the idea.

I've tested it on Toshiba Portege R500, but I don't have an XFS partition
to verify that it really helps, so I'd appreciate it if someone able to
reproduce the original issue could test it and report back.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Freeze kernel threads after preallocating memory

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h  |    4 +++-
 kernel/power/hibernate.c |   12 ++++++++----
 kernel/power/power.h     |    3 ++-
 kernel/power/process.c   |   30 ++++++++++++++++++++----------
 4 files changed, 33 insertions(+), 16 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
 }
 
 /**
- *	freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
  */
 int freeze_processes(void)
 {
@@ -143,20 +143,30 @@ int freeze_processes(void)
 
 	printk("Freezing user space processes ... ");
 	error = try_to_freeze_tasks(true);
-	if (error)
-		goto Exit;
-	printk("done.\n");
+	if (!error) {
+		printk("done.");
+		oom_killer_disable();
+	}
+	printk("\n");
+	BUG_ON(in_atomic());
+
+	return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+	int error;
 
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
-	if (error)
-		goto Exit;
-	printk("done.");
+	if (!error)
+		printk("done.");
 
-	oom_killer_disable();
- Exit:
-	BUG_ON(in_atomic());
 	printk("\n");
+	BUG_ON(in_atomic());
 
 	return error;
 }
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
 
 extern void refrigerator(void);
 extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 
 static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
 static inline int thaw_process(struct task_struct *p) { return 1; }
 
 static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
 
 static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
 #ifdef CONFIG_SUSPEND_FREEZER
 static inline int suspend_freeze_processes(void)
 {
-	return freeze_processes();
+	int error = freeze_processes();
+	return error ? : freeze_kernel_threads();
 }
 
 static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		goto Close;
 
-	error = dpm_prepare(PMSG_FREEZE);
-	if (error)
-		goto Complete_devices;
-
 	/* Preallocate image memory before shutting down devices. */
 	error = hibernate_preallocate_memory();
 	if (error)
+		goto Close;
+
+	error = freeze_kernel_threads();
+	if (error)
+		goto Close;
+
+	error = dpm_prepare(PMSG_FREEZE);
+	if (error)
 		goto Complete_devices;
 
 	suspend_console();

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-24 22:56                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-24 22:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linux PM mailing list, Pavel Machek, Nigel Cunningham,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---

Below is an alternative fix, the changelog pretty much explains the idea.

I've tested it on Toshiba Portege R500, but I don't have an XFS partition
to verify that it really helps, so I'd appreciate it if someone able to
reproduce the original issue could test it and report back.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Freeze kernel threads after preallocating memory

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h  |    4 +++-
 kernel/power/hibernate.c |   12 ++++++++----
 kernel/power/power.h     |    3 ++-
 kernel/power/process.c   |   30 ++++++++++++++++++++----------
 4 files changed, 33 insertions(+), 16 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
 }
 
 /**
- *	freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
  */
 int freeze_processes(void)
 {
@@ -143,20 +143,30 @@ int freeze_processes(void)
 
 	printk("Freezing user space processes ... ");
 	error = try_to_freeze_tasks(true);
-	if (error)
-		goto Exit;
-	printk("done.\n");
+	if (!error) {
+		printk("done.");
+		oom_killer_disable();
+	}
+	printk("\n");
+	BUG_ON(in_atomic());
+
+	return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+	int error;
 
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
-	if (error)
-		goto Exit;
-	printk("done.");
+	if (!error)
+		printk("done.");
 
-	oom_killer_disable();
- Exit:
-	BUG_ON(in_atomic());
 	printk("\n");
+	BUG_ON(in_atomic());
 
 	return error;
 }
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
 
 extern void refrigerator(void);
 extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 
 static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
 static inline int thaw_process(struct task_struct *p) { return 1; }
 
 static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
 
 static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
 #ifdef CONFIG_SUSPEND_FREEZER
 static inline int suspend_freeze_processes(void)
 {
-	return freeze_processes();
+	int error = freeze_processes();
+	return error ? : freeze_kernel_threads();
 }
 
 static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		goto Close;
 
-	error = dpm_prepare(PMSG_FREEZE);
-	if (error)
-		goto Complete_devices;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-24 22:56                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-24 22:56 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On Sunday, August 07, 2011, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > 
> > Freeze all filesystems during the freezing of tasks by calling
> > freeze_bdev() for each of them and thaw them during the thawing
> > of tasks with the help of thaw_bdev().
> > 
> > This is needed by hibernation, because some filesystems (e.g. XFS)
> > deadlock with the preallocation of memory used by it if the memory
> > pressure caused by it is too heavy.
> > 
> > The additional benefit of this change is that, if something goes
> > wrong after filesystems have been frozen, they will stay in a
> > consistent state and journal replays won't be necessary (e.g. after
> > a failing suspend or resume).  In particular, this should help to
> > solve a long-standing issue that in some cases during resume from
> > hibernation the boot loader causes the journal to be replied for the
> > filesystem containing the kernel image and initrd causing it to
> > become inconsistent with the information stored in the hibernation
> > image.
> > 
> > This change is based on earlier work by Nigel Cunningham.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---

Below is an alternative fix, the changelog pretty much explains the idea.

I've tested it on Toshiba Portege R500, but I don't have an XFS partition
to verify that it really helps, so I'd appreciate it if someone able to
reproduce the original issue could test it and report back.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / Hibernate: Freeze kernel threads after preallocating memory

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 include/linux/freezer.h  |    4 +++-
 kernel/power/hibernate.c |   12 ++++++++----
 kernel/power/power.h     |    3 ++-
 kernel/power/process.c   |   30 ++++++++++++++++++++----------
 4 files changed, 33 insertions(+), 16 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
 }
 
 /**
- *	freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
  */
 int freeze_processes(void)
 {
@@ -143,20 +143,30 @@ int freeze_processes(void)
 
 	printk("Freezing user space processes ... ");
 	error = try_to_freeze_tasks(true);
-	if (error)
-		goto Exit;
-	printk("done.\n");
+	if (!error) {
+		printk("done.");
+		oom_killer_disable();
+	}
+	printk("\n");
+	BUG_ON(in_atomic());
+
+	return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+	int error;
 
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
-	if (error)
-		goto Exit;
-	printk("done.");
+	if (!error)
+		printk("done.");
 
-	oom_killer_disable();
- Exit:
-	BUG_ON(in_atomic());
 	printk("\n");
+	BUG_ON(in_atomic());
 
 	return error;
 }
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
 
 extern void refrigerator(void);
 extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 
 static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
 static inline int thaw_process(struct task_struct *p) { return 1; }
 
 static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
 
 static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
 #ifdef CONFIG_SUSPEND_FREEZER
 static inline int suspend_freeze_processes(void)
 {
-	return freeze_processes();
+	int error = freeze_processes();
+	return error ? : freeze_kernel_threads();
 }
 
 static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		goto Close;
 
-	error = dpm_prepare(PMSG_FREEZE);
-	if (error)
-		goto Complete_devices;
-
 	/* Preallocate image memory before shutting down devices. */
 	error = hibernate_preallocate_memory();
 	if (error)
+		goto Close;
+
+	error = freeze_kernel_threads();
+	if (error)
+		goto Close;
+
+	error = dpm_prepare(PMSG_FREEZE);
+	if (error)
 		goto Complete_devices;
 
 	suspend_console();

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-24 22:56                           ` Rafael J. Wysocki
@ 2011-09-25  5:32                             ` Nigel Cunningham
  -1 siblings, 0 replies; 68+ messages in thread
From: Nigel Cunningham @ 2011-09-25  5:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dave Chinner, Linux PM mailing list, Pavel Machek,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

Hi.

On 25/09/11 08:56, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>>
>>> Freeze all filesystems during the freezing of tasks by calling
>>> freeze_bdev() for each of them and thaw them during the thawing
>>> of tasks with the help of thaw_bdev().
>>>
>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>> deadlock with the preallocation of memory used by it if the memory
>>> pressure caused by it is too heavy.
>>>
>>> The additional benefit of this change is that, if something goes
>>> wrong after filesystems have been frozen, they will stay in a
>>> consistent state and journal replays won't be necessary (e.g. after
>>> a failing suspend or resume).  In particular, this should help to
>>> solve a long-standing issue that in some cases during resume from
>>> hibernation the boot loader causes the journal to be replied for the
>>> filesystem containing the kernel image and initrd causing it to
>>> become inconsistent with the information stored in the hibernation
>>> image.
>>>
>>> This change is based on earlier work by Nigel Cunningham.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>>> ---
> 
> Below is an alternative fix, the changelog pretty much explains the idea.
> 
> I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> to verify that it really helps, so I'd appreciate it if someone able to
> reproduce the original issue could test it and report back.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> 
> There is a problem with the current ordering of hibernate code which
> leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> some filesystems use freezable kernel threads that are inactive when
> the hibernate memory preallocation is carried out.  Those same
> filesystems use memory shrinkers that may be triggered by the
> hibernate memory preallocation.  If those memory shrinkers wait for
> the frozen kernel threads, the hibernate process deadlocks (this
> happens with XFS, for one example).
> 
> Apparently, it is not technically viable to redesign the filesystems
> in question to avoid the situation described above, so the only
> possible solution of this issue is to defer the freezing of kernel
> threads until the hibernate memory preallocation is done, which is
> implemented by this change.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

TuxOnIce has the following logic at the moment: Freeze all threads.
Calculate whether we have enough memory for the image, thaw kernel
threads only, free memory and try again if it looks like we don't have
enough. I've never heard of a deadlock, though I suppose one would be
possible if you had the added complication of userspace
drivers/filesystems - it would be good to be able to distinguish and
thaw them.

It does this prior to the atomic copy, using a user-supplied estimate of
the amount of memory drivers will need - the actual amount used is show
in debugging info at the end of the cycle. Apart from that, if you have
everything else frozen, everything else is pretty deterministic
(assuming you don't have any memory leaks in your image-writing code).

Regards,

Nigel
-- 
Evolution (n): A hypothetical process whereby improbable
events occur with alarming frequency, order arises from chaos, and
no one is given credit.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-25  5:32                             ` Nigel Cunningham
  0 siblings, 0 replies; 68+ messages in thread
From: Nigel Cunningham @ 2011-09-25  5:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Pavel Machek, linux-fsdevel, Linux PM mailing list, linux-ext4

Hi.

On 25/09/11 08:56, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>>
>>> Freeze all filesystems during the freezing of tasks by calling
>>> freeze_bdev() for each of them and thaw them during the thawing
>>> of tasks with the help of thaw_bdev().
>>>
>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>> deadlock with the preallocation of memory used by it if the memory
>>> pressure caused by it is too heavy.
>>>
>>> The additional benefit of this change is that, if something goes
>>> wrong after filesystems have been frozen, they will stay in a
>>> consistent state and journal replays won't be necessary (e.g. after
>>> a failing suspend or resume).  In particular, this should help to
>>> solve a long-standing issue that in some cases during resume from
>>> hibernation the boot loader causes the journal to be replied for the
>>> filesystem containing the kernel image and initrd causing it to
>>> become inconsistent with the information stored in the hibernation
>>> image.
>>>
>>> This change is based on earlier work by Nigel Cunningham.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>>> ---
> 
> Below is an alternative fix, the changelog pretty much explains the idea.
> 
> I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> to verify that it really helps, so I'd appreciate it if someone able to
> reproduce the original issue could test it and report back.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> 
> There is a problem with the current ordering of hibernate code which
> leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> some filesystems use freezable kernel threads that are inactive when
> the hibernate memory preallocation is carried out.  Those same
> filesystems use memory shrinkers that may be triggered by the
> hibernate memory preallocation.  If those memory shrinkers wait for
> the frozen kernel threads, the hibernate process deadlocks (this
> happens with XFS, for one example).
> 
> Apparently, it is not technically viable to redesign the filesystems
> in question to avoid the situation described above, so the only
> possible solution of this issue is to defer the freezing of kernel
> threads until the hibernate memory preallocation is done, which is
> implemented by this change.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

TuxOnIce has the following logic at the moment: Freeze all threads.
Calculate whether we have enough memory for the image, thaw kernel
threads only, free memory and try again if it looks like we don't have
enough. I've never heard of a deadlock, though I suppose one would be
possible if you had the added complication of userspace
drivers/filesystems - it would be good to be able to distinguish and
thaw them.

It does this prior to the atomic copy, using a user-supplied estimate of
the amount of memory drivers will need - the actual amount used is show
in debugging info at the end of the cycle. Apart from that, if you have
everything else frozen, everything else is pretty deterministic
(assuming you don't have any memory leaks in your image-writing code).

Regards,

Nigel
-- 
Evolution (n): A hypothetical process whereby improbable
events occur with alarming frequency, order arises from chaos, and
no one is given credit.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-24 22:56                           ` Rafael J. Wysocki
@ 2011-09-25 10:38                             ` Christoph
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-09-25 10:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dave Chinner, Linux PM mailing list, Pavel Machek,
	Nigel Cunningham, Christoph Hellwig, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 8179 bytes --]

test results of the patch below:

1. real machine

suspends fine but on wakeup, after loading image: hard reset.
nvidia gpu => disabled compitz  => wakeup worked two times.


2. virtualbox / stress test / xfs and ext4

on 3rd resume, it booted up "normal" like this:

[    3.351813] Freeing unused kernel memory: 568k freed
[    3.460973] Freeing unused kernel memory: 284k freed

[   17.328356] PM: Preparing processes for restore.

[   17.328357] Freezing user space processes ...
[   37.345414] Freezing of tasks failed after 20.01 seconds (1 tasks
refusing to freeze, wq_busy=0):
[   37.475244]  ffff88001f06fd68 0000000000000086 0000000000000000
0000000000000000
[   37.526163]  ffff88001f06e010 ffff88001f4c4410 0000000000012ec0
ffff88001f06ffd8
[   37.580110]  ffff88001f06ffd8 0000000000012ec0 ffffffff8160d020
ffff88001f4c4410
[   37.626167] Call Trace:
[   37.626769]  [<ffffffff81049944>] schedule+0x55/0x57
[   37.674925]  [<ffffffff81360dbe>] __mutex_lock_common+0x117/0x178
[   37.792559]  [<ffffffff81113ef2>] ? user_path_at+0x61/0x90
[   37.888501]  [<ffffffff81360e35>] __mutex_lock_slowpath+0x16/0x18
[   37.986966]  [<ffffffff81360efb>] mutex_lock+0x1e/0x32
[   38.086931]  [<ffffffffa00a4d43>] show_manufacturer+0x23/0x51 [usbcore]
[   38.212500]  [<ffffffff8125cd44>] dev_attr_show+0x22/0x49
[   38.282319]  [<ffffffff810c6f8c>] ? __get_free_pages+0x9/0x38
[   38.397449]  [<ffffffff8115febb>] sysfs_read_file+0xa9/0x12b
[   38.491607]  [<ffffffff81107ff6>] vfs_read+0xa6/0x102
[   38.541994]  [<ffffffff81105ebf>] ? do_sys_open+0xee/0x100
[   38.564907]  [<ffffffff8110810b>] sys_read+0x45/0x6c
[   38.578397]  [<ffffffff81368412>] system_call_fastpath+0x16/0x1b
[   38.590083]
[   38.598046] Restarting tasks ... done.
[   38.660448] XFS (sda3): Mounting Filesystem

restarted the test runs, increased delay between awake and sleep from 20
to 25 sec:

36 time successful hibernate+resume so far.


chris



On 25.09.2011 00:56, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>>
>>> Freeze all filesystems during the freezing of tasks by calling
>>> freeze_bdev() for each of them and thaw them during the thawing
>>> of tasks with the help of thaw_bdev().
>>>
>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>> deadlock with the preallocation of memory used by it if the memory
>>> pressure caused by it is too heavy.
>>>
>>> The additional benefit of this change is that, if something goes
>>> wrong after filesystems have been frozen, they will stay in a
>>> consistent state and journal replays won't be necessary (e.g. after
>>> a failing suspend or resume).  In particular, this should help to
>>> solve a long-standing issue that in some cases during resume from
>>> hibernation the boot loader causes the journal to be replied for the
>>> filesystem containing the kernel image and initrd causing it to
>>> become inconsistent with the information stored in the hibernation
>>> image.
>>>
>>> This change is based on earlier work by Nigel Cunningham.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>>> ---
> 
> Below is an alternative fix, the changelog pretty much explains the idea.
> 
> I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> to verify that it really helps, so I'd appreciate it if someone able to
> reproduce the original issue could test it and report back.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> 
> There is a problem with the current ordering of hibernate code which
> leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> some filesystems use freezable kernel threads that are inactive when
> the hibernate memory preallocation is carried out.  Those same
> filesystems use memory shrinkers that may be triggered by the
> hibernate memory preallocation.  If those memory shrinkers wait for
> the frozen kernel threads, the hibernate process deadlocks (this
> happens with XFS, for one example).
> 
> Apparently, it is not technically viable to redesign the filesystems
> in question to avoid the situation described above, so the only
> possible solution of this issue is to defer the freezing of kernel
> threads until the hibernate memory preallocation is done, which is
> implemented by this change.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  include/linux/freezer.h  |    4 +++-
>  kernel/power/hibernate.c |   12 ++++++++----
>  kernel/power/power.h     |    3 ++-
>  kernel/power/process.c   |   30 ++++++++++++++++++++----------
>  4 files changed, 33 insertions(+), 16 deletions(-)
> 
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
>  }
>  
>  /**
> - *	freeze_processes - tell processes to enter the refrigerator
> + * freeze_processes - Signal user space processes to enter the refrigerator.
>   */
>  int freeze_processes(void)
>  {
> @@ -143,20 +143,30 @@ int freeze_processes(void)
>  
>  	printk("Freezing user space processes ... ");
>  	error = try_to_freeze_tasks(true);
> -	if (error)
> -		goto Exit;
> -	printk("done.\n");
> +	if (!error) {
> +		printk("done.");
> +		oom_killer_disable();
> +	}
> +	printk("\n");
> +	BUG_ON(in_atomic());
> +
> +	return error;
> +}
> +
> +/**
> + * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
> + */
> +int freeze_kernel_threads(void)
> +{
> +	int error;
>  
>  	printk("Freezing remaining freezable tasks ... ");
>  	error = try_to_freeze_tasks(false);
> -	if (error)
> -		goto Exit;
> -	printk("done.");
> +	if (!error)
> +		printk("done.");
>  
> -	oom_killer_disable();
> - Exit:
> -	BUG_ON(in_atomic());
>  	printk("\n");
> +	BUG_ON(in_atomic());
>  
>  	return error;
>  }
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
>  
>  extern void refrigerator(void);
>  extern int freeze_processes(void);
> +extern int freeze_kernel_threads(void);
>  extern void thaw_processes(void);
>  
>  static inline int try_to_freeze(void)
> @@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
>  static inline int thaw_process(struct task_struct *p) { return 1; }
>  
>  static inline void refrigerator(void) {}
> -static inline int freeze_processes(void) { BUG(); return 0; }
> +static inline int freeze_processes(void) { return -ENOSYS; }
> +static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
>  
>  static inline int try_to_freeze(void) { return 0; }
> Index: linux/kernel/power/power.h
> ===================================================================
> --- linux.orig/kernel/power/power.h
> +++ linux/kernel/power/power.h
> @@ -228,7 +228,8 @@ extern int pm_test_level;
>  #ifdef CONFIG_SUSPEND_FREEZER
>  static inline int suspend_freeze_processes(void)
>  {
> -	return freeze_processes();
> +	int error = freeze_processes();
> +	return error ? : freeze_kernel_threads();
>  }
>  
>  static inline void suspend_thaw_processes(void)
> Index: linux/kernel/power/hibernate.c
> ===================================================================
> --- linux.orig/kernel/power/hibernate.c
> +++ linux/kernel/power/hibernate.c
> @@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
>  	if (error)
>  		goto Close;
>  
> -	error = dpm_prepare(PMSG_FREEZE);
> -	if (error)
> -		goto Complete_devices;
> -
>  	/* Preallocate image memory before shutting down devices. */
>  	error = hibernate_preallocate_memory();
>  	if (error)
> +		goto Close;
> +
> +	error = freeze_kernel_threads();
> +	if (error)
> +		goto Close;
> +
> +	error = dpm_prepare(PMSG_FREEZE);
> +	if (error)
>  		goto Complete_devices;
>  
>  	suspend_console();

[-- Attachment #2: run1-v3.1-p1.bz2 --]
[-- Type: application/octet-stream, Size: 21428 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-25 10:38                             ` Christoph
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-09-25 10:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 8179 bytes --]

test results of the patch below:

1. real machine

suspends fine but on wakeup, after loading image: hard reset.
nvidia gpu => disabled compitz  => wakeup worked two times.


2. virtualbox / stress test / xfs and ext4

on 3rd resume, it booted up "normal" like this:

[    3.351813] Freeing unused kernel memory: 568k freed
[    3.460973] Freeing unused kernel memory: 284k freed

[   17.328356] PM: Preparing processes for restore.

[   17.328357] Freezing user space processes ...
[   37.345414] Freezing of tasks failed after 20.01 seconds (1 tasks
refusing to freeze, wq_busy=0):
[   37.475244]  ffff88001f06fd68 0000000000000086 0000000000000000
0000000000000000
[   37.526163]  ffff88001f06e010 ffff88001f4c4410 0000000000012ec0
ffff88001f06ffd8
[   37.580110]  ffff88001f06ffd8 0000000000012ec0 ffffffff8160d020
ffff88001f4c4410
[   37.626167] Call Trace:
[   37.626769]  [<ffffffff81049944>] schedule+0x55/0x57
[   37.674925]  [<ffffffff81360dbe>] __mutex_lock_common+0x117/0x178
[   37.792559]  [<ffffffff81113ef2>] ? user_path_at+0x61/0x90
[   37.888501]  [<ffffffff81360e35>] __mutex_lock_slowpath+0x16/0x18
[   37.986966]  [<ffffffff81360efb>] mutex_lock+0x1e/0x32
[   38.086931]  [<ffffffffa00a4d43>] show_manufacturer+0x23/0x51 [usbcore]
[   38.212500]  [<ffffffff8125cd44>] dev_attr_show+0x22/0x49
[   38.282319]  [<ffffffff810c6f8c>] ? __get_free_pages+0x9/0x38
[   38.397449]  [<ffffffff8115febb>] sysfs_read_file+0xa9/0x12b
[   38.491607]  [<ffffffff81107ff6>] vfs_read+0xa6/0x102
[   38.541994]  [<ffffffff81105ebf>] ? do_sys_open+0xee/0x100
[   38.564907]  [<ffffffff8110810b>] sys_read+0x45/0x6c
[   38.578397]  [<ffffffff81368412>] system_call_fastpath+0x16/0x1b
[   38.590083]
[   38.598046] Restarting tasks ... done.
[   38.660448] XFS (sda3): Mounting Filesystem

restarted the test runs, increased delay between awake and sleep from 20
to 25 sec:

36 time successful hibernate+resume so far.


chris



On 25.09.2011 00:56, Rafael J. Wysocki wrote:
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>>
>>> Freeze all filesystems during the freezing of tasks by calling
>>> freeze_bdev() for each of them and thaw them during the thawing
>>> of tasks with the help of thaw_bdev().
>>>
>>> This is needed by hibernation, because some filesystems (e.g. XFS)
>>> deadlock with the preallocation of memory used by it if the memory
>>> pressure caused by it is too heavy.
>>>
>>> The additional benefit of this change is that, if something goes
>>> wrong after filesystems have been frozen, they will stay in a
>>> consistent state and journal replays won't be necessary (e.g. after
>>> a failing suspend or resume).  In particular, this should help to
>>> solve a long-standing issue that in some cases during resume from
>>> hibernation the boot loader causes the journal to be replied for the
>>> filesystem containing the kernel image and initrd causing it to
>>> become inconsistent with the information stored in the hibernation
>>> image.
>>>
>>> This change is based on earlier work by Nigel Cunningham.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
>>> ---
> 
> Below is an alternative fix, the changelog pretty much explains the idea.
> 
> I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> to verify that it really helps, so I'd appreciate it if someone able to
> reproduce the original issue could test it and report back.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> 
> There is a problem with the current ordering of hibernate code which
> leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> some filesystems use freezable kernel threads that are inactive when
> the hibernate memory preallocation is carried out.  Those same
> filesystems use memory shrinkers that may be triggered by the
> hibernate memory preallocation.  If those memory shrinkers wait for
> the frozen kernel threads, the hibernate process deadlocks (this
> happens with XFS, for one example).
> 
> Apparently, it is not technically viable to redesign the filesystems
> in question to avoid the situation described above, so the only
> possible solution of this issue is to defer the freezing of kernel
> threads until the hibernate memory preallocation is done, which is
> implemented by this change.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  include/linux/freezer.h  |    4 +++-
>  kernel/power/hibernate.c |   12 ++++++++----
>  kernel/power/power.h     |    3 ++-
>  kernel/power/process.c   |   30 ++++++++++++++++++++----------
>  4 files changed, 33 insertions(+), 16 deletions(-)
> 
> Index: linux/kernel/power/process.c
> ===================================================================
> --- linux.orig/kernel/power/process.c
> +++ linux/kernel/power/process.c
> @@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
>  }
>  
>  /**
> - *	freeze_processes - tell processes to enter the refrigerator
> + * freeze_processes - Signal user space processes to enter the refrigerator.
>   */
>  int freeze_processes(void)
>  {
> @@ -143,20 +143,30 @@ int freeze_processes(void)
>  
>  	printk("Freezing user space processes ... ");
>  	error = try_to_freeze_tasks(true);
> -	if (error)
> -		goto Exit;
> -	printk("done.\n");
> +	if (!error) {
> +		printk("done.");
> +		oom_killer_disable();
> +	}
> +	printk("\n");
> +	BUG_ON(in_atomic());
> +
> +	return error;
> +}
> +
> +/**
> + * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
> + */
> +int freeze_kernel_threads(void)
> +{
> +	int error;
>  
>  	printk("Freezing remaining freezable tasks ... ");
>  	error = try_to_freeze_tasks(false);
> -	if (error)
> -		goto Exit;
> -	printk("done.");
> +	if (!error)
> +		printk("done.");
>  
> -	oom_killer_disable();
> - Exit:
> -	BUG_ON(in_atomic());
>  	printk("\n");
> +	BUG_ON(in_atomic());
>  
>  	return error;
>  }
> Index: linux/include/linux/freezer.h
> ===================================================================
> --- linux.orig/include/linux/freezer.h
> +++ linux/include/linux/freezer.h
> @@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
>  
>  extern void refrigerator(void);
>  extern int freeze_processes(void);
> +extern int freeze_kernel_threads(void);
>  extern void thaw_processes(void);
>  
>  static inline int try_to_freeze(void)
> @@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
>  static inline int thaw_process(struct task_struct *p) { return 1; }
>  
>  static inline void refrigerator(void) {}
> -static inline int freeze_processes(void) { BUG(); return 0; }
> +static inline int freeze_processes(void) { return -ENOSYS; }
> +static inline int freeze_kernel_threads(void) { return -ENOSYS; }
>  static inline void thaw_processes(void) {}
>  
>  static inline int try_to_freeze(void) { return 0; }
> Index: linux/kernel/power/power.h
> ===================================================================
> --- linux.orig/kernel/power/power.h
> +++ linux/kernel/power/power.h
> @@ -228,7 +228,8 @@ extern int pm_test_level;
>  #ifdef CONFIG_SUSPEND_FREEZER
>  static inline int suspend_freeze_processes(void)
>  {
> -	return freeze_processes();
> +	int error = freeze_processes();
> +	return error ? : freeze_kernel_threads();
>  }
>  
>  static inline void suspend_thaw_processes(void)
> Index: linux/kernel/power/hibernate.c
> ===================================================================
> --- linux.orig/kernel/power/hibernate.c
> +++ linux/kernel/power/hibernate.c
> @@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
>  	if (error)
>  		goto Close;
>  
> -	error = dpm_prepare(PMSG_FREEZE);
> -	if (error)
> -		goto Complete_devices;
> -
>  	/* Preallocate image memory before shutting down devices. */
>  	error = hibernate_preallocate_memory();
>  	if (error)
> +		goto Close;
> +
> +	error = freeze_kernel_threads();
> +	if (error)
> +		goto Close;
> +
> +	error = dpm_prepare(PMSG_FREEZE);
> +	if (error)
>  		goto Complete_devices;
>  
>  	suspend_console();

[-- Attachment #2: run1-v3.1-p1.bz2 --]
[-- Type: application/octet-stream, Size: 21428 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-25 10:38                             ` Christoph
@ 2011-09-25 13:32                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 13:32 UTC (permalink / raw)
  To: Christoph
  Cc: Dave Chinner, Linux PM mailing list, Pavel Machek,
	Nigel Cunningham, Christoph Hellwig, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, September 25, 2011, Christoph wrote:
> test results of the patch below:
> 
> 1. real machine
> 
> suspends fine but on wakeup, after loading image: hard reset.
> nvidia gpu => disabled compitz  => wakeup worked two times.

Hmm, so there's a separate bug related to NVidia I guess.

> 2. virtualbox / stress test / xfs and ext4
> 
> on 3rd resume, it booted up "normal" like this:
> 
> [    3.351813] Freeing unused kernel memory: 568k freed
> [    3.460973] Freeing unused kernel memory: 284k freed
> 
> [   17.328356] PM: Preparing processes for restore.
> 
> [   17.328357] Freezing user space processes ...
> [   37.345414] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [   37.475244]  ffff88001f06fd68 0000000000000086 0000000000000000
> 0000000000000000
> [   37.526163]  ffff88001f06e010 ffff88001f4c4410 0000000000012ec0
> ffff88001f06ffd8
> [   37.580110]  ffff88001f06ffd8 0000000000012ec0 ffffffff8160d020
> ffff88001f4c4410
> [   37.626167] Call Trace:
> [   37.626769]  [<ffffffff81049944>] schedule+0x55/0x57
> [   37.674925]  [<ffffffff81360dbe>] __mutex_lock_common+0x117/0x178
> [   37.792559]  [<ffffffff81113ef2>] ? user_path_at+0x61/0x90
> [   37.888501]  [<ffffffff81360e35>] __mutex_lock_slowpath+0x16/0x18
> [   37.986966]  [<ffffffff81360efb>] mutex_lock+0x1e/0x32
> [   38.086931]  [<ffffffffa00a4d43>] show_manufacturer+0x23/0x51 [usbcore]
> [   38.212500]  [<ffffffff8125cd44>] dev_attr_show+0x22/0x49
> [   38.282319]  [<ffffffff810c6f8c>] ? __get_free_pages+0x9/0x38
> [   38.397449]  [<ffffffff8115febb>] sysfs_read_file+0xa9/0x12b
> [   38.491607]  [<ffffffff81107ff6>] vfs_read+0xa6/0x102
> [   38.541994]  [<ffffffff81105ebf>] ? do_sys_open+0xee/0x100
> [   38.564907]  [<ffffffff8110810b>] sys_read+0x45/0x6c
> [   38.578397]  [<ffffffff81368412>] system_call_fastpath+0x16/0x1b
> [   38.590083]
> [   38.598046] Restarting tasks ... done.
> [   38.660448] XFS (sda3): Mounting Filesystem
> 
> restarted the test runs, increased delay between awake and sleep from 20
> to 25 sec:
> 
> 36 time successful hibernate+resume so far.

OK, cool.  Thanks for testing!

Rafael


> On 25.09.2011 00:56, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing
> >>> of tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> >>> The additional benefit of this change is that, if something goes
> >>> wrong after filesystems have been frozen, they will stay in a
> >>> consistent state and journal replays won't be necessary (e.g. after
> >>> a failing suspend or resume).  In particular, this should help to
> >>> solve a long-standing issue that in some cases during resume from
> >>> hibernation the boot loader causes the journal to be replied for the
> >>> filesystem containing the kernel image and initrd causing it to
> >>> become inconsistent with the information stored in the hibernation
> >>> image.
> >>>
> >>> This change is based on earlier work by Nigel Cunningham.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> >>> ---
> > 
> > Below is an alternative fix, the changelog pretty much explains the idea.
> > 
> > I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> > to verify that it really helps, so I'd appreciate it if someone able to
> > reproduce the original issue could test it and report back.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> > 
> > There is a problem with the current ordering of hibernate code which
> > leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> > some filesystems use freezable kernel threads that are inactive when
> > the hibernate memory preallocation is carried out.  Those same
> > filesystems use memory shrinkers that may be triggered by the
> > hibernate memory preallocation.  If those memory shrinkers wait for
> > the frozen kernel threads, the hibernate process deadlocks (this
> > happens with XFS, for one example).
> > 
> > Apparently, it is not technically viable to redesign the filesystems
> > in question to avoid the situation described above, so the only
> > possible solution of this issue is to defer the freezing of kernel
> > threads until the hibernate memory preallocation is done, which is
> > implemented by this change.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  include/linux/freezer.h  |    4 +++-
> >  kernel/power/hibernate.c |   12 ++++++++----
> >  kernel/power/power.h     |    3 ++-
> >  kernel/power/process.c   |   30 ++++++++++++++++++++----------
> >  4 files changed, 33 insertions(+), 16 deletions(-)
> > 
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
> >  }
> >  
> >  /**
> > - *	freeze_processes - tell processes to enter the refrigerator
> > + * freeze_processes - Signal user space processes to enter the refrigerator.
> >   */
> >  int freeze_processes(void)
> >  {
> > @@ -143,20 +143,30 @@ int freeze_processes(void)
> >  
> >  	printk("Freezing user space processes ... ");
> >  	error = try_to_freeze_tasks(true);
> > -	if (error)
> > -		goto Exit;
> > -	printk("done.\n");
> > +	if (!error) {
> > +		printk("done.");
> > +		oom_killer_disable();
> > +	}
> > +	printk("\n");
> > +	BUG_ON(in_atomic());
> > +
> > +	return error;
> > +}
> > +
> > +/**
> > + * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
> > + */
> > +int freeze_kernel_threads(void)
> > +{
> > +	int error;
> >  
> >  	printk("Freezing remaining freezable tasks ... ");
> >  	error = try_to_freeze_tasks(false);
> > -	if (error)
> > -		goto Exit;
> > -	printk("done.");
> > +	if (!error)
> > +		printk("done.");
> >  
> > -	oom_killer_disable();
> > - Exit:
> > -	BUG_ON(in_atomic());
> >  	printk("\n");
> > +	BUG_ON(in_atomic());
> >  
> >  	return error;
> >  }
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
> >  
> >  extern void refrigerator(void);
> >  extern int freeze_processes(void);
> > +extern int freeze_kernel_threads(void);
> >  extern void thaw_processes(void);
> >  
> >  static inline int try_to_freeze(void)
> > @@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
> >  static inline int thaw_process(struct task_struct *p) { return 1; }
> >  
> >  static inline void refrigerator(void) {}
> > -static inline int freeze_processes(void) { BUG(); return 0; }
> > +static inline int freeze_processes(void) { return -ENOSYS; }
> > +static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> >  static inline void thaw_processes(void) {}
> >  
> >  static inline int try_to_freeze(void) { return 0; }
> > Index: linux/kernel/power/power.h
> > ===================================================================
> > --- linux.orig/kernel/power/power.h
> > +++ linux/kernel/power/power.h
> > @@ -228,7 +228,8 @@ extern int pm_test_level;
> >  #ifdef CONFIG_SUSPEND_FREEZER
> >  static inline int suspend_freeze_processes(void)
> >  {
> > -	return freeze_processes();
> > +	int error = freeze_processes();
> > +	return error ? : freeze_kernel_threads();
> >  }
> >  
> >  static inline void suspend_thaw_processes(void)
> > Index: linux/kernel/power/hibernate.c
> > ===================================================================
> > --- linux.orig/kernel/power/hibernate.c
> > +++ linux/kernel/power/hibernate.c
> > @@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
> >  	if (error)
> >  		goto Close;
> >  
> > -	error = dpm_prepare(PMSG_FREEZE);
> > -	if (error)
> > -		goto Complete_devices;
> > -
> >  	/* Preallocate image memory before shutting down devices. */
> >  	error = hibernate_preallocate_memory();
> >  	if (error)
> > +		goto Close;
> > +
> > +	error = freeze_kernel_threads();
> > +	if (error)
> > +		goto Close;
> > +
> > +	error = dpm_prepare(PMSG_FREEZE);
> > +	if (error)
> >  		goto Complete_devices;
> >  
> >  	suspend_console();
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-25 13:32                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 13:32 UTC (permalink / raw)
  To: Christoph
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On Sunday, September 25, 2011, Christoph wrote:
> test results of the patch below:
> 
> 1. real machine
> 
> suspends fine but on wakeup, after loading image: hard reset.
> nvidia gpu => disabled compitz  => wakeup worked two times.

Hmm, so there's a separate bug related to NVidia I guess.

> 2. virtualbox / stress test / xfs and ext4
> 
> on 3rd resume, it booted up "normal" like this:
> 
> [    3.351813] Freeing unused kernel memory: 568k freed
> [    3.460973] Freeing unused kernel memory: 284k freed
> 
> [   17.328356] PM: Preparing processes for restore.
> 
> [   17.328357] Freezing user space processes ...
> [   37.345414] Freezing of tasks failed after 20.01 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [   37.475244]  ffff88001f06fd68 0000000000000086 0000000000000000
> 0000000000000000
> [   37.526163]  ffff88001f06e010 ffff88001f4c4410 0000000000012ec0
> ffff88001f06ffd8
> [   37.580110]  ffff88001f06ffd8 0000000000012ec0 ffffffff8160d020
> ffff88001f4c4410
> [   37.626167] Call Trace:
> [   37.626769]  [<ffffffff81049944>] schedule+0x55/0x57
> [   37.674925]  [<ffffffff81360dbe>] __mutex_lock_common+0x117/0x178
> [   37.792559]  [<ffffffff81113ef2>] ? user_path_at+0x61/0x90
> [   37.888501]  [<ffffffff81360e35>] __mutex_lock_slowpath+0x16/0x18
> [   37.986966]  [<ffffffff81360efb>] mutex_lock+0x1e/0x32
> [   38.086931]  [<ffffffffa00a4d43>] show_manufacturer+0x23/0x51 [usbcore]
> [   38.212500]  [<ffffffff8125cd44>] dev_attr_show+0x22/0x49
> [   38.282319]  [<ffffffff810c6f8c>] ? __get_free_pages+0x9/0x38
> [   38.397449]  [<ffffffff8115febb>] sysfs_read_file+0xa9/0x12b
> [   38.491607]  [<ffffffff81107ff6>] vfs_read+0xa6/0x102
> [   38.541994]  [<ffffffff81105ebf>] ? do_sys_open+0xee/0x100
> [   38.564907]  [<ffffffff8110810b>] sys_read+0x45/0x6c
> [   38.578397]  [<ffffffff81368412>] system_call_fastpath+0x16/0x1b
> [   38.590083]
> [   38.598046] Restarting tasks ... done.
> [   38.660448] XFS (sda3): Mounting Filesystem
> 
> restarted the test runs, increased delay between awake and sleep from 20
> to 25 sec:
> 
> 36 time successful hibernate+resume so far.

OK, cool.  Thanks for testing!

Rafael


> On 25.09.2011 00:56, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing
> >>> of tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> >>> The additional benefit of this change is that, if something goes
> >>> wrong after filesystems have been frozen, they will stay in a
> >>> consistent state and journal replays won't be necessary (e.g. after
> >>> a failing suspend or resume).  In particular, this should help to
> >>> solve a long-standing issue that in some cases during resume from
> >>> hibernation the boot loader causes the journal to be replied for the
> >>> filesystem containing the kernel image and initrd causing it to
> >>> become inconsistent with the information stored in the hibernation
> >>> image.
> >>>
> >>> This change is based on earlier work by Nigel Cunningham.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> >>> ---
> > 
> > Below is an alternative fix, the changelog pretty much explains the idea.
> > 
> > I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> > to verify that it really helps, so I'd appreciate it if someone able to
> > reproduce the original issue could test it and report back.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> > 
> > There is a problem with the current ordering of hibernate code which
> > leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> > some filesystems use freezable kernel threads that are inactive when
> > the hibernate memory preallocation is carried out.  Those same
> > filesystems use memory shrinkers that may be triggered by the
> > hibernate memory preallocation.  If those memory shrinkers wait for
> > the frozen kernel threads, the hibernate process deadlocks (this
> > happens with XFS, for one example).
> > 
> > Apparently, it is not technically viable to redesign the filesystems
> > in question to avoid the situation described above, so the only
> > possible solution of this issue is to defer the freezing of kernel
> > threads until the hibernate memory preallocation is done, which is
> > implemented by this change.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> >  include/linux/freezer.h  |    4 +++-
> >  kernel/power/hibernate.c |   12 ++++++++----
> >  kernel/power/power.h     |    3 ++-
> >  kernel/power/process.c   |   30 ++++++++++++++++++++----------
> >  4 files changed, 33 insertions(+), 16 deletions(-)
> > 
> > Index: linux/kernel/power/process.c
> > ===================================================================
> > --- linux.orig/kernel/power/process.c
> > +++ linux/kernel/power/process.c
> > @@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
> >  }
> >  
> >  /**
> > - *	freeze_processes - tell processes to enter the refrigerator
> > + * freeze_processes - Signal user space processes to enter the refrigerator.
> >   */
> >  int freeze_processes(void)
> >  {
> > @@ -143,20 +143,30 @@ int freeze_processes(void)
> >  
> >  	printk("Freezing user space processes ... ");
> >  	error = try_to_freeze_tasks(true);
> > -	if (error)
> > -		goto Exit;
> > -	printk("done.\n");
> > +	if (!error) {
> > +		printk("done.");
> > +		oom_killer_disable();
> > +	}
> > +	printk("\n");
> > +	BUG_ON(in_atomic());
> > +
> > +	return error;
> > +}
> > +
> > +/**
> > + * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
> > + */
> > +int freeze_kernel_threads(void)
> > +{
> > +	int error;
> >  
> >  	printk("Freezing remaining freezable tasks ... ");
> >  	error = try_to_freeze_tasks(false);
> > -	if (error)
> > -		goto Exit;
> > -	printk("done.");
> > +	if (!error)
> > +		printk("done.");
> >  
> > -	oom_killer_disable();
> > - Exit:
> > -	BUG_ON(in_atomic());
> >  	printk("\n");
> > +	BUG_ON(in_atomic());
> >  
> >  	return error;
> >  }
> > Index: linux/include/linux/freezer.h
> > ===================================================================
> > --- linux.orig/include/linux/freezer.h
> > +++ linux/include/linux/freezer.h
> > @@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
> >  
> >  extern void refrigerator(void);
> >  extern int freeze_processes(void);
> > +extern int freeze_kernel_threads(void);
> >  extern void thaw_processes(void);
> >  
> >  static inline int try_to_freeze(void)
> > @@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
> >  static inline int thaw_process(struct task_struct *p) { return 1; }
> >  
> >  static inline void refrigerator(void) {}
> > -static inline int freeze_processes(void) { BUG(); return 0; }
> > +static inline int freeze_processes(void) { return -ENOSYS; }
> > +static inline int freeze_kernel_threads(void) { return -ENOSYS; }
> >  static inline void thaw_processes(void) {}
> >  
> >  static inline int try_to_freeze(void) { return 0; }
> > Index: linux/kernel/power/power.h
> > ===================================================================
> > --- linux.orig/kernel/power/power.h
> > +++ linux/kernel/power/power.h
> > @@ -228,7 +228,8 @@ extern int pm_test_level;
> >  #ifdef CONFIG_SUSPEND_FREEZER
> >  static inline int suspend_freeze_processes(void)
> >  {
> > -	return freeze_processes();
> > +	int error = freeze_processes();
> > +	return error ? : freeze_kernel_threads();
> >  }
> >  
> >  static inline void suspend_thaw_processes(void)
> > Index: linux/kernel/power/hibernate.c
> > ===================================================================
> > --- linux.orig/kernel/power/hibernate.c
> > +++ linux/kernel/power/hibernate.c
> > @@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
> >  	if (error)
> >  		goto Close;
> >  
> > -	error = dpm_prepare(PMSG_FREEZE);
> > -	if (error)
> > -		goto Complete_devices;
> > -
> >  	/* Preallocate image memory before shutting down devices. */
> >  	error = hibernate_preallocate_memory();
> >  	if (error)
> > +		goto Close;
> > +
> > +	error = freeze_kernel_threads();
> > +	if (error)
> > +		goto Close;
> > +
> > +	error = dpm_prepare(PMSG_FREEZE);
> > +	if (error)
> >  		goto Complete_devices;
> >  
> >  	suspend_console();
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-25  5:32                             ` Nigel Cunningham
@ 2011-09-25 13:37                               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 13:37 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Dave Chinner, Linux PM mailing list, Pavel Machek,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, September 25, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 25/09/11 08:56, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing
> >>> of tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> >>> The additional benefit of this change is that, if something goes
> >>> wrong after filesystems have been frozen, they will stay in a
> >>> consistent state and journal replays won't be necessary (e.g. after
> >>> a failing suspend or resume).  In particular, this should help to
> >>> solve a long-standing issue that in some cases during resume from
> >>> hibernation the boot loader causes the journal to be replied for the
> >>> filesystem containing the kernel image and initrd causing it to
> >>> become inconsistent with the information stored in the hibernation
> >>> image.
> >>>
> >>> This change is based on earlier work by Nigel Cunningham.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> >>> ---
> > 
> > Below is an alternative fix, the changelog pretty much explains the idea.
> > 
> > I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> > to verify that it really helps, so I'd appreciate it if someone able to
> > reproduce the original issue could test it and report back.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> > 
> > There is a problem with the current ordering of hibernate code which
> > leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> > some filesystems use freezable kernel threads that are inactive when
> > the hibernate memory preallocation is carried out.  Those same
> > filesystems use memory shrinkers that may be triggered by the
> > hibernate memory preallocation.  If those memory shrinkers wait for
> > the frozen kernel threads, the hibernate process deadlocks (this
> > happens with XFS, for one example).
> > 
> > Apparently, it is not technically viable to redesign the filesystems
> > in question to avoid the situation described above, so the only
> > possible solution of this issue is to defer the freezing of kernel
> > threads until the hibernate memory preallocation is done, which is
> > implemented by this change.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> TuxOnIce has the following logic at the moment: Freeze all threads.
> Calculate whether we have enough memory for the image, thaw kernel
> threads only, free memory and try again if it looks like we don't have
> enough.

Well, it seems that the freezing of kernel threads in the first step
is not necessary.  You can do (1) freeze user space, (2) check if there's
enough free memory, (3) free memory if necessary, (4) freeze kernel
threads instead.  Which is what my patch does, actually. :-)

> I've never heard of a deadlock, though I suppose one would be
> possible if you had the added complication of userspace
> drivers/filesystems - it would be good to be able to distinguish and
> thaw them.

Yes, there is a known problem with FUSE in that area.

> It does this prior to the atomic copy, using a user-supplied estimate of
> the amount of memory drivers will need - the actual amount used is show
> in debugging info at the end of the cycle. Apart from that, if you have
> everything else frozen, everything else is pretty deterministic
> (assuming you don't have any memory leaks in your image-writing code).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-25 13:37                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 13:37 UTC (permalink / raw)
  To: Nigel Cunningham
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Pavel Machek, linux-fsdevel, Linux PM mailing list, linux-ext4

On Sunday, September 25, 2011, Nigel Cunningham wrote:
> Hi.
> 
> On 25/09/11 08:56, Rafael J. Wysocki wrote:
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>>
> >>> Freeze all filesystems during the freezing of tasks by calling
> >>> freeze_bdev() for each of them and thaw them during the thawing
> >>> of tasks with the help of thaw_bdev().
> >>>
> >>> This is needed by hibernation, because some filesystems (e.g. XFS)
> >>> deadlock with the preallocation of memory used by it if the memory
> >>> pressure caused by it is too heavy.
> >>>
> >>> The additional benefit of this change is that, if something goes
> >>> wrong after filesystems have been frozen, they will stay in a
> >>> consistent state and journal replays won't be necessary (e.g. after
> >>> a failing suspend or resume).  In particular, this should help to
> >>> solve a long-standing issue that in some cases during resume from
> >>> hibernation the boot loader causes the journal to be replied for the
> >>> filesystem containing the kernel image and initrd causing it to
> >>> become inconsistent with the information stored in the hibernation
> >>> image.
> >>>
> >>> This change is based on earlier work by Nigel Cunningham.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> >>> ---
> > 
> > Below is an alternative fix, the changelog pretty much explains the idea.
> > 
> > I've tested it on Toshiba Portege R500, but I don't have an XFS partition
> > to verify that it really helps, so I'd appreciate it if someone able to
> > reproduce the original issue could test it and report back.
> > 
> > Thanks,
> > Rafael
> > 
> > ---
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> > Subject: PM / Hibernate: Freeze kernel threads after preallocating memory
> > 
> > There is a problem with the current ordering of hibernate code which
> > leads to deadlocks in some filesystems' memory shrinkers.  Namely,
> > some filesystems use freezable kernel threads that are inactive when
> > the hibernate memory preallocation is carried out.  Those same
> > filesystems use memory shrinkers that may be triggered by the
> > hibernate memory preallocation.  If those memory shrinkers wait for
> > the frozen kernel threads, the hibernate process deadlocks (this
> > happens with XFS, for one example).
> > 
> > Apparently, it is not technically viable to redesign the filesystems
> > in question to avoid the situation described above, so the only
> > possible solution of this issue is to defer the freezing of kernel
> > threads until the hibernate memory preallocation is done, which is
> > implemented by this change.
> > 
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> 
> TuxOnIce has the following logic at the moment: Freeze all threads.
> Calculate whether we have enough memory for the image, thaw kernel
> threads only, free memory and try again if it looks like we don't have
> enough.

Well, it seems that the freezing of kernel threads in the first step
is not necessary.  You can do (1) freeze user space, (2) check if there's
enough free memory, (3) free memory if necessary, (4) freeze kernel
threads instead.  Which is what my patch does, actually. :-)

> I've never heard of a deadlock, though I suppose one would be
> possible if you had the added complication of userspace
> drivers/filesystems - it would be good to be able to distinguish and
> thaw them.

Yes, there is a known problem with FUSE in that area.

> It does this prior to the atomic copy, using a user-supplied estimate of
> the amount of memory drivers will need - the actual amount used is show
> in debugging info at the end of the cycle. Apart from that, if you have
> everything else frozen, everything else is pretty deterministic
> (assuming you don't have any memory leaks in your image-writing code).

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory
  2011-09-24 22:56                           ` Rafael J. Wysocki
@ 2011-09-25 13:40                             ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 13:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linux PM mailing list, Pavel Machek, Nigel Cunningham,
	Christoph Hellwig, Christoph, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

From: Rafael J. Wysocki <rjw@sisk.pl>

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/devices.txt |    4 ----
 include/linux/freezer.h         |    4 +++-
 kernel/power/hibernate.c        |   12 ++++++++----
 kernel/power/power.h            |    3 ++-
 kernel/power/process.c          |   30 ++++++++++++++++++++----------
 5 files changed, 33 insertions(+), 20 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
 }
 
 /**
- *	freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
  */
 int freeze_processes(void)
 {
@@ -143,20 +143,30 @@ int freeze_processes(void)
 
 	printk("Freezing user space processes ... ");
 	error = try_to_freeze_tasks(true);
-	if (error)
-		goto Exit;
-	printk("done.\n");
+	if (!error) {
+		printk("done.");
+		oom_killer_disable();
+	}
+	printk("\n");
+	BUG_ON(in_atomic());
+
+	return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+	int error;
 
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
-	if (error)
-		goto Exit;
-	printk("done.");
+	if (!error)
+		printk("done.");
 
-	oom_killer_disable();
- Exit:
-	BUG_ON(in_atomic());
 	printk("\n");
+	BUG_ON(in_atomic());
 
 	return error;
 }
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
 
 extern void refrigerator(void);
 extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 
 static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
 static inline int thaw_process(struct task_struct *p) { return 1; }
 
 static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
 
 static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
 #ifdef CONFIG_SUSPEND_FREEZER
 static inline int suspend_freeze_processes(void)
 {
-	return freeze_processes();
+	int error = freeze_processes();
+	return error ? : freeze_kernel_threads();
 }
 
 static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		goto Close;
 
-	error = dpm_prepare(PMSG_FREEZE);
-	if (error)
-		goto Complete_devices;
-
 	/* Preallocate image memory before shutting down devices. */
 	error = hibernate_preallocate_memory();
 	if (error)
+		goto Close;
+
+	error = freeze_kernel_threads();
+	if (error)
+		goto Close;
+
+	error = dpm_prepare(PMSG_FREEZE);
+	if (error)
 		goto Complete_devices;
 
 	suspend_console();
Index: linux/Documentation/power/devices.txt
===================================================================
--- linux.orig/Documentation/power/devices.txt
+++ linux/Documentation/power/devices.txt
@@ -279,10 +279,6 @@ When the system goes into the standby or
 	time.)  Unlike the other suspend-related phases, during the prepare
 	phase the device tree is traversed top-down.
 
-	In addition to that, if device drivers need to allocate additional
-	memory to be able to hadle device suspend correctly, that should be
-	done in the prepare phase.
-
 	After the prepare callback method returns, no new children may be
 	registered below the device.  The method may also prepare the device or
 	driver in some way for the upcoming system power transition (for

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory
@ 2011-09-25 13:40                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 13:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

From: Rafael J. Wysocki <rjw@sisk.pl>

There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 Documentation/power/devices.txt |    4 ----
 include/linux/freezer.h         |    4 +++-
 kernel/power/hibernate.c        |   12 ++++++++----
 kernel/power/power.h            |    3 ++-
 kernel/power/process.c          |   30 ++++++++++++++++++++----------
 5 files changed, 33 insertions(+), 20 deletions(-)

Index: linux/kernel/power/process.c
===================================================================
--- linux.orig/kernel/power/process.c
+++ linux/kernel/power/process.c
@@ -135,7 +135,7 @@ static int try_to_freeze_tasks(bool sig_
 }
 
 /**
- *	freeze_processes - tell processes to enter the refrigerator
+ * freeze_processes - Signal user space processes to enter the refrigerator.
  */
 int freeze_processes(void)
 {
@@ -143,20 +143,30 @@ int freeze_processes(void)
 
 	printk("Freezing user space processes ... ");
 	error = try_to_freeze_tasks(true);
-	if (error)
-		goto Exit;
-	printk("done.\n");
+	if (!error) {
+		printk("done.");
+		oom_killer_disable();
+	}
+	printk("\n");
+	BUG_ON(in_atomic());
+
+	return error;
+}
+
+/**
+ * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
+ */
+int freeze_kernel_threads(void)
+{
+	int error;
 
 	printk("Freezing remaining freezable tasks ... ");
 	error = try_to_freeze_tasks(false);
-	if (error)
-		goto Exit;
-	printk("done.");
+	if (!error)
+		printk("done.");
 
-	oom_killer_disable();
- Exit:
-	BUG_ON(in_atomic());
 	printk("\n");
+	BUG_ON(in_atomic());
 
 	return error;
 }
Index: linux/include/linux/freezer.h
===================================================================
--- linux.orig/include/linux/freezer.h
+++ linux/include/linux/freezer.h
@@ -49,6 +49,7 @@ extern int thaw_process(struct task_stru
 
 extern void refrigerator(void);
 extern int freeze_processes(void);
+extern int freeze_kernel_threads(void);
 extern void thaw_processes(void);
 
 static inline int try_to_freeze(void)
@@ -171,7 +172,8 @@ static inline void clear_freeze_flag(str
 static inline int thaw_process(struct task_struct *p) { return 1; }
 
 static inline void refrigerator(void) {}
-static inline int freeze_processes(void) { BUG(); return 0; }
+static inline int freeze_processes(void) { return -ENOSYS; }
+static inline int freeze_kernel_threads(void) { return -ENOSYS; }
 static inline void thaw_processes(void) {}
 
 static inline int try_to_freeze(void) { return 0; }
Index: linux/kernel/power/power.h
===================================================================
--- linux.orig/kernel/power/power.h
+++ linux/kernel/power/power.h
@@ -228,7 +228,8 @@ extern int pm_test_level;
 #ifdef CONFIG_SUSPEND_FREEZER
 static inline int suspend_freeze_processes(void)
 {
-	return freeze_processes();
+	int error = freeze_processes();
+	return error ? : freeze_kernel_threads();
 }
 
 static inline void suspend_thaw_processes(void)
Index: linux/kernel/power/hibernate.c
===================================================================
--- linux.orig/kernel/power/hibernate.c
+++ linux/kernel/power/hibernate.c
@@ -334,13 +334,17 @@ int hibernation_snapshot(int platform_mo
 	if (error)
 		goto Close;
 
-	error = dpm_prepare(PMSG_FREEZE);
-	if (error)
-		goto Complete_devices;
-
 	/* Preallocate image memory before shutting down devices. */
 	error = hibernate_preallocate_memory();
 	if (error)
+		goto Close;
+
+	error = freeze_kernel_threads();
+	if (error)
+		goto Close;
+
+	error = dpm_prepare(PMSG_FREEZE);
+	if (error)
 		goto Complete_devices;
 
 	suspend_console();
Index: linux/Documentation/power/devices.txt
===================================================================
--- linux.orig/Documentation/power/devices.txt
+++ linux/Documentation/power/devices.txt
@@ -279,10 +279,6 @@ When the system goes into the standby or
 	time.)  Unlike the other suspend-related phases, during the prepare
 	phase the device tree is traversed top-down.
 
-	In addition to that, if device drivers need to allocate additional
-	memory to be able to hadle device suspend correctly, that should be
-	done in the prepare phase.
-
 	After the prepare callback method returns, no new children may be
 	registered below the device.  The method may also prepare the device or
 	driver in some way for the upcoming system power transition (for

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-25 13:32                               ` Rafael J. Wysocki
@ 2011-09-25 21:57                                 ` Christoph
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-09-25 21:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dave Chinner, Linux PM mailing list, Pavel Machek,
	Nigel Cunningham, Christoph Hellwig, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On 25.09.2011 15:32, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> test results of the patch below:
>> 
>> 1. real machine
>> 
>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>> gpu => disabled compitz  => wakeup worked two times.
> 
> Hmm, so there's a separate bug related to NVidia I guess.

Maybe.

Just made another test: the machine (macbook) woke up, loaded image, thaw.
It got stuck at vt#1, displaying console with login. Cursor blinking, but
no (usb) keyboard or network enabled. Bricked?!!


On the other hand I've got another box with nvidia gpu:

debian5 32bit
2.6.38.2+ #3 SMP Fri Apr 1
nvidia 260.19.36

It's up since I compiled the kernel: I use it twice a week and I kept it
freezed all the time. It was solid rock until today: hard reset on resume.
WTF? (I remember this version combo was stable on the macbook but the
kernel lacks a solid wireless driver).

There are a stupid situation where you can't debug. What else can go
wrong? This is off topic but it's a cute kernel crash while I gave the
nouveau driver a chance:

http://events.ccc.de/camp/2011/wiki/File:Dome22.jpg

chris

http://events.ccc.de/camp/2011/wiki/DomeTent

> 
>> 2. virtualbox / stress test / xfs and ext4
>> 
>> on 3rd resume, it booted up "normal" like this:
>> 
>> [    3.351813] Freeing unused kernel memory: 568k freed [
>> 3.460973] Freeing unused kernel memory: 284k freed
>> 
>> [   17.328356] PM: Preparing processes for restore.
>> 

...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-25 21:57                                 ` Christoph
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-09-25 21:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On 25.09.2011 15:32, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> test results of the patch below:
>> 
>> 1. real machine
>> 
>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>> gpu => disabled compitz  => wakeup worked two times.
> 
> Hmm, so there's a separate bug related to NVidia I guess.

Maybe.

Just made another test: the machine (macbook) woke up, loaded image, thaw.
It got stuck at vt#1, displaying console with login. Cursor blinking, but
no (usb) keyboard or network enabled. Bricked?!!


On the other hand I've got another box with nvidia gpu:

debian5 32bit
2.6.38.2+ #3 SMP Fri Apr 1
nvidia 260.19.36

It's up since I compiled the kernel: I use it twice a week and I kept it
freezed all the time. It was solid rock until today: hard reset on resume.
WTF? (I remember this version combo was stable on the macbook but the
kernel lacks a solid wireless driver).

There are a stupid situation where you can't debug. What else can go
wrong? This is off topic but it's a cute kernel crash while I gave the
nouveau driver a chance:

http://events.ccc.de/camp/2011/wiki/File:Dome22.jpg

chris

http://events.ccc.de/camp/2011/wiki/DomeTent

> 
>> 2. virtualbox / stress test / xfs and ext4
>> 
>> on 3rd resume, it booted up "normal" like this:
>> 
>> [    3.351813] Freeing unused kernel memory: 568k freed [
>> 3.460973] Freeing unused kernel memory: 284k freed
>> 
>> [   17.328356] PM: Preparing processes for restore.
>> 

...

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-25 21:57                                 ` Christoph
@ 2011-09-25 22:10                                   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 22:10 UTC (permalink / raw)
  To: Christoph
  Cc: Dave Chinner, Linux PM mailing list, Pavel Machek,
	Nigel Cunningham, Christoph Hellwig, xfs, LKML, linux-ext4,
	Theodore Ts'o, linux-fsdevel

On Sunday, September 25, 2011, Christoph wrote:
> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
> > On Sunday, September 25, 2011, Christoph wrote:
> >> test results of the patch below:
> >> 
> >> 1. real machine
> >> 
> >> suspends fine but on wakeup, after loading image: hard reset. nvidia
> >> gpu => disabled compitz  => wakeup worked two times.
> > 
> > Hmm, so there's a separate bug related to NVidia I guess.
> 
> Maybe.
> 
> Just made another test: the machine (macbook) woke up, loaded image, thaw.
> It got stuck at vt#1, displaying console with login. Cursor blinking, but
> no (usb) keyboard or network enabled. Bricked?!!
> 
> 
> On the other hand I've got another box with nvidia gpu:
> 
> debian5 32bit
> 2.6.38.2+ #3 SMP Fri Apr 1
> nvidia 260.19.36
> 
> It's up since I compiled the kernel: I use it twice a week and I kept it
> freezed all the time. It was solid rock until today: hard reset on resume.
> WTF? (I remember this version combo was stable on the macbook but the
> kernel lacks a solid wireless driver).

If that's an x86_64 system, there is a known bug causing problems like
this to happen.  There's a patch fixing it, but not conclusive:
http://marc.info/?l=linux-kernel&m=131653513414314&w=2

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-25 22:10                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-09-25 22:10 UTC (permalink / raw)
  To: Christoph
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

On Sunday, September 25, 2011, Christoph wrote:
> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
> > On Sunday, September 25, 2011, Christoph wrote:
> >> test results of the patch below:
> >> 
> >> 1. real machine
> >> 
> >> suspends fine but on wakeup, after loading image: hard reset. nvidia
> >> gpu => disabled compitz  => wakeup worked two times.
> > 
> > Hmm, so there's a separate bug related to NVidia I guess.
> 
> Maybe.
> 
> Just made another test: the machine (macbook) woke up, loaded image, thaw.
> It got stuck at vt#1, displaying console with login. Cursor blinking, but
> no (usb) keyboard or network enabled. Bricked?!!
> 
> 
> On the other hand I've got another box with nvidia gpu:
> 
> debian5 32bit
> 2.6.38.2+ #3 SMP Fri Apr 1
> nvidia 260.19.36
> 
> It's up since I compiled the kernel: I use it twice a week and I kept it
> freezed all the time. It was solid rock until today: hard reset on resume.
> WTF? (I remember this version combo was stable on the macbook but the
> kernel lacks a solid wireless driver).

If that's an x86_64 system, there is a known bug causing problems like
this to happen.  There's a patch fixing it, but not conclusive:
http://marc.info/?l=linux-kernel&m=131653513414314&w=2

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-25 22:10                                   ` Rafael J. Wysocki
@ 2011-09-26  5:27                                     ` Christoph
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-09-26  5:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4


On 26.09.2011 00:10, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
>>> On Sunday, September 25, 2011, Christoph wrote:
>>>> test results of the patch below:
>>>>
>>>> 1. real machine
>>>>
>>>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>>>> gpu => disabled compitz  => wakeup worked two times.
>>>
>>> Hmm, so there's a separate bug related to NVidia I guess.
>>
>> Maybe.
>>
>> Just made another test: the machine (macbook) woke up, loaded image, thaw.
>> It got stuck at vt#1, displaying console with login. Cursor blinking, but
>> no (usb) keyboard or network enabled. Bricked?!!
>>
>>
>> On the other hand I've got another box with nvidia gpu:
>>
>> debian5 32bit
>> 2.6.38.2+ #3 SMP Fri Apr 1
>> nvidia 260.19.36
>>
>> It's up since I compiled the kernel: I use it twice a week and I kept it
>> freezed all the time. It was solid rock until today: hard reset on resume.
>> WTF? (I remember this version combo was stable on the macbook but the
>> kernel lacks a solid wireless driver).
> 
> If that's an x86_64 system, there is a known bug causing problems like
> this to happen.  There's a patch fixing it, but not conclusive:
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2

very good. seems to fix resume, at least the 2 times I tested so far.

tx! :)

chris


> 
> Thanks,
> Rafael
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-09-26  5:27                                     ` Christoph
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-09-26  5:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4


On 26.09.2011 00:10, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
>>> On Sunday, September 25, 2011, Christoph wrote:
>>>> test results of the patch below:
>>>>
>>>> 1. real machine
>>>>
>>>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>>>> gpu => disabled compitz  => wakeup worked two times.
>>>
>>> Hmm, so there's a separate bug related to NVidia I guess.
>>
>> Maybe.
>>
>> Just made another test: the machine (macbook) woke up, loaded image, thaw.
>> It got stuck at vt#1, displaying console with login. Cursor blinking, but
>> no (usb) keyboard or network enabled. Bricked?!!
>>
>>
>> On the other hand I've got another box with nvidia gpu:
>>
>> debian5 32bit
>> 2.6.38.2+ #3 SMP Fri Apr 1
>> nvidia 260.19.36
>>
>> It's up since I compiled the kernel: I use it twice a week and I kept it
>> freezed all the time. It was solid rock until today: hard reset on resume.
>> WTF? (I remember this version combo was stable on the macbook but the
>> kernel lacks a solid wireless driver).
> 
> If that's an x86_64 system, there is a known bug causing problems like
> this to happen.  There's a patch fixing it, but not conclusive:
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2

very good. seems to fix resume, at least the 2 times I tested so far.

tx! :)

chris


> 
> Thanks,
> Rafael
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-09-25 22:10                                   ` Rafael J. Wysocki
@ 2011-10-22 15:14                                     ` Christoph
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-10-22 15:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

> PM / Freezer: Freeze filesystems while freezing processes (v2)
> 
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>> 
>>> Freeze all filesystems during the freezing of tasks by calling 
>>> freeze_bdev() for each of them and thaw them during the thawing of
>>> tasks with the help of thaw_bdev().
>>> 
>>> This is needed by hibernation, because some filesystems (e.g. XFS) 
>>> deadlock with the preallocation of memory used by it if the memory 
>>> pressure caused by it is too heavy.
>>> 
...
> 
> Below is an alternative fix, the changelog pretty much explains the
> idea.
> 
> I've tested it on Toshiba Portege R500, but I don't have an XFS
> partition to verify that it really helps, so I'd appreciate it if
> someone able to reproduce the original issue could test it and report
> back.

Hi Rafael!

Well, the kernel bugtracker is still down and I just like to post my
experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
MacBookPro, doomed with 4GB RAM running debian.

Bug #1

on the way to hibernate, machine hangs on

"PM: Preallocating image memory..."

this patch worked for me now for weeks:
"[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
https://lkml.org/lkml/2011/9/24/77

I was able to reproduce this bug with virtualbox and tested the patch ~40
cycles.

Bug#2

on resume from hibernate, hard reset (x64 only):
http://marc.info/?l=linux-kernel&m=131653513414314&w=2

With this patch I haven't got this issue again the last weeks.

I wasn't able to reproduce this bug with virtualbox.





I only got one pm-hibernate issue. Last line:

Disabling non-boot CPUs ...

This time I've enabled debug hung task :)

schedule_timeout
...
workqueue_cpu_callback
notifier_call_chain
...
__cpu_notify
_cpu_down
printk
disable_nonboot_cpus
hibernation_snapshot
hibernate
...

Any other idea besides the possibility it's caused by evil earth
radiation, isn't it?


Gruss,
chris




On 26.09.2011 00:10, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
>>> On Sunday, September 25, 2011, Christoph wrote:
>>>> test results of the patch below:
>>>>
>>>> 1. real machine
>>>>
>>>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>>>> gpu => disabled compitz  => wakeup worked two times.
>>>
>>> Hmm, so there's a separate bug related to NVidia I guess.
>>
>> Maybe.
>>
>> Just made another test: the machine (macbook) woke up, loaded image, thaw.
>> It got stuck at vt#1, displaying console with login. Cursor blinking, but
>> no (usb) keyboard or network enabled. Bricked?!!
>>
>>
>> On the other hand I've got another box with nvidia gpu:
>>
>> debian5 32bit
>> 2.6.38.2+ #3 SMP Fri Apr 1
>> nvidia 260.19.36
>>
>> It's up since I compiled the kernel: I use it twice a week and I kept it
>> freezed all the time. It was solid rock until today: hard reset on resume.
>> WTF? (I remember this version combo was stable on the macbook but the
>> kernel lacks a solid wireless driver).
> 
> If that's an x86_64 system, there is a known bug causing problems like
> this to happen.  There's a patch fixing it, but not conclusive:
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2
> 
> Thanks,
> Rafael
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-10-22 15:14                                     ` Christoph
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph @ 2011-10-22 15:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel,
	Linux PM mailing list, linux-ext4

> PM / Freezer: Freeze filesystems while freezing processes (v2)
> 
> On Sunday, August 07, 2011, Dave Chinner wrote:
>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>> 
>>> Freeze all filesystems during the freezing of tasks by calling 
>>> freeze_bdev() for each of them and thaw them during the thawing of
>>> tasks with the help of thaw_bdev().
>>> 
>>> This is needed by hibernation, because some filesystems (e.g. XFS) 
>>> deadlock with the preallocation of memory used by it if the memory 
>>> pressure caused by it is too heavy.
>>> 
...
> 
> Below is an alternative fix, the changelog pretty much explains the
> idea.
> 
> I've tested it on Toshiba Portege R500, but I don't have an XFS
> partition to verify that it really helps, so I'd appreciate it if
> someone able to reproduce the original issue could test it and report
> back.

Hi Rafael!

Well, the kernel bugtracker is still down and I just like to post my
experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
MacBookPro, doomed with 4GB RAM running debian.

Bug #1

on the way to hibernate, machine hangs on

"PM: Preallocating image memory..."

this patch worked for me now for weeks:
"[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
https://lkml.org/lkml/2011/9/24/77

I was able to reproduce this bug with virtualbox and tested the patch ~40
cycles.

Bug#2

on resume from hibernate, hard reset (x64 only):
http://marc.info/?l=linux-kernel&m=131653513414314&w=2

With this patch I haven't got this issue again the last weeks.

I wasn't able to reproduce this bug with virtualbox.





I only got one pm-hibernate issue. Last line:

Disabling non-boot CPUs ...

This time I've enabled debug hung task :)

schedule_timeout
...
workqueue_cpu_callback
notifier_call_chain
...
__cpu_notify
_cpu_down
printk
disable_nonboot_cpus
hibernation_snapshot
hibernate
...

Any other idea besides the possibility it's caused by evil earth
radiation, isn't it?


Gruss,
chris




On 26.09.2011 00:10, Rafael J. Wysocki wrote:
> On Sunday, September 25, 2011, Christoph wrote:
>> On 25.09.2011 15:32, Rafael J. Wysocki wrote:
>>> On Sunday, September 25, 2011, Christoph wrote:
>>>> test results of the patch below:
>>>>
>>>> 1. real machine
>>>>
>>>> suspends fine but on wakeup, after loading image: hard reset. nvidia
>>>> gpu => disabled compitz  => wakeup worked two times.
>>>
>>> Hmm, so there's a separate bug related to NVidia I guess.
>>
>> Maybe.
>>
>> Just made another test: the machine (macbook) woke up, loaded image, thaw.
>> It got stuck at vt#1, displaying console with login. Cursor blinking, but
>> no (usb) keyboard or network enabled. Bricked?!!
>>
>>
>> On the other hand I've got another box with nvidia gpu:
>>
>> debian5 32bit
>> 2.6.38.2+ #3 SMP Fri Apr 1
>> nvidia 260.19.36
>>
>> It's up since I compiled the kernel: I use it twice a week and I kept it
>> freezed all the time. It was solid rock until today: hard reset on resume.
>> WTF? (I remember this version combo was stable on the macbook but the
>> kernel lacks a solid wireless driver).
> 
> If that's an x86_64 system, there is a known bug causing problems like
> this to happen.  There's a patch fixing it, but not conclusive:
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2
> 
> Thanks,
> Rafael
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-10-22 15:14                                     ` Christoph
@ 2011-10-22 21:35                                       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-10-22 21:35 UTC (permalink / raw)
  To: Christoph, Takashi Iwai
  Cc: Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel, linux-ext4,
	Linux PM list

On Saturday, October 22, 2011, Christoph wrote:
> > PM / Freezer: Freeze filesystems while freezing processes (v2)
> > 
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>> 
> >>> Freeze all filesystems during the freezing of tasks by calling 
> >>> freeze_bdev() for each of them and thaw them during the thawing of
> >>> tasks with the help of thaw_bdev().
> >>> 
> >>> This is needed by hibernation, because some filesystems (e.g. XFS) 
> >>> deadlock with the preallocation of memory used by it if the memory 
> >>> pressure caused by it is too heavy.
> >>> 
> ...
> > 
> > Below is an alternative fix, the changelog pretty much explains the
> > idea.
> > 
> > I've tested it on Toshiba Portege R500, but I don't have an XFS
> > partition to verify that it really helps, so I'd appreciate it if
> > someone able to reproduce the original issue could test it and report
> > back.
> 
> Hi Rafael!
> 
> Well, the kernel bugtracker is still down and I just like to post my
> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
> MacBookPro, doomed with 4GB RAM running debian.
> 
> Bug #1
> 
> on the way to hibernate, machine hangs on
> 
> "PM: Preallocating image memory..."
> 
> this patch worked for me now for weeks:
> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
> https://lkml.org/lkml/2011/9/24/77

This patch is going to be merged into 3.2.

> I was able to reproduce this bug with virtualbox and tested the patch ~40
> cycles.
> 
> Bug#2
> 
> on resume from hibernate, hard reset (x64 only):
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2
> 
> With this patch I haven't got this issue again the last weeks.

Hmm.  This issue appears to be still under investigation to me, but perhaps
that's taken too much already.

Takashi, perhaps you can repost the patch as a proper submission?  It would
be good to have this regression fixed even if we don't know the real source of
it.

> I wasn't able to reproduce this bug with virtualbox.
> 
> 
> 
> 
> 
> I only got one pm-hibernate issue. Last line:
> 
> Disabling non-boot CPUs ...
> 
> This time I've enabled debug hung task :)
> 
> schedule_timeout
> ...
> workqueue_cpu_callback
> notifier_call_chain
> ...
> __cpu_notify
> _cpu_down
> printk
> disable_nonboot_cpus
> hibernation_snapshot
> hibernate
> ...
> 
> Any other idea besides the possibility it's caused by evil earth
> radiation, isn't it?

I'm not exactly sure what happened from your description, care to explain?

Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-10-22 21:35                                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-10-22 21:35 UTC (permalink / raw)
  To: Christoph, Takashi Iwai
  Cc: Theodore Ts'o, Linux PM list, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel, linux-ext4

On Saturday, October 22, 2011, Christoph wrote:
> > PM / Freezer: Freeze filesystems while freezing processes (v2)
> > 
> > On Sunday, August 07, 2011, Dave Chinner wrote:
> >> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>> 
> >>> Freeze all filesystems during the freezing of tasks by calling 
> >>> freeze_bdev() for each of them and thaw them during the thawing of
> >>> tasks with the help of thaw_bdev().
> >>> 
> >>> This is needed by hibernation, because some filesystems (e.g. XFS) 
> >>> deadlock with the preallocation of memory used by it if the memory 
> >>> pressure caused by it is too heavy.
> >>> 
> ...
> > 
> > Below is an alternative fix, the changelog pretty much explains the
> > idea.
> > 
> > I've tested it on Toshiba Portege R500, but I don't have an XFS
> > partition to verify that it really helps, so I'd appreciate it if
> > someone able to reproduce the original issue could test it and report
> > back.
> 
> Hi Rafael!
> 
> Well, the kernel bugtracker is still down and I just like to post my
> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
> MacBookPro, doomed with 4GB RAM running debian.
> 
> Bug #1
> 
> on the way to hibernate, machine hangs on
> 
> "PM: Preallocating image memory..."
> 
> this patch worked for me now for weeks:
> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
> https://lkml.org/lkml/2011/9/24/77

This patch is going to be merged into 3.2.

> I was able to reproduce this bug with virtualbox and tested the patch ~40
> cycles.
> 
> Bug#2
> 
> on resume from hibernate, hard reset (x64 only):
> http://marc.info/?l=linux-kernel&m=131653513414314&w=2
> 
> With this patch I haven't got this issue again the last weeks.

Hmm.  This issue appears to be still under investigation to me, but perhaps
that's taken too much already.

Takashi, perhaps you can repost the patch as a proper submission?  It would
be good to have this regression fixed even if we don't know the real source of
it.

> I wasn't able to reproduce this bug with virtualbox.
> 
> 
> 
> 
> 
> I only got one pm-hibernate issue. Last line:
> 
> Disabling non-boot CPUs ...
> 
> This time I've enabled debug hung task :)
> 
> schedule_timeout
> ...
> workqueue_cpu_callback
> notifier_call_chain
> ...
> __cpu_notify
> _cpu_down
> printk
> disable_nonboot_cpus
> hibernation_snapshot
> hibernate
> ...
> 
> Any other idea besides the possibility it's caused by evil earth
> radiation, isn't it?

I'm not exactly sure what happened from your description, care to explain?

Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-10-22 21:35                                       ` Rafael J. Wysocki
@ 2011-11-16 13:49                                         ` Ferenc Wagner
  -1 siblings, 0 replies; 68+ messages in thread
From: Ferenc Wagner @ 2011-11-16 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel, linux-ext4,
	Linux PM list

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Saturday, October 22, 2011, Christoph wrote:
>
>>> PM / Freezer: Freeze filesystems while freezing processes (v2)
>>> 
>>> On Sunday, August 07, 2011, Dave Chinner wrote:
>>>
>>>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>>>
>>>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>>>> 
>>>>> Freeze all filesystems during the freezing of tasks by calling 
>>>>> freeze_bdev() for each of them and thaw them during the thawing of
>>>>> tasks with the help of thaw_bdev().
>>>>> 
>>>>> This is needed by hibernation, because some filesystems (e.g. XFS) 
>>>>> deadlock with the preallocation of memory used by it if the memory 
>>>>> pressure caused by it is too heavy.
>>>
>>> Below is an alternative fix, the changelog pretty much explains the
>>> idea.
>>> 
>>> I've tested it on Toshiba Portege R500, but I don't have an XFS
>>> partition to verify that it really helps, so I'd appreciate it if
>>> someone able to reproduce the original issue could test it and report
>>> back.
>> 
>> Well, the kernel bugtracker is still down and I just like to post my
>> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
>> MacBookPro, doomed with 4GB RAM running debian.
>> 
>> Bug #1
>> 
>> on the way to hibernate, machine hangs on
>> 
>> "PM: Preallocating image memory..."
>> 
>> this patch worked for me now for weeks:
>> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
>> https://lkml.org/lkml/2011/9/24/77
>
> This patch is going to be merged into 3.2.

Hi,

I was the original reporter of the Bugzilla issue, just didn't know
about this thread until recently.  Anyway, I'm running 3.2-rc1 now,
which contains the alternative fix, and I can confirm that it indeed
works: hibernation does not deadlock on my XFS rooted system anymore
during memory preallocation.  Thanks everybody for his or her work on
the issue!

To add something still, preallocation now ends with a couple of seconds
of heavy disk activity, but with several seconds of total inactivity
beforehand.  Is this warranted by some CPU intensive task at that stage?
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-11-16 13:49                                         ` Ferenc Wagner
  0 siblings, 0 replies; 68+ messages in thread
From: Ferenc Wagner @ 2011-11-16 13:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Christoph, Theodore Ts'o, Linux PM list, LKML, xfs,
	Christoph Hellwig, Nigel Cunningham, Pavel Machek, linux-fsdevel,
	linux-ext4

"Rafael J. Wysocki" <rjw@sisk.pl> writes:

> On Saturday, October 22, 2011, Christoph wrote:
>
>>> PM / Freezer: Freeze filesystems while freezing processes (v2)
>>> 
>>> On Sunday, August 07, 2011, Dave Chinner wrote:
>>>
>>>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
>>>>
>>>>> From: Rafael J. Wysocki <rjw@sisk.pl>
>>>>> 
>>>>> Freeze all filesystems during the freezing of tasks by calling 
>>>>> freeze_bdev() for each of them and thaw them during the thawing of
>>>>> tasks with the help of thaw_bdev().
>>>>> 
>>>>> This is needed by hibernation, because some filesystems (e.g. XFS) 
>>>>> deadlock with the preallocation of memory used by it if the memory 
>>>>> pressure caused by it is too heavy.
>>>
>>> Below is an alternative fix, the changelog pretty much explains the
>>> idea.
>>> 
>>> I've tested it on Toshiba Portege R500, but I don't have an XFS
>>> partition to verify that it really helps, so I'd appreciate it if
>>> someone able to reproduce the original issue could test it and report
>>> back.
>> 
>> Well, the kernel bugtracker is still down and I just like to post my
>> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
>> MacBookPro, doomed with 4GB RAM running debian.
>> 
>> Bug #1
>> 
>> on the way to hibernate, machine hangs on
>> 
>> "PM: Preallocating image memory..."
>> 
>> this patch worked for me now for weeks:
>> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
>> https://lkml.org/lkml/2011/9/24/77
>
> This patch is going to be merged into 3.2.

Hi,

I was the original reporter of the Bugzilla issue, just didn't know
about this thread until recently.  Anyway, I'm running 3.2-rc1 now,
which contains the alternative fix, and I can confirm that it indeed
works: hibernation does not deadlock on my XFS rooted system anymore
during memory preallocation.  Thanks everybody for his or her work on
the issue!

To add something still, preallocation now ends with a couple of seconds
of heavy disk activity, but with several seconds of total inactivity
beforehand.  Is this warranted by some CPU intensive task at that stage?
-- 
Thanks,
Feri.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
  2011-11-16 13:49                                         ` Ferenc Wagner
@ 2011-11-16 21:50                                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-11-16 21:50 UTC (permalink / raw)
  To: Ferenc Wagner
  Cc: Christoph, Theodore Ts'o, LKML, xfs, Christoph Hellwig,
	Nigel Cunningham, Pavel Machek, linux-fsdevel, linux-ext4,
	Linux PM list

On Wednesday, November 16, 2011, Ferenc Wagner wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Saturday, October 22, 2011, Christoph wrote:
> >
> >>> PM / Freezer: Freeze filesystems while freezing processes (v2)
> >>> 
> >>> On Sunday, August 07, 2011, Dave Chinner wrote:
> >>>
> >>>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>>>
> >>>>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>>>> 
> >>>>> Freeze all filesystems during the freezing of tasks by calling 
> >>>>> freeze_bdev() for each of them and thaw them during the thawing of
> >>>>> tasks with the help of thaw_bdev().
> >>>>> 
> >>>>> This is needed by hibernation, because some filesystems (e.g. XFS) 
> >>>>> deadlock with the preallocation of memory used by it if the memory 
> >>>>> pressure caused by it is too heavy.
> >>>
> >>> Below is an alternative fix, the changelog pretty much explains the
> >>> idea.
> >>> 
> >>> I've tested it on Toshiba Portege R500, but I don't have an XFS
> >>> partition to verify that it really helps, so I'd appreciate it if
> >>> someone able to reproduce the original issue could test it and report
> >>> back.
> >> 
> >> Well, the kernel bugtracker is still down and I just like to post my
> >> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
> >> MacBookPro, doomed with 4GB RAM running debian.
> >> 
> >> Bug #1
> >> 
> >> on the way to hibernate, machine hangs on
> >> 
> >> "PM: Preallocating image memory..."
> >> 
> >> this patch worked for me now for weeks:
> >> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
> >> https://lkml.org/lkml/2011/9/24/77
> >
> > This patch is going to be merged into 3.2.
> 
> Hi,
> 
> I was the original reporter of the Bugzilla issue, just didn't know
> about this thread until recently.  Anyway, I'm running 3.2-rc1 now,
> which contains the alternative fix, and I can confirm that it indeed
> works: hibernation does not deadlock on my XFS rooted system anymore
> during memory preallocation.  Thanks everybody for his or her work on
> the issue!
> 
> To add something still, preallocation now ends with a couple of seconds
> of heavy disk activity, but with several seconds of total inactivity
> beforehand.  Is this warranted by some CPU intensive task at that stage?

Quilte frankly, I have no idea.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)
@ 2011-11-16 21:50                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 68+ messages in thread
From: Rafael J. Wysocki @ 2011-11-16 21:50 UTC (permalink / raw)
  To: Ferenc Wagner
  Cc: Christoph, Theodore Ts'o, Linux PM list, LKML, xfs,
	Christoph Hellwig, Nigel Cunningham, Pavel Machek, linux-fsdevel,
	linux-ext4

On Wednesday, November 16, 2011, Ferenc Wagner wrote:
> "Rafael J. Wysocki" <rjw@sisk.pl> writes:
> 
> > On Saturday, October 22, 2011, Christoph wrote:
> >
> >>> PM / Freezer: Freeze filesystems while freezing processes (v2)
> >>> 
> >>> On Sunday, August 07, 2011, Dave Chinner wrote:
> >>>
> >>>> On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> >>>>
> >>>>> From: Rafael J. Wysocki <rjw@sisk.pl>
> >>>>> 
> >>>>> Freeze all filesystems during the freezing of tasks by calling 
> >>>>> freeze_bdev() for each of them and thaw them during the thawing of
> >>>>> tasks with the help of thaw_bdev().
> >>>>> 
> >>>>> This is needed by hibernation, because some filesystems (e.g. XFS) 
> >>>>> deadlock with the preallocation of memory used by it if the memory 
> >>>>> pressure caused by it is too heavy.
> >>>
> >>> Below is an alternative fix, the changelog pretty much explains the
> >>> idea.
> >>> 
> >>> I've tested it on Toshiba Portege R500, but I don't have an XFS
> >>> partition to verify that it really helps, so I'd appreciate it if
> >>> someone able to reproduce the original issue could test it and report
> >>> back.
> >> 
> >> Well, the kernel bugtracker is still down and I just like to post my
> >> experience with kernel (x64) v3.1-rc8/9 + patches. My machine is a
> >> MacBookPro, doomed with 4GB RAM running debian.
> >> 
> >> Bug #1
> >> 
> >> on the way to hibernate, machine hangs on
> >> 
> >> "PM: Preallocating image memory..."
> >> 
> >> this patch worked for me now for weeks:
> >> "[PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)"
> >> https://lkml.org/lkml/2011/9/24/77
> >
> > This patch is going to be merged into 3.2.
> 
> Hi,
> 
> I was the original reporter of the Bugzilla issue, just didn't know
> about this thread until recently.  Anyway, I'm running 3.2-rc1 now,
> which contains the alternative fix, and I can confirm that it indeed
> works: hibernation does not deadlock on my XFS rooted system anymore
> during memory preallocation.  Thanks everybody for his or her work on
> the issue!
> 
> To add something still, preallocation now ends with a couple of seconds
> of heavy disk activity, but with several seconds of total inactivity
> beforehand.  Is this warranted by some CPU intensive task at that stage?

Quilte frankly, I have no idea.

Thanks,
Rafael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2011-11-16 22:15 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-12 16:05 PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Christoph
2011-07-13  0:03 ` Dave Chinner
2011-07-26 20:28   ` Rafael J. Wysocki
2011-07-26 20:28     ` Rafael J. Wysocki
2011-07-27  0:45     ` Dave Chinner
2011-07-27  0:45       ` Dave Chinner
2011-07-27  9:35       ` Rafael J. Wysocki
2011-07-27  9:35         ` Rafael J. Wysocki
2011-07-27 10:33         ` Christoph Hellwig
2011-07-27 10:33           ` Christoph Hellwig
2011-07-27 12:22           ` Nigel Cunningham
2011-07-27 12:22             ` Nigel Cunningham
2011-08-03 21:15             ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-03 21:15               ` Rafael J. Wysocki
2011-08-03 17:29               ` Pavel Machek
2011-08-03 17:29               ` Pavel Machek
2011-08-04  9:27                 ` Rafael J. Wysocki
2011-08-04  9:27                   ` Rafael J. Wysocki
2011-08-04 22:25                   ` Rafael J. Wysocki
2011-08-04 22:25                   ` Rafael J. Wysocki
2011-08-04 22:25                     ` Rafael J. Wysocki
2011-08-06 21:17                     ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
2011-08-06 21:17                       ` Rafael J. Wysocki
2011-08-07  0:14                       ` Dave Chinner
2011-08-07  0:14                       ` Dave Chinner
2011-08-07  0:14                         ` Dave Chinner
2011-08-08 21:11                         ` Rafael J. Wysocki
2011-08-08 21:11                           ` Rafael J. Wysocki
2011-08-08 21:11                         ` Rafael J. Wysocki
2011-08-14  0:16                         ` Rafael J. Wysocki
2011-08-14  0:16                           ` Rafael J. Wysocki
2011-08-14  0:16                         ` Rafael J. Wysocki
2011-09-24 22:56                         ` Rafael J. Wysocki
2011-09-24 22:56                           ` Rafael J. Wysocki
2011-09-24 22:56                           ` Rafael J. Wysocki
2011-09-25  5:32                           ` Nigel Cunningham
2011-09-25  5:32                             ` Nigel Cunningham
2011-09-25 13:37                             ` Rafael J. Wysocki
2011-09-25 13:37                               ` Rafael J. Wysocki
2011-09-25 10:38                           ` Christoph
2011-09-25 10:38                             ` Christoph
2011-09-25 13:32                             ` Rafael J. Wysocki
2011-09-25 13:32                               ` Rafael J. Wysocki
2011-09-25 21:57                               ` Christoph
2011-09-25 21:57                                 ` Christoph
2011-09-25 22:10                                 ` Rafael J. Wysocki
2011-09-25 22:10                                   ` Rafael J. Wysocki
2011-09-26  5:27                                   ` Christoph
2011-09-26  5:27                                     ` Christoph
2011-10-22 15:14                                   ` Christoph
2011-10-22 15:14                                     ` Christoph
2011-10-22 21:35                                     ` Rafael J. Wysocki
2011-10-22 21:35                                       ` Rafael J. Wysocki
2011-11-16 13:49                                       ` Ferenc Wagner
2011-11-16 13:49                                         ` Ferenc Wagner
2011-11-16 21:50                                         ` Rafael J. Wysocki
2011-11-16 21:50                                           ` Rafael J. Wysocki
2011-09-25 13:40                           ` [Update][PATCH] PM / Hibernate: Freeze kernel threads after preallocating memory Rafael J. Wysocki
2011-09-25 13:40                             ` Rafael J. Wysocki
2011-08-06 21:17                     ` [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2) Rafael J. Wysocki
2011-08-04  9:27                 ` [RFC][PATCH] PM / Freezer: Freeze filesystems along with freezing processes (was: Re: PM / hibernate xfs lock up / xfs_reclaim_inodes_ag) Rafael J. Wysocki
2011-08-03 21:15             ` Rafael J. Wysocki
2011-08-10 21:43         ` PM / hibernate xfs lock up / xfs_reclaim_inodes_ag Pavel Machek
2011-08-10 21:43           ` Pavel Machek
2011-08-16 12:38           ` Christoph
2011-08-16 18:05             ` Rafael J. Wysocki
2011-08-16 18:05               ` Rafael J. Wysocki
2011-08-16 12:38           ` Christoph

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.