All of lore.kernel.org
 help / color / mirror / Atom feed
* mem-hotplug + ksm make lockdep warning
@ 2010-10-25 10:49 ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-10-25 10:49 UTC (permalink / raw)
  To: LKML, linux-mm, Hugh Dickins; +Cc: kosaki.motohiro

Hi Hugh,

commit 62b61f611e(ksm: memory hotremove migration only) makes following
lockdep warnings. Is this intentional?

More detail: current lockdep hieralcy is here.

memory_notify
	offline_pages
		lock_system_sleep();
			mutex_lock(&pm_mutex);
		memory_notify(MEM_GOING_OFFLINE)
			__blocking_notifier_call_chain
				down_read(memory_chain.rwsem)
				ksm_memory_callback()
					mutex_lock(&ksm_thread_mutex);  // memory_chain.rmsem -> ksm_thread_mutex order
				up_read(memory_chain.rwsem)
		memory_notify(MEM_OFFLINE)
			__blocking_notifier_call_chain
				down_read(memory_chain.rwsem)		// ksm_thread_mutex -> memory_chain.rmsem order
				ksm_memory_callback()
					mutex_unlock(&ksm_thread_mutex);
				up_read(memory_chain.rwsem)
		unlock_system_sleep();
			mutex_unlock(&pm_mutex);

So, I think pm_mutex protect ABBA deadlock. but it exist only when
CONFIG_HIBERNATION=y. IOW, this code is not correct generically. Am I
missing something?

Thanks.



=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.36-rc7-mm1+ #148
-------------------------------------------------------
bash/1621 is trying to acquire lock:
 ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0

but task is already holding lock:
 (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (ksm_thread_mutex){+.+.+.}:
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
       [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
       [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0		
       [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
       [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0	
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

-> #0 ((memory_chain).rwsem){.+.+.+}:
       [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81506601>] down_read+0x51/0xa0
       [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0	
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

5 locks held by bash/1621:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811bc074>] sysfs_write_file+0x44/0x170
 #1:  (s_active#110){.+.+.+}, at: [<ffffffff811bc0fd>] sysfs_write_file+0xcd/0x170
 #2:  (&mem->state_mutex){+.+.+.}, at: [<ffffffff813af478>] memory_block_change_state+0x38/0x1a0
 #3:  (pm_mutex){+.+.+.}, at: [<ffffffff81141ad9>] remove_memory+0x129/0x5f0
 #4:  (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0

stack backtrace:
Pid: 1621, comm: bash Not tainted 2.6.36-rc7-mm1+ #148
Call Trace:
 [<ffffffff81088b5b>] print_circular_bug+0xeb/0xf0
 [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
 [<ffffffff8103a1f9>] ? finish_task_switch+0x79/0xe0
 [<ffffffff815049a9>] ? schedule+0x419/0xa80
 [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
 [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0	
 [<ffffffff81506601>] down_read+0x51/0xa0
 [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0
 [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
 [<ffffffff81110f06>] ? next_online_pgdat+0x26/0x50
 [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff813afbfb>] memory_notify+0x1b/0x20			
 [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
 [<ffffffff8108ba98>] ? lock_release_non_nested+0x2f8/0x3a0
 [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
 [<ffffffff8111705c>] ? might_fault+0x5c/0xb0
 [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
 [<ffffffff811bc0fd>] ? sysfs_write_file+0xcd/0x170
 [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
 [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
 [<ffffffff8114f398>] vfs_write+0xc8/0x190
 [<ffffffff8114fc14>] sys_write+0x54/0x90
 [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b


^ permalink raw reply	[flat|nested] 14+ messages in thread

* mem-hotplug + ksm make lockdep warning
@ 2010-10-25 10:49 ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-10-25 10:49 UTC (permalink / raw)
  To: LKML, linux-mm, Hugh Dickins; +Cc: kosaki.motohiro

Hi Hugh,

commit 62b61f611e(ksm: memory hotremove migration only) makes following
lockdep warnings. Is this intentional?

More detail: current lockdep hieralcy is here.

memory_notify
	offline_pages
		lock_system_sleep();
			mutex_lock(&pm_mutex);
		memory_notify(MEM_GOING_OFFLINE)
			__blocking_notifier_call_chain
				down_read(memory_chain.rwsem)
				ksm_memory_callback()
					mutex_lock(&ksm_thread_mutex);  // memory_chain.rmsem -> ksm_thread_mutex order
				up_read(memory_chain.rwsem)
		memory_notify(MEM_OFFLINE)
			__blocking_notifier_call_chain
				down_read(memory_chain.rwsem)		// ksm_thread_mutex -> memory_chain.rmsem order
				ksm_memory_callback()
					mutex_unlock(&ksm_thread_mutex);
				up_read(memory_chain.rwsem)
		unlock_system_sleep();
			mutex_unlock(&pm_mutex);

So, I think pm_mutex protect ABBA deadlock. but it exist only when
CONFIG_HIBERNATION=y. IOW, this code is not correct generically. Am I
missing something?

Thanks.



=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.36-rc7-mm1+ #148
-------------------------------------------------------
bash/1621 is trying to acquire lock:
 ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0

but task is already holding lock:
 (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (ksm_thread_mutex){+.+.+.}:
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
       [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
       [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0		
       [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
       [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0	
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

-> #0 ((memory_chain).rwsem){.+.+.+}:
       [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81506601>] down_read+0x51/0xa0
       [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0	
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

5 locks held by bash/1621:
 #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811bc074>] sysfs_write_file+0x44/0x170
 #1:  (s_active#110){.+.+.+}, at: [<ffffffff811bc0fd>] sysfs_write_file+0xcd/0x170
 #2:  (&mem->state_mutex){+.+.+.}, at: [<ffffffff813af478>] memory_block_change_state+0x38/0x1a0
 #3:  (pm_mutex){+.+.+.}, at: [<ffffffff81141ad9>] remove_memory+0x129/0x5f0
 #4:  (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0

stack backtrace:
Pid: 1621, comm: bash Not tainted 2.6.36-rc7-mm1+ #148
Call Trace:
 [<ffffffff81088b5b>] print_circular_bug+0xeb/0xf0
 [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
 [<ffffffff8103a1f9>] ? finish_task_switch+0x79/0xe0
 [<ffffffff815049a9>] ? schedule+0x419/0xa80
 [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
 [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0	
 [<ffffffff81506601>] down_read+0x51/0xa0
 [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0
 [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
 [<ffffffff81110f06>] ? next_online_pgdat+0x26/0x50
 [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff813afbfb>] memory_notify+0x1b/0x20			
 [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
 [<ffffffff8108ba98>] ? lock_release_non_nested+0x2f8/0x3a0
 [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
 [<ffffffff8111705c>] ? might_fault+0x5c/0xb0
 [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
 [<ffffffff811bc0fd>] ? sysfs_write_file+0xcd/0x170
 [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
 [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
 [<ffffffff8114f398>] vfs_write+0xc8/0x190
 [<ffffffff8114fc14>] sys_write+0x54/0x90
 [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mem-hotplug + ksm make lockdep warning
  2010-10-25 10:49 ` KOSAKI Motohiro
@ 2010-10-26  7:10   ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2010-10-26  7:10 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML, linux-mm

On Mon, 25 Oct 2010, KOSAKI Motohiro wrote:
> Hi Hugh,
> 
> commit 62b61f611e(ksm: memory hotremove migration only) makes following
> lockdep warnings. Is this intentional?

No, certainly not intentional: thanks for finding this.  Looking back,
I think the machine I tested memory hotplug versus KSM upon was not
the machine I habitually ran lockdep on, I bet I forgot to try it.

> 
> More detail: current lockdep hieralcy is here.

And especial thanks for taking the trouble to present it in a way
that I find much easier to understand than lockdep's pronouncements.

> 
> memory_notify
> 	offline_pages
> 		lock_system_sleep();
> 			mutex_lock(&pm_mutex);
> 		memory_notify(MEM_GOING_OFFLINE)
> 			__blocking_notifier_call_chain
> 				down_read(memory_chain.rwsem)
> 				ksm_memory_callback()
> 					mutex_lock(&ksm_thread_mutex);  // memory_chain.rmsem -> ksm_thread_mutex order
> 				up_read(memory_chain.rwsem)
> 		memory_notify(MEM_OFFLINE)
> 			__blocking_notifier_call_chain
> 				down_read(memory_chain.rwsem)		// ksm_thread_mutex -> memory_chain.rmsem order
> 				ksm_memory_callback()
> 					mutex_unlock(&ksm_thread_mutex);
> 				up_read(memory_chain.rwsem)
> 		unlock_system_sleep();
> 			mutex_unlock(&pm_mutex);
> 
> So, I think pm_mutex protect ABBA deadlock. but it exist only when
> CONFIG_HIBERNATION=y. IOW, this code is not correct generically. Am I
> missing something?

I do remember taking great comfort from lock_system_sleep() i.e. pm_mutex
when I did the ksm_memory_callback(); but I think that comfort was more
along the lines of it making obvious that taking a mutex was okay there,
than it providing any safety.  I think I was unconscious of the issue you
raise, perhaps didn't even notice rwsem in __blocking_notifier_call_chain.

But is it really a problem, given that it's down_read(rwsem) in each case?
Yes, but I had to look up akpm's comment on msync in ChangeLog-2.6.11 to
remember why:

	And yes, the ranking of down_read() versus down() does matter:
	
		Task A			Task B		Task C
	
		down_read(rwsem)
					down(sem)
							down_write(rwsem)
		down(sem)
					down_read(rwsem)
	
	C's down_write() will cause B's down_read to block.
	B holds `sem', so A will never release `rwsem'.

Am I mistaken, or is get_any_page() in mm/memory-failure.c also relying
on lock_system_sleep() to do real locking, even without CONFIG_HIBERNATION?

If it is, then I think we should solve both problems by making it lock
unconditionally: though neither "lock_system_sleep" nor "pm_mutex" is an
appropriate name then... maybe "lock_memory_hotplug", but still using a
pm_mutex declared outside of CONFIG_PM?  Seems a bit weird.

And some kind of lockdep annotation needed for ksm_memory_callback(),
to help it understand how the outer mutex makes the inner inversion safe?
Or does lockdep manage that without help?

I think I'm not going to find time to do the patch for a while,
so please go ahead if you can.

Thanks,
Hugh

> 
> Thanks.
> 
> 
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.36-rc7-mm1+ #148
> -------------------------------------------------------
> bash/1621 is trying to acquire lock:
>  ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
> 
> but task is already holding lock:
>  (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (ksm_thread_mutex){+.+.+.}:
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
>        [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
>        [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0		
>        [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
>        [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0	
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
> -> #0 ((memory_chain).rwsem){.+.+.+}:
>        [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81506601>] down_read+0x51/0xa0
>        [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0	
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
> other info that might help us debug this:
> 
> 5 locks held by bash/1621:
>  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811bc074>] sysfs_write_file+0x44/0x170
>  #1:  (s_active#110){.+.+.+}, at: [<ffffffff811bc0fd>] sysfs_write_file+0xcd/0x170
>  #2:  (&mem->state_mutex){+.+.+.}, at: [<ffffffff813af478>] memory_block_change_state+0x38/0x1a0
>  #3:  (pm_mutex){+.+.+.}, at: [<ffffffff81141ad9>] remove_memory+0x129/0x5f0
>  #4:  (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
> 
> stack backtrace:
> Pid: 1621, comm: bash Not tainted 2.6.36-rc7-mm1+ #148
> Call Trace:
>  [<ffffffff81088b5b>] print_circular_bug+0xeb/0xf0
>  [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
>  [<ffffffff8103a1f9>] ? finish_task_switch+0x79/0xe0
>  [<ffffffff815049a9>] ? schedule+0x419/0xa80
>  [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>  [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0	
>  [<ffffffff81506601>] down_read+0x51/0xa0
>  [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0
>  [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
>  [<ffffffff81110f06>] ? next_online_pgdat+0x26/0x50
>  [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>  [<ffffffff813afbfb>] memory_notify+0x1b/0x20			
>  [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
>  [<ffffffff8108ba98>] ? lock_release_non_nested+0x2f8/0x3a0
>  [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>  [<ffffffff8111705c>] ? might_fault+0x5c/0xb0
>  [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>  [<ffffffff811bc0fd>] ? sysfs_write_file+0xcd/0x170
>  [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>  [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>  [<ffffffff8114f398>] vfs_write+0xc8/0x190
>  [<ffffffff8114fc14>] sys_write+0x54/0x90
>  [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mem-hotplug + ksm make lockdep warning
@ 2010-10-26  7:10   ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2010-10-26  7:10 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML, linux-mm

On Mon, 25 Oct 2010, KOSAKI Motohiro wrote:
> Hi Hugh,
> 
> commit 62b61f611e(ksm: memory hotremove migration only) makes following
> lockdep warnings. Is this intentional?

No, certainly not intentional: thanks for finding this.  Looking back,
I think the machine I tested memory hotplug versus KSM upon was not
the machine I habitually ran lockdep on, I bet I forgot to try it.

> 
> More detail: current lockdep hieralcy is here.

And especial thanks for taking the trouble to present it in a way
that I find much easier to understand than lockdep's pronouncements.

> 
> memory_notify
> 	offline_pages
> 		lock_system_sleep();
> 			mutex_lock(&pm_mutex);
> 		memory_notify(MEM_GOING_OFFLINE)
> 			__blocking_notifier_call_chain
> 				down_read(memory_chain.rwsem)
> 				ksm_memory_callback()
> 					mutex_lock(&ksm_thread_mutex);  // memory_chain.rmsem -> ksm_thread_mutex order
> 				up_read(memory_chain.rwsem)
> 		memory_notify(MEM_OFFLINE)
> 			__blocking_notifier_call_chain
> 				down_read(memory_chain.rwsem)		// ksm_thread_mutex -> memory_chain.rmsem order
> 				ksm_memory_callback()
> 					mutex_unlock(&ksm_thread_mutex);
> 				up_read(memory_chain.rwsem)
> 		unlock_system_sleep();
> 			mutex_unlock(&pm_mutex);
> 
> So, I think pm_mutex protect ABBA deadlock. but it exist only when
> CONFIG_HIBERNATION=y. IOW, this code is not correct generically. Am I
> missing something?

I do remember taking great comfort from lock_system_sleep() i.e. pm_mutex
when I did the ksm_memory_callback(); but I think that comfort was more
along the lines of it making obvious that taking a mutex was okay there,
than it providing any safety.  I think I was unconscious of the issue you
raise, perhaps didn't even notice rwsem in __blocking_notifier_call_chain.

But is it really a problem, given that it's down_read(rwsem) in each case?
Yes, but I had to look up akpm's comment on msync in ChangeLog-2.6.11 to
remember why:

	And yes, the ranking of down_read() versus down() does matter:
	
		Task A			Task B		Task C
	
		down_read(rwsem)
					down(sem)
							down_write(rwsem)
		down(sem)
					down_read(rwsem)
	
	C's down_write() will cause B's down_read to block.
	B holds `sem', so A will never release `rwsem'.

Am I mistaken, or is get_any_page() in mm/memory-failure.c also relying
on lock_system_sleep() to do real locking, even without CONFIG_HIBERNATION?

If it is, then I think we should solve both problems by making it lock
unconditionally: though neither "lock_system_sleep" nor "pm_mutex" is an
appropriate name then... maybe "lock_memory_hotplug", but still using a
pm_mutex declared outside of CONFIG_PM?  Seems a bit weird.

And some kind of lockdep annotation needed for ksm_memory_callback(),
to help it understand how the outer mutex makes the inner inversion safe?
Or does lockdep manage that without help?

I think I'm not going to find time to do the patch for a while,
so please go ahead if you can.

Thanks,
Hugh

> 
> Thanks.
> 
> 
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.36-rc7-mm1+ #148
> -------------------------------------------------------
> bash/1621 is trying to acquire lock:
>  ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
> 
> but task is already holding lock:
>  (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (ksm_thread_mutex){+.+.+.}:
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
>        [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
>        [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0		
>        [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
>        [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0	
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
> -> #0 ((memory_chain).rwsem){.+.+.+}:
>        [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81506601>] down_read+0x51/0xa0
>        [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0	
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
> other info that might help us debug this:
> 
> 5 locks held by bash/1621:
>  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811bc074>] sysfs_write_file+0x44/0x170
>  #1:  (s_active#110){.+.+.+}, at: [<ffffffff811bc0fd>] sysfs_write_file+0xcd/0x170
>  #2:  (&mem->state_mutex){+.+.+.}, at: [<ffffffff813af478>] memory_block_change_state+0x38/0x1a0
>  #3:  (pm_mutex){+.+.+.}, at: [<ffffffff81141ad9>] remove_memory+0x129/0x5f0
>  #4:  (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
> 
> stack backtrace:
> Pid: 1621, comm: bash Not tainted 2.6.36-rc7-mm1+ #148
> Call Trace:
>  [<ffffffff81088b5b>] print_circular_bug+0xeb/0xf0
>  [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
>  [<ffffffff8103a1f9>] ? finish_task_switch+0x79/0xe0
>  [<ffffffff815049a9>] ? schedule+0x419/0xa80
>  [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>  [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0	
>  [<ffffffff81506601>] down_read+0x51/0xa0
>  [<ffffffff81079339>] ? __blocking_notifier_call_chain+0x69/0xc0
>  [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
>  [<ffffffff81110f06>] ? next_online_pgdat+0x26/0x50
>  [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>  [<ffffffff813afbfb>] memory_notify+0x1b/0x20			
>  [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
>  [<ffffffff8108ba98>] ? lock_release_non_nested+0x2f8/0x3a0
>  [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>  [<ffffffff8111705c>] ? might_fault+0x5c/0xb0
>  [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>  [<ffffffff811bc0fd>] ? sysfs_write_file+0xcd/0x170
>  [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>  [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>  [<ffffffff8114f398>] vfs_write+0xc8/0x190
>  [<ffffffff8114fc14>] sys_write+0x54/0x90
>  [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mem-hotplug + ksm make lockdep warning
  2010-10-26  7:10   ` Hugh Dickins
@ 2010-10-26  8:07     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-10-26  8:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Andrea Arcangeli, Andi Kleen,
	LKML, linux-mm

> On Mon, 25 Oct 2010, KOSAKI Motohiro wrote:
> > Hi Hugh,
> > 
> > commit 62b61f611e(ksm: memory hotremove migration only) makes following
> > lockdep warnings. Is this intentional?
> 
> No, certainly not intentional: thanks for finding this.  Looking back,
> I think the machine I tested memory hotplug versus KSM upon was not
> the machine I habitually ran lockdep on, I bet I forgot to try it.
> 
> > 
> > More detail: current lockdep hieralcy is here.
> 
> And especial thanks for taking the trouble to present it in a way
> that I find much easier to understand than lockdep's pronouncements.
> 
> > 
> > memory_notify
> > 	offline_pages
> > 		lock_system_sleep();
> > 			mutex_lock(&pm_mutex);
> > 		memory_notify(MEM_GOING_OFFLINE)
> > 			__blocking_notifier_call_chain
> > 				down_read(memory_chain.rwsem)
> > 				ksm_memory_callback()
> > 					mutex_lock(&ksm_thread_mutex);  // memory_chain.rmsem -> ksm_thread_mutex order
> > 				up_read(memory_chain.rwsem)
> > 		memory_notify(MEM_OFFLINE)
> > 			__blocking_notifier_call_chain
> > 				down_read(memory_chain.rwsem)		// ksm_thread_mutex -> memory_chain.rmsem order
> > 				ksm_memory_callback()
> > 					mutex_unlock(&ksm_thread_mutex);
> > 				up_read(memory_chain.rwsem)
> > 		unlock_system_sleep();
> > 			mutex_unlock(&pm_mutex);
> > 
> > So, I think pm_mutex protect ABBA deadlock. but it exist only when
> > CONFIG_HIBERNATION=y. IOW, this code is not correct generically. Am I
> > missing something?
> 
> I do remember taking great comfort from lock_system_sleep() i.e. pm_mutex
> when I did the ksm_memory_callback(); but I think that comfort was more
> along the lines of it making obvious that taking a mutex was okay there,
> than it providing any safety.  I think I was unconscious of the issue you
> raise, perhaps didn't even notice rwsem in __blocking_notifier_call_chain.
> 
> But is it really a problem, given that it's down_read(rwsem) in each case?
> Yes, but I had to look up akpm's comment on msync in ChangeLog-2.6.11 to
> remember why:
> 
> 	And yes, the ranking of down_read() versus down() does matter:
> 	
> 		Task A			Task B		Task C
> 	
> 		down_read(rwsem)
> 					down(sem)
> 							down_write(rwsem)
> 		down(sem)
> 					down_read(rwsem)
> 	
> 	C's down_write() will cause B's down_read to block.
> 	B holds `sem', so A will never release `rwsem'.

Yeah, in other word, my raised issue is neccessary following three actor.

A. do memory unplug
B. ditto
C. register new blocking notifier chain

Thus, I don't think this issue is occur so frquently. (Who want to unplug memory
concurrently?) But even though, some arch don't have hibernation support at all
and we need to fix it, maybe.


> 
> Am I mistaken, or is get_any_page() in mm/memory-failure.c also relying
> on lock_system_sleep() to do real locking, even without CONFIG_HIBERNATION?

I think get_any_page() also need to fix. ;)
Andi, please double check.

> If it is, then I think we should solve both problems by making it lock
> unconditionally: though neither "lock_system_sleep" nor "pm_mutex" is an
> appropriate name then... maybe "lock_memory_hotplug", but still using a
> pm_mutex declared outside of CONFIG_PM?  Seems a bit weird.

I agree with making lock_memory_hotplug.

> And some kind of lockdep annotation needed for ksm_memory_callback(),
> to help it understand how the outer mutex makes the inner inversion safe?
> Or does lockdep manage that without help?

I don't know lockdep internal at all. I can only say CONFIG_HIBERNATION=y
still makes this lockdep splat. iow, lockdep can't handle this inner 
inversion safe issue automatically.



> I think I'm not going to find time to do the patch for a while,
> so please go ahead if you can.

I also need to attend KS. So, If you can accept to waiting until middle of 
next month, I'll do.

Thanks.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: mem-hotplug + ksm make lockdep warning
@ 2010-10-26  8:07     ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-10-26  8:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Andrea Arcangeli, Andi Kleen,
	LKML, linux-mm

> On Mon, 25 Oct 2010, KOSAKI Motohiro wrote:
> > Hi Hugh,
> > 
> > commit 62b61f611e(ksm: memory hotremove migration only) makes following
> > lockdep warnings. Is this intentional?
> 
> No, certainly not intentional: thanks for finding this.  Looking back,
> I think the machine I tested memory hotplug versus KSM upon was not
> the machine I habitually ran lockdep on, I bet I forgot to try it.
> 
> > 
> > More detail: current lockdep hieralcy is here.
> 
> And especial thanks for taking the trouble to present it in a way
> that I find much easier to understand than lockdep's pronouncements.
> 
> > 
> > memory_notify
> > 	offline_pages
> > 		lock_system_sleep();
> > 			mutex_lock(&pm_mutex);
> > 		memory_notify(MEM_GOING_OFFLINE)
> > 			__blocking_notifier_call_chain
> > 				down_read(memory_chain.rwsem)
> > 				ksm_memory_callback()
> > 					mutex_lock(&ksm_thread_mutex);  // memory_chain.rmsem -> ksm_thread_mutex order
> > 				up_read(memory_chain.rwsem)
> > 		memory_notify(MEM_OFFLINE)
> > 			__blocking_notifier_call_chain
> > 				down_read(memory_chain.rwsem)		// ksm_thread_mutex -> memory_chain.rmsem order
> > 				ksm_memory_callback()
> > 					mutex_unlock(&ksm_thread_mutex);
> > 				up_read(memory_chain.rwsem)
> > 		unlock_system_sleep();
> > 			mutex_unlock(&pm_mutex);
> > 
> > So, I think pm_mutex protect ABBA deadlock. but it exist only when
> > CONFIG_HIBERNATION=y. IOW, this code is not correct generically. Am I
> > missing something?
> 
> I do remember taking great comfort from lock_system_sleep() i.e. pm_mutex
> when I did the ksm_memory_callback(); but I think that comfort was more
> along the lines of it making obvious that taking a mutex was okay there,
> than it providing any safety.  I think I was unconscious of the issue you
> raise, perhaps didn't even notice rwsem in __blocking_notifier_call_chain.
> 
> But is it really a problem, given that it's down_read(rwsem) in each case?
> Yes, but I had to look up akpm's comment on msync in ChangeLog-2.6.11 to
> remember why:
> 
> 	And yes, the ranking of down_read() versus down() does matter:
> 	
> 		Task A			Task B		Task C
> 	
> 		down_read(rwsem)
> 					down(sem)
> 							down_write(rwsem)
> 		down(sem)
> 					down_read(rwsem)
> 	
> 	C's down_write() will cause B's down_read to block.
> 	B holds `sem', so A will never release `rwsem'.

Yeah, in other word, my raised issue is neccessary following three actor.

A. do memory unplug
B. ditto
C. register new blocking notifier chain

Thus, I don't think this issue is occur so frquently. (Who want to unplug memory
concurrently?) But even though, some arch don't have hibernation support at all
and we need to fix it, maybe.


> 
> Am I mistaken, or is get_any_page() in mm/memory-failure.c also relying
> on lock_system_sleep() to do real locking, even without CONFIG_HIBERNATION?

I think get_any_page() also need to fix. ;)
Andi, please double check.

> If it is, then I think we should solve both problems by making it lock
> unconditionally: though neither "lock_system_sleep" nor "pm_mutex" is an
> appropriate name then... maybe "lock_memory_hotplug", but still using a
> pm_mutex declared outside of CONFIG_PM?  Seems a bit weird.

I agree with making lock_memory_hotplug.

> And some kind of lockdep annotation needed for ksm_memory_callback(),
> to help it understand how the outer mutex makes the inner inversion safe?
> Or does lockdep manage that without help?

I don't know lockdep internal at all. I can only say CONFIG_HIBERNATION=y
still makes this lockdep splat. iow, lockdep can't handle this inner 
inversion safe issue automatically.



> I think I'm not going to find time to do the patch for a while,
> so please go ahead if you can.

I also need to attend KS. So, If you can accept to waiting until middle of 
next month, I'll do.

Thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mem-hotplug: Introduce {un}lock_memory_hotplug()
  2010-10-26  8:07     ` KOSAKI Motohiro
@ 2010-12-01  5:28       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-12-01  5:28 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Andrea Arcangeli,
	Andi Kleen, LKML, linux-mm, KAMEZAWA Hiroyuki

Now, hwpoison are using lock_system_sleep() for prevent a race
with memory hotplug. However lock_system_sleep() is no-op if
CONFIG_HIBERNATION=n. Therefore we need new lock.

Cc: Andi Kleen <andi@firstfloor.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/memory_hotplug.h |    6 ++++++
 mm/memory-failure.c            |    8 ++++----
 mm/memory_hotplug.c            |   31 ++++++++++++++++++++++++-------
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4307231..31c237a 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -161,6 +161,9 @@ extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
 #endif
 
+void lock_memory_hotplug(void);
+void unlock_memory_hotplug(void);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
@@ -192,6 +195,9 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 
+static inline void lock_memory_hotplug(void) {}
+static inline void unlock_memory_hotplug(void) {}
+
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1243241..46ab2c0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -51,6 +51,7 @@
 #include <linux/slab.h>
 #include <linux/swapops.h>
 #include <linux/hugetlb.h>
+#include <linux/memory_hotplug.h>
 #include "internal.h"
 
 int sysctl_memory_failure_early_kill __read_mostly = 0;
@@ -1230,11 +1231,10 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 		return 1;
 
 	/*
-	 * The lock_system_sleep prevents a race with memory hotplug,
-	 * because the isolation assumes there's only a single user.
+	 * The lock_memory_hotplug prevents a race with memory hotplug.
 	 * This is a big hammer, a better would be nicer.
 	 */
-	lock_system_sleep();
+	lock_memory_hotplug();
 
 	/*
 	 * Isolate the page, so that it doesn't get reallocated if it
@@ -1264,7 +1264,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 		ret = 1;
 	}
 	unset_migratetype_isolate(p);
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 833e286..7549a01 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -34,6 +34,23 @@
 
 #include "internal.h"
 
+DEFINE_MUTEX(mem_hotplug_mutex);
+
+void lock_memory_hotplug(void)
+{
+	mutex_lock(&mem_hotplug_mutex);
+
+	/* for exclusive hibernation if CONFIG_HIBERNATION=y */
+	lock_system_sleep();
+}
+
+void unlock_memory_hotplug(void)
+{
+	unlock_system_sleep();
+	mutex_unlock(&mem_hotplug_mutex);
+}
+
+
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -491,7 +508,7 @@ int mem_online_node(int nid)
 	pg_data_t	*pgdat;
 	int	ret;
 
-	lock_system_sleep();
+	lock_memory_hotplug();
 	pgdat = hotadd_new_pgdat(nid, 0);
 	if (pgdat) {
 		ret = -ENOMEM;
@@ -502,7 +519,7 @@ int mem_online_node(int nid)
 	BUG_ON(ret);
 
 out:
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 
@@ -514,7 +531,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
 	struct resource *res;
 	int ret;
 
-	lock_system_sleep();
+	lock_memory_hotplug();
 
 	res = register_memory_resource(start, size);
 	ret = -EEXIST;
@@ -561,7 +578,7 @@ error:
 		release_memory_resource(res);
 
 out:
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory);
@@ -789,7 +806,7 @@ static int offline_pages(unsigned long start_pfn,
 	if (!test_pages_in_a_zone(start_pfn, end_pfn))
 		return -EINVAL;
 
-	lock_system_sleep();
+	lock_memory_hotplug();
 
 	zone = page_zone(pfn_to_page(start_pfn));
 	node = zone_to_nid(zone);
@@ -877,7 +894,7 @@ repeat:
 	vm_total_pages = nr_free_pagecache_pages();
 
 	memory_notify(MEM_OFFLINE, &arg);
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return 0;
 
 failed_removal:
@@ -888,7 +905,7 @@ failed_removal:
 	undo_isolate_page_range(start_pfn, end_pfn);
 
 out:
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mem-hotplug: Introduce {un}lock_memory_hotplug()
@ 2010-12-01  5:28       ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-12-01  5:28 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML,
	linux-mm, KAMEZAWA Hiroyuki

Now, hwpoison are using lock_system_sleep() for prevent a race
with memory hotplug. However lock_system_sleep() is no-op if
CONFIG_HIBERNATION=n. Therefore we need new lock.

Cc: Andi Kleen <andi@firstfloor.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/memory_hotplug.h |    6 ++++++
 mm/memory-failure.c            |    8 ++++----
 mm/memory_hotplug.c            |   31 ++++++++++++++++++++++++-------
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 4307231..31c237a 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -161,6 +161,9 @@ extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
 extern void put_page_bootmem(struct page *page);
 #endif
 
+void lock_memory_hotplug(void);
+void unlock_memory_hotplug(void);
+
 #else /* ! CONFIG_MEMORY_HOTPLUG */
 /*
  * Stub functions for when hotplug is off
@@ -192,6 +195,9 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
 
+static inline void lock_memory_hotplug(void) {}
+static inline void unlock_memory_hotplug(void) {}
+
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 1243241..46ab2c0 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -51,6 +51,7 @@
 #include <linux/slab.h>
 #include <linux/swapops.h>
 #include <linux/hugetlb.h>
+#include <linux/memory_hotplug.h>
 #include "internal.h"
 
 int sysctl_memory_failure_early_kill __read_mostly = 0;
@@ -1230,11 +1231,10 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 		return 1;
 
 	/*
-	 * The lock_system_sleep prevents a race with memory hotplug,
-	 * because the isolation assumes there's only a single user.
+	 * The lock_memory_hotplug prevents a race with memory hotplug.
 	 * This is a big hammer, a better would be nicer.
 	 */
-	lock_system_sleep();
+	lock_memory_hotplug();
 
 	/*
 	 * Isolate the page, so that it doesn't get reallocated if it
@@ -1264,7 +1264,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 		ret = 1;
 	}
 	unset_migratetype_isolate(p);
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 833e286..7549a01 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -34,6 +34,23 @@
 
 #include "internal.h"
 
+DEFINE_MUTEX(mem_hotplug_mutex);
+
+void lock_memory_hotplug(void)
+{
+	mutex_lock(&mem_hotplug_mutex);
+
+	/* for exclusive hibernation if CONFIG_HIBERNATION=y */
+	lock_system_sleep();
+}
+
+void unlock_memory_hotplug(void)
+{
+	unlock_system_sleep();
+	mutex_unlock(&mem_hotplug_mutex);
+}
+
+
 /* add this memory to iomem resource */
 static struct resource *register_memory_resource(u64 start, u64 size)
 {
@@ -491,7 +508,7 @@ int mem_online_node(int nid)
 	pg_data_t	*pgdat;
 	int	ret;
 
-	lock_system_sleep();
+	lock_memory_hotplug();
 	pgdat = hotadd_new_pgdat(nid, 0);
 	if (pgdat) {
 		ret = -ENOMEM;
@@ -502,7 +519,7 @@ int mem_online_node(int nid)
 	BUG_ON(ret);
 
 out:
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 
@@ -514,7 +531,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
 	struct resource *res;
 	int ret;
 
-	lock_system_sleep();
+	lock_memory_hotplug();
 
 	res = register_memory_resource(start, size);
 	ret = -EEXIST;
@@ -561,7 +578,7 @@ error:
 		release_memory_resource(res);
 
 out:
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory);
@@ -789,7 +806,7 @@ static int offline_pages(unsigned long start_pfn,
 	if (!test_pages_in_a_zone(start_pfn, end_pfn))
 		return -EINVAL;
 
-	lock_system_sleep();
+	lock_memory_hotplug();
 
 	zone = page_zone(pfn_to_page(start_pfn));
 	node = zone_to_nid(zone);
@@ -877,7 +894,7 @@ repeat:
 	vm_total_pages = nr_free_pagecache_pages();
 
 	memory_notify(MEM_OFFLINE, &arg);
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return 0;
 
 failed_removal:
@@ -888,7 +905,7 @@ failed_removal:
 	undo_isolate_page_range(start_pfn, end_pfn);
 
 out:
-	unlock_system_sleep();
+	unlock_memory_hotplug();
 	return ret;
 }
 
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] ksm: annotate ksm_thread_mutex is no deadlock source
  2010-10-26  8:07     ` KOSAKI Motohiro
@ 2010-12-01  5:29       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-12-01  5:29 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: kosaki.motohiro, Hugh Dickins, Andrew Morton, Andrea Arcangeli,
	Andi Kleen, LKML, linux-mm, KAMEZAWA Hiroyuki


commit 62b61f611e(ksm: memory hotremove migration only) made following
new lockdep warning.

  =======================================================
  [ INFO: possible circular locking dependency detected ]
  -------------------------------------------------------
  bash/1621 is trying to acquire lock:
   ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>]
  __blocking_notifier_call_chain+0x69/0xc0

  but task is already holding lock:
   (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>]
  ksm_memory_callback+0x3a/0xc0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (ksm_thread_mutex){+.+.+.}:
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
       [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
       [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
       [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
       [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

  -> #0 ((memory_chain).rwsem){.+.+.+}:
       [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81506601>] down_read+0x51/0xa0
       [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

But it's false positive. Both memory_chain.rwsem and ksm_thread_mutex
have outer lock (mem_hotplug_mutex). then, they can't make deadlock.

Thus, This patch annotate ksm_thread_mutex is not deadlock source.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/ksm.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 65ab5c7..5aa4900 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1724,8 +1724,10 @@ static int ksm_memory_callback(struct notifier_block *self,
 		/*
 		 * Keep it very simple for now: just lock out ksmd and
 		 * MADV_UNMERGEABLE while any memory is going offline.
+		 * Mutex_lock_nested() is necessary to tell that
+		 * ksm_thread_mutex is not unlocked here intentionally.
 		 */
-		mutex_lock(&ksm_thread_mutex);
+		mutex_lock_nested(&ksm_thread_mutex, SINGLE_DEPTH_NESTING);
 		break;
 
 	case MEM_OFFLINE:
-- 
1.6.5.2




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] ksm: annotate ksm_thread_mutex is no deadlock source
@ 2010-12-01  5:29       ` KOSAKI Motohiro
  0 siblings, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2010-12-01  5:29 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML,
	linux-mm, KAMEZAWA Hiroyuki


commit 62b61f611e(ksm: memory hotremove migration only) made following
new lockdep warning.

  =======================================================
  [ INFO: possible circular locking dependency detected ]
  -------------------------------------------------------
  bash/1621 is trying to acquire lock:
   ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>]
  __blocking_notifier_call_chain+0x69/0xc0

  but task is already holding lock:
   (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>]
  ksm_memory_callback+0x3a/0xc0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (ksm_thread_mutex){+.+.+.}:
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
       [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
       [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
       [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
       [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

  -> #0 ((memory_chain).rwsem){.+.+.+}:
       [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
       [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
       [<ffffffff81506601>] down_read+0x51/0xa0
       [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
       [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
       [<ffffffff813afbfb>] memory_notify+0x1b/0x20
       [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
       [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
       [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
       [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
       [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
       [<ffffffff8114f398>] vfs_write+0xc8/0x190
       [<ffffffff8114fc14>] sys_write+0x54/0x90
       [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b

But it's false positive. Both memory_chain.rwsem and ksm_thread_mutex
have outer lock (mem_hotplug_mutex). then, they can't make deadlock.

Thus, This patch annotate ksm_thread_mutex is not deadlock source.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/ksm.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 65ab5c7..5aa4900 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1724,8 +1724,10 @@ static int ksm_memory_callback(struct notifier_block *self,
 		/*
 		 * Keep it very simple for now: just lock out ksmd and
 		 * MADV_UNMERGEABLE while any memory is going offline.
+		 * Mutex_lock_nested() is necessary to tell that
+		 * ksm_thread_mutex is not unlocked here intentionally.
 		 */
-		mutex_lock(&ksm_thread_mutex);
+		mutex_lock_nested(&ksm_thread_mutex, SINGLE_DEPTH_NESTING);
 		break;
 
 	case MEM_OFFLINE:
-- 
1.6.5.2



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mem-hotplug: Introduce {un}lock_memory_hotplug()
  2010-12-01  5:28       ` KOSAKI Motohiro
@ 2010-12-01 19:57         ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2010-12-01 19:57 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML, linux-mm,
	KAMEZAWA Hiroyuki

On Wed, 1 Dec 2010, KOSAKI Motohiro wrote:

> Now, hwpoison are using lock_system_sleep() for prevent a race
> with memory hotplug. However lock_system_sleep() is no-op if
> CONFIG_HIBERNATION=n. Therefore we need new lock.
> 
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Suggested-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Yes, that looks better than what I had had in mind (further abusing
pm_mutex).  I notice that if MEMORY_HOTPLUG is off but HIBERNATION on,
lock_memory_hotplug does nothing where lock_system_sleep did something;
but I don't think that's a problem at all.  Thanks.

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  include/linux/memory_hotplug.h |    6 ++++++
>  mm/memory-failure.c            |    8 ++++----
>  mm/memory_hotplug.c            |   31 ++++++++++++++++++++++++-------
>  3 files changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 4307231..31c237a 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -161,6 +161,9 @@ extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
>  extern void put_page_bootmem(struct page *page);
>  #endif
>  
> +void lock_memory_hotplug(void);
> +void unlock_memory_hotplug(void);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> @@ -192,6 +195,9 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
>  {
>  }
>  
> +static inline void lock_memory_hotplug(void) {}
> +static inline void unlock_memory_hotplug(void) {}
> +
>  #endif /* ! CONFIG_MEMORY_HOTPLUG */
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 1243241..46ab2c0 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -51,6 +51,7 @@
>  #include <linux/slab.h>
>  #include <linux/swapops.h>
>  #include <linux/hugetlb.h>
> +#include <linux/memory_hotplug.h>
>  #include "internal.h"
>  
>  int sysctl_memory_failure_early_kill __read_mostly = 0;
> @@ -1230,11 +1231,10 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
>  		return 1;
>  
>  	/*
> -	 * The lock_system_sleep prevents a race with memory hotplug,
> -	 * because the isolation assumes there's only a single user.
> +	 * The lock_memory_hotplug prevents a race with memory hotplug.
>  	 * This is a big hammer, a better would be nicer.
>  	 */
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  
>  	/*
>  	 * Isolate the page, so that it doesn't get reallocated if it
> @@ -1264,7 +1264,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
>  		ret = 1;
>  	}
>  	unset_migratetype_isolate(p);
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 833e286..7549a01 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -34,6 +34,23 @@
>  
>  #include "internal.h"
>  
> +DEFINE_MUTEX(mem_hotplug_mutex);
> +
> +void lock_memory_hotplug(void)
> +{
> +	mutex_lock(&mem_hotplug_mutex);
> +
> +	/* for exclusive hibernation if CONFIG_HIBERNATION=y */
> +	lock_system_sleep();
> +}
> +
> +void unlock_memory_hotplug(void)
> +{
> +	unlock_system_sleep();
> +	mutex_unlock(&mem_hotplug_mutex);
> +}
> +
> +
>  /* add this memory to iomem resource */
>  static struct resource *register_memory_resource(u64 start, u64 size)
>  {
> @@ -491,7 +508,7 @@ int mem_online_node(int nid)
>  	pg_data_t	*pgdat;
>  	int	ret;
>  
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  	pgdat = hotadd_new_pgdat(nid, 0);
>  	if (pgdat) {
>  		ret = -ENOMEM;
> @@ -502,7 +519,7 @@ int mem_online_node(int nid)
>  	BUG_ON(ret);
>  
>  out:
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  
> @@ -514,7 +531,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
>  	struct resource *res;
>  	int ret;
>  
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  
>  	res = register_memory_resource(start, size);
>  	ret = -EEXIST;
> @@ -561,7 +578,7 @@ error:
>  		release_memory_resource(res);
>  
>  out:
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(add_memory);
> @@ -789,7 +806,7 @@ static int offline_pages(unsigned long start_pfn,
>  	if (!test_pages_in_a_zone(start_pfn, end_pfn))
>  		return -EINVAL;
>  
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  
>  	zone = page_zone(pfn_to_page(start_pfn));
>  	node = zone_to_nid(zone);
> @@ -877,7 +894,7 @@ repeat:
>  	vm_total_pages = nr_free_pagecache_pages();
>  
>  	memory_notify(MEM_OFFLINE, &arg);
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return 0;
>  
>  failed_removal:
> @@ -888,7 +905,7 @@ failed_removal:
>  	undo_isolate_page_range(start_pfn, end_pfn);
>  
>  out:
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  
> -- 
> 1.6.5.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mem-hotplug: Introduce {un}lock_memory_hotplug()
@ 2010-12-01 19:57         ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2010-12-01 19:57 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML, linux-mm,
	KAMEZAWA Hiroyuki

On Wed, 1 Dec 2010, KOSAKI Motohiro wrote:

> Now, hwpoison are using lock_system_sleep() for prevent a race
> with memory hotplug. However lock_system_sleep() is no-op if
> CONFIG_HIBERNATION=n. Therefore we need new lock.
> 
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Suggested-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Yes, that looks better than what I had had in mind (further abusing
pm_mutex).  I notice that if MEMORY_HOTPLUG is off but HIBERNATION on,
lock_memory_hotplug does nothing where lock_system_sleep did something;
but I don't think that's a problem at all.  Thanks.

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  include/linux/memory_hotplug.h |    6 ++++++
>  mm/memory-failure.c            |    8 ++++----
>  mm/memory_hotplug.c            |   31 ++++++++++++++++++++++++-------
>  3 files changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 4307231..31c237a 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -161,6 +161,9 @@ extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
>  extern void put_page_bootmem(struct page *page);
>  #endif
>  
> +void lock_memory_hotplug(void);
> +void unlock_memory_hotplug(void);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> @@ -192,6 +195,9 @@ static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
>  {
>  }
>  
> +static inline void lock_memory_hotplug(void) {}
> +static inline void unlock_memory_hotplug(void) {}
> +
>  #endif /* ! CONFIG_MEMORY_HOTPLUG */
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 1243241..46ab2c0 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -51,6 +51,7 @@
>  #include <linux/slab.h>
>  #include <linux/swapops.h>
>  #include <linux/hugetlb.h>
> +#include <linux/memory_hotplug.h>
>  #include "internal.h"
>  
>  int sysctl_memory_failure_early_kill __read_mostly = 0;
> @@ -1230,11 +1231,10 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
>  		return 1;
>  
>  	/*
> -	 * The lock_system_sleep prevents a race with memory hotplug,
> -	 * because the isolation assumes there's only a single user.
> +	 * The lock_memory_hotplug prevents a race with memory hotplug.
>  	 * This is a big hammer, a better would be nicer.
>  	 */
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  
>  	/*
>  	 * Isolate the page, so that it doesn't get reallocated if it
> @@ -1264,7 +1264,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
>  		ret = 1;
>  	}
>  	unset_migratetype_isolate(p);
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 833e286..7549a01 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -34,6 +34,23 @@
>  
>  #include "internal.h"
>  
> +DEFINE_MUTEX(mem_hotplug_mutex);
> +
> +void lock_memory_hotplug(void)
> +{
> +	mutex_lock(&mem_hotplug_mutex);
> +
> +	/* for exclusive hibernation if CONFIG_HIBERNATION=y */
> +	lock_system_sleep();
> +}
> +
> +void unlock_memory_hotplug(void)
> +{
> +	unlock_system_sleep();
> +	mutex_unlock(&mem_hotplug_mutex);
> +}
> +
> +
>  /* add this memory to iomem resource */
>  static struct resource *register_memory_resource(u64 start, u64 size)
>  {
> @@ -491,7 +508,7 @@ int mem_online_node(int nid)
>  	pg_data_t	*pgdat;
>  	int	ret;
>  
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  	pgdat = hotadd_new_pgdat(nid, 0);
>  	if (pgdat) {
>  		ret = -ENOMEM;
> @@ -502,7 +519,7 @@ int mem_online_node(int nid)
>  	BUG_ON(ret);
>  
>  out:
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  
> @@ -514,7 +531,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
>  	struct resource *res;
>  	int ret;
>  
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  
>  	res = register_memory_resource(start, size);
>  	ret = -EEXIST;
> @@ -561,7 +578,7 @@ error:
>  		release_memory_resource(res);
>  
>  out:
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(add_memory);
> @@ -789,7 +806,7 @@ static int offline_pages(unsigned long start_pfn,
>  	if (!test_pages_in_a_zone(start_pfn, end_pfn))
>  		return -EINVAL;
>  
> -	lock_system_sleep();
> +	lock_memory_hotplug();
>  
>  	zone = page_zone(pfn_to_page(start_pfn));
>  	node = zone_to_nid(zone);
> @@ -877,7 +894,7 @@ repeat:
>  	vm_total_pages = nr_free_pagecache_pages();
>  
>  	memory_notify(MEM_OFFLINE, &arg);
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return 0;
>  
>  failed_removal:
> @@ -888,7 +905,7 @@ failed_removal:
>  	undo_isolate_page_range(start_pfn, end_pfn);
>  
>  out:
> -	unlock_system_sleep();
> +	unlock_memory_hotplug();
>  	return ret;
>  }
>  
> -- 
> 1.6.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] ksm: annotate ksm_thread_mutex is no deadlock source
  2010-12-01  5:29       ` KOSAKI Motohiro
@ 2010-12-01 20:13         ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2010-12-01 20:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML, linux-mm,
	KAMEZAWA Hiroyuki

On Wed, 1 Dec 2010, KOSAKI Motohiro wrote:
> 
> commit 62b61f611e(ksm: memory hotremove migration only) made following
> new lockdep warning.
> 
>   =======================================================
>   [ INFO: possible circular locking dependency detected ]
>   -------------------------------------------------------
>   bash/1621 is trying to acquire lock:
>    ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>]
>   __blocking_notifier_call_chain+0x69/0xc0
> 
>   but task is already holding lock:
>    (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>]
>   ksm_memory_callback+0x3a/0xc0
> 
>   which lock already depends on the new lock.
> 
>   the existing dependency chain (in reverse order) is:
> 
>   -> #1 (ksm_thread_mutex){+.+.+.}:
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
>        [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
>        [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
>        [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
>        [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
>   -> #0 ((memory_chain).rwsem){.+.+.+}:
>        [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81506601>] down_read+0x51/0xa0
>        [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
> But it's false positive. Both memory_chain.rwsem and ksm_thread_mutex
> have outer lock (mem_hotplug_mutex). then, they can't make deadlock.
> 
> Thus, This patch annotate ksm_thread_mutex is not deadlock source.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you (I assume it does the job, I've not yet checked).  My only
issue with this is that the comment you added below tells a different
story from the fuller comment you give above.  Maybe change it to:

	* mutex_lock_nested() is necessary because lockdep was alarmed that
	* here we take ksm_thread_mutex inside notifier chain mutex, and
	* later take notifier chain mutex inside ksm_thread_mutex to unlock
	* it: but that's safe because both are inside mem_hotplug_mutex.

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  mm/ksm.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 65ab5c7..5aa4900 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1724,8 +1724,10 @@ static int ksm_memory_callback(struct notifier_block *self,
>  		/*
>  		 * Keep it very simple for now: just lock out ksmd and
>  		 * MADV_UNMERGEABLE while any memory is going offline.
> +		 * Mutex_lock_nested() is necessary to tell that
> +		 * ksm_thread_mutex is not unlocked here intentionally.
>  		 */
> -		mutex_lock(&ksm_thread_mutex);
> +		mutex_lock_nested(&ksm_thread_mutex, SINGLE_DEPTH_NESTING);
>  		break;
>  
>  	case MEM_OFFLINE:
> -- 
> 1.6.5.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] ksm: annotate ksm_thread_mutex is no deadlock source
@ 2010-12-01 20:13         ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2010-12-01 20:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Andrea Arcangeli, Andi Kleen, LKML, linux-mm,
	KAMEZAWA Hiroyuki

On Wed, 1 Dec 2010, KOSAKI Motohiro wrote:
> 
> commit 62b61f611e(ksm: memory hotremove migration only) made following
> new lockdep warning.
> 
>   =======================================================
>   [ INFO: possible circular locking dependency detected ]
>   -------------------------------------------------------
>   bash/1621 is trying to acquire lock:
>    ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>]
>   __blocking_notifier_call_chain+0x69/0xc0
> 
>   but task is already holding lock:
>    (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>]
>   ksm_memory_callback+0x3a/0xc0
> 
>   which lock already depends on the new lock.
> 
>   the existing dependency chain (in reverse order) is:
> 
>   -> #1 (ksm_thread_mutex){+.+.+.}:
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0
>        [<ffffffff81506228>] mutex_lock_nested+0x48/0x60
>        [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0
>        [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0
>        [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
>   -> #0 ((memory_chain).rwsem){.+.+.+}:
>        [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600
>        [<ffffffff8108b70a>] lock_acquire+0xaa/0x140
>        [<ffffffff81506601>] down_read+0x51/0xa0
>        [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0
>        [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20
>        [<ffffffff813afbfb>] memory_notify+0x1b/0x20
>        [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0
>        [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0
>        [<ffffffff813afd62>] store_mem_state+0xe2/0xf0
>        [<ffffffff813a0bb0>] sysdev_store+0x20/0x30
>        [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170
>        [<ffffffff8114f398>] vfs_write+0xc8/0x190
>        [<ffffffff8114fc14>] sys_write+0x54/0x90
>        [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b
> 
> But it's false positive. Both memory_chain.rwsem and ksm_thread_mutex
> have outer lock (mem_hotplug_mutex). then, they can't make deadlock.
> 
> Thus, This patch annotate ksm_thread_mutex is not deadlock source.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you (I assume it does the job, I've not yet checked).  My only
issue with this is that the comment you added below tells a different
story from the fuller comment you give above.  Maybe change it to:

	* mutex_lock_nested() is necessary because lockdep was alarmed that
	* here we take ksm_thread_mutex inside notifier chain mutex, and
	* later take notifier chain mutex inside ksm_thread_mutex to unlock
	* it: but that's safe because both are inside mem_hotplug_mutex.

Acked-by: Hugh Dickins <hughd@google.com>

> ---
>  mm/ksm.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 65ab5c7..5aa4900 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1724,8 +1724,10 @@ static int ksm_memory_callback(struct notifier_block *self,
>  		/*
>  		 * Keep it very simple for now: just lock out ksmd and
>  		 * MADV_UNMERGEABLE while any memory is going offline.
> +		 * Mutex_lock_nested() is necessary to tell that
> +		 * ksm_thread_mutex is not unlocked here intentionally.
>  		 */
> -		mutex_lock(&ksm_thread_mutex);
> +		mutex_lock_nested(&ksm_thread_mutex, SINGLE_DEPTH_NESTING);
>  		break;
>  
>  	case MEM_OFFLINE:
> -- 
> 1.6.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-12-01 20:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-25 10:49 mem-hotplug + ksm make lockdep warning KOSAKI Motohiro
2010-10-25 10:49 ` KOSAKI Motohiro
2010-10-26  7:10 ` Hugh Dickins
2010-10-26  7:10   ` Hugh Dickins
2010-10-26  8:07   ` KOSAKI Motohiro
2010-10-26  8:07     ` KOSAKI Motohiro
2010-12-01  5:28     ` [PATCH 1/2] mem-hotplug: Introduce {un}lock_memory_hotplug() KOSAKI Motohiro
2010-12-01  5:28       ` KOSAKI Motohiro
2010-12-01 19:57       ` Hugh Dickins
2010-12-01 19:57         ` Hugh Dickins
2010-12-01  5:29     ` [PATCH 2/2] ksm: annotate ksm_thread_mutex is no deadlock source KOSAKI Motohiro
2010-12-01  5:29       ` KOSAKI Motohiro
2010-12-01 20:13       ` Hugh Dickins
2010-12-01 20:13         ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.