All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-21 11:26 Arun Bhanu
  2010-11-21 13:30   ` Ted Ts'o
  0 siblings, 1 reply; 37+ messages in thread
From: Arun Bhanu @ 2010-11-21 11:26 UTC (permalink / raw)
  To: linux-ext4, linux-kernel
  Cc: Theodore Ts'o, Andreas Dilger, Paul McKenney, Eric Sandeen

I saw this in kernel log messages while testing 2.6.37-rc2. I think it
appeared while mounting an external hard-disk. I can't seem to
reproduce it.

Please let me know if you need more info.

[47064.272151] ===================================================
[47064.273474] [ INFO: suspicious rcu_dereference_check() usage. ]
[47064.273956] ---------------------------------------------------
[47064.274431] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
[47064.274905] 
[47064.274906] other info that might help us debug this:
[47064.274907] 
[47064.276303] 
[47064.276303] rcu_scheduler_active = 1, debug_locks = 0
[47064.277202] 2 locks held by mount/21199:
[47064.277635]  #0:  (&type->s_umount_key#20/1){+.+.+.}, at: [<c05007f0>] sget+0x21f/0x35d
[47064.278078]  #1:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c04f5e67>] migrate_page_move_mapping+0x3a/0x120
[47064.278529] 
[47064.278529] stack backtrace:
[47064.279409] Pid: 21199, comm: mount Not tainted 2.6.37-rc2-ab2-589136bfa784a4558b397f017ca2f06f0ca9080e+ #1
[47064.279864] Call Trace:
[47064.280313]  [<c0822e61>] ? printk+0x2d/0x34
[47064.280765]  [<c04709c8>] lockdep_rcu_dereference+0x97/0x9f
[47064.281220]  [<c04f5e28>] radix_tree_deref_slot+0x4a/0x4f
[47064.281680]  [<c04f5ea7>] migrate_page_move_mapping+0x7a/0x120
[47064.282129]  [<c04f6323>] migrate_page+0x1f/0x35
[47064.282573]  [<c04f6464>] move_to_new_page+0x12b/0x164
[47064.283017]  [<c04f6773>] migrate_pages+0x1e1/0x2ee
[47064.283463]  [<c04eed89>] ? compaction_alloc+0x0/0x1ef
[47064.283918]  [<c04eebdc>] compact_zone+0x24f/0x3fc
[47064.284363]  [<c04ef0f4>] try_to_compact_pages+0x17c/0x1e6
[47064.284820]  [<c04cac33>] __alloc_pages_nodemask+0x397/0x6b7
[47064.285276]  [<c04caf75>] __get_free_pages+0x22/0x33
[47064.285729]  [<c04f4304>] __kmalloc+0x2f/0x112
[47064.286193]  [<c0435adc>] ? should_resched+0xd/0x28
[47064.286655]  [<c0578ca9>] kzalloc.clone.58+0x12/0x14
[47064.287118]  [<c057c7f1>] ext4_fill_super+0x1090/0x2521
[47064.287573]  [<c040ea02>] ? native_sched_clock+0x14/0x52
[47064.288029]  [<c054774a>] ? disk_name+0x86/0x90
[47064.288477]  [<c0523bfa>] ? set_blocksize+0x33/0x78
[47064.288930]  [<c0500aed>] mount_bdev+0x123/0x16d
[47064.289385]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
[47064.289842]  [<c05776fd>] ? ext4_mount+0x0/0x24
[47064.290300]  [<c057771c>] ext4_mount+0x1f/0x24
[47064.290760]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
[47064.291222]  [<c05003e1>] vfs_kern_mount+0xa1/0x1ad
[47064.291687]  [<c0511c36>] ? get_fs_type+0x38/0x91
[47064.292149]  [<c0500546>] do_kern_mount+0x3d/0xc8
[47064.292615]  [<c05142b2>] do_mount+0x614/0x640
[47064.293056]  [<c04d69f0>] ? strndup_user+0x2e/0x3f
[47064.293489]  [<c05144a1>] sys_mount+0x6d/0x99
[47064.293925]  [<c040971f>] sysenter_do_call+0x12/0x38
[47064.368116] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)

-Arun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-21 11:26 [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage Arun Bhanu
  2010-11-21 13:30   ` Ted Ts'o
@ 2010-11-21 13:30   ` Ted Ts'o
  0 siblings, 0 replies; 37+ messages in thread
From: Ted Ts'o @ 2010-11-21 13:30 UTC (permalink / raw)
  To: linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote:
> I saw this in kernel log messages while testing 2.6.37-rc2. I think it
> appeared while mounting an external hard-disk. I can't seem to
> reproduce it.

I could be wrong but this looks like it's a bug in mm/migrate.c in
migrate_page_move_mapping(): it is calling radix_tree_lookup_slot()
without first taking an rcu_read_lock().

It was triggered by a memory allocation out of ext4_fill_super(),
which then triggered a memory compaction/migration, but I don't
believe it's otherwise related to the ext4 code.

Over to the linux-mm folks for confirmation...

					- Ted

> Please let me know if you need more info.
> 
> [47064.272151] ===================================================
> [47064.273474] [ INFO: suspicious rcu_dereference_check() usage. ]
> [47064.273956] ---------------------------------------------------
> [47064.274431] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> [47064.274905] 
> [47064.274906] other info that might help us debug this:
> [47064.274907] 
> [47064.276303] 
> [47064.276303] rcu_scheduler_active = 1, debug_locks = 0
> [47064.277202] 2 locks held by mount/21199:
> [47064.277635]  #0:  (&type->s_umount_key#20/1){+.+.+.}, at: [<c05007f0>] sget+0x21f/0x35d
> [47064.278078]  #1:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c04f5e67>] migrate_page_move_mapping+0x3a/0x120
> [47064.278529] 
> [47064.278529] stack backtrace:
> [47064.279409] Pid: 21199, comm: mount Not tainted 2.6.37-rc2-ab2-589136bfa784a4558b397f017ca2f06f0ca9080e+ #1
> [47064.279864] Call Trace:
> [47064.280313]  [<c0822e61>] ? printk+0x2d/0x34
> [47064.280765]  [<c04709c8>] lockdep_rcu_dereference+0x97/0x9f
> [47064.281220]  [<c04f5e28>] radix_tree_deref_slot+0x4a/0x4f
> [47064.281680]  [<c04f5ea7>] migrate_page_move_mapping+0x7a/0x120
> [47064.282129]  [<c04f6323>] migrate_page+0x1f/0x35
> [47064.282573]  [<c04f6464>] move_to_new_page+0x12b/0x164
> [47064.283017]  [<c04f6773>] migrate_pages+0x1e1/0x2ee
> [47064.283463]  [<c04eed89>] ? compaction_alloc+0x0/0x1ef
> [47064.283918]  [<c04eebdc>] compact_zone+0x24f/0x3fc
> [47064.284363]  [<c04ef0f4>] try_to_compact_pages+0x17c/0x1e6
> [47064.284820]  [<c04cac33>] __alloc_pages_nodemask+0x397/0x6b7
> [47064.285276]  [<c04caf75>] __get_free_pages+0x22/0x33
> [47064.285729]  [<c04f4304>] __kmalloc+0x2f/0x112
> [47064.286193]  [<c0435adc>] ? should_resched+0xd/0x28
> [47064.286655]  [<c0578ca9>] kzalloc.clone.58+0x12/0x14
> [47064.287118]  [<c057c7f1>] ext4_fill_super+0x1090/0x2521
> [47064.287573]  [<c040ea02>] ? native_sched_clock+0x14/0x52
> [47064.288029]  [<c054774a>] ? disk_name+0x86/0x90
> [47064.288477]  [<c0523bfa>] ? set_blocksize+0x33/0x78
> [47064.288930]  [<c0500aed>] mount_bdev+0x123/0x16d
> [47064.289385]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
> [47064.289842]  [<c05776fd>] ? ext4_mount+0x0/0x24
> [47064.290300]  [<c057771c>] ext4_mount+0x1f/0x24
> [47064.290760]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
> [47064.291222]  [<c05003e1>] vfs_kern_mount+0xa1/0x1ad
> [47064.291687]  [<c0511c36>] ? get_fs_type+0x38/0x91
> [47064.292149]  [<c0500546>] do_kern_mount+0x3d/0xc8
> [47064.292615]  [<c05142b2>] do_mount+0x614/0x640
> [47064.293056]  [<c04d69f0>] ? strndup_user+0x2e/0x3f
> [47064.293489]  [<c05144a1>] sys_mount+0x6d/0x99
> [47064.293925]  [<c040971f>] sysenter_do_call+0x12/0x38
> [47064.368116] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
> 
> -Arun

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-21 13:30   ` Ted Ts'o
  0 siblings, 0 replies; 37+ messages in thread
From: Ted Ts'o @ 2010-11-21 13:30 UTC (permalink / raw)
  To: linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote:
> I saw this in kernel log messages while testing 2.6.37-rc2. I think it
> appeared while mounting an external hard-disk. I can't seem to
> reproduce it.

I could be wrong but this looks like it's a bug in mm/migrate.c in
migrate_page_move_mapping(): it is calling radix_tree_lookup_slot()
without first taking an rcu_read_lock().

It was triggered by a memory allocation out of ext4_fill_super(),
which then triggered a memory compaction/migration, but I don't
believe it's otherwise related to the ext4 code.

Over to the linux-mm folks for confirmation...

					- Ted

> Please let me know if you need more info.
> 
> [47064.272151] ===================================================
> [47064.273474] [ INFO: suspicious rcu_dereference_check() usage. ]
> [47064.273956] ---------------------------------------------------
> [47064.274431] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> [47064.274905] 
> [47064.274906] other info that might help us debug this:
> [47064.274907] 
> [47064.276303] 
> [47064.276303] rcu_scheduler_active = 1, debug_locks = 0
> [47064.277202] 2 locks held by mount/21199:
> [47064.277635]  #0:  (&type->s_umount_key#20/1){+.+.+.}, at: [<c05007f0>] sget+0x21f/0x35d
> [47064.278078]  #1:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c04f5e67>] migrate_page_move_mapping+0x3a/0x120
> [47064.278529] 
> [47064.278529] stack backtrace:
> [47064.279409] Pid: 21199, comm: mount Not tainted 2.6.37-rc2-ab2-589136bfa784a4558b397f017ca2f06f0ca9080e+ #1
> [47064.279864] Call Trace:
> [47064.280313]  [<c0822e61>] ? printk+0x2d/0x34
> [47064.280765]  [<c04709c8>] lockdep_rcu_dereference+0x97/0x9f
> [47064.281220]  [<c04f5e28>] radix_tree_deref_slot+0x4a/0x4f
> [47064.281680]  [<c04f5ea7>] migrate_page_move_mapping+0x7a/0x120
> [47064.282129]  [<c04f6323>] migrate_page+0x1f/0x35
> [47064.282573]  [<c04f6464>] move_to_new_page+0x12b/0x164
> [47064.283017]  [<c04f6773>] migrate_pages+0x1e1/0x2ee
> [47064.283463]  [<c04eed89>] ? compaction_alloc+0x0/0x1ef
> [47064.283918]  [<c04eebdc>] compact_zone+0x24f/0x3fc
> [47064.284363]  [<c04ef0f4>] try_to_compact_pages+0x17c/0x1e6
> [47064.284820]  [<c04cac33>] __alloc_pages_nodemask+0x397/0x6b7
> [47064.285276]  [<c04caf75>] __get_free_pages+0x22/0x33
> [47064.285729]  [<c04f4304>] __kmalloc+0x2f/0x112
> [47064.286193]  [<c0435adc>] ? should_resched+0xd/0x28
> [47064.286655]  [<c0578ca9>] kzalloc.clone.58+0x12/0x14
> [47064.287118]  [<c057c7f1>] ext4_fill_super+0x1090/0x2521
> [47064.287573]  [<c040ea02>] ? native_sched_clock+0x14/0x52
> [47064.288029]  [<c054774a>] ? disk_name+0x86/0x90
> [47064.288477]  [<c0523bfa>] ? set_blocksize+0x33/0x78
> [47064.288930]  [<c0500aed>] mount_bdev+0x123/0x16d
> [47064.289385]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
> [47064.289842]  [<c05776fd>] ? ext4_mount+0x0/0x24
> [47064.290300]  [<c057771c>] ext4_mount+0x1f/0x24
> [47064.290760]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
> [47064.291222]  [<c05003e1>] vfs_kern_mount+0xa1/0x1ad
> [47064.291687]  [<c0511c36>] ? get_fs_type+0x38/0x91
> [47064.292149]  [<c0500546>] do_kern_mount+0x3d/0xc8
> [47064.292615]  [<c05142b2>] do_mount+0x614/0x640
> [47064.293056]  [<c04d69f0>] ? strndup_user+0x2e/0x3f
> [47064.293489]  [<c05144a1>] sys_mount+0x6d/0x99
> [47064.293925]  [<c040971f>] sysenter_do_call+0x12/0x38
> [47064.368116] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
> 
> -Arun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-21 13:30   ` Ted Ts'o
  0 siblings, 0 replies; 37+ messages in thread
From: Ted Ts'o @ 2010-11-21 13:30 UTC (permalink / raw)
  To: linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote:
> I saw this in kernel log messages while testing 2.6.37-rc2. I think it
> appeared while mounting an external hard-disk. I can't seem to
> reproduce it.

I could be wrong but this looks like it's a bug in mm/migrate.c in
migrate_page_move_mapping(): it is calling radix_tree_lookup_slot()
without first taking an rcu_read_lock().

It was triggered by a memory allocation out of ext4_fill_super(),
which then triggered a memory compaction/migration, but I don't
believe it's otherwise related to the ext4 code.

Over to the linux-mm folks for confirmation...

					- Ted

> Please let me know if you need more info.
> 
> [47064.272151] ===================================================
> [47064.273474] [ INFO: suspicious rcu_dereference_check() usage. ]
> [47064.273956] ---------------------------------------------------
> [47064.274431] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> [47064.274905] 
> [47064.274906] other info that might help us debug this:
> [47064.274907] 
> [47064.276303] 
> [47064.276303] rcu_scheduler_active = 1, debug_locks = 0
> [47064.277202] 2 locks held by mount/21199:
> [47064.277635]  #0:  (&type->s_umount_key#20/1){+.+.+.}, at: [<c05007f0>] sget+0x21f/0x35d
> [47064.278078]  #1:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c04f5e67>] migrate_page_move_mapping+0x3a/0x120
> [47064.278529] 
> [47064.278529] stack backtrace:
> [47064.279409] Pid: 21199, comm: mount Not tainted 2.6.37-rc2-ab2-589136bfa784a4558b397f017ca2f06f0ca9080e+ #1
> [47064.279864] Call Trace:
> [47064.280313]  [<c0822e61>] ? printk+0x2d/0x34
> [47064.280765]  [<c04709c8>] lockdep_rcu_dereference+0x97/0x9f
> [47064.281220]  [<c04f5e28>] radix_tree_deref_slot+0x4a/0x4f
> [47064.281680]  [<c04f5ea7>] migrate_page_move_mapping+0x7a/0x120
> [47064.282129]  [<c04f6323>] migrate_page+0x1f/0x35
> [47064.282573]  [<c04f6464>] move_to_new_page+0x12b/0x164
> [47064.283017]  [<c04f6773>] migrate_pages+0x1e1/0x2ee
> [47064.283463]  [<c04eed89>] ? compaction_alloc+0x0/0x1ef
> [47064.283918]  [<c04eebdc>] compact_zone+0x24f/0x3fc
> [47064.284363]  [<c04ef0f4>] try_to_compact_pages+0x17c/0x1e6
> [47064.284820]  [<c04cac33>] __alloc_pages_nodemask+0x397/0x6b7
> [47064.285276]  [<c04caf75>] __get_free_pages+0x22/0x33
> [47064.285729]  [<c04f4304>] __kmalloc+0x2f/0x112
> [47064.286193]  [<c0435adc>] ? should_resched+0xd/0x28
> [47064.286655]  [<c0578ca9>] kzalloc.clone.58+0x12/0x14
> [47064.287118]  [<c057c7f1>] ext4_fill_super+0x1090/0x2521
> [47064.287573]  [<c040ea02>] ? native_sched_clock+0x14/0x52
> [47064.288029]  [<c054774a>] ? disk_name+0x86/0x90
> [47064.288477]  [<c0523bfa>] ? set_blocksize+0x33/0x78
> [47064.288930]  [<c0500aed>] mount_bdev+0x123/0x16d
> [47064.289385]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
> [47064.289842]  [<c05776fd>] ? ext4_mount+0x0/0x24
> [47064.290300]  [<c057771c>] ext4_mount+0x1f/0x24
> [47064.290760]  [<c057b761>] ? ext4_fill_super+0x0/0x2521
> [47064.291222]  [<c05003e1>] vfs_kern_mount+0xa1/0x1ad
> [47064.291687]  [<c0511c36>] ? get_fs_type+0x38/0x91
> [47064.292149]  [<c0500546>] do_kern_mount+0x3d/0xc8
> [47064.292615]  [<c05142b2>] do_mount+0x614/0x640
> [47064.293056]  [<c04d69f0>] ? strndup_user+0x2e/0x3f
> [47064.293489]  [<c05144a1>] sys_mount+0x6d/0x99
> [47064.293925]  [<c040971f>] sysenter_do_call+0x12/0x38
> [47064.368116] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
> 
> -Arun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-21 13:30   ` Ted Ts'o
  (?)
@ 2010-11-21 15:39     ` Minchan Kim
  -1 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-11-21 15:39 UTC (permalink / raw)
  To: Ted Ts'o, linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Sun, Nov 21, 2010 at 08:30:24AM -0500, Ted Ts'o wrote:
> On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote:
> > I saw this in kernel log messages while testing 2.6.37-rc2. I think it
> > appeared while mounting an external hard-disk. I can't seem to
> > reproduce it.
> 
> I could be wrong but this looks like it's a bug in mm/migrate.c in
> migrate_page_move_mapping(): it is calling radix_tree_lookup_slot()
> without first taking an rcu_read_lock().
> 
> It was triggered by a memory allocation out of ext4_fill_super(),
> which then triggered a memory compaction/migration, but I don't
> believe it's otherwise related to the ext4 code.
> 
> Over to the linux-mm folks for confirmation...

I think it's no problem. 

That's because migration always holds lock_page on the file page.
So the page couldn't remove from radix. 


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-21 15:39     ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-11-21 15:39 UTC (permalink / raw)
  To: Ted Ts'o, linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney

On Sun, Nov 21, 2010 at 08:30:24AM -0500, Ted Ts'o wrote:
> On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote:
> > I saw this in kernel log messages while testing 2.6.37-rc2. I think it
> > appeared while mounting an external hard-disk. I can't seem to
> > reproduce it.
> 
> I could be wrong but this looks like it's a bug in mm/migrate.c in
> migrate_page_move_mapping(): it is calling radix_tree_lookup_slot()
> without first taking an rcu_read_lock().
> 
> It was triggered by a memory allocation out of ext4_fill_super(),
> which then triggered a memory compaction/migration, but I don't
> believe it's otherwise related to the ext4 code.
> 
> Over to the linux-mm folks for confirmation...

I think it's no problem. 

That's because migration always holds lock_page on the file page.
So the page couldn't remove from radix. 


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-21 15:39     ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-11-21 15:39 UTC (permalink / raw)
  To: Ted Ts'o, linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Sun, Nov 21, 2010 at 08:30:24AM -0500, Ted Ts'o wrote:
> On Sun, Nov 21, 2010 at 07:26:11PM +0800, Arun Bhanu wrote:
> > I saw this in kernel log messages while testing 2.6.37-rc2. I think it
> > appeared while mounting an external hard-disk. I can't seem to
> > reproduce it.
> 
> I could be wrong but this looks like it's a bug in mm/migrate.c in
> migrate_page_move_mapping(): it is calling radix_tree_lookup_slot()
> without first taking an rcu_read_lock().
> 
> It was triggered by a memory allocation out of ext4_fill_super(),
> which then triggered a memory compaction/migration, but I don't
> believe it's otherwise related to the ext4 code.
> 
> Over to the linux-mm folks for confirmation...

I think it's no problem. 

That's because migration always holds lock_page on the file page.
So the page couldn't remove from radix. 


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-21 15:39     ` Minchan Kim
@ 2010-11-21 17:37       ` Ted Ts'o
  -1 siblings, 0 replies; 37+ messages in thread
From: Ted Ts'o @ 2010-11-21 17:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> 
> I think it's no problem. 
> 
> That's because migration always holds lock_page on the file page.
> So the page couldn't remove from radix. 

It may be "ok" in that it won't cause a race, but it still leaves an
unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
cause /proc_lock_stat to be disabled.  So I think it still needs to be
fixed by adding rcu_read_lock()/rcu_read_unlock() to
migrate_page_move_mapping().

      	 					     - Ted

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-21 17:37       ` Ted Ts'o
  0 siblings, 0 replies; 37+ messages in thread
From: Ted Ts'o @ 2010-11-21 17:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-ext4, linux-mm, linux-kernel, Andreas Dilger,
	Paul McKenney, Eric Sandeen

On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> 
> I think it's no problem. 
> 
> That's because migration always holds lock_page on the file page.
> So the page couldn't remove from radix. 

It may be "ok" in that it won't cause a race, but it still leaves an
unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
cause /proc_lock_stat to be disabled.  So I think it still needs to be
fixed by adding rcu_read_lock()/rcu_read_unlock() to
migrate_page_move_mapping().

      	 					     - Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-21 17:37       ` Ted Ts'o
  (?)
@ 2010-11-22  0:38         ` Minchan Kim
  -1 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-11-22  0:38 UTC (permalink / raw)
  To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel,
	Andreas Dilger, Paul McKenney, Eric Sandeen

On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
>>
>> I think it's no problem.
>>
>> That's because migration always holds lock_page on the file page.
>> So the page couldn't remove from radix.
>
> It may be "ok" in that it won't cause a race, but it still leaves an
> unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> cause /proc_lock_stat to be disabled.  So I think it still needs to be
> fixed by adding rcu_read_lock()/rcu_read_unlock() to
> migrate_page_move_mapping().
>
>                                                     - Ted
>

Yes. if it is really "ok" about race, we will add rcu_read_lock with
below comment to prevent false positive.
"suppress RCU lockdep false positives".
But I am not sure it's good although rcu_read_lock is little cost.
Whenever we find false positive, should we add rcu_read_lock to
suppress although it's no problem in real product?
Couldn't we provide following function? (or we might have already it
but I missed it. )

/*
 * Suppress RCU lockdep false positive.
 */
#ifdef CONFIG_PROVE_RCU
#define rcu_read_lock_suppress rcu_read_lock
#else
#define rcu_read_lock_suppress
#endif


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-22  0:38         ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-11-22  0:38 UTC (permalink / raw)
  To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel,
	Andreas Dilger

On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
>>
>> I think it's no problem.
>>
>> That's because migration always holds lock_page on the file page.
>> So the page couldn't remove from radix.
>
> It may be "ok" in that it won't cause a race, but it still leaves an
> unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> cause /proc_lock_stat to be disabled.  So I think it still needs to be
> fixed by adding rcu_read_lock()/rcu_read_unlock() to
> migrate_page_move_mapping().
>
>                                                     - Ted
>

Yes. if it is really "ok" about race, we will add rcu_read_lock with
below comment to prevent false positive.
"suppress RCU lockdep false positives".
But I am not sure it's good although rcu_read_lock is little cost.
Whenever we find false positive, should we add rcu_read_lock to
suppress although it's no problem in real product?
Couldn't we provide following function? (or we might have already it
but I missed it. )

/*
 * Suppress RCU lockdep false positive.
 */
#ifdef CONFIG_PROVE_RCU
#define rcu_read_lock_suppress rcu_read_lock
#else
#define rcu_read_lock_suppress
#endif


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-22  0:38         ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-11-22  0:38 UTC (permalink / raw)
  To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel,
	Andreas Dilger, Paul McKenney, Eric Sandeen

On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
>>
>> I think it's no problem.
>>
>> That's because migration always holds lock_page on the file page.
>> So the page couldn't remove from radix.
>
> It may be "ok" in that it won't cause a race, but it still leaves an
> unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> cause /proc_lock_stat to be disabled.  So I think it still needs to be
> fixed by adding rcu_read_lock()/rcu_read_unlock() to
> migrate_page_move_mapping().
>
>                                                     - Ted
>

Yes. if it is really "ok" about race, we will add rcu_read_lock with
below comment to prevent false positive.
"suppress RCU lockdep false positives".
But I am not sure it's good although rcu_read_lock is little cost.
Whenever we find false positive, should we add rcu_read_lock to
suppress although it's no problem in real product?
Couldn't we provide following function? (or we might have already it
but I missed it. )

/*
 * Suppress RCU lockdep false positive.
 */
#ifdef CONFIG_PROVE_RCU
#define rcu_read_lock_suppress rcu_read_lock
#else
#define rcu_read_lock_suppress
#endif


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-22  0:38         ` Minchan Kim
  (?)
  (?)
@ 2010-11-22  3:31         ` Milton Miller
  2010-11-22  6:16           ` Paul E. McKenney
  -1 siblings, 1 reply; 37+ messages in thread
From: Milton Miller @ 2010-11-22  3:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Paul E. McKenney

On 2010-11-22 at around 0:38:49, Minchan Kim wrote:
> On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
> > On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> > >
> > > I think it's no problem.
> > >
> > > That's because migration always holds lock_page on the file page.
> > > So the page couldn't remove from radix.
> >
> > It may be "ok" in that it won't cause a race, but it still leaves an
> > unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> > cause /proc_lock_stat to be disabled.  So I think it still needs to be
> > fixed by adding rcu_read_lock()/rcu_read_unlock() to
> > migrate_page_move_mapping().
> >
> >                                                     - Ted
> >
> 
> Yes. if it is really "ok" about race, we will add rcu_read_lock with
> below comment to prevent false positive.
> "suppress RCU lockdep false positives".
> But I am not sure it's good although rcu_read_lock is little cost.
> Whenever we find false positive, should we add rcu_read_lock to
> suppress although it's no problem in real product?
> Couldn't we provide following function? (or we might have already it
> but I missed it. )
> 
> /*
>  * Suppress RCU lockdep false positive.
>  */
> #ifdef CONFIG_PROVE_RCU
> #define rcu_read_lock_suppress rcu_read_lock
> #else
> #define rcu_read_lock_suppress
> #endif

No, you don't need anything like this, as rcu_dereference_check already
takes a test for alternate locking.

However, looking more closely at the code, it appears this is the
"the tree is write locked" case as described in radix-tree.h

Looking at rcupdate.h, perhaps we need a version of radix_tree_deref_slot
that uses rcu_dereference_protected?

Copying Paul McKenney for rcu ...

milton

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-22  3:31         ` Milton Miller
@ 2010-11-22  6:16           ` Paul E. McKenney
  2010-12-07 19:01             ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer
  0 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2010-11-22  6:16 UTC (permalink / raw)
  To: Milton Miller
  Cc: Minchan Kim, linux-kernel, linux-mm, linux-ext4, Ted Ts'o,
	Arun Bhanu

On Sun, Nov 21, 2010 at 09:31:14PM -0600, Milton Miller wrote:
> On 2010-11-22 at around 0:38:49, Minchan Kim wrote:
> > On Mon, Nov 22, 2010 at 2:37 AM, Ted Ts'o <tytso@mit.edu> wrote:
> > > On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> > > >
> > > > I think it's no problem.
> > > >
> > > > That's because migration always holds lock_page on the file page.
> > > > So the page couldn't remove from radix.
> > >
> > > It may be "ok" in that it won't cause a race, but it still leaves an
> > > unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> > > cause /proc_lock_stat to be disabled.  So I think it still needs to be
> > > fixed by adding rcu_read_lock()/rcu_read_unlock() to
> > > migrate_page_move_mapping().
> > >
> > >                                                     - Ted
> > >
> > 
> > Yes. if it is really "ok" about race, we will add rcu_read_lock with
> > below comment to prevent false positive.
> > "suppress RCU lockdep false positives".
> > But I am not sure it's good although rcu_read_lock is little cost.
> > Whenever we find false positive, should we add rcu_read_lock to
> > suppress although it's no problem in real product?
> > Couldn't we provide following function? (or we might have already it
> > but I missed it. )
> > 
> > /*
> >  * Suppress RCU lockdep false positive.
> >  */
> > #ifdef CONFIG_PROVE_RCU
> > #define rcu_read_lock_suppress rcu_read_lock
> > #else
> > #define rcu_read_lock_suppress
> > #endif
> 
> No, you don't need anything like this, as rcu_dereference_check already
> takes a test for alternate locking.
> 
> However, looking more closely at the code, it appears this is the
> "the tree is write locked" case as described in radix-tree.h
> 
> Looking at rcupdate.h, perhaps we need a version of radix_tree_deref_slot
> that uses rcu_dereference_protected?
> 
> Copying Paul McKenney for rcu ...

This approach could work.  One way of doing it would be to add a second
argument:

static inline void *radix_tree_deref_slot_check(void **pslot, int ldc)
{
	void *ret = rcu_dereference_check(*pslot, ldc);
	if (unlikely(radix_tree_is_indirect_ptr(ret)))
		ret = RADIX_TREE_RETRY;
	return ret;
}

static inline void *radix_tree_deref_slot(void **pslot)
{
	return radix_tree_deref_slot_check(pslot, rcu_read_lock_held());
}

Another alternative would have radix_tree_deref_slot() pass "1" into
the "ldc" argument, which reduces splats but at the expense of failing
to detect problems.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
  2010-11-21 17:37       ` Ted Ts'o
  (?)
@ 2010-11-23  7:16         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 37+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel,
	Andreas Dilger, Paul McKenney, Eric Sandeen
  Cc: kosaki.motohiro

> On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> > 
> > I think it's no problem. 
> > 
> > That's because migration always holds lock_page on the file page.
> > So the page couldn't remove from radix. 
> 
> It may be "ok" in that it won't cause a race, but it still leaves an
> unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> cause /proc_lock_stat to be disabled.  So I think it still needs to be
> fixed by adding rcu_read_lock()/rcu_read_unlock() to
> migrate_page_move_mapping().

Hi Ted,

Current mmotm has following patch and I think it should be fixed your
issue.

Thanks.




From: Zeng Zhaoming <zengzm.kernel@gmail.com>

find_task_by_vpid() should be protected by rcu_read_lock(), to prevent
free_pid() reclaiming pid.

Signed-off-by: Zeng Zhaoming <zengzm.kernel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |    3 +++
 1 file changed, 3 insertions(+)

diff -puN mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure mm/mempolicy.c
--- a/mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure
+++ a/mm/mempolicy.c
@@ -1307,15 +1307,18 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pi
                goto out;

        /* Find the mm_struct */
+       rcu_read_lock();
        read_lock(&tasklist_lock);
        task = pid ? find_task_by_vpid(pid) : current;
        if (!task) {
                read_unlock(&tasklist_lock);
+               rcu_read_unlock();
                err = -ESRCH;
                goto out;
        }
        mm = get_task_mm(task);
        read_unlock(&tasklist_lock);
+       rcu_read_unlock();



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-23  7:16         ` KOSAKI Motohiro
  0 siblings, 0 replies; 37+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel,
	Andreas Dilger
  Cc: kosaki.motohiro

> On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> > 
> > I think it's no problem. 
> > 
> > That's because migration always holds lock_page on the file page.
> > So the page couldn't remove from radix. 
> 
> It may be "ok" in that it won't cause a race, but it still leaves an
> unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> cause /proc_lock_stat to be disabled.  So I think it still needs to be
> fixed by adding rcu_read_lock()/rcu_read_unlock() to
> migrate_page_move_mapping().

Hi Ted,

Current mmotm has following patch and I think it should be fixed your
issue.

Thanks.




From: Zeng Zhaoming <zengzm.kernel@gmail.com>

find_task_by_vpid() should be protected by rcu_read_lock(), to prevent
free_pid() reclaiming pid.

Signed-off-by: Zeng Zhaoming <zengzm.kernel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |    3 +++
 1 file changed, 3 insertions(+)

diff -puN mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure mm/mempolicy.c
--- a/mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure
+++ a/mm/mempolicy.c
@@ -1307,15 +1307,18 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pi
                goto out;

        /* Find the mm_struct */
+       rcu_read_lock();
        read_lock(&tasklist_lock);
        task = pid ? find_task_by_vpid(pid) : current;
        if (!task) {
                read_unlock(&tasklist_lock);
+               rcu_read_unlock();
                err = -ESRCH;
                goto out;
        }
        mm = get_task_mm(task);
        read_unlock(&tasklist_lock);
+       rcu_read_unlock();


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage
@ 2010-11-23  7:16         ` KOSAKI Motohiro
  0 siblings, 0 replies; 37+ messages in thread
From: KOSAKI Motohiro @ 2010-11-23  7:16 UTC (permalink / raw)
  To: Ted Ts'o, Minchan Kim, linux-ext4, linux-mm, linux-kernel,
	Andreas Dilger, Paul McKenney, Eric Sandeen
  Cc: kosaki.motohiro

> On Mon, Nov 22, 2010 at 12:39:49AM +0900, Minchan Kim wrote:
> > 
> > I think it's no problem. 
> > 
> > That's because migration always holds lock_page on the file page.
> > So the page couldn't remove from radix. 
> 
> It may be "ok" in that it won't cause a race, but it still leaves an
> unsightly warning if LOCKDEP is enabled, and LOCKDEP warnings will
> cause /proc_lock_stat to be disabled.  So I think it still needs to be
> fixed by adding rcu_read_lock()/rcu_read_unlock() to
> migrate_page_move_mapping().

Hi Ted,

Current mmotm has following patch and I think it should be fixed your
issue.

Thanks.




From: Zeng Zhaoming <zengzm.kernel@gmail.com>

find_task_by_vpid() should be protected by rcu_read_lock(), to prevent
free_pid() reclaiming pid.

Signed-off-by: Zeng Zhaoming <zengzm.kernel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mempolicy.c |    3 +++
 1 file changed, 3 insertions(+)

diff -puN mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure mm/mempolicy.c
--- a/mm/mempolicy.c~mm-mempolicyc-add-rcu-read-lock-to-protect-pid-structure
+++ a/mm/mempolicy.c
@@ -1307,15 +1307,18 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pi
                goto out;

        /* Find the mm_struct */
+       rcu_read_lock();
        read_lock(&tasklist_lock);
        task = pid ? find_task_by_vpid(pid) : current;
        if (!task) {
                read_unlock(&tasklist_lock);
+               rcu_read_unlock();
                err = -ESRCH;
                goto out;
        }
        mm = get_task_mm(task);
        read_unlock(&tasklist_lock);
+       rcu_read_unlock();


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-11-22  6:16           ` Paul E. McKenney
@ 2010-12-07 19:01             ` Gerald Schaefer
  2010-12-08  1:19               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 37+ messages in thread
From: Gerald Schaefer @ 2010-12-07 19:01 UTC (permalink / raw)
  To: paulmck, KAMEZAWA Hiroyuki
  Cc: Milton Miller, Minchan Kim, linux-kernel, linux-mm, linux-ext4,
	Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton,
	Heiko Carstens, Martin Schwidefsky

I got the same warning as reported by Arun Bhanu in "[BUG?] [Ext4] INFO:
suspicious rcu_dereference_check() usage" during memory hotplug test on
2.6.37-rc5, see below.

migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous
pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d.
This should be ok, as the comment suggests, so the warning is probably a
false positive. Eventually, migrate_page_move_mapping() will call
radix_tree_deref_slot() with tree_lock held, but w/o rcu_read_lock() for
non-anonymous pages.

As suggested by Milton before, a new version of radix_tree_deref_slot that
uses rcu_dereference_protected could help, if this is a false positive, or
maybe the rcu_read_lock() should be called for all pages, not just anonymous.
Any thoughts?


===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
6 locks held by sh/960:
 #0:  (&buffer->mutex){+.+.+.}, at: [<00000000002afa7a>] sysfs_write_file+0x4a/0x1a8
 #1:  (s_active#52){.+.+.+}, at: [<00000000002afb02>] sysfs_write_file+0xd2/0x1a8
 #2:  (&mem->state_mutex){+.+.+.}, at: [<0000000000404d78>] memory_block_change_state+0x40/0x1a0
 #3:  (mem_hotplug_mutex){+.+.+.}, at: [<00000000002242c4>] lock_memory_hotplug+0x2c/0x44
 #4:  (pm_mutex){+.+.+.}, at: [<000000000022483c>] remove_memory+0x104/0x758
 #5:  (&(&inode->i_data.tree_lock)->rlock){..-...}, at: [<000000000022624a>] migrate_page_move_mapping+0x4a/0x2d8

stack backtrace:
CPU: 2 Not tainted 2.6.37-rc5 #2
Process sh (pid: 960, task: 0000000018e38640, ksp: 000000000f037818)
000000000f037ad8 000000000f037a58 0000000000000002 0000000000000000
       000000000f037af8 000000000f037a70 000000000f037a70 0000000000567c42
       0000000000000000 0000000017bd7e78 0000000000000002 0000000017bd2e30
       000003e00000000d 000003e00000000c 000000000f037ac0 0000000000000000
       0000000000000000 0000000000100bfa 000000000f037a58 000000000f037a98
Call Trace:
([<0000000000100b02>] show_trace+0xee/0x144)
 [<00000000002263ea>] migrate_page_move_mapping+0x1ea/0x2d8
 [<0000000000226b1c>] migrate_page+0x38/0x68
 [<0000000000226c36>] move_to_new_page+0xea/0x2bc
 [<00000000002276f6>] migrate_pages+0x496/0x568
 [<0000000000224be6>] remove_memory+0x4ae/0x758
 [<0000000000404e20>] memory_block_change_state+0xe8/0x1a0
 [<0000000000404fda>] store_mem_state+0x102/0x138
 [<00000000002afb26>] sysfs_write_file+0xf6/0x1a8
 [<000000000023357c>] vfs_write+0xac/0x18c
 [<0000000000233758>] SyS_write+0x58/0xa8
 [<0000000000113976>] sysc_noemu+0x16/0x1c
 [<0000020000162edc>] 0x20000162edc
INFO: lockdep is turned off.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-07 19:01             ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer
@ 2010-12-08  1:19               ` KAMEZAWA Hiroyuki
  2010-12-16 13:50                 ` Gerald Schaefer
  0 siblings, 1 reply; 37+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-08  1:19 UTC (permalink / raw)
  To: gerald.schaefer
  Cc: paulmck, Milton Miller, Minchan Kim, linux-kernel, linux-mm,
	linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton,
	Heiko Carstens, Martin Schwidefsky

On Tue, 07 Dec 2010 20:01:49 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> I got the same warning as reported by Arun Bhanu in "[BUG?] [Ext4] INFO:
> suspicious rcu_dereference_check() usage" during memory hotplug test on
> 2.6.37-rc5, see below.
> 
> migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous
> pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d.
> This should be ok, as the comment suggests, so the warning is probably a
> false positive. Eventually, migrate_page_move_mapping() will call
> radix_tree_deref_slot() with tree_lock held, but w/o rcu_read_lock() for
> non-anonymous pages.
> 
> As suggested by Milton before, a new version of radix_tree_deref_slot that
> uses rcu_dereference_protected could help, if this is a false positive, or
> maybe the rcu_read_lock() should be called for all pages, not just anonymous.
> Any thoughts?
> 

Hmm, in radix_tree_deref_slot() comment,

 140  * For use with radix_tree_lookup_slot().  Caller must hold tree at least read
 141  * locked across slot lookup and dereference.  More likely, will be used with
 142  * radix_tree_replace_slot(), as well, so caller will hold tree write locked.

racy update must not happen. rcu_dereference_protected() sounds nice.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-08  1:19               ` KAMEZAWA Hiroyuki
@ 2010-12-16 13:50                 ` Gerald Schaefer
  2010-12-17  0:04                     ` Minchan Kim
  0 siblings, 1 reply; 37+ messages in thread
From: Gerald Schaefer @ 2010-12-16 13:50 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: paulmck, Milton Miller, Minchan Kim, linux-kernel, linux-mm,
	linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman, Andrew Morton,
	Heiko Carstens, Martin Schwidefsky

I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
below. Both cases are easily reproducible: memory unplug with big page cache,
or adding large pages during run-time.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
1 lock held by bash/761:
 #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8

stack backtrace:
CPU: 1 Not tainted 2.6.37-rc6 #4
Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
00000000181bb818 00000000181bb798 0000000000000002 0000000000000000 
       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa 
       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30 
       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000 
       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8 
Call Trace:
([<0000000000100b02>] show_trace+0xee/0x144)
 [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
 [<0000000000226c80>] migrate_page+0x38/0x68
 [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
 [<000000000022785a>] migrate_pages+0x496/0x568
 [<000000000021e24e>] compact_zone+0x432/0x7d8
 [<000000000021e772>] compact_zone_order+0x9e/0xbc
 [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
 [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
 [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
 [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
 [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
 [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
 [<00000000002a65be>] proc_sys_write+0x26/0x34
 [<00000000002336e0>] vfs_write+0xac/0x18c
 [<00000000002338bc>] SyS_write+0x58/0xa8
 [<0000000000113976>] sysc_noemu+0x16/0x1c
 [<0000020000162edc>] 0x20000162edc
INFO: lockdep is turned off.

I honestly do not understand 100% why this is a false positive, seeing that
e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
the &mapping->tree_lock instead. So I'm not quite sure how to fix this
properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
even if it is not necessary for synchronization, would get rid of the warning,
like in the following patch. Any ideas?

---
 fs/hugetlbfs/inode.c |    2 ++
 mm/migrate.c         |    4 ++++
 2 files changed, 6 insertions(+)

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
 {
 	int rc;
 
+	rcu_read_lock();
 	rc = migrate_huge_page_move_mapping(mapping, newpage, page);
+	rcu_read_unlock();
 	if (rc)
 		return rc;
 	migrate_page_copy(newpage, page);
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
 
 	BUG_ON(PageWriteback(page));	/* Writeback must be complete */
 
+	rcu_read_lock();
 	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rcu_read_unlock();
 
 	if (rc)
 		return rc;
@@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
 
 	head = page_buffers(page);
 
+	rcu_read_lock();
 	rc = migrate_page_move_mapping(mapping, newpage, page);
+	rcu_read_unlock();
 
 	if (rc)
 		return rc;



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-16 13:50                 ` Gerald Schaefer
@ 2010-12-17  0:04                     ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-12-17  0:04 UTC (permalink / raw)
  To: gerald.schaefer
  Cc: KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
<gerald.schaefer@de.ibm.com> wrote:
> I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> below. Both cases are easily reproducible: memory unplug with big page cache,
> or adding large pages during run-time.
>
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by bash/761:
>  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
>
> stack backtrace:
> CPU: 1 Not tainted 2.6.37-rc6 #4
> Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
>       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
>       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
>       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
>       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> Call Trace:
> ([<0000000000100b02>] show_trace+0xee/0x144)
>  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
>  [<0000000000226c80>] migrate_page+0x38/0x68
>  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
>  [<000000000022785a>] migrate_pages+0x496/0x568
>  [<000000000021e24e>] compact_zone+0x432/0x7d8
>  [<000000000021e772>] compact_zone_order+0x9e/0xbc
>  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
>  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
>  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
>  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
>  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
>  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
>  [<00000000002a65be>] proc_sys_write+0x26/0x34
>  [<00000000002336e0>] vfs_write+0xac/0x18c
>  [<00000000002338bc>] SyS_write+0x58/0xa8
>  [<0000000000113976>] sysc_noemu+0x16/0x1c
>  [<0000020000162edc>] 0x20000162edc
> INFO: lockdep is turned off.
>
> I honestly do not understand 100% why this is a false positive, seeing that
> e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> even if it is not necessary for synchronization, would get rid of the warning,
> like in the following patch. Any ideas?

In case of anon page, we hold rcu_read_lock in unmap_and_move.
The problem is file-backed page. In case of that, we hold lock_page
and mapping->tree_lock as update-side lock.
So we don't need rcu_read_lock.

>
> ---
>  fs/hugetlbfs/inode.c |    2 ++
>  mm/migrate.c         |    4 ++++
>  2 files changed, 6 insertions(+)
>
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
>  {
>        int rc;
>
> +       rcu_read_lock();
>        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> +       rcu_read_unlock();
>        if (rc)
>                return rc;
>        migrate_page_copy(newpage, page);
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
>
>        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
>
> +       rcu_read_lock();
>        rc = migrate_page_move_mapping(mapping, newpage, page);
> +       rcu_read_unlock();
>
>        if (rc)
>                return rc;
> @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
>
>        head = page_buffers(page);
>
> +       rcu_read_lock();
>        rc = migrate_page_move_mapping(mapping, newpage, page);
> +       rcu_read_unlock();
>
>        if (rc)
>                return rc;
>
>
>


How about this?
Maybe Paul have better idea.
(It's apparently be word-wrapped.)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ab2baa5..135af1e 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
 }

 /**
+ * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
+ * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
+ * Returns:    item that was stored in that slot with any direct pointer flag
+ *             removed.
+ *
+ * This functions works like radix_tree_deref_slot except it doesn't check
+ * RCU rule. Normally this funcion is used with update-side lock.
+ * You should use this function very carefully.
+ */
+static inline void *radix_tree_deref_slot_nocheck(void **pslot)
+{
+       return rcu_dereference_protected(*pslot, 1);
+}
+/**
  * radix_tree_deref_retry      - check radix_tree_deref_slot
  * @arg:       pointer returned by radix_tree_deref_slot
  * Returns:    0 if retry is not required, otherwise retry is required
diff --git a/mm/migrate.c b/mm/migrate.c
index 2eb2243..5be2841 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
address_space *mappin

        expected_count = 2 + page_has_private(page);
        if (page_count(page) != expected_count ||
-                       (struct page *)radix_tree_deref_slot(pslot) != page) {
+                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
+                               != page) {
                spin_unlock_irq(&mapping->tree_lock);
                return -EAGAIN;
        }
@@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
address_space *mapping,

        expected_count = 2 + page_has_private(page);
        if (page_count(page) != expected_count ||
-           (struct page *)radix_tree_deref_slot(pslot) != page) {
+           (struct page *)radix_tree_deref_slot_nocheck(pslot)
+                               != page) {
                spin_unlock_irq(&mapping->tree_lock);
                return -EAGAIN;
        }




-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17  0:04                     ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-12-17  0:04 UTC (permalink / raw)
  To: gerald.schaefer
  Cc: KAMEZAWA Hiroyuki, paulmck, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
<gerald.schaefer@de.ibm.com> wrote:
> I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> below. Both cases are easily reproducible: memory unplug with big page cache,
> or adding large pages during run-time.
>
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by bash/761:
>  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
>
> stack backtrace:
> CPU: 1 Not tainted 2.6.37-rc6 #4
> Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
>       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
>       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
>       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
>       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> Call Trace:
> ([<0000000000100b02>] show_trace+0xee/0x144)
>  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
>  [<0000000000226c80>] migrate_page+0x38/0x68
>  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
>  [<000000000022785a>] migrate_pages+0x496/0x568
>  [<000000000021e24e>] compact_zone+0x432/0x7d8
>  [<000000000021e772>] compact_zone_order+0x9e/0xbc
>  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
>  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
>  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
>  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
>  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
>  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
>  [<00000000002a65be>] proc_sys_write+0x26/0x34
>  [<00000000002336e0>] vfs_write+0xac/0x18c
>  [<00000000002338bc>] SyS_write+0x58/0xa8
>  [<0000000000113976>] sysc_noemu+0x16/0x1c
>  [<0000020000162edc>] 0x20000162edc
> INFO: lockdep is turned off.
>
> I honestly do not understand 100% why this is a false positive, seeing that
> e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> even if it is not necessary for synchronization, would get rid of the warning,
> like in the following patch. Any ideas?

In case of anon page, we hold rcu_read_lock in unmap_and_move.
The problem is file-backed page. In case of that, we hold lock_page
and mapping->tree_lock as update-side lock.
So we don't need rcu_read_lock.

>
> ---
>  fs/hugetlbfs/inode.c |    2 ++
>  mm/migrate.c         |    4 ++++
>  2 files changed, 6 insertions(+)
>
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
>  {
>        int rc;
>
> +       rcu_read_lock();
>        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> +       rcu_read_unlock();
>        if (rc)
>                return rc;
>        migrate_page_copy(newpage, page);
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
>
>        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
>
> +       rcu_read_lock();
>        rc = migrate_page_move_mapping(mapping, newpage, page);
> +       rcu_read_unlock();
>
>        if (rc)
>                return rc;
> @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
>
>        head = page_buffers(page);
>
> +       rcu_read_lock();
>        rc = migrate_page_move_mapping(mapping, newpage, page);
> +       rcu_read_unlock();
>
>        if (rc)
>                return rc;
>
>
>


How about this?
Maybe Paul have better idea.
(It's apparently be word-wrapped.)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ab2baa5..135af1e 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
 }

 /**
+ * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
+ * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
+ * Returns:    item that was stored in that slot with any direct pointer flag
+ *             removed.
+ *
+ * This functions works like radix_tree_deref_slot except it doesn't check
+ * RCU rule. Normally this funcion is used with update-side lock.
+ * You should use this function very carefully.
+ */
+static inline void *radix_tree_deref_slot_nocheck(void **pslot)
+{
+       return rcu_dereference_protected(*pslot, 1);
+}
+/**
  * radix_tree_deref_retry      - check radix_tree_deref_slot
  * @arg:       pointer returned by radix_tree_deref_slot
  * Returns:    0 if retry is not required, otherwise retry is required
diff --git a/mm/migrate.c b/mm/migrate.c
index 2eb2243..5be2841 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
address_space *mappin

        expected_count = 2 + page_has_private(page);
        if (page_count(page) != expected_count ||
-                       (struct page *)radix_tree_deref_slot(pslot) != page) {
+                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
+                               != page) {
                spin_unlock_irq(&mapping->tree_lock);
                return -EAGAIN;
        }
@@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
address_space *mapping,

        expected_count = 2 + page_has_private(page);
        if (page_count(page) != expected_count ||
-           (struct page *)radix_tree_deref_slot(pslot) != page) {
+           (struct page *)radix_tree_deref_slot_nocheck(pslot)
+                               != page) {
                spin_unlock_irq(&mapping->tree_lock);
                return -EAGAIN;
        }




-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  0:04                     ` Minchan Kim
@ 2010-12-17  5:47                       ` Paul E. McKenney
  -1 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2010-12-17  5:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> <gerald.schaefer@de.ibm.com> wrote:
> > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > below. Both cases are easily reproducible: memory unplug with big page cache,
> > or adding large pages during run-time.
> >
> > ===================================================
> > [ INFO: suspicious rcu_dereference_check() usage. ]
> > ---------------------------------------------------
> > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 1, debug_locks = 0
> > 1 lock held by bash/761:
> >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> >
> > stack backtrace:
> > CPU: 1 Not tainted 2.6.37-rc6 #4
> > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > Call Trace:
> > ([<0000000000100b02>] show_trace+0xee/0x144)
> >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> >  [<0000000000226c80>] migrate_page+0x38/0x68
> >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> >  [<000000000022785a>] migrate_pages+0x496/0x568
> >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> >  [<00000000002336e0>] vfs_write+0xac/0x18c
> >  [<00000000002338bc>] SyS_write+0x58/0xa8
> >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> >  [<0000020000162edc>] 0x20000162edc
> > INFO: lockdep is turned off.
> >
> > I honestly do not understand 100% why this is a false positive, seeing that
> > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > even if it is not necessary for synchronization, would get rid of the warning,
> > like in the following patch. Any ideas?
> 
> In case of anon page, we hold rcu_read_lock in unmap_and_move.
> The problem is file-backed page. In case of that, we hold lock_page
> and mapping->tree_lock as update-side lock.
> So we don't need rcu_read_lock.
> 
> >
> > ---
> >  fs/hugetlbfs/inode.c |    2 ++
> >  mm/migrate.c         |    4 ++++
> >  2 files changed, 6 insertions(+)
> >
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> >  {
> >        int rc;
> >
> > +       rcu_read_lock();
> >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >        if (rc)
> >                return rc;
> >        migrate_page_copy(newpage, page);
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> >
> >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> >
> >        head = page_buffers(page);
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> >
> >
> >
> 
> 
> How about this?
> Maybe Paul have better idea.
> (It's apparently be word-wrapped.)
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..135af1e 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
> 
>  /**
> + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:    item that was stored in that slot with any direct pointer flag
> + *             removed.
> + *
> + * This functions works like radix_tree_deref_slot except it doesn't check
> + * RCU rule. Normally this funcion is used with update-side lock.
> + * You should use this function very carefully.
> + */
> +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> +{
> +       return rcu_dereference_protected(*pslot, 1);

I suggest replacing the "1" with lockdep expressions for the locks
that you say might be held:

	return rcu_dereference_check(*pslot,
				     lockdep_is_held(&mapping->tree_lock));

This assumes that when you said "and" you meant both lock_page() and
mapping->tree_lock.  Also you need to pass in the mapping, which
should not be a problem given likely inlining.

If you meant that either mapping->tree_lock or page_lock() might be
held, I suppose that the page_lock() state could be passed in, but
perhaps better to take a general lockdep expression.

So, either or both?  ;-)

						Thanx, Paul

> +}
> +/**
>   * radix_tree_deref_retry      - check radix_tree_deref_slot
>   * @arg:       pointer returned by radix_tree_deref_slot
>   * Returns:    0 if retry is not required, otherwise retry is required
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2eb2243..5be2841 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> address_space *mappin
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }
> @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> address_space *mapping,
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }
> 
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17  5:47                       ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2010-12-17  5:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> <gerald.schaefer@de.ibm.com> wrote:
> > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > below. Both cases are easily reproducible: memory unplug with big page cache,
> > or adding large pages during run-time.
> >
> > ===================================================
> > [ INFO: suspicious rcu_dereference_check() usage. ]
> > ---------------------------------------------------
> > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 1, debug_locks = 0
> > 1 lock held by bash/761:
> >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> >
> > stack backtrace:
> > CPU: 1 Not tainted 2.6.37-rc6 #4
> > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > Call Trace:
> > ([<0000000000100b02>] show_trace+0xee/0x144)
> >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> >  [<0000000000226c80>] migrate_page+0x38/0x68
> >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> >  [<000000000022785a>] migrate_pages+0x496/0x568
> >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> >  [<00000000002336e0>] vfs_write+0xac/0x18c
> >  [<00000000002338bc>] SyS_write+0x58/0xa8
> >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> >  [<0000020000162edc>] 0x20000162edc
> > INFO: lockdep is turned off.
> >
> > I honestly do not understand 100% why this is a false positive, seeing that
> > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > even if it is not necessary for synchronization, would get rid of the warning,
> > like in the following patch. Any ideas?
> 
> In case of anon page, we hold rcu_read_lock in unmap_and_move.
> The problem is file-backed page. In case of that, we hold lock_page
> and mapping->tree_lock as update-side lock.
> So we don't need rcu_read_lock.
> 
> >
> > ---
> >  fs/hugetlbfs/inode.c |    2 ++
> >  mm/migrate.c         |    4 ++++
> >  2 files changed, 6 insertions(+)
> >
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> >  {
> >        int rc;
> >
> > +       rcu_read_lock();
> >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >        if (rc)
> >                return rc;
> >        migrate_page_copy(newpage, page);
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> >
> >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> >
> >        head = page_buffers(page);
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> >
> >
> >
> 
> 
> How about this?
> Maybe Paul have better idea.
> (It's apparently be word-wrapped.)
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..135af1e 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
> 
>  /**
> + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:    item that was stored in that slot with any direct pointer flag
> + *             removed.
> + *
> + * This functions works like radix_tree_deref_slot except it doesn't check
> + * RCU rule. Normally this funcion is used with update-side lock.
> + * You should use this function very carefully.
> + */
> +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> +{
> +       return rcu_dereference_protected(*pslot, 1);

I suggest replacing the "1" with lockdep expressions for the locks
that you say might be held:

	return rcu_dereference_check(*pslot,
				     lockdep_is_held(&mapping->tree_lock));

This assumes that when you said "and" you meant both lock_page() and
mapping->tree_lock.  Also you need to pass in the mapping, which
should not be a problem given likely inlining.

If you meant that either mapping->tree_lock or page_lock() might be
held, I suppose that the page_lock() state could be passed in, but
perhaps better to take a general lockdep expression.

So, either or both?  ;-)

						Thanx, Paul

> +}
> +/**
>   * radix_tree_deref_retry      - check radix_tree_deref_slot
>   * @arg:       pointer returned by radix_tree_deref_slot
>   * Returns:    0 if retry is not required, otherwise retry is required
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2eb2243..5be2841 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> address_space *mappin
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }
> @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> address_space *mapping,
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }
> 
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  5:47                       ` Paul E. McKenney
@ 2010-12-17  5:59                         ` Eric Dumazet
  -1 siblings, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-12-17  5:59 UTC (permalink / raw)
  To: paulmck
  Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky

Le jeudi 16 décembre 2010 à 21:47 -0800, Paul E. McKenney a écrit :
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> 
> I suggest replacing the "1" with lockdep expressions for the locks
> that you say might be held:
> 
> 	return rcu_dereference_check(*pslot,
> 				     lockdep_is_held(&mapping->tree_lock));
> 

Yes, but point was also to use rcu_dereference_protected() here, not
rcu_dereference_check().



> This assumes that when you said "and" you meant both lock_page() and
> mapping->tree_lock.  Also you need to pass in the mapping, which
> should not be a problem given likely inlining.
> 
> If you meant that either mapping->tree_lock or page_lock() might be
> held, I suppose that the page_lock() state could be passed in, but
> perhaps better to take a general lockdep expression.
> 
> So, either or both?  ;-)
> 

Thanks



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17  5:59                         ` Eric Dumazet
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-12-17  5:59 UTC (permalink / raw)
  To: paulmck
  Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Mel Gorman, Andrew Morton, Heiko Carstens, Martin Schwidefsky

Le jeudi 16 décembre 2010 à 21:47 -0800, Paul E. McKenney a écrit :
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> 
> I suggest replacing the "1" with lockdep expressions for the locks
> that you say might be held:
> 
> 	return rcu_dereference_check(*pslot,
> 				     lockdep_is_held(&mapping->tree_lock));
> 

Yes, but point was also to use rcu_dereference_protected() here, not
rcu_dereference_check().



> This assumes that when you said "and" you meant both lock_page() and
> mapping->tree_lock.  Also you need to pass in the mapping, which
> should not be a problem given likely inlining.
> 
> If you meant that either mapping->tree_lock or page_lock() might be
> held, I suppose that the page_lock() state could be passed in, but
> perhaps better to take a general lockdep expression.
> 
> So, either or both?  ;-)
> 

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  0:04                     ` Minchan Kim
@ 2010-12-17  8:39                       ` Mel Gorman
  -1 siblings, 0 replies; 37+ messages in thread
From: Mel Gorman @ 2010-12-17  8:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> <gerald.schaefer@de.ibm.com> wrote:
> > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > below. Both cases are easily reproducible: memory unplug with big page cache,
> > or adding large pages during run-time.
> >
> > ===================================================
> > [ INFO: suspicious rcu_dereference_check() usage. ]
> > ---------------------------------------------------
> > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 1, debug_locks = 0
> > 1 lock held by bash/761:
> >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> >
> > stack backtrace:
> > CPU: 1 Not tainted 2.6.37-rc6 #4
> > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > Call Trace:
> > ([<0000000000100b02>] show_trace+0xee/0x144)
> >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> >  [<0000000000226c80>] migrate_page+0x38/0x68
> >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> >  [<000000000022785a>] migrate_pages+0x496/0x568
> >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> >  [<00000000002336e0>] vfs_write+0xac/0x18c
> >  [<00000000002338bc>] SyS_write+0x58/0xa8
> >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> >  [<0000020000162edc>] 0x20000162edc
> > INFO: lockdep is turned off.
> >
> > I honestly do not understand 100% why this is a false positive, seeing that
> > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > even if it is not necessary for synchronization, would get rid of the warning,
> > like in the following patch. Any ideas?
> 
> In case of anon page, we hold rcu_read_lock in unmap_and_move.
> The problem is file-backed page. In case of that, we hold lock_page
> and mapping->tree_lock as update-side lock.
> So we don't need rcu_read_lock.
> 
> >
> > ---
> >  fs/hugetlbfs/inode.c |    2 ++
> >  mm/migrate.c         |    4 ++++
> >  2 files changed, 6 insertions(+)
> >
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> >  {
> >        int rc;
> >
> > +       rcu_read_lock();
> >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >        if (rc)
> >                return rc;
> >        migrate_page_copy(newpage, page);
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> >
> >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> >
> >        head = page_buffers(page);
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> >
> >
> >
> 
> 
> How about this?
> Maybe Paul have better idea.
> (It's apparently be word-wrapped.)
> 

heh, I wrote a patch almost identical to this and ran it overnight for testing
(test was a memory consumer running while a parallel process grew and shrunk
the hugepage pool). It passes but that is hardly a surprise. We differed
slightly in a number of respects though.

> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..135af1e 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
> 
>  /**
> + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:    item that was stored in that slot with any direct pointer flag
> + *             removed.
> + *
> + * This functions works like radix_tree_deref_slot except it doesn't check
> + * RCU rule. Normally this funcion is used with update-side lock.
> + * You should use this function very carefully.
> + */
> +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> +{
> +       return rcu_dereference_protected(*pslot, 1);
> +}

For this, I had

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ab2baa5..252d21c 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
 }
 
 /**
+ * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
+ * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
+ * Returns:	item that was stored in that slot with any direct pointer flag
+ *		removed.
+ *
+ * For use with radix_tree_lookup_slot().  Caller must hold tree read
+ * locked across slot lookup and dereference. Not required if write lock is
+ * held (ie. items cannot be concurrently inserted).
+ *
+ * radix_tree_deref_retry must be used to confirm validity of the pointer if
+ * only the read lock is held.
+ */
+static inline void *radix_tree_deref_slot_protected(void **pslot,
+							spinlock_t *treelock)
+{
+	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
+}
+
+/**
  * radix_tree_deref_retry	- check radix_tree_deref_slot
  * @arg:	pointer returned by radix_tree_deref_slot
  * Returns:	0 if retry is not required, otherwise retry is required

In the documentation, I noted that the check might be without RCU but with
the knowledge that it's protected by the tree lock. I'm not a RCU expert
but this is only safe when you know there isn't a parallel updater and the
treelock should be preventing that, right?

Even so, other users of rcu_dereference_protected() check a lock condition
which I used tree lock for. I intended to read through the rest of
documentation properly this morning to determine if this was indeed the
right approach.

I used the name _protected instead of _nocheck because the dereference
is still protected (by the tree lock) just not by RCU. Again, have to
check the documentation to ensure this is correct.

> +/**
>   * radix_tree_deref_retry      - check radix_tree_deref_slot
>   * @arg:       pointer returned by radix_tree_deref_slot
>   * Returns:    0 if retry is not required, otherwise retry is required
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2eb2243..5be2841 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> address_space *mappin
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }

We only differed here by my passing in the &mapping->tree_lock



> @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> address_space *mapping,
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17  8:39                       ` Mel Gorman
  0 siblings, 0 replies; 37+ messages in thread
From: Mel Gorman @ 2010-12-17  8:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> <gerald.schaefer@de.ibm.com> wrote:
> > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > below. Both cases are easily reproducible: memory unplug with big page cache,
> > or adding large pages during run-time.
> >
> > ===================================================
> > [ INFO: suspicious rcu_dereference_check() usage. ]
> > ---------------------------------------------------
> > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 1, debug_locks = 0
> > 1 lock held by bash/761:
> >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> >
> > stack backtrace:
> > CPU: 1 Not tainted 2.6.37-rc6 #4
> > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > Call Trace:
> > ([<0000000000100b02>] show_trace+0xee/0x144)
> >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> >  [<0000000000226c80>] migrate_page+0x38/0x68
> >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> >  [<000000000022785a>] migrate_pages+0x496/0x568
> >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> >  [<00000000002336e0>] vfs_write+0xac/0x18c
> >  [<00000000002338bc>] SyS_write+0x58/0xa8
> >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> >  [<0000020000162edc>] 0x20000162edc
> > INFO: lockdep is turned off.
> >
> > I honestly do not understand 100% why this is a false positive, seeing that
> > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > even if it is not necessary for synchronization, would get rid of the warning,
> > like in the following patch. Any ideas?
> 
> In case of anon page, we hold rcu_read_lock in unmap_and_move.
> The problem is file-backed page. In case of that, we hold lock_page
> and mapping->tree_lock as update-side lock.
> So we don't need rcu_read_lock.
> 
> >
> > ---
> >  fs/hugetlbfs/inode.c |    2 ++
> >  mm/migrate.c         |    4 ++++
> >  2 files changed, 6 insertions(+)
> >
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> >  {
> >        int rc;
> >
> > +       rcu_read_lock();
> >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >        if (rc)
> >                return rc;
> >        migrate_page_copy(newpage, page);
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> >
> >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> >
> >        head = page_buffers(page);
> >
> > +       rcu_read_lock();
> >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > +       rcu_read_unlock();
> >
> >        if (rc)
> >                return rc;
> >
> >
> >
> 
> 
> How about this?
> Maybe Paul have better idea.
> (It's apparently be word-wrapped.)
> 

heh, I wrote a patch almost identical to this and ran it overnight for testing
(test was a memory consumer running while a parallel process grew and shrunk
the hugepage pool). It passes but that is hardly a surprise. We differed
slightly in a number of respects though.

> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..135af1e 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
> 
>  /**
> + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:    item that was stored in that slot with any direct pointer flag
> + *             removed.
> + *
> + * This functions works like radix_tree_deref_slot except it doesn't check
> + * RCU rule. Normally this funcion is used with update-side lock.
> + * You should use this function very carefully.
> + */
> +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> +{
> +       return rcu_dereference_protected(*pslot, 1);
> +}

For this, I had

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ab2baa5..252d21c 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
 }
 
 /**
+ * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
+ * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
+ * Returns:	item that was stored in that slot with any direct pointer flag
+ *		removed.
+ *
+ * For use with radix_tree_lookup_slot().  Caller must hold tree read
+ * locked across slot lookup and dereference. Not required if write lock is
+ * held (ie. items cannot be concurrently inserted).
+ *
+ * radix_tree_deref_retry must be used to confirm validity of the pointer if
+ * only the read lock is held.
+ */
+static inline void *radix_tree_deref_slot_protected(void **pslot,
+							spinlock_t *treelock)
+{
+	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
+}
+
+/**
  * radix_tree_deref_retry	- check radix_tree_deref_slot
  * @arg:	pointer returned by radix_tree_deref_slot
  * Returns:	0 if retry is not required, otherwise retry is required

In the documentation, I noted that the check might be without RCU but with
the knowledge that it's protected by the tree lock. I'm not a RCU expert
but this is only safe when you know there isn't a parallel updater and the
treelock should be preventing that, right?

Even so, other users of rcu_dereference_protected() check a lock condition
which I used tree lock for. I intended to read through the rest of
documentation properly this morning to determine if this was indeed the
right approach.

I used the name _protected instead of _nocheck because the dereference
is still protected (by the tree lock) just not by RCU. Again, have to
check the documentation to ensure this is correct.

> +/**
>   * radix_tree_deref_retry      - check radix_tree_deref_slot
>   * @arg:       pointer returned by radix_tree_deref_slot
>   * Returns:    0 if retry is not required, otherwise retry is required
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2eb2243..5be2841 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> address_space *mappin
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }

We only differed here by my passing in the &mapping->tree_lock



> @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> address_space *mapping,
> 
>         expected_count = 2 + page_has_private(page);
>         if (page_count(page) != expected_count ||
> -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> +                               != page) {
>                 spin_unlock_irq(&mapping->tree_lock);
>                 return -EAGAIN;
>         }

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  8:39                       ` Mel Gorman
  (?)
@ 2010-12-17  9:28                       ` Mel Gorman
  2010-12-17 15:22                         ` Minchan Kim
  -1 siblings, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2010-12-17  9:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote:
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > <gerald.schaefer@de.ibm.com> wrote:
> > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > or adding large pages during run-time.
> > >
> > > ===================================================
> > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > ---------------------------------------------------
> > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 1, debug_locks = 0
> > > 1 lock held by bash/761:
> > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > >
> > > stack backtrace:
> > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > Call Trace:
> > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > >  [<0000020000162edc>] 0x20000162edc
> > > INFO: lockdep is turned off.
> > >
> > > I honestly do not understand 100% why this is a false positive, seeing that
> > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > even if it is not necessary for synchronization, would get rid of the warning,
> > > like in the following patch. Any ideas?
> > 
> > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > The problem is file-backed page. In case of that, we hold lock_page
> > and mapping->tree_lock as update-side lock.
> > So we don't need rcu_read_lock.
> > 
> > >
> > > ---
> > >  fs/hugetlbfs/inode.c |    2 ++
> > >  mm/migrate.c         |    4 ++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > >  {
> > >        int rc;
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >        if (rc)
> > >                return rc;
> > >        migrate_page_copy(newpage, page);
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > >
> > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > >
> > >        head = page_buffers(page);
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > >
> > >
> > >
> > 
> > 
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> 
> heh, I wrote a patch almost identical to this and ran it overnight for testing
> (test was a memory consumer running while a parallel process grew and shrunk
> the hugepage pool). It passes but that is hardly a surprise. We differed
> slightly in a number of respects though.
> 

For completeness, this is what I tested last night. There are two "confirms"
in the changelog that I intended to work out today but maybe someone can
confirm faster.

==== CUT HERE ====
mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration

migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous
pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d.
The point of the RCU protection there is part of getting a stable reference
to anon_vma and is only held for anon pages as file pages are locked
which is sufficient protection against freeing.

However, while a file page's mapping is being migrated, the radix tree
is double checked to ensure it is the expected page. This uses
radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
triggering the following warning.

[  173.674290] ===================================================
[  173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
[  173.676016] ---------------------------------------------------
[  173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
[  173.676016]
[  173.676016] other info that might help us debug this:
[  173.676016]
[  173.676016]
[  173.676016] rcu_scheduler_active = 1, debug_locks = 0
[  173.676016] 1 lock held by hugeadm/2899:
[  173.676016]  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
[  173.676016]
[  173.676016] stack backtrace:
[  173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild
[  173.676016] Call Trace:
[  173.676016]  [<c128cc01>] ? printk+0x14/0x1b
[  173.676016]  [<c1063502>] lockdep_rcu_dereference+0x7d/0x86
[  173.676016]  [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab
[  173.676016]  [<c10e41ad>] migrate_page+0x23/0x39
[  173.676016]  [<c10e491b>] buffer_migrate_page+0x22/0x107
[  173.676016]  [<c10e48f9>] ? buffer_migrate_page+0x0/0x107
[  173.676016]  [<c10e425d>] move_to_new_page+0x9a/0x1ae
[  173.676016]  [<c10e47e6>] migrate_pages+0x1e7/0x2fa

This patch introduces radix_tree_deref_slot_protected() which calls
rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock
that is protecting this dereference. Based on the locking hierarchy described
in mm/filemap.c, holding the tree lock is protecting the radix tree from
concurrent updaters in all cases (Confirm that no case has been missed).
According to Documentation/RCU/lockdep.txt, if there is a guarantee that
no parallel updaters exist, use of rcu_dereference_protected() is allowed
(Confirm this is accurate?).

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/radix-tree.h |   19 +++++++++++++++++++
 mm/migrate.c               |    4 ++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index ab2baa5..252d21c 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
 }
 
 /**
+ * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
+ * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
+ * Returns:	item that was stored in that slot with any direct pointer flag
+ *		removed.
+ *
+ * For use with radix_tree_lookup_slot().  Caller must hold tree read
+ * locked across slot lookup and dereference. Not required if write lock is
+ * held (ie. items cannot be concurrently inserted).
+ *
+ * radix_tree_deref_retry must be used to confirm validity of the pointer if
+ * only the read lock is held.
+ */
+static inline void *radix_tree_deref_slot_protected(void **pslot,
+							spinlock_t *treelock)
+{
+	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
+}
+
+/**
  * radix_tree_deref_retry	- check radix_tree_deref_slot
  * @arg:	pointer returned by radix_tree_deref_slot
  * Returns:	0 if retry is not required, otherwise retry is required
diff --git a/mm/migrate.c b/mm/migrate.c
index fe5a3c6..7d4686a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -244,7 +244,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 
 	expected_count = 2 + page_has_private(page);
 	if (page_count(page) != expected_count ||
-			(struct page *)radix_tree_deref_slot(pslot) != page) {
+			(struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
 		spin_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
 	}
@@ -316,7 +316,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
 
 	expected_count = 2 + page_has_private(page);
 	if (page_count(page) != expected_count ||
-	    (struct page *)radix_tree_deref_slot(pslot) != page) {
+	    (struct page *)radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
 		spin_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
 	}

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  5:47                       ` Paul E. McKenney
  (?)
  (?)
@ 2010-12-17 15:08                       ` Minchan Kim
  2010-12-17 16:03                           ` Paul E. McKenney
  -1 siblings, 1 reply; 37+ messages in thread
From: Minchan Kim @ 2010-12-17 15:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Thu, Dec 16, 2010 at 09:47:22PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > <gerald.schaefer@de.ibm.com> wrote:
> > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > or adding large pages during run-time.
> > >
> > > ===================================================
> > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > ---------------------------------------------------
> > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 1, debug_locks = 0
> > > 1 lock held by bash/761:
> > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > >
> > > stack backtrace:
> > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > Call Trace:
> > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > >  [<0000020000162edc>] 0x20000162edc
> > > INFO: lockdep is turned off.
> > >
> > > I honestly do not understand 100% why this is a false positive, seeing that
> > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > even if it is not necessary for synchronization, would get rid of the warning,
> > > like in the following patch. Any ideas?
> > 
> > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > The problem is file-backed page. In case of that, we hold lock_page
> > and mapping->tree_lock as update-side lock.
> > So we don't need rcu_read_lock.
> > 
> > >
> > > ---
> > >  fs/hugetlbfs/inode.c |    2 ++
> > >  mm/migrate.c         |    4 ++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > >  {
> > >        int rc;
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >        if (rc)
> > >                return rc;
> > >        migrate_page_copy(newpage, page);
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > >
> > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > >
> > >        head = page_buffers(page);
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > >
> > >
> > >
> > 
> > 
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> 
> I suggest replacing the "1" with lockdep expressions for the locks
> that you say might be held:

It's not hard.

> 
> 	return rcu_dereference_check(*pslot,
> 				     lockdep_is_held(&mapping->tree_lock));

rcu_dereference_check still pass rcu_read_lock_held check we don't want.
I think rcu_dereference_protected is proper.

Why I don't add lockdep expressions is radix_tree_deref_slot is general API.
It might be used anywhere where it doesn't related to mapping->tree_lock.
If we add argument 'mapping', it has a very strong dependency with address_space.
so I decided making the function general and then caller must use it very carefully.
But I am not strong in this point.

> 
> This assumes that when you said "and" you meant both lock_page() and
> mapping->tree_lock.  Also you need to pass in the mapping, which
> should not be a problem given likely inlining.
> 
> If you meant that either mapping->tree_lock or page_lock() might be
> held, I suppose that the page_lock() state could be passed in, but
> perhaps better to take a general lockdep expression.
> 
> So, either or both?  ;-)
> 
> 						Thanx, Paul

I think either is okay. That's because remove_from_page_cache/__remove_from_page_cache
needs both locks so we can't prevent update if we get a either lock.
In code context, I think mapping->tree_lock is more readable since it is used near by.

Thanks for the comment, Paul.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  8:39                       ` Mel Gorman
@ 2010-12-17 15:13                         ` Minchan Kim
  -1 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-12-17 15:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote:
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > <gerald.schaefer@de.ibm.com> wrote:
> > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > or adding large pages during run-time.
> > >
> > > ===================================================
> > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > ---------------------------------------------------
> > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 1, debug_locks = 0
> > > 1 lock held by bash/761:
> > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > >
> > > stack backtrace:
> > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > Call Trace:
> > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > >  [<0000020000162edc>] 0x20000162edc
> > > INFO: lockdep is turned off.
> > >
> > > I honestly do not understand 100% why this is a false positive, seeing that
> > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > even if it is not necessary for synchronization, would get rid of the warning,
> > > like in the following patch. Any ideas?
> > 
> > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > The problem is file-backed page. In case of that, we hold lock_page
> > and mapping->tree_lock as update-side lock.
> > So we don't need rcu_read_lock.
> > 
> > >
> > > ---
> > >  fs/hugetlbfs/inode.c |    2 ++
> > >  mm/migrate.c         |    4 ++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > >  {
> > >        int rc;
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >        if (rc)
> > >                return rc;
> > >        migrate_page_copy(newpage, page);
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > >
> > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > >
> > >        head = page_buffers(page);
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > >
> > >
> > >
> > 
> > 
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> 
> heh, I wrote a patch almost identical to this and ran it overnight for testing
> (test was a memory consumer running while a parallel process grew and shrunk
> the hugepage pool). It passes but that is hardly a surprise. We differed

Good. 

> slightly in a number of respects though.
> 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> > +}
> 
> For this, I had
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..252d21c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
>  
>  /**
> + * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
> + * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:	item that was stored in that slot with any direct pointer flag
> + *		removed.
> + *
> + * For use with radix_tree_lookup_slot().  Caller must hold tree read
> + * locked across slot lookup and dereference. Not required if write lock is
> + * held (ie. items cannot be concurrently inserted).
> + *
> + * radix_tree_deref_retry must be used to confirm validity of the pointer if
> + * only the read lock is held.
> + */
> +static inline void *radix_tree_deref_slot_protected(void **pslot,
> +							spinlock_t *treelock)
> +{
> +	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
> +}
> +
> +/**
>   * radix_tree_deref_retry	- check radix_tree_deref_slot
>   * @arg:	pointer returned by radix_tree_deref_slot
>   * Returns:	0 if retry is not required, otherwise retry is required
> 
> In the documentation, I noted that the check might be without RCU but with
> the knowledge that it's protected by the tree lock. I'm not a RCU expert
> but this is only safe when you know there isn't a parallel updater and the
> treelock should be preventing that, right?

I think so.

> 
> Even so, other users of rcu_dereference_protected() check a lock condition
> which I used tree lock for. I intended to read through the rest of
> documentation properly this morning to determine if this was indeed the
> right approach.
> 
> I used the name _protected instead of _nocheck because the dereference
> is still protected (by the tree lock) just not by RCU. Again, have to
> check the documentation to ensure this is correct.

I agree _proected naming is good.

> 
> > +/**
> >   * radix_tree_deref_retry      - check radix_tree_deref_slot
> >   * @arg:       pointer returned by radix_tree_deref_slot
> >   * Returns:    0 if retry is not required, otherwise retry is required
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 2eb2243..5be2841 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> > address_space *mappin
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> We only differed here by my passing in the &mapping->tree_lock

I will add the comment in your formal patch.

> 
> 
> 
> > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> > address_space *mapping,
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17 15:13                         ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-12-17 15:13 UTC (permalink / raw)
  To: Mel Gorman
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote:
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > <gerald.schaefer@de.ibm.com> wrote:
> > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > or adding large pages during run-time.
> > >
> > > ===================================================
> > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > ---------------------------------------------------
> > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 1, debug_locks = 0
> > > 1 lock held by bash/761:
> > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > >
> > > stack backtrace:
> > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > Call Trace:
> > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > >  [<0000020000162edc>] 0x20000162edc
> > > INFO: lockdep is turned off.
> > >
> > > I honestly do not understand 100% why this is a false positive, seeing that
> > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > even if it is not necessary for synchronization, would get rid of the warning,
> > > like in the following patch. Any ideas?
> > 
> > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > The problem is file-backed page. In case of that, we hold lock_page
> > and mapping->tree_lock as update-side lock.
> > So we don't need rcu_read_lock.
> > 
> > >
> > > ---
> > >  fs/hugetlbfs/inode.c |    2 ++
> > >  mm/migrate.c         |    4 ++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > >  {
> > >        int rc;
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >        if (rc)
> > >                return rc;
> > >        migrate_page_copy(newpage, page);
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > >
> > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > >
> > >        head = page_buffers(page);
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > >
> > >
> > >
> > 
> > 
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> 
> heh, I wrote a patch almost identical to this and ran it overnight for testing
> (test was a memory consumer running while a parallel process grew and shrunk
> the hugepage pool). It passes but that is hardly a surprise. We differed

Good. 

> slightly in a number of respects though.
> 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> > +}
> 
> For this, I had
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..252d21c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
>  
>  /**
> + * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
> + * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:	item that was stored in that slot with any direct pointer flag
> + *		removed.
> + *
> + * For use with radix_tree_lookup_slot().  Caller must hold tree read
> + * locked across slot lookup and dereference. Not required if write lock is
> + * held (ie. items cannot be concurrently inserted).
> + *
> + * radix_tree_deref_retry must be used to confirm validity of the pointer if
> + * only the read lock is held.
> + */
> +static inline void *radix_tree_deref_slot_protected(void **pslot,
> +							spinlock_t *treelock)
> +{
> +	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
> +}
> +
> +/**
>   * radix_tree_deref_retry	- check radix_tree_deref_slot
>   * @arg:	pointer returned by radix_tree_deref_slot
>   * Returns:	0 if retry is not required, otherwise retry is required
> 
> In the documentation, I noted that the check might be without RCU but with
> the knowledge that it's protected by the tree lock. I'm not a RCU expert
> but this is only safe when you know there isn't a parallel updater and the
> treelock should be preventing that, right?

I think so.

> 
> Even so, other users of rcu_dereference_protected() check a lock condition
> which I used tree lock for. I intended to read through the rest of
> documentation properly this morning to determine if this was indeed the
> right approach.
> 
> I used the name _protected instead of _nocheck because the dereference
> is still protected (by the tree lock) just not by RCU. Again, have to
> check the documentation to ensure this is correct.

I agree _proected naming is good.

> 
> > +/**
> >   * radix_tree_deref_retry      - check radix_tree_deref_slot
> >   * @arg:       pointer returned by radix_tree_deref_slot
> >   * Returns:    0 if retry is not required, otherwise retry is required
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 2eb2243..5be2841 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> > address_space *mappin
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> We only differed here by my passing in the &mapping->tree_lock

I will add the comment in your formal patch.

> 
> 
> 
> > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> > address_space *mapping,
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  9:28                       ` Mel Gorman
@ 2010-12-17 15:22                         ` Minchan Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2010-12-17 15:22 UTC (permalink / raw)
  To: Mel Gorman
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, paulmck, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 09:28:28AM +0000, Mel Gorman wrote:
> On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote:
> > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > > <gerald.schaefer@de.ibm.com> wrote:
> > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > > or adding large pages during run-time.
> > > >
> > > > ===================================================
> > > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > ---------------------------------------------------
> > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > > >
> > > > other info that might help us debug this:
> > > >
> > > >
> > > > rcu_scheduler_active = 1, debug_locks = 0
> > > > 1 lock held by bash/761:
> > > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > > >
> > > > stack backtrace:
> > > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > > Call Trace:
> > > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > > >  [<0000020000162edc>] 0x20000162edc
> > > > INFO: lockdep is turned off.
> > > >
> > > > I honestly do not understand 100% why this is a false positive, seeing that
> > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > > even if it is not necessary for synchronization, would get rid of the warning,
> > > > like in the following patch. Any ideas?
> > > 
> > > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > > The problem is file-backed page. In case of that, we hold lock_page
> > > and mapping->tree_lock as update-side lock.
> > > So we don't need rcu_read_lock.
> > > 
> > > >
> > > > ---
> > > >  fs/hugetlbfs/inode.c |    2 ++
> > > >  mm/migrate.c         |    4 ++++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > --- a/fs/hugetlbfs/inode.c
> > > > +++ b/fs/hugetlbfs/inode.c
> > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > > >  {
> > > >        int rc;
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >        if (rc)
> > > >                return rc;
> > > >        migrate_page_copy(newpage, page);
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > > >
> > > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >
> > > >        if (rc)
> > > >                return rc;
> > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > > >
> > > >        head = page_buffers(page);
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >
> > > >        if (rc)
> > > >                return rc;
> > > >
> > > >
> > > >
> > > 
> > > 
> > > How about this?
> > > Maybe Paul have better idea.
> > > (It's apparently be word-wrapped.)
> > > 
> > 
> > heh, I wrote a patch almost identical to this and ran it overnight for testing
> > (test was a memory consumer running while a parallel process grew and shrunk
> > the hugepage pool). It passes but that is hardly a surprise. We differed
> > slightly in a number of respects though.
> > 
> 
> For completeness, this is what I tested last night. There are two "confirms"
> in the changelog that I intended to work out today but maybe someone can
> confirm faster.
> 
> ==== CUT HERE ====
> mm: migration: Use rcu_dereference_protected when dereferencing the radix tree slot during file page migration
> 
> migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for anonymous
> pages, as introduced by git commit 989f89c57e6361e7d16fbd9572b5da7d313b073d.
> The point of the RCU protection there is part of getting a stable reference
> to anon_vma and is only held for anon pages as file pages are locked
> which is sufficient protection against freeing.
> 
> However, while a file page's mapping is being migrated, the radix tree
> is double checked to ensure it is the expected page. This uses
> radix_tree_deref_slot() -> rcu_dereference() without the RCU lock held
> triggering the following warning.
> 
> [  173.674290] ===================================================
> [  173.676016] [ INFO: suspicious rcu_dereference_check() usage. ]
> [  173.676016] ---------------------------------------------------
> [  173.676016] include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> [  173.676016]
> [  173.676016] other info that might help us debug this:
> [  173.676016]
> [  173.676016]
> [  173.676016] rcu_scheduler_active = 1, debug_locks = 0
> [  173.676016] 1 lock held by hugeadm/2899:
> [  173.676016]  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<c10e3d2b>] migrate_page_move_mapping+0x40/0x1ab
> [  173.676016]
> [  173.676016] stack backtrace:
> [  173.676016] Pid: 2899, comm: hugeadm Not tainted 2.6.37-rc5-autobuild
> [  173.676016] Call Trace:
> [  173.676016]  [<c128cc01>] ? printk+0x14/0x1b
> [  173.676016]  [<c1063502>] lockdep_rcu_dereference+0x7d/0x86
> [  173.676016]  [<c10e3db5>] migrate_page_move_mapping+0xca/0x1ab
> [  173.676016]  [<c10e41ad>] migrate_page+0x23/0x39
> [  173.676016]  [<c10e491b>] buffer_migrate_page+0x22/0x107
> [  173.676016]  [<c10e48f9>] ? buffer_migrate_page+0x0/0x107
> [  173.676016]  [<c10e425d>] move_to_new_page+0x9a/0x1ae
> [  173.676016]  [<c10e47e6>] migrate_pages+0x1e7/0x2fa
> 
> This patch introduces radix_tree_deref_slot_protected() which calls
> rcu_dereference_protected(). Users of it must pass in the mapping->tree_lock
> that is protecting this dereference. Based on the locking hierarchy described
> in mm/filemap.c, holding the tree lock is protecting the radix tree from
> concurrent updaters in all cases (Confirm that no case has been missed).
> According to Documentation/RCU/lockdep.txt, if there is a guarantee that
> no parallel updaters exist, use of rcu_dereference_protected() is allowed
> (Confirm this is accurate?).
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  include/linux/radix-tree.h |   19 +++++++++++++++++++
>  mm/migrate.c               |    4 ++--
>  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..252d21c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
>  
>  /**
> + * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
> + * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:	item that was stored in that slot with any direct pointer flag
> + *		removed.
> + *
> + * For use with radix_tree_lookup_slot().  Caller must hold tree read
> + * locked across slot lookup and dereference. Not required if write lock is
> + * held (ie. items cannot be concurrently inserted).
> + *
> + * radix_tree_deref_retry must be used to confirm validity of the pointer if
> + * only the read lock is held.
> + */
> +static inline void *radix_tree_deref_slot_protected(void **pslot,
> +							spinlock_t *treelock)
> +{
> +	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
> +}

It seems to be good than mine. Just a nitpick.
Can't we get the mutex lock as update-side lock in future?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17  8:39                       ` Mel Gorman
@ 2010-12-17 16:01                         ` Paul E. McKenney
  -1 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2010-12-17 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote:
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > <gerald.schaefer@de.ibm.com> wrote:
> > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > or adding large pages during run-time.
> > >
> > > ===================================================
> > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > ---------------------------------------------------
> > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 1, debug_locks = 0
> > > 1 lock held by bash/761:
> > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > >
> > > stack backtrace:
> > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > Call Trace:
> > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > >  [<0000020000162edc>] 0x20000162edc
> > > INFO: lockdep is turned off.
> > >
> > > I honestly do not understand 100% why this is a false positive, seeing that
> > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > even if it is not necessary for synchronization, would get rid of the warning,
> > > like in the following patch. Any ideas?
> > 
> > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > The problem is file-backed page. In case of that, we hold lock_page
> > and mapping->tree_lock as update-side lock.
> > So we don't need rcu_read_lock.
> > 
> > >
> > > ---
> > >  fs/hugetlbfs/inode.c |    2 ++
> > >  mm/migrate.c         |    4 ++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > >  {
> > >        int rc;
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >        if (rc)
> > >                return rc;
> > >        migrate_page_copy(newpage, page);
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > >
> > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > >
> > >        head = page_buffers(page);
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > >
> > >
> > >
> > 
> > 
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> 
> heh, I wrote a patch almost identical to this and ran it overnight for testing
> (test was a memory consumer running while a parallel process grew and shrunk
> the hugepage pool). It passes but that is hardly a surprise. We differed
> slightly in a number of respects though.
> 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> > +}
> 
> For this, I had
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..252d21c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
> 
>  /**
> + * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
> + * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:	item that was stored in that slot with any direct pointer flag
> + *		removed.
> + *
> + * For use with radix_tree_lookup_slot().  Caller must hold tree read
> + * locked across slot lookup and dereference. Not required if write lock is
> + * held (ie. items cannot be concurrently inserted).
> + *
> + * radix_tree_deref_retry must be used to confirm validity of the pointer if
> + * only the read lock is held.
> + */
> +static inline void *radix_tree_deref_slot_protected(void **pslot,
> +							spinlock_t *treelock)
> +{
> +	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
> +}
> +
> +/**
>   * radix_tree_deref_retry	- check radix_tree_deref_slot
>   * @arg:	pointer returned by radix_tree_deref_slot
>   * Returns:	0 if retry is not required, otherwise retry is required
> 
> In the documentation, I noted that the check might be without RCU but with
> the knowledge that it's protected by the tree lock. I'm not a RCU expert
> but this is only safe when you know there isn't a parallel updater and the
> treelock should be preventing that, right?

Yes, if you have prevented updates from happening, then it is OK to
use rcu_dereference_protected() instead of rcu_dereference_check().

However, if RCU readers can invoke this code, things will break.

By the way, this means that something like:

	rcu_dereference_protected(p, rcu_read_lock_held()) /* BUGGY!!! */

is just plain wrong.

> Even so, other users of rcu_dereference_protected() check a lock condition
> which I used tree lock for. I intended to read through the rest of
> documentation properly this morning to determine if this was indeed the
> right approach.
> 
> I used the name _protected instead of _nocheck because the dereference
> is still protected (by the tree lock) just not by RCU. Again, have to
> check the documentation to ensure this is correct.

Yep, this is indeed the case that rcu_dereference_protected() is for.

> > +/**
> >   * radix_tree_deref_retry      - check radix_tree_deref_slot
> >   * @arg:       pointer returned by radix_tree_deref_slot
> >   * Returns:    0 if retry is not required, otherwise retry is required
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 2eb2243..5be2841 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> > address_space *mappin
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> We only differed here by my passing in the &mapping->tree_lock

Which should be optimized away during inlining, so no performance
penalty.  ;-)

							Thanx, Paul

> > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> > address_space *mapping,
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17 16:01                         ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2010-12-17 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller,
	linux-kernel, linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Fri, Dec 17, 2010 at 08:39:12AM +0000, Mel Gorman wrote:
> On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > <gerald.schaefer@de.ibm.com> wrote:
> > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > or adding large pages during run-time.
> > >
> > > ===================================================
> > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > ---------------------------------------------------
> > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 1, debug_locks = 0
> > > 1 lock held by bash/761:
> > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > >
> > > stack backtrace:
> > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > Call Trace:
> > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > >  [<0000020000162edc>] 0x20000162edc
> > > INFO: lockdep is turned off.
> > >
> > > I honestly do not understand 100% why this is a false positive, seeing that
> > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > even if it is not necessary for synchronization, would get rid of the warning,
> > > like in the following patch. Any ideas?
> > 
> > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > The problem is file-backed page. In case of that, we hold lock_page
> > and mapping->tree_lock as update-side lock.
> > So we don't need rcu_read_lock.
> > 
> > >
> > > ---
> > >  fs/hugetlbfs/inode.c |    2 ++
> > >  mm/migrate.c         |    4 ++++
> > >  2 files changed, 6 insertions(+)
> > >
> > > --- a/fs/hugetlbfs/inode.c
> > > +++ b/fs/hugetlbfs/inode.c
> > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > >  {
> > >        int rc;
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >        if (rc)
> > >                return rc;
> > >        migrate_page_copy(newpage, page);
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > >
> > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > >
> > >        head = page_buffers(page);
> > >
> > > +       rcu_read_lock();
> > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > +       rcu_read_unlock();
> > >
> > >        if (rc)
> > >                return rc;
> > >
> > >
> > >
> > 
> > 
> > How about this?
> > Maybe Paul have better idea.
> > (It's apparently be word-wrapped.)
> > 
> 
> heh, I wrote a patch almost identical to this and ran it overnight for testing
> (test was a memory consumer running while a parallel process grew and shrunk
> the hugepage pool). It passes but that is hardly a surprise. We differed
> slightly in a number of respects though.
> 
> > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > index ab2baa5..135af1e 100644
> > --- a/include/linux/radix-tree.h
> > +++ b/include/linux/radix-tree.h
> > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> >  }
> > 
> >  /**
> > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > + * Returns:    item that was stored in that slot with any direct pointer flag
> > + *             removed.
> > + *
> > + * This functions works like radix_tree_deref_slot except it doesn't check
> > + * RCU rule. Normally this funcion is used with update-side lock.
> > + * You should use this function very carefully.
> > + */
> > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > +{
> > +       return rcu_dereference_protected(*pslot, 1);
> > +}
> 
> For this, I had
> 
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index ab2baa5..252d21c 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -146,6 +146,25 @@ static inline void *radix_tree_deref_slot(void **pslot)
>  }
> 
>  /**
> + * radix_tree_deref_slot_protected	- dereference a slot without RCU lock but with tree lock held
> + * @pslot:	pointer to slot, returned by radix_tree_lookup_slot
> + * Returns:	item that was stored in that slot with any direct pointer flag
> + *		removed.
> + *
> + * For use with radix_tree_lookup_slot().  Caller must hold tree read
> + * locked across slot lookup and dereference. Not required if write lock is
> + * held (ie. items cannot be concurrently inserted).
> + *
> + * radix_tree_deref_retry must be used to confirm validity of the pointer if
> + * only the read lock is held.
> + */
> +static inline void *radix_tree_deref_slot_protected(void **pslot,
> +							spinlock_t *treelock)
> +{
> +	return rcu_dereference_protected(*pslot, lockdep_is_held(treelock));
> +}
> +
> +/**
>   * radix_tree_deref_retry	- check radix_tree_deref_slot
>   * @arg:	pointer returned by radix_tree_deref_slot
>   * Returns:	0 if retry is not required, otherwise retry is required
> 
> In the documentation, I noted that the check might be without RCU but with
> the knowledge that it's protected by the tree lock. I'm not a RCU expert
> but this is only safe when you know there isn't a parallel updater and the
> treelock should be preventing that, right?

Yes, if you have prevented updates from happening, then it is OK to
use rcu_dereference_protected() instead of rcu_dereference_check().

However, if RCU readers can invoke this code, things will break.

By the way, this means that something like:

	rcu_dereference_protected(p, rcu_read_lock_held()) /* BUGGY!!! */

is just plain wrong.

> Even so, other users of rcu_dereference_protected() check a lock condition
> which I used tree lock for. I intended to read through the rest of
> documentation properly this morning to determine if this was indeed the
> right approach.
> 
> I used the name _protected instead of _nocheck because the dereference
> is still protected (by the tree lock) just not by RCU. Again, have to
> check the documentation to ensure this is correct.

Yep, this is indeed the case that rcu_dereference_protected() is for.

> > +/**
> >   * radix_tree_deref_retry      - check radix_tree_deref_slot
> >   * @arg:       pointer returned by radix_tree_deref_slot
> >   * Returns:    0 if retry is not required, otherwise retry is required
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 2eb2243..5be2841 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -244,7 +244,8 @@ static int migrate_page_move_mapping(struct
> > address_space *mappin
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -                       (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +                       (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> We only differed here by my passing in the &mapping->tree_lock

Which should be optimized away during inlining, so no performance
penalty.  ;-)

							Thanx, Paul

> > @@ -316,7 +317,8 @@ int migrate_huge_page_move_mapping(struct
> > address_space *mapping,
> > 
> >         expected_count = 2 + page_has_private(page);
> >         if (page_count(page) != expected_count ||
> > -           (struct page *)radix_tree_deref_slot(pslot) != page) {
> > +           (struct page *)radix_tree_deref_slot_nocheck(pslot)
> > +                               != page) {
> >                 spin_unlock_irq(&mapping->tree_lock);
> >                 return -EAGAIN;
> >         }
> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
  2010-12-17 15:08                       ` Minchan Kim
@ 2010-12-17 16:03                           ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2010-12-17 16:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Sat, Dec 18, 2010 at 12:08:27AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 09:47:22PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > > <gerald.schaefer@de.ibm.com> wrote:
> > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > > or adding large pages during run-time.
> > > >
> > > > ===================================================
> > > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > ---------------------------------------------------
> > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > > >
> > > > other info that might help us debug this:
> > > >
> > > >
> > > > rcu_scheduler_active = 1, debug_locks = 0
> > > > 1 lock held by bash/761:
> > > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > > >
> > > > stack backtrace:
> > > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > > Call Trace:
> > > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > > >  [<0000020000162edc>] 0x20000162edc
> > > > INFO: lockdep is turned off.
> > > >
> > > > I honestly do not understand 100% why this is a false positive, seeing that
> > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > > even if it is not necessary for synchronization, would get rid of the warning,
> > > > like in the following patch. Any ideas?
> > > 
> > > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > > The problem is file-backed page. In case of that, we hold lock_page
> > > and mapping->tree_lock as update-side lock.
> > > So we don't need rcu_read_lock.
> > > 
> > > >
> > > > ---
> > > >  fs/hugetlbfs/inode.c |    2 ++
> > > >  mm/migrate.c         |    4 ++++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > --- a/fs/hugetlbfs/inode.c
> > > > +++ b/fs/hugetlbfs/inode.c
> > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > > >  {
> > > >        int rc;
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >        if (rc)
> > > >                return rc;
> > > >        migrate_page_copy(newpage, page);
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > > >
> > > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >
> > > >        if (rc)
> > > >                return rc;
> > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > > >
> > > >        head = page_buffers(page);
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >
> > > >        if (rc)
> > > >                return rc;
> > > >
> > > >
> > > >
> > > 
> > > 
> > > How about this?
> > > Maybe Paul have better idea.
> > > (It's apparently be word-wrapped.)
> > > 
> > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > > index ab2baa5..135af1e 100644
> > > --- a/include/linux/radix-tree.h
> > > +++ b/include/linux/radix-tree.h
> > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> > >  }
> > > 
> > >  /**
> > > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > > + * Returns:    item that was stored in that slot with any direct pointer flag
> > > + *             removed.
> > > + *
> > > + * This functions works like radix_tree_deref_slot except it doesn't check
> > > + * RCU rule. Normally this funcion is used with update-side lock.
> > > + * You should use this function very carefully.
> > > + */
> > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > > +{
> > > +       return rcu_dereference_protected(*pslot, 1);
> > 
> > I suggest replacing the "1" with lockdep expressions for the locks
> > that you say might be held:
> 
> It's not hard.
> 
> > 
> > 	return rcu_dereference_check(*pslot,
> > 				     lockdep_is_held(&mapping->tree_lock));
> 
> rcu_dereference_check still pass rcu_read_lock_held check we don't want.
> I think rcu_dereference_protected is proper.

You are exactly right.  The only reason that I used rcu_dereference_check()
instead of rcu_dereference_protected() is because I didn't realize that
RCU readers never called this function.

> Why I don't add lockdep expressions is radix_tree_deref_slot is general API.
> It might be used anywhere where it doesn't related to mapping->tree_lock.
> If we add argument 'mapping', it has a very strong dependency with address_space.
> so I decided making the function general and then caller must use it very carefully.
> But I am not strong in this point.

I believe that this would be a good thing.

> > This assumes that when you said "and" you meant both lock_page() and
> > mapping->tree_lock.  Also you need to pass in the mapping, which
> > should not be a problem given likely inlining.
> > 
> > If you meant that either mapping->tree_lock or page_lock() might be
> > held, I suppose that the page_lock() state could be passed in, but
> > perhaps better to take a general lockdep expression.
> > 
> > So, either or both?  ;-)
> > 
> > 						Thanx, Paul
> 
> I think either is okay. That's because remove_from_page_cache/__remove_from_page_cache
> needs both locks so we can't prevent update if we get a either lock.
> In code context, I think mapping->tree_lock is more readable since it is used near by.

Good!!!  So we only really need to check for one or the other.

> Thanks for the comment, Paul.

No problem!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
@ 2010-12-17 16:03                           ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2010-12-17 16:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: gerald.schaefer, KAMEZAWA Hiroyuki, Milton Miller, linux-kernel,
	linux-mm, linux-ext4, Ted Ts'o, Arun Bhanu, Mel Gorman,
	Andrew Morton, Heiko Carstens, Martin Schwidefsky

On Sat, Dec 18, 2010 at 12:08:27AM +0900, Minchan Kim wrote:
> On Thu, Dec 16, 2010 at 09:47:22PM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 17, 2010 at 09:04:13AM +0900, Minchan Kim wrote:
> > > On Thu, Dec 16, 2010 at 10:50 PM, Gerald Schaefer
> > > <gerald.schaefer@de.ibm.com> wrote:
> > > > I got the same warning now after increasing /proc/sys/vm/nr_hugepages, see
> > > > below. Both cases are easily reproducible: memory unplug with big page cache,
> > > > or adding large pages during run-time.
> > > >
> > > > ===================================================
> > > > [ INFO: suspicious rcu_dereference_check() usage. ]
> > > > ---------------------------------------------------
> > > > include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection!
> > > >
> > > > other info that might help us debug this:
> > > >
> > > >
> > > > rcu_scheduler_active = 1, debug_locks = 0
> > > > 1 lock held by bash/761:
> > > >  #0:  (&(&inode->i_data.tree_lock)->rlock){..-.-.}, at: [<00000000002263ae>] migrate_page_move_mapping+0x4a/0x2d8
> > > >
> > > > stack backtrace:
> > > > CPU: 1 Not tainted 2.6.37-rc6 #4
> > > > Process bash (pid: 761, task: 00000000181b5540, ksp: 00000000181bb7f8)
> > > > 00000000181bb818 00000000181bb798 0000000000000002 0000000000000000
> > > >       00000000181bb838 00000000181bb7b0 00000000181bb7b0 000000000056bafa
> > > >       0000000000000000 000000003f42bdf0 0000000000000002 000000001c43be30
> > > >       000003e00000000d 000003e00000000c 00000000181bb800 0000000000000000
> > > >       0000000000000000 0000000000100bfa 00000000181bb798 00000000181bb7d8
> > > > Call Trace:
> > > > ([<0000000000100b02>] show_trace+0xee/0x144)
> > > >  [<000000000022654e>] migrate_page_move_mapping+0x1ea/0x2d8
> > > >  [<0000000000226c80>] migrate_page+0x38/0x68
> > > >  [<0000000000226d9a>] move_to_new_page+0xea/0x2bc
> > > >  [<000000000022785a>] migrate_pages+0x496/0x568
> > > >  [<000000000021e24e>] compact_zone+0x432/0x7d8
> > > >  [<000000000021e772>] compact_zone_order+0x9e/0xbc
> > > >  [<000000000021ed52>] try_to_compact_pages+0x1ba/0x24c
> > > >  [<00000000001e1afa>] __alloc_pages_nodemask+0x86a/0xa64
> > > >  [<000000000021c80c>] alloc_fresh_huge_page.clone.2+0x68/0x18c
> > > >  [<000000000021cc4c>] set_max_huge_pages.clone.0+0xa4/0x1ac
> > > >  [<000000000021ce06>] hugetlb_sysctl_handler+0xb2/0xcc
> > > >  [<00000000002a6572>] proc_sys_call_handler+0xe6/0x10c
> > > >  [<00000000002a65be>] proc_sys_write+0x26/0x34
> > > >  [<00000000002336e0>] vfs_write+0xac/0x18c
> > > >  [<00000000002338bc>] SyS_write+0x58/0xa8
> > > >  [<0000000000113976>] sysc_noemu+0x16/0x1c
> > > >  [<0000020000162edc>] 0x20000162edc
> > > > INFO: lockdep is turned off.
> > > >
> > > > I honestly do not understand 100% why this is a false positive, seeing that
> > > > e.g. find_get_page() will also use radix_tree_deref_slot(), holding only the
> > > > rcu_read_lock, while migrate_page_move_mapping() has no rcu_read_lock() but
> > > > the &mapping->tree_lock instead. So I'm not quite sure how to fix this
> > > > properly, but simply adding rcu_read_lock/unlock() to the affected code paths,
> > > > even if it is not necessary for synchronization, would get rid of the warning,
> > > > like in the following patch. Any ideas?
> > > 
> > > In case of anon page, we hold rcu_read_lock in unmap_and_move.
> > > The problem is file-backed page. In case of that, we hold lock_page
> > > and mapping->tree_lock as update-side lock.
> > > So we don't need rcu_read_lock.
> > > 
> > > >
> > > > ---
> > > >  fs/hugetlbfs/inode.c |    2 ++
> > > >  mm/migrate.c         |    4 ++++
> > > >  2 files changed, 6 insertions(+)
> > > >
> > > > --- a/fs/hugetlbfs/inode.c
> > > > +++ b/fs/hugetlbfs/inode.c
> > > > @@ -580,7 +580,9 @@ static int hugetlbfs_migrate_page(struct
> > > >  {
> > > >        int rc;
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >        if (rc)
> > > >                return rc;
> > > >        migrate_page_copy(newpage, page);
> > > > --- a/mm/migrate.c
> > > > +++ b/mm/migrate.c
> > > > @@ -417,7 +417,9 @@ int migrate_page(struct address_space *m
> > > >
> > > >        BUG_ON(PageWriteback(page));    /* Writeback must be complete */
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >
> > > >        if (rc)
> > > >                return rc;
> > > > @@ -444,7 +446,9 @@ int buffer_migrate_page(struct address_s
> > > >
> > > >        head = page_buffers(page);
> > > >
> > > > +       rcu_read_lock();
> > > >        rc = migrate_page_move_mapping(mapping, newpage, page);
> > > > +       rcu_read_unlock();
> > > >
> > > >        if (rc)
> > > >                return rc;
> > > >
> > > >
> > > >
> > > 
> > > 
> > > How about this?
> > > Maybe Paul have better idea.
> > > (It's apparently be word-wrapped.)
> > > 
> > > diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> > > index ab2baa5..135af1e 100644
> > > --- a/include/linux/radix-tree.h
> > > +++ b/include/linux/radix-tree.h
> > > @@ -146,6 +146,20 @@ static inline void *radix_tree_deref_slot(void **pslot)
> > >  }
> > > 
> > >  /**
> > > + * radix_tree_deref_slot_nocheck       - dereference a slot without RCU check
> > > + * @pslot:     pointer to slot, returned by radix_tree_lookup_slot
> > > + * Returns:    item that was stored in that slot with any direct pointer flag
> > > + *             removed.
> > > + *
> > > + * This functions works like radix_tree_deref_slot except it doesn't check
> > > + * RCU rule. Normally this funcion is used with update-side lock.
> > > + * You should use this function very carefully.
> > > + */
> > > +static inline void *radix_tree_deref_slot_nocheck(void **pslot)
> > > +{
> > > +       return rcu_dereference_protected(*pslot, 1);
> > 
> > I suggest replacing the "1" with lockdep expressions for the locks
> > that you say might be held:
> 
> It's not hard.
> 
> > 
> > 	return rcu_dereference_check(*pslot,
> > 				     lockdep_is_held(&mapping->tree_lock));
> 
> rcu_dereference_check still pass rcu_read_lock_held check we don't want.
> I think rcu_dereference_protected is proper.

You are exactly right.  The only reason that I used rcu_dereference_check()
instead of rcu_dereference_protected() is because I didn't realize that
RCU readers never called this function.

> Why I don't add lockdep expressions is radix_tree_deref_slot is general API.
> It might be used anywhere where it doesn't related to mapping->tree_lock.
> If we add argument 'mapping', it has a very strong dependency with address_space.
> so I decided making the function general and then caller must use it very carefully.
> But I am not strong in this point.

I believe that this would be a good thing.

> > This assumes that when you said "and" you meant both lock_page() and
> > mapping->tree_lock.  Also you need to pass in the mapping, which
> > should not be a problem given likely inlining.
> > 
> > If you meant that either mapping->tree_lock or page_lock() might be
> > held, I suppose that the page_lock() state could be passed in, but
> > perhaps better to take a general lockdep expression.
> > 
> > So, either or both?  ;-)
> > 
> > 						Thanx, Paul
> 
> I think either is okay. That's because remove_from_page_cache/__remove_from_page_cache
> needs both locks so we can't prevent update if we get a either lock.
> In code context, I think mapping->tree_lock is more readable since it is used near by.

Good!!!  So we only really need to check for one or the other.

> Thanks for the comment, Paul.

No problem!

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2010-12-17 16:04 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-21 11:26 [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage Arun Bhanu
2010-11-21 13:30 ` Ted Ts'o
2010-11-21 13:30   ` Ted Ts'o
2010-11-21 13:30   ` Ted Ts'o
2010-11-21 15:39   ` Minchan Kim
2010-11-21 15:39     ` Minchan Kim
2010-11-21 15:39     ` Minchan Kim
2010-11-21 17:37     ` Ted Ts'o
2010-11-21 17:37       ` Ted Ts'o
2010-11-22  0:38       ` Minchan Kim
2010-11-22  0:38         ` Minchan Kim
2010-11-22  0:38         ` Minchan Kim
2010-11-22  3:31         ` Milton Miller
2010-11-22  6:16           ` Paul E. McKenney
2010-12-07 19:01             ` [BUG?] memory hotplug: include/linux/radix-tree.h:145 invoked rcu_dereference_check() without protection! Gerald Schaefer
2010-12-08  1:19               ` KAMEZAWA Hiroyuki
2010-12-16 13:50                 ` Gerald Schaefer
2010-12-17  0:04                   ` Minchan Kim
2010-12-17  0:04                     ` Minchan Kim
2010-12-17  5:47                     ` Paul E. McKenney
2010-12-17  5:47                       ` Paul E. McKenney
2010-12-17  5:59                       ` Eric Dumazet
2010-12-17  5:59                         ` Eric Dumazet
2010-12-17 15:08                       ` Minchan Kim
2010-12-17 16:03                         ` Paul E. McKenney
2010-12-17 16:03                           ` Paul E. McKenney
2010-12-17  8:39                     ` Mel Gorman
2010-12-17  8:39                       ` Mel Gorman
2010-12-17  9:28                       ` Mel Gorman
2010-12-17 15:22                         ` Minchan Kim
2010-12-17 15:13                       ` Minchan Kim
2010-12-17 15:13                         ` Minchan Kim
2010-12-17 16:01                       ` Paul E. McKenney
2010-12-17 16:01                         ` Paul E. McKenney
2010-11-23  7:16       ` [BUG?] [Ext4] INFO: suspicious rcu_dereference_check() usage KOSAKI Motohiro
2010-11-23  7:16         ` KOSAKI Motohiro
2010-11-23  7:16         ` KOSAKI Motohiro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.