All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-17 16:50 ` Jiang Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-17 16:50 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, Matt Mackall, Mel Gorman
  Cc: Jianguo Wu, Jiang Liu, Tony Luck, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, David Rientjes, Minchan Kim, Keping Chen,
	linux-mm, linux-kernel, Jiang Liu

From: Jianguo Wu <wujianguo@huawei.com>

From: Jianguo Wu <wujianguo@huawei.com>

SLUB allocator may cause a BUG_ON() when offlining a memory node if
CONFIG_SLUB_DEBUG is on. The scenario is:

1) when creating kmem_cache_node slab, it cause inc_slabs_node() twice.
early_kmem_cache_node_alloc
	->new_slab
		->inc_slabs_node
	->inc_slabs_node

2) Later when offlining a memory node, it triggers the BUG_ON() in function
slab_mem_offline_callback() due to the extra inc_slabs_node() in function
early_kmem_cache_node_alloc().
{
		if (n) {
			/*
			 * if n->nr_slabs > 0, slabs still exist on the node
			 * that is going down. We were unable to free them,
			 * and offline_pages() function shouldn't call this
			 * callback. So, we must fail.
			 */
			BUG_ON(slabs_node(s, offline_node));
}

------------[ cut here ]------------
kernel BUG at mm/slub.c:3590!
invalid opcode: 0000 [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64 aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85 IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[<ffffffff81160b2a>]  [<ffffffff81160b2a>] slab_memory_callback+0x1ba/0x1c0
RSP: 0018:ffff880efdcb7c68  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880f7ec06100 RCX: 0000000100400001
RDX: 0000000100400002 RSI: ffff880f7ec02000 RDI: ffff880f7ec06100
RBP: ffff880efdcb7c78 R08: ffff88107b6fb098 R09: ffffffff81160a00
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000019
R13: 00000000fffffffb R14: 0000000000000000 R15: ffffffff81abe930
FS:  00007f709f342700(0000) GS:ffff880f7f3a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b5a874570 CR3: 0000000f0da20000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 46287, threadinfo ffff880efdcb6000, task ffff880f0fa50000)
Stack:
 0000000000000004 ffff880efdcb7da8 ffff880efdcb7cb8 ffffffff81524af5
 0000000000000001 ffffffff81a8b620 ffffffff81a8b640 0000000000000004
 ffff880efdcb7da8 00000000ffffffff ffff880efdcb7d08 ffffffff8107a89a
Call Trace:
 [<ffffffff81524af5>] notifier_call_chain+0x55/0x80
 [<ffffffff8107a89a>] __blocking_notifier_call_chain+0x5a/0x80
 [<ffffffff8107a8d6>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff81352f0b>] memory_notify+0x1b/0x20
 [<ffffffff81507104>] offline_pages+0x624/0x700
 [<ffffffff811619de>] remove_memory+0x1e/0x20
 [<ffffffff813530cc>] memory_block_change_state+0x13c/0x2e0
 [<ffffffff81153e96>] ? alloc_pages_current+0xb6/0x120
 [<ffffffff81353332>] store_mem_state+0xc2/0xd0
 [<ffffffff8133e190>] dev_attr_store+0x20/0x30
 [<ffffffff811e2d4f>] sysfs_write_file+0xef/0x170
 [<ffffffff81173e28>] vfs_write+0xc8/0x190
 [<ffffffff81173ff1>] sys_write+0x51/0x90
 [<ffffffff81528d29>] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7 c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [<ffffffff81160b2a>] slab_memory_callback+0x1ba/0x1c0
 RSP <ffff880efdcb7c68>
---[ end trace 749e9e9a67c78c12 ]---


Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 mm/slub.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..f8276db 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2840,7 +2840,6 @@ static void early_kmem_cache_node_alloc(int node)
 	init_tracking(kmem_cache_node, n);
 #endif
 	init_kmem_cache_node(n);
-	inc_slabs_node(kmem_cache_node, node, page->objects);
 
 	add_partial(n, page, DEACTIVATE_TO_HEAD);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-17 16:50 ` Jiang Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-17 16:50 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, Matt Mackall, Mel Gorman
  Cc: Jianguo Wu, Jiang Liu, Tony Luck, KAMEZAWA Hiroyuki,
	KOSAKI Motohiro, David Rientjes, Minchan Kim, Keping Chen,
	linux-mm, linux-kernel, Jiang Liu

From: Jianguo Wu <wujianguo@huawei.com>

From: Jianguo Wu <wujianguo@huawei.com>

SLUB allocator may cause a BUG_ON() when offlining a memory node if
CONFIG_SLUB_DEBUG is on. The scenario is:

1) when creating kmem_cache_node slab, it cause inc_slabs_node() twice.
early_kmem_cache_node_alloc
	->new_slab
		->inc_slabs_node
	->inc_slabs_node

2) Later when offlining a memory node, it triggers the BUG_ON() in function
slab_mem_offline_callback() due to the extra inc_slabs_node() in function
early_kmem_cache_node_alloc().
{
		if (n) {
			/*
			 * if n->nr_slabs > 0, slabs still exist on the node
			 * that is going down. We were unable to free them,
			 * and offline_pages() function shouldn't call this
			 * callback. So, we must fail.
			 */
			BUG_ON(slabs_node(s, offline_node));
}

------------[ cut here ]------------
kernel BUG at mm/slub.c:3590!
invalid opcode: 0000 [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64 aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85 IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[<ffffffff81160b2a>]  [<ffffffff81160b2a>] slab_memory_callback+0x1ba/0x1c0
RSP: 0018:ffff880efdcb7c68  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880f7ec06100 RCX: 0000000100400001
RDX: 0000000100400002 RSI: ffff880f7ec02000 RDI: ffff880f7ec06100
RBP: ffff880efdcb7c78 R08: ffff88107b6fb098 R09: ffffffff81160a00
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000019
R13: 00000000fffffffb R14: 0000000000000000 R15: ffffffff81abe930
FS:  00007f709f342700(0000) GS:ffff880f7f3a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b5a874570 CR3: 0000000f0da20000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 46287, threadinfo ffff880efdcb6000, task ffff880f0fa50000)
Stack:
 0000000000000004 ffff880efdcb7da8 ffff880efdcb7cb8 ffffffff81524af5
 0000000000000001 ffffffff81a8b620 ffffffff81a8b640 0000000000000004
 ffff880efdcb7da8 00000000ffffffff ffff880efdcb7d08 ffffffff8107a89a
Call Trace:
 [<ffffffff81524af5>] notifier_call_chain+0x55/0x80
 [<ffffffff8107a89a>] __blocking_notifier_call_chain+0x5a/0x80
 [<ffffffff8107a8d6>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff81352f0b>] memory_notify+0x1b/0x20
 [<ffffffff81507104>] offline_pages+0x624/0x700
 [<ffffffff811619de>] remove_memory+0x1e/0x20
 [<ffffffff813530cc>] memory_block_change_state+0x13c/0x2e0
 [<ffffffff81153e96>] ? alloc_pages_current+0xb6/0x120
 [<ffffffff81353332>] store_mem_state+0xc2/0xd0
 [<ffffffff8133e190>] dev_attr_store+0x20/0x30
 [<ffffffff811e2d4f>] sysfs_write_file+0xef/0x170
 [<ffffffff81173e28>] vfs_write+0xc8/0x190
 [<ffffffff81173ff1>] sys_write+0x51/0x90
 [<ffffffff81528d29>] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7 c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [<ffffffff81160b2a>] slab_memory_callback+0x1ba/0x1c0
 RSP <ffff880efdcb7c68>
---[ end trace 749e9e9a67c78c12 ]---


Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 mm/slub.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..f8276db 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2840,7 +2840,6 @@ static void early_kmem_cache_node_alloc(int node)
 	init_tracking(kmem_cache_node, n);
 #endif
 	init_kmem_cache_node(n);
-	inc_slabs_node(kmem_cache_node, node, page->objects);
 
 	add_partial(n, page, DEACTIVATE_TO_HEAD);
 }
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
  2012-07-17 16:50 ` Jiang Liu
@ 2012-07-17 17:39   ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-17 17:39 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	Tony Luck, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Wed, 18 Jul 2012, Jiang Liu wrote:

> From: Jianguo Wu <wujianguo@huawei.com>
>
> From: Jianguo Wu <wujianguo@huawei.com>
>
> SLUB allocator may cause a BUG_ON() when offlining a memory node if
> CONFIG_SLUB_DEBUG is on. The scenario is:
>
> 1) when creating kmem_cache_node slab, it cause inc_slabs_node() twice.
> early_kmem_cache_node_alloc
> 	->new_slab
> 		->inc_slabs_node
> 	->inc_slabs_node

New slab will not be able to increment the slab counter. It will
check that there is no per node structure yet and then skip the inc slabs
node.

This suggests that a call to early_kmem_cache_node_alloc was not needed
because the per node structure already existed. Lets fix that instead.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-17 17:39   ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-17 17:39 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	Tony Luck, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Wed, 18 Jul 2012, Jiang Liu wrote:

> From: Jianguo Wu <wujianguo@huawei.com>
>
> From: Jianguo Wu <wujianguo@huawei.com>
>
> SLUB allocator may cause a BUG_ON() when offlining a memory node if
> CONFIG_SLUB_DEBUG is on. The scenario is:
>
> 1) when creating kmem_cache_node slab, it cause inc_slabs_node() twice.
> early_kmem_cache_node_alloc
> 	->new_slab
> 		->inc_slabs_node
> 	->inc_slabs_node

New slab will not be able to increment the slab counter. It will
check that there is no per node structure yet and then skip the inc slabs
node.

This suggests that a call to early_kmem_cache_node_alloc was not needed
because the per node structure already existed. Lets fix that instead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
  2012-07-17 17:39   ` Christoph Lameter
@ 2012-07-17 17:53     ` Luck, Tony
  -1 siblings, 0 replies; 20+ messages in thread
From: Luck, Tony @ 2012-07-17 17:53 UTC (permalink / raw)
  To: Christoph Lameter, Jiang Liu
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes, Minchan Kim,
	Keping Chen, linux-mm, linux-kernel

> This suggests that a call to early_kmem_cache_node_alloc was not needed
> because the per node structure already existed. Lets fix that instead.

Perhaps by just having one API for users to call? It seems odd to force users
to figure out whether they are called before some magic time during boot
and use the "early...()" call. Shouldn't we hide this sort of detail from them?

-Tony


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-17 17:53     ` Luck, Tony
  0 siblings, 0 replies; 20+ messages in thread
From: Luck, Tony @ 2012-07-17 17:53 UTC (permalink / raw)
  To: Christoph Lameter, Jiang Liu
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes, Minchan Kim,
	Keping Chen, linux-mm, linux-kernel

> This suggests that a call to early_kmem_cache_node_alloc was not needed
> because the per node structure already existed. Lets fix that instead.

Perhaps by just having one API for users to call? It seems odd to force users
to figure out whether they are called before some magic time during boot
and use the "early...()" call. Shouldn't we hide this sort of detail from them?

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
  2012-07-17 17:53     ` Luck, Tony
@ 2012-07-18 15:30       ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-18 15:30 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jiang Liu, Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu,
	Jiang Liu, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Tue, 17 Jul 2012, Luck, Tony wrote:

> > This suggests that a call to early_kmem_cache_node_alloc was not needed
> > because the per node structure already existed. Lets fix that instead.
>
> Perhaps by just having one API for users to call? It seems odd to force users
> to figure out whether they are called before some magic time during boot
> and use the "early...()" call. Shouldn't we hide this sort of detail from them?

The early_ calls are internal to the allocator and not exposed to the
user.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-18 15:30       ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-18 15:30 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jiang Liu, Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu,
	Jiang Liu, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Tue, 17 Jul 2012, Luck, Tony wrote:

> > This suggests that a call to early_kmem_cache_node_alloc was not needed
> > because the per node structure already existed. Lets fix that instead.
>
> Perhaps by just having one API for users to call? It seems odd to force users
> to figure out whether they are called before some magic time during boot
> and use the "early...()" call. Shouldn't we hide this sort of detail from them?

The early_ calls are internal to the allocator and not exposed to the
user.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
  2012-07-17 17:39   ` Christoph Lameter
@ 2012-07-18 16:52     ` Jiang Liu
  -1 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-18 16:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	Tony Luck, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

Hi Chris,
	I found the previous analysis of the BUG_ON() issue is incorrect after
another round of code review. 
	The really issue is that function early_kmem_cache_node_alloc() calls
inc_slabs_node(kmem_cache_node, node, page->objects) to increase the object 
count on local node no matter whether page is allocated from local or remote
node. With current implementation it's OK because every memory node has normal
memory so page is allocated from local node. Now we are working on a patch set
to improve memory hotplug. The basic idea is to to let some memory nodes only
host ZONE_MOVABLE zone, so we could easily remove the whole memory node when 
needed. That means some memory nodes have no ZONE_NORMAL/ZONE_DMA, and the page
will be allocated from remote node in function early_kmem_cache_node_alloc().
But early_kmem_cache_node_alloc() still increases object count on local node,
which triggers the BUG_ON eventually when removing the affected memory node.
	I will try to work out another version for it.
	Thanks!
	Gerry

On 07/18/2012 01:39 AM, Christoph Lameter wrote:
> On Wed, 18 Jul 2012, Jiang Liu wrote:
> 
>> From: Jianguo Wu <wujianguo@huawei.com>
>>
>> From: Jianguo Wu <wujianguo@huawei.com>
>>
>> SLUB allocator may cause a BUG_ON() when offlining a memory node if
>> CONFIG_SLUB_DEBUG is on. The scenario is:
>>
>> 1) when creating kmem_cache_node slab, it cause inc_slabs_node() twice.
>> early_kmem_cache_node_alloc
>> 	->new_slab
>> 		->inc_slabs_node
>> 	->inc_slabs_node
> 
> New slab will not be able to increment the slab counter. It will
> check that there is no per node structure yet and then skip the inc slabs
> node.
> 
> This suggests that a call to early_kmem_cache_node_alloc was not needed
> because the per node structure already existed. Lets fix that instead.
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-18 16:52     ` Jiang Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-18 16:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	Tony Luck, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

Hi Chris,
	I found the previous analysis of the BUG_ON() issue is incorrect after
another round of code review. 
	The really issue is that function early_kmem_cache_node_alloc() calls
inc_slabs_node(kmem_cache_node, node, page->objects) to increase the object 
count on local node no matter whether page is allocated from local or remote
node. With current implementation it's OK because every memory node has normal
memory so page is allocated from local node. Now we are working on a patch set
to improve memory hotplug. The basic idea is to to let some memory nodes only
host ZONE_MOVABLE zone, so we could easily remove the whole memory node when 
needed. That means some memory nodes have no ZONE_NORMAL/ZONE_DMA, and the page
will be allocated from remote node in function early_kmem_cache_node_alloc().
But early_kmem_cache_node_alloc() still increases object count on local node,
which triggers the BUG_ON eventually when removing the affected memory node.
	I will try to work out another version for it.
	Thanks!
	Gerry

On 07/18/2012 01:39 AM, Christoph Lameter wrote:
> On Wed, 18 Jul 2012, Jiang Liu wrote:
> 
>> From: Jianguo Wu <wujianguo@huawei.com>
>>
>> From: Jianguo Wu <wujianguo@huawei.com>
>>
>> SLUB allocator may cause a BUG_ON() when offlining a memory node if
>> CONFIG_SLUB_DEBUG is on. The scenario is:
>>
>> 1) when creating kmem_cache_node slab, it cause inc_slabs_node() twice.
>> early_kmem_cache_node_alloc
>> 	->new_slab
>> 		->inc_slabs_node
>> 	->inc_slabs_node
> 
> New slab will not be able to increment the slab counter. It will
> check that there is no per node structure yet and then skip the inc slabs
> node.
> 
> This suggests that a call to early_kmem_cache_node_alloc was not needed
> because the per node structure already existed. Lets fix that instead.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
  2012-07-18 16:52     ` Jiang Liu
@ 2012-07-18 18:53       ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-18 18:53 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	Tony Luck, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Thu, 19 Jul 2012, Jiang Liu wrote:

> 	I found the previous analysis of the BUG_ON() issue is incorrect after
> another round of code review.
> 	The really issue is that function early_kmem_cache_node_alloc() calls
> inc_slabs_node(kmem_cache_node, node, page->objects) to increase the object
> count on local node no matter whether page is allocated from local or remote
> node. With current implementation it's OK because every memory node has normal
> memory so page is allocated from local node. Now we are working on a patch set
> to improve memory hotplug. The basic idea is to to let some memory nodes only
> host ZONE_MOVABLE zone, so we could easily remove the whole memory node when
> needed. That means some memory nodes have no ZONE_NORMAL/ZONE_DMA, and the page
> will be allocated from remote node in function early_kmem_cache_node_alloc().
> But early_kmem_cache_node_alloc() still increases object count on local node,
> which triggers the BUG_ON eventually when removing the affected memory node.

That does not work. If the node does only have ZONE_MOVABLE then no slab
object can be allocated from the zone. You need to modify the slab
allocators to not allocate a per node structure for those zones and forbit
all allocations from such a node. Actually that should already work
because only ZONE_NORMAL nodes should get a per node structure because
slab objects can only be allocated from ZONE_NORMAL.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on
@ 2012-07-18 18:53       ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-18 18:53 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Pekka Enberg, Matt Mackall, Mel Gorman, Jianguo Wu, Jiang Liu,
	Tony Luck, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Thu, 19 Jul 2012, Jiang Liu wrote:

> 	I found the previous analysis of the BUG_ON() issue is incorrect after
> another round of code review.
> 	The really issue is that function early_kmem_cache_node_alloc() calls
> inc_slabs_node(kmem_cache_node, node, page->objects) to increase the object
> count on local node no matter whether page is allocated from local or remote
> node. With current implementation it's OK because every memory node has normal
> memory so page is allocated from local node. Now we are working on a patch set
> to improve memory hotplug. The basic idea is to to let some memory nodes only
> host ZONE_MOVABLE zone, so we could easily remove the whole memory node when
> needed. That means some memory nodes have no ZONE_NORMAL/ZONE_DMA, and the page
> will be allocated from remote node in function early_kmem_cache_node_alloc().
> But early_kmem_cache_node_alloc() still increases object count on local node,
> which triggers the BUG_ON eventually when removing the affected memory node.

That does not work. If the node does only have ZONE_MOVABLE then no slab
object can be allocated from the zone. You need to modify the slab
allocators to not allocate a per node structure for those zones and forbit
all allocations from such a node. Actually that should already work
because only ZONE_NORMAL nodes should get a per node structure because
slab objects can only be allocated from ZONE_NORMAL.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
  2012-07-18 18:53       ` Christoph Lameter
@ 2012-07-24  9:55         ` Jiang Liu
  -1 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-24  9:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall, Mel Gorman,
	Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel, Jiang Liu

From: WuJianguo <wujianguo@huawei.com>

When handling a memory node with only movable zone, function
early_kmem_cache_node_alloc() will allocate a page from remote node but
still increase object count on local node, which will trigger a BUG_ON()
as below when hot-removing this memory node. Actually there's no need to
create kmem_cache_node for memory node with only movable zone at all.

------------[ cut here ]------------
kernel BUG at mm/slub.c:3590!
invalid opcode: 0000 [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg
lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[<ffffffff81160b2a>]  [<ffffffff81160b2a>]
slab_memory_callback+0x1ba/0x1c0
RSP: 0018:ffff880efdcb7c68  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880f7ec06100 RCX: 0000000100400001
RDX: 0000000100400002 RSI: ffff880f7ec02000 RDI: ffff880f7ec06100
RBP: ffff880efdcb7c78 R08: ffff88107b6fb098 R09: ffffffff81160a00
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000019
R13: 00000000fffffffb R14: 0000000000000000 R15: ffffffff81abe930
FS:  00007f709f342700(0000) GS:ffff880f7f3a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b5a874570 CR3: 0000000f0da20000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 46287, threadinfo ffff880efdcb6000, task ffff880f0fa50000)
Stack:
 0000000000000004 ffff880efdcb7da8 ffff880efdcb7cb8 ffffffff81524af5
 0000000000000001 ffffffff81a8b620 ffffffff81a8b640 0000000000000004
 ffff880efdcb7da8 00000000ffffffff ffff880efdcb7d08 ffffffff8107a89a
Call Trace:
 [<ffffffff81524af5>] notifier_call_chain+0x55/0x80
 [<ffffffff8107a89a>] __blocking_notifier_call_chain+0x5a/0x80
 [<ffffffff8107a8d6>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff81352f0b>] memory_notify+0x1b/0x20
 [<ffffffff81507104>] offline_pages+0x624/0x700
 [<ffffffff811619de>] remove_memory+0x1e/0x20
 [<ffffffff813530cc>] memory_block_change_state+0x13c/0x2e0
 [<ffffffff81153e96>] ? alloc_pages_current+0xb6/0x120
 [<ffffffff81353332>] store_mem_state+0xc2/0xd0
 [<ffffffff8133e190>] dev_attr_store+0x20/0x30
 [<ffffffff811e2d4f>] sysfs_write_file+0xef/0x170
 [<ffffffff81173e28>] vfs_write+0xc8/0x190
 [<ffffffff81173ff1>] sys_write+0x51/0x90
 [<ffffffff81528d29>] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [<ffffffff81160b2a>] slab_memory_callback+0x1ba/0x1c0
 RSP <ffff880efdcb7c68>
---[ end trace 749e9e9a67c78c12 ]---

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 mm/slub.c |   44 +++++++++++++++++++++++++++++++++-----------
 1 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..3976745 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2803,6 +2803,17 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
 
 static struct kmem_cache *kmem_cache_node;
 
+static bool node_has_normal_memory(int node)
+{
+	int i;
+
+	for (i = ZONE_NORMAL; i >= 0; i--)
+		if (populated_zone(&NODE_DATA(node)->node_zones[i]))
+			return true;
+
+	return false;
+}
+
 /*
  * No kmalloc_node yet so do it by hand. We know that this is the first
  * slab on the node for this slabcache. There are no concurrent accesses
@@ -2866,6 +2877,10 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n;
 
+		/* Do not create kmem_cache_node for node without normal memory */
+		if (!node_has_normal_memory(node))
+			continue;
+
 		if (slab_state == DOWN) {
 			early_kmem_cache_node_alloc(node);
 			continue;
@@ -3178,9 +3193,11 @@ static inline int kmem_cache_close(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n = get_node(s, node);
 
-		free_partial(s, n);
-		if (n->nr_partial || slabs_node(s, node))
-			return 1;
+		if (n) {
+			free_partial(s, n);
+			if (n->nr_partial || slabs_node(s, node))
+				return 1;
+		}
 	}
 	free_kmem_cache_nodes(s);
 	return 0;
@@ -3509,7 +3526,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		n = get_node(s, node);
 
-		if (!n->nr_partial)
+		if (!n || !n->nr_partial)
 			continue;
 
 		for (i = 0; i < objects; i++)
@@ -4170,7 +4187,8 @@ static long validate_slab_cache(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n = get_node(s, node);
 
-		count += validate_slab_node(s, n, map);
+		if (n)
+			count += validate_slab_node(s, n, map);
 	}
 	kfree(map);
 	return count;
@@ -4339,7 +4357,7 @@ static int list_locations(struct kmem_cache *s, char *buf,
 		unsigned long flags;
 		struct page *page;
 
-		if (!atomic_long_read(&n->nr_slabs))
+		if (!n || !atomic_long_read(&n->nr_slabs))
 			continue;
 
 		spin_lock_irqsave(&n->list_lock, flags);
@@ -4534,11 +4552,13 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 		for_each_node_state(node, N_NORMAL_MEMORY) {
 			struct kmem_cache_node *n = get_node(s, node);
 
-		if (flags & SO_TOTAL)
-			x = atomic_long_read(&n->total_objects);
-		else if (flags & SO_OBJECTS)
-			x = atomic_long_read(&n->total_objects) -
-				count_partial(n, count_free);
+			if (!n)
+				continue;
+			if (flags & SO_TOTAL)
+				x = atomic_long_read(&n->total_objects);
+			else if (flags & SO_OBJECTS)
+				x = atomic_long_read(&n->total_objects) -
+					count_partial(n, count_free);
 
 			else
 				x = atomic_long_read(&n->nr_slabs);
@@ -4552,6 +4572,8 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 		for_each_node_state(node, N_NORMAL_MEMORY) {
 			struct kmem_cache_node *n = get_node(s, node);
 
+			if (!n)
+				continue;
 			if (flags & SO_TOTAL)
 				x = count_partial(n, count_total);
 			else if (flags & SO_OBJECTS)
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
@ 2012-07-24  9:55         ` Jiang Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-24  9:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall, Mel Gorman,
	Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel, Jiang Liu

From: WuJianguo <wujianguo@huawei.com>

When handling a memory node with only movable zone, function
early_kmem_cache_node_alloc() will allocate a page from remote node but
still increase object count on local node, which will trigger a BUG_ON()
as below when hot-removing this memory node. Actually there's no need to
create kmem_cache_node for memory node with only movable zone at all.

------------[ cut here ]------------
kernel BUG at mm/slub.c:3590!
invalid opcode: 0000 [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg
lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[<ffffffff81160b2a>]  [<ffffffff81160b2a>]
slab_memory_callback+0x1ba/0x1c0
RSP: 0018:ffff880efdcb7c68  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880f7ec06100 RCX: 0000000100400001
RDX: 0000000100400002 RSI: ffff880f7ec02000 RDI: ffff880f7ec06100
RBP: ffff880efdcb7c78 R08: ffff88107b6fb098 R09: ffffffff81160a00
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000019
R13: 00000000fffffffb R14: 0000000000000000 R15: ffffffff81abe930
FS:  00007f709f342700(0000) GS:ffff880f7f3a0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b5a874570 CR3: 0000000f0da20000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 46287, threadinfo ffff880efdcb6000, task ffff880f0fa50000)
Stack:
 0000000000000004 ffff880efdcb7da8 ffff880efdcb7cb8 ffffffff81524af5
 0000000000000001 ffffffff81a8b620 ffffffff81a8b640 0000000000000004
 ffff880efdcb7da8 00000000ffffffff ffff880efdcb7d08 ffffffff8107a89a
Call Trace:
 [<ffffffff81524af5>] notifier_call_chain+0x55/0x80
 [<ffffffff8107a89a>] __blocking_notifier_call_chain+0x5a/0x80
 [<ffffffff8107a8d6>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff81352f0b>] memory_notify+0x1b/0x20
 [<ffffffff81507104>] offline_pages+0x624/0x700
 [<ffffffff811619de>] remove_memory+0x1e/0x20
 [<ffffffff813530cc>] memory_block_change_state+0x13c/0x2e0
 [<ffffffff81153e96>] ? alloc_pages_current+0xb6/0x120
 [<ffffffff81353332>] store_mem_state+0xc2/0xd0
 [<ffffffff8133e190>] dev_attr_store+0x20/0x30
 [<ffffffff811e2d4f>] sysfs_write_file+0xef/0x170
 [<ffffffff81173e28>] vfs_write+0xc8/0x190
 [<ffffffff81173ff1>] sys_write+0x51/0x90
 [<ffffffff81528d29>] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [<ffffffff81160b2a>] slab_memory_callback+0x1ba/0x1c0
 RSP <ffff880efdcb7c68>
---[ end trace 749e9e9a67c78c12 ]---

Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
---
 mm/slub.c |   44 +++++++++++++++++++++++++++++++++-----------
 1 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..3976745 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2803,6 +2803,17 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
 
 static struct kmem_cache *kmem_cache_node;
 
+static bool node_has_normal_memory(int node)
+{
+	int i;
+
+	for (i = ZONE_NORMAL; i >= 0; i--)
+		if (populated_zone(&NODE_DATA(node)->node_zones[i]))
+			return true;
+
+	return false;
+}
+
 /*
  * No kmalloc_node yet so do it by hand. We know that this is the first
  * slab on the node for this slabcache. There are no concurrent accesses
@@ -2866,6 +2877,10 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n;
 
+		/* Do not create kmem_cache_node for node without normal memory */
+		if (!node_has_normal_memory(node))
+			continue;
+
 		if (slab_state == DOWN) {
 			early_kmem_cache_node_alloc(node);
 			continue;
@@ -3178,9 +3193,11 @@ static inline int kmem_cache_close(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n = get_node(s, node);
 
-		free_partial(s, n);
-		if (n->nr_partial || slabs_node(s, node))
-			return 1;
+		if (n) {
+			free_partial(s, n);
+			if (n->nr_partial || slabs_node(s, node))
+				return 1;
+		}
 	}
 	free_kmem_cache_nodes(s);
 	return 0;
@@ -3509,7 +3526,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		n = get_node(s, node);
 
-		if (!n->nr_partial)
+		if (!n || !n->nr_partial)
 			continue;
 
 		for (i = 0; i < objects; i++)
@@ -4170,7 +4187,8 @@ static long validate_slab_cache(struct kmem_cache *s)
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		struct kmem_cache_node *n = get_node(s, node);
 
-		count += validate_slab_node(s, n, map);
+		if (n)
+			count += validate_slab_node(s, n, map);
 	}
 	kfree(map);
 	return count;
@@ -4339,7 +4357,7 @@ static int list_locations(struct kmem_cache *s, char *buf,
 		unsigned long flags;
 		struct page *page;
 
-		if (!atomic_long_read(&n->nr_slabs))
+		if (!n || !atomic_long_read(&n->nr_slabs))
 			continue;
 
 		spin_lock_irqsave(&n->list_lock, flags);
@@ -4534,11 +4552,13 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 		for_each_node_state(node, N_NORMAL_MEMORY) {
 			struct kmem_cache_node *n = get_node(s, node);
 
-		if (flags & SO_TOTAL)
-			x = atomic_long_read(&n->total_objects);
-		else if (flags & SO_OBJECTS)
-			x = atomic_long_read(&n->total_objects) -
-				count_partial(n, count_free);
+			if (!n)
+				continue;
+			if (flags & SO_TOTAL)
+				x = atomic_long_read(&n->total_objects);
+			else if (flags & SO_OBJECTS)
+				x = atomic_long_read(&n->total_objects) -
+					count_partial(n, count_free);
 
 			else
 				x = atomic_long_read(&n->nr_slabs);
@@ -4552,6 +4572,8 @@ static ssize_t show_slab_objects(struct kmem_cache *s,
 		for_each_node_state(node, N_NORMAL_MEMORY) {
 			struct kmem_cache_node *n = get_node(s, node);
 
+			if (!n)
+				continue;
 			if (flags & SO_TOTAL)
 				x = count_partial(n, count_total);
 			else if (flags & SO_OBJECTS)
-- 
1.7.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
  2012-07-24  9:55         ` Jiang Liu
@ 2012-07-24 14:45           ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-24 14:45 UTC (permalink / raw)
  To: Jiang Liu
  Cc: WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall, Mel Gorman,
	Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel, Jiang Liu

On Tue, 24 Jul 2012, Jiang Liu wrote:

>
> diff --git a/mm/slub.c b/mm/slub.c
> index 8c691fa..3976745 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2803,6 +2803,17 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>
>  static struct kmem_cache *kmem_cache_node;
>
> +static bool node_has_normal_memory(int node)
> +{
> +	int i;
> +
> +	for (i = ZONE_NORMAL; i >= 0; i--)
> +		if (populated_zone(&NODE_DATA(node)->node_zones[i]))
> +			return true;
> +
> +	return false;
> +}

There is already a N_NORMAL_MEMORY node map that contains a list of node
that have *normal* memory usable by slab allocators etc. I think the
cleanest solution would be to clear the corresponding node bits for your
special movable only zones. Then you wont be needing to modify other
subsystems anymore.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
@ 2012-07-24 14:45           ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-24 14:45 UTC (permalink / raw)
  To: Jiang Liu
  Cc: WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall, Mel Gorman,
	Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro, David Rientjes,
	Minchan Kim, Keping Chen, linux-mm, linux-kernel, Jiang Liu

On Tue, 24 Jul 2012, Jiang Liu wrote:

>
> diff --git a/mm/slub.c b/mm/slub.c
> index 8c691fa..3976745 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2803,6 +2803,17 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>
>  static struct kmem_cache *kmem_cache_node;
>
> +static bool node_has_normal_memory(int node)
> +{
> +	int i;
> +
> +	for (i = ZONE_NORMAL; i >= 0; i--)
> +		if (populated_zone(&NODE_DATA(node)->node_zones[i]))
> +			return true;
> +
> +	return false;
> +}

There is already a N_NORMAL_MEMORY node map that contains a list of node
that have *normal* memory usable by slab allocators etc. I think the
cleanest solution would be to clear the corresponding node bits for your
special movable only zones. Then you wont be needing to modify other
subsystems anymore.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
  2012-07-24 14:45           ` Christoph Lameter
@ 2012-07-24 17:00             ` Jiang Liu
  -1 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-24 17:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jiang Liu, WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall,
	Mel Gorman, Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	David Rientjes, Minchan Kim, Keping Chen, linux-mm, linux-kernel

On 07/24/2012 10:45 PM, Christoph Lameter wrote:
> On Tue, 24 Jul 2012, Jiang Liu wrote:
> 
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 8c691fa..3976745 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2803,6 +2803,17 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>>
>>  static struct kmem_cache *kmem_cache_node;
>>
>> +static bool node_has_normal_memory(int node)
>> +{
>> +	int i;
>> +
>> +	for (i = ZONE_NORMAL; i >= 0; i--)
>> +		if (populated_zone(&NODE_DATA(node)->node_zones[i]))
>> +			return true;
>> +
>> +	return false;
>> +}
> 
> There is already a N_NORMAL_MEMORY node map that contains a list of node
> that have *normal* memory usable by slab allocators etc. I think the
> cleanest solution would be to clear the corresponding node bits for your
> special movable only zones. Then you wont be needing to modify other
> subsystems anymore.
> 
Hi Chris,
	Thanks for your comments! I have thought about the solution mentioned,
but seems it doesn't work. We have node masks for both N_NORMAL_MEMORY and
N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as x86.
But we still don't have such a mechanism to distinguish between "normal" and "movable"
memory. So for memory nodes with only movable zones, we still set N_NORMAL_MEMORY for
them. One possible solution is to add a node mask for "N_NORMAL_OR_MOVABLE_MEMORY",
but haven't tried that yet. Will have a try for that.
	Thanks!
	Gerry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
@ 2012-07-24 17:00             ` Jiang Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Jiang Liu @ 2012-07-24 17:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jiang Liu, WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall,
	Mel Gorman, Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	David Rientjes, Minchan Kim, Keping Chen, linux-mm, linux-kernel

On 07/24/2012 10:45 PM, Christoph Lameter wrote:
> On Tue, 24 Jul 2012, Jiang Liu wrote:
> 
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 8c691fa..3976745 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2803,6 +2803,17 @@ static inline int alloc_kmem_cache_cpus(struct kmem_cache *s)
>>
>>  static struct kmem_cache *kmem_cache_node;
>>
>> +static bool node_has_normal_memory(int node)
>> +{
>> +	int i;
>> +
>> +	for (i = ZONE_NORMAL; i >= 0; i--)
>> +		if (populated_zone(&NODE_DATA(node)->node_zones[i]))
>> +			return true;
>> +
>> +	return false;
>> +}
> 
> There is already a N_NORMAL_MEMORY node map that contains a list of node
> that have *normal* memory usable by slab allocators etc. I think the
> cleanest solution would be to clear the corresponding node bits for your
> special movable only zones. Then you wont be needing to modify other
> subsystems anymore.
> 
Hi Chris,
	Thanks for your comments! I have thought about the solution mentioned,
but seems it doesn't work. We have node masks for both N_NORMAL_MEMORY and
N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as x86.
But we still don't have such a mechanism to distinguish between "normal" and "movable"
memory. So for memory nodes with only movable zones, we still set N_NORMAL_MEMORY for
them. One possible solution is to add a node mask for "N_NORMAL_OR_MOVABLE_MEMORY",
but haven't tried that yet. Will have a try for that.
	Thanks!
	Gerry

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
  2012-07-24 17:00             ` Jiang Liu
@ 2012-07-25 15:31               ` Christoph Lameter
  -1 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-25 15:31 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Jiang Liu, WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall,
	Mel Gorman, Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	David Rientjes, Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Wed, 25 Jul 2012, Jiang Liu wrote:

> > There is already a N_NORMAL_MEMORY node map that contains a list of node
> > that have *normal* memory usable by slab allocators etc. I think the
> > cleanest solution would be to clear the corresponding node bits for your
> > special movable only zones. Then you wont be needing to modify other
> > subsystems anymore.
> >
> Hi Chris,
> 	Thanks for your comments! I have thought about the solution mentioned,
> but seems it doesn't work. We have node masks for both N_NORMAL_MEMORY and
> N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as x86.
> But we still don't have such a mechanism to distinguish between "normal" and "movable"
> memory. So for memory nodes with only movable zones, we still set N_NORMAL_MEMORY for
> them. One possible solution is to add a node mask for "N_NORMAL_OR_MOVABLE_MEMORY",
> but haven't tried that yet. Will have a try for that.

Hmmm... Maybe add another N_LRU_MEMORY bitmask and replace those
N_NORMAL_MEMORY uses with N_LRU_MEMORY as needed? Use N_NORMAL_MEMORY for
subsystems that need to do regular (non LRU) allocations that are not
movable?
_

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory
@ 2012-07-25 15:31               ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2012-07-25 15:31 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Jiang Liu, WuJianguo, Tony Luck, Pekka Enberg, Matt Mackall,
	Mel Gorman, Yinghai Lu, KAMEZAWA Hiroyuki, KOSAKI Motohiro,
	David Rientjes, Minchan Kim, Keping Chen, linux-mm, linux-kernel

On Wed, 25 Jul 2012, Jiang Liu wrote:

> > There is already a N_NORMAL_MEMORY node map that contains a list of node
> > that have *normal* memory usable by slab allocators etc. I think the
> > cleanest solution would be to clear the corresponding node bits for your
> > special movable only zones. Then you wont be needing to modify other
> > subsystems anymore.
> >
> Hi Chris,
> 	Thanks for your comments! I have thought about the solution mentioned,
> but seems it doesn't work. We have node masks for both N_NORMAL_MEMORY and
> N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as x86.
> But we still don't have such a mechanism to distinguish between "normal" and "movable"
> memory. So for memory nodes with only movable zones, we still set N_NORMAL_MEMORY for
> them. One possible solution is to add a node mask for "N_NORMAL_OR_MOVABLE_MEMORY",
> but haven't tried that yet. Will have a try for that.

Hmmm... Maybe add another N_LRU_MEMORY bitmask and replace those
N_NORMAL_MEMORY uses with N_LRU_MEMORY as needed? Use N_NORMAL_MEMORY for
subsystems that need to do regular (non LRU) allocations that are not
movable?
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-07-25 15:31 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-17 16:50 [PATCH] mm/slub: fix a BUG_ON() when offlining a memory node and CONFIG_SLUB_DEBUG is on Jiang Liu
2012-07-17 16:50 ` Jiang Liu
2012-07-17 17:39 ` Christoph Lameter
2012-07-17 17:39   ` Christoph Lameter
2012-07-17 17:53   ` Luck, Tony
2012-07-17 17:53     ` Luck, Tony
2012-07-18 15:30     ` Christoph Lameter
2012-07-18 15:30       ` Christoph Lameter
2012-07-18 16:52   ` Jiang Liu
2012-07-18 16:52     ` Jiang Liu
2012-07-18 18:53     ` Christoph Lameter
2012-07-18 18:53       ` Christoph Lameter
2012-07-24  9:55       ` [RFC PATCH v2] SLUB: enhance slub to handle memory nodes without normal memory Jiang Liu
2012-07-24  9:55         ` Jiang Liu
2012-07-24 14:45         ` Christoph Lameter
2012-07-24 14:45           ` Christoph Lameter
2012-07-24 17:00           ` Jiang Liu
2012-07-24 17:00             ` Jiang Liu
2012-07-25 15:31             ` Christoph Lameter
2012-07-25 15:31               ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.