All of lore.kernel.org
 help / color / mirror / Atom feed
* node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-03  3:30 ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-03  3:30 UTC (permalink / raw)
  To: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu
  Cc: Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo

When hot-remove a numa node, we will clear pgdat,
but is memset 0 safe in try_offline_node()?

process A:			offline node XX:
for_each_populated_zone()
find online node XX
cond_resched()
				offline cpu and memory, then try_offline_node()
				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
access node XX's pgdat
NULL pointer access error

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 32+ messages in thread

* node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-03  3:30 ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-03  3:30 UTC (permalink / raw)
  To: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu
  Cc: Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo

When hot-remove a numa node, we will clear pgdat,
but is memset 0 safe in try_offline_node()?

process A:			offline node XX:
for_each_populated_zone()
find online node XX
cond_resched()
				offline cpu and memory, then try_offline_node()
				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
access node XX's pgdat
NULL pointer access error

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-03  3:30 ` Xishi Qiu
@ 2015-03-03 10:20   ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-03 10:20 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo

Hi Xishi,
On 03/03/2015 11:30 AM, Xishi Qiu wrote:

> When hot-remove a numa node, we will clear pgdat,
> but is memset 0 safe in try_offline_node()?

It is not safe here. In fact, this is a temporary solution here.
As you know, pgdat is accessed lock-less now, so protection
mechanism (RCU?) is needed to make it completely safe here,
but it seems a bit over-kill.

> 
> process A:			offline node XX:
> for_each_populated_zone()
> find online node XX
> cond_resched()
> 				offline cpu and memory, then try_offline_node()
> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
> access node XX's pgdat
> NULL pointer access error

It's possible, but I did not meet this condition, did you?

Regards,
Gu

> 
> Thanks,
> Xishi Qiu
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-03 10:20   ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-03 10:20 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo

Hi Xishi,
On 03/03/2015 11:30 AM, Xishi Qiu wrote:

> When hot-remove a numa node, we will clear pgdat,
> but is memset 0 safe in try_offline_node()?

It is not safe here. In fact, this is a temporary solution here.
As you know, pgdat is accessed lock-less now, so protection
mechanism (RCU?) is needed to make it completely safe here,
but it seems a bit over-kill.

> 
> process A:			offline node XX:
> for_each_populated_zone()
> find online node XX
> cond_resched()
> 				offline cpu and memory, then try_offline_node()
> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
> access node XX's pgdat
> NULL pointer access error

It's possible, but I did not meet this condition, did you?

Regards,
Gu

> 
> Thanks,
> Xishi Qiu
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-03 10:20   ` Gu Zheng
@ 2015-03-04  2:22     ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  2:22 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo

On 2015/3/3 18:20, Gu Zheng wrote:

> Hi Xishi,
> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
> 
>> When hot-remove a numa node, we will clear pgdat,
>> but is memset 0 safe in try_offline_node()?
> 
> It is not safe here. In fact, this is a temporary solution here.
> As you know, pgdat is accessed lock-less now, so protection
> mechanism (RCU?) is needed to make it completely safe here,
> but it seems a bit over-kill.
> 
>>
>> process A:			offline node XX:
>> for_each_populated_zone()
>> find online node XX
>> cond_resched()
>> 				offline cpu and memory, then try_offline_node()
>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>> access node XX's pgdat
>> NULL pointer access error
> 
> It's possible, but I did not meet this condition, did you?
> 

Yes, we test hot-add/hot-remove node with stress, and meet the following
call trace several times.

	next_online_pgdat()
		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL

I add some printk, it shows the above pgdat is just the offline node's pgdat.
The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
And stop machine could not resolve it, because cond_resched() maybe in cyclical code.

[ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
[ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.011178] PGD 0 
[ 1422.011180] Oops: 0000 [#1] SMP 
[ 1422.011409] ACPI: Device does not support D3cold
[ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
[ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
[ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
[ 1422.012065] Workqueue: events vmstat_update
[ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
[ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
[ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
[ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
[ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
[ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
[ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
[ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
[ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
[ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1422.012328] Stack:
[ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
[ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
[ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
[ 1422.012328] Call Trace:
[ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
[ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
[ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
[ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
[ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
[ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
[ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
[ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
[ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70

Thanks,
Xishi Qiu

> Regards,
> Gu
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  2:22     ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  2:22 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo

On 2015/3/3 18:20, Gu Zheng wrote:

> Hi Xishi,
> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
> 
>> When hot-remove a numa node, we will clear pgdat,
>> but is memset 0 safe in try_offline_node()?
> 
> It is not safe here. In fact, this is a temporary solution here.
> As you know, pgdat is accessed lock-less now, so protection
> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
> but it seems a bit over-kill.
> 
>>
>> process A:			offline node XX:
>> for_each_populated_zone()
>> find online node XX
>> cond_resched()
>> 				offline cpu and memory, then try_offline_node()
>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>> access node XX's pgdat
>> NULL pointer access error
> 
> It's possible, but I did not meet this condition, did you?
> 

Yes, we test hot-add/hot-remove node with stress, and meet the following
call trace several times.

	next_online_pgdat()
		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL

I add some printk, it shows the above pgdat is just the offline node's pgdat.
The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
And stop machine could not resolve it, because cond_resched() maybe in cyclical code.

[ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
[ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.011178] PGD 0 
[ 1422.011180] Oops: 0000 [#1] SMP 
[ 1422.011409] ACPI: Device does not support D3cold
[ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
[ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
[ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
[ 1422.012065] Workqueue: events vmstat_update
[ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
[ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
[ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
[ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
[ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
[ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
[ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
[ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
[ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
[ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1422.012328] Stack:
[ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
[ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
[ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
[ 1422.012328] Call Trace:
[ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
[ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
[ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
[ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
[ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
[ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
[ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
[ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
[ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70

Thanks,
Xishi Qiu

> Regards,
> Gu
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  2:22     ` Xishi Qiu
@ 2015-03-04  2:52       ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  2:52 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

On 2015/3/4 10:22, Xishi Qiu wrote:

> On 2015/3/3 18:20, Gu Zheng wrote:
> 
>> Hi Xishi,
>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>
>>> When hot-remove a numa node, we will clear pgdat,
>>> but is memset 0 safe in try_offline_node()?
>>
>> It is not safe here. In fact, this is a temporary solution here.
>> As you know, pgdat is accessed lock-less now, so protection
>> mechanism (RCU?) is needed to make it completely safe here,
>> but it seems a bit over-kill.
>>

Hi Gu,

Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
I find this will be fine in the stress test except the warning 
when hot-add memory.

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  2:52       ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  2:52 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

On 2015/3/4 10:22, Xishi Qiu wrote:

> On 2015/3/3 18:20, Gu Zheng wrote:
> 
>> Hi Xishi,
>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>
>>> When hot-remove a numa node, we will clear pgdat,
>>> but is memset 0 safe in try_offline_node()?
>>
>> It is not safe here. In fact, this is a temporary solution here.
>> As you know, pgdat is accessed lock-less now, so protection
>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
>> but it seems a bit over-kill.
>>

Hi Gu,

Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
I find this will be fine in the stress test except the warning 
when hot-add memory.

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  2:22     ` Xishi Qiu
@ 2015-03-04  3:53       ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-04  3:53 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo

Hi Xishi,

On 03/04/2015 10:22 AM, Xishi Qiu wrote:

> On 2015/3/3 18:20, Gu Zheng wrote:
> 
>> Hi Xishi,
>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>
>>> When hot-remove a numa node, we will clear pgdat,
>>> but is memset 0 safe in try_offline_node()?
>>
>> It is not safe here. In fact, this is a temporary solution here.
>> As you know, pgdat is accessed lock-less now, so protection
>> mechanism (RCU?) is needed to make it completely safe here,
>> but it seems a bit over-kill.
>>
>>>
>>> process A:			offline node XX:
>>> for_each_populated_zone()
>>> find online node XX
>>> cond_resched()
>>> 				offline cpu and memory, then try_offline_node()
>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>> access node XX's pgdat
>>> NULL pointer access error
>>
>> It's possible, but I did not meet this condition, did you?
>>
> 
> Yes, we test hot-add/hot-remove node with stress, and meet the following
> call trace several times.

Thanks.

> 
> 	next_online_pgdat()
> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL

	memset(pgdat, 0, sizeof(*pgdat));
This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
NULL* is strange here.
But anyway, the bug is real, we must fix it.

Regards,
Gu

> 
> I add some printk, it shows the above pgdat is just the offline node's pgdat.
> The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
> And stop machine could not resolve it, because cond_resched() maybe in cyclical code.
> 
> [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
> [ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
> [ 1422.011178] PGD 0 
> [ 1422.011180] Oops: 0000 [#1] SMP 
> [ 1422.011409] ACPI: Device does not support D3cold
> [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
> [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
> [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
> [ 1422.012065] Workqueue: events vmstat_update
> [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
> [ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
> [ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
> [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
> [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
> [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
> [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
> [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
> [ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
> [ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
> [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1422.012328] Stack:
> [ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
> [ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
> [ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
> [ 1422.012328] Call Trace:
> [ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
> [ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
> [ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
> [ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
> [ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
> [ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
> [ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
> [ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
> [ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
> 
> Thanks,
> Xishi Qiu
> 
>> Regards,
>> Gu
>>
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>
>>
>>
>> .
>>
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  3:53       ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-04  3:53 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo

Hi Xishi,

On 03/04/2015 10:22 AM, Xishi Qiu wrote:

> On 2015/3/3 18:20, Gu Zheng wrote:
> 
>> Hi Xishi,
>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>
>>> When hot-remove a numa node, we will clear pgdat,
>>> but is memset 0 safe in try_offline_node()?
>>
>> It is not safe here. In fact, this is a temporary solution here.
>> As you know, pgdat is accessed lock-less now, so protection
>> mechanism (RCU?) is needed to make it completely safe here,
>> but it seems a bit over-kill.
>>
>>>
>>> process A:			offline node XX:
>>> for_each_populated_zone()
>>> find online node XX
>>> cond_resched()
>>> 				offline cpu and memory, then try_offline_node()
>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>> access node XX's pgdat
>>> NULL pointer access error
>>
>> It's possible, but I did not meet this condition, did you?
>>
> 
> Yes, we test hot-add/hot-remove node with stress, and meet the following
> call trace several times.

Thanks.

> 
> 	next_online_pgdat()
> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL

	memset(pgdat, 0, sizeof(*pgdat));
This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
NULL* is strange here.
But anyway, the bug is real, we must fix it.

Regards,
Gu

> 
> I add some printk, it shows the above pgdat is just the offline node's pgdat.
> The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
> And stop machine could not resolve it, because cond_resched() maybe in cyclical code.
> 
> [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
> [ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
> [ 1422.011178] PGD 0 
> [ 1422.011180] Oops: 0000 [#1] SMP 
> [ 1422.011409] ACPI: Device does not support D3cold
> [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
> [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
> [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
> [ 1422.012065] Workqueue: events vmstat_update
> [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
> [ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
> [ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
> [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
> [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
> [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
> [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
> [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
> [ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
> [ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
> [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1422.012328] Stack:
> [ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
> [ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
> [ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
> [ 1422.012328] Call Trace:
> [ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
> [ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
> [ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
> [ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
> [ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
> [ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
> [ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
> [ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
> [ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
> 
> Thanks,
> Xishi Qiu
> 
>> Regards,
>> Gu
>>
>>>
>>> Thanks,
>>> Xishi Qiu
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>
>>
>>
>> .
>>
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  2:52       ` Xishi Qiu
@ 2015-03-04  3:56         ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-04  3:56 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

Hi Xishi,
On 03/04/2015 10:52 AM, Xishi Qiu wrote:

> On 2015/3/4 10:22, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
> 
> Hi Gu,
> 
> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
> I find this will be fine in the stress test except the warning 
> when hot-add memory.

As you see, it will trigger the warning in free_area_init_node().
Could you try the following patch? It will reset the pgdat before reuse it.

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1778628..0717649 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
                        return NULL;
 
                arch_refresh_nodedata(nid, pgdat);
+       } else {
+               /* Reset the pgdat to reuse */
+               memset(pgdat, 0, sizeof(*pgdat));
        }
 
        /* we can use NODE_DATA(nid) from here */
@@ -2021,15 +2024,6 @@ void try_offline_node(int nid)
 
        /* notify that the node is down */
        call_node_notify(NODE_DOWN, (void *)(long)nid);
-
-       /*
-        * Since there is no way to guarentee the address of pgdat/zone is not
-        * on stack of any kernel threads or used by other kernel objects
-        * without reference counting or other symchronizing method, do not
-        * reset node_data and free pgdat here. Just reset it to 0 and reuse
-        * the memory when the node is online again.
-        */
-       memset(pgdat, 0, sizeof(*pgdat));
 }
 EXPORT_SYMBOL(try_offline_node);
 

> 
> Thanks,
> Xishi Qiu
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  3:56         ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-04  3:56 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

Hi Xishi,
On 03/04/2015 10:52 AM, Xishi Qiu wrote:

> On 2015/3/4 10:22, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
> 
> Hi Gu,
> 
> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
> I find this will be fine in the stress test except the warning 
> when hot-add memory.

As you see, it will trigger the warning in free_area_init_node().
Could you try the following patch? It will reset the pgdat before reuse it.

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1778628..0717649 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
                        return NULL;
 
                arch_refresh_nodedata(nid, pgdat);
+       } else {
+               /* Reset the pgdat to reuse */
+               memset(pgdat, 0, sizeof(*pgdat));
        }
 
        /* we can use NODE_DATA(nid) from here */
@@ -2021,15 +2024,6 @@ void try_offline_node(int nid)
 
        /* notify that the node is down */
        call_node_notify(NODE_DOWN, (void *)(long)nid);
-
-       /*
-        * Since there is no way to guarentee the address of pgdat/zone is not
-        * on stack of any kernel threads or used by other kernel objects
-        * without reference counting or other symchronizing method, do not
-        * reset node_data and free pgdat here. Just reset it to 0 and reuse
-        * the memory when the node is online again.
-        */
-       memset(pgdat, 0, sizeof(*pgdat));
 }
 EXPORT_SYMBOL(try_offline_node);
 

> 
> Thanks,
> Xishi Qiu
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  3:53       ` Gu Zheng
@ 2015-03-04  7:01         ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  7:01 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo

On 2015/3/4 11:53, Gu Zheng wrote:

> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:			offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> 				offline cpu and memory, then try_offline_node()
>>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>> 	next_online_pgdat()
>> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL
> 
> 	memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.

next_zone()
	pg_data_t *pgdat = zone->zone_pgdat;  // I think this pgdat is NULL, and NODE_DATA() is not NULL.
	...
	pgdat = next_online_pgdat(pgdat);
		int nid = next_online_node(pgdat->node_id);  // so here is the null pointer access

Thanks for your new patch, I'll test it.

Thanks,
Xishi Qiu


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  7:01         ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  7:01 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo

On 2015/3/4 11:53, Gu Zheng wrote:

> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:			offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> 				offline cpu and memory, then try_offline_node()
>>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>> 	next_online_pgdat()
>> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL
> 
> 	memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.

next_zone()
	pg_data_t *pgdat = zone->zone_pgdat;  // I think this pgdat is NULL, and NODE_DATA() is not NULL.
	...
	pgdat = next_online_pgdat(pgdat);
		int nid = next_online_node(pgdat->node_id);  // so here is the null pointer access

Thanks for your new patch, I'll test it.

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  3:56         ` Gu Zheng
@ 2015-03-04  8:03           ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  8:03 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

On 2015/3/4 11:56, Gu Zheng wrote:

> Hi Xishi,
> On 03/04/2015 10:52 AM, Xishi Qiu wrote:
> 
>> On 2015/3/4 10:22, Xishi Qiu wrote:
>>
>>> On 2015/3/3 18:20, Gu Zheng wrote:
>>>
>>>> Hi Xishi,
>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>>
>>>>> When hot-remove a numa node, we will clear pgdat,
>>>>> but is memset 0 safe in try_offline_node()?
>>>>
>>>> It is not safe here. In fact, this is a temporary solution here.
>>>> As you know, pgdat is accessed lock-less now, so protection
>>>> mechanism (RCU?) is needed to make it completely safe here,
>>>> but it seems a bit over-kill.
>>>>
>>
>> Hi Gu,
>>
>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
>> I find this will be fine in the stress test except the warning 
>> when hot-add memory.
> 
> As you see, it will trigger the warning in free_area_init_node().
> Could you try the following patch? It will reset the pgdat before reuse it.
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1778628..0717649 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>                         return NULL;
>  
>                 arch_refresh_nodedata(nid, pgdat);
> +       } else {
> +               /* Reset the pgdat to reuse */
> +               memset(pgdat, 0, sizeof(*pgdat));
>         }

Hi Gu,

If schedule last a long time, next_zone may be still access the pgdat here,
so it is not safe enough, right?

Thanks
Xishi Qiu

>  
>         /* we can use NODE_DATA(nid) from here */
> @@ -2021,15 +2024,6 @@ void try_offline_node(int nid)
>  
>         /* notify that the node is down */
>         call_node_notify(NODE_DOWN, (void *)(long)nid);
> -
> -       /*
> -        * Since there is no way to guarentee the address of pgdat/zone is not
> -        * on stack of any kernel threads or used by other kernel objects
> -        * without reference counting or other symchronizing method, do not
> -        * reset node_data and free pgdat here. Just reset it to 0 and reuse
> -        * the memory when the node is online again.
> -        */
> -       memset(pgdat, 0, sizeof(*pgdat));
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> .
>>
> 
> 
> 
> .
> 




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  8:03           ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-04  8:03 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

On 2015/3/4 11:56, Gu Zheng wrote:

> Hi Xishi,
> On 03/04/2015 10:52 AM, Xishi Qiu wrote:
> 
>> On 2015/3/4 10:22, Xishi Qiu wrote:
>>
>>> On 2015/3/3 18:20, Gu Zheng wrote:
>>>
>>>> Hi Xishi,
>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>>
>>>>> When hot-remove a numa node, we will clear pgdat,
>>>>> but is memset 0 safe in try_offline_node()?
>>>>
>>>> It is not safe here. In fact, this is a temporary solution here.
>>>> As you know, pgdat is accessed lock-less now, so protection
>>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
>>>> but it seems a bit over-kill.
>>>>
>>
>> Hi Gu,
>>
>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
>> I find this will be fine in the stress test except the warning 
>> when hot-add memory.
> 
> As you see, it will trigger the warning in free_area_init_node().
> Could you try the following patch? It will reset the pgdat before reuse it.
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1778628..0717649 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>                         return NULL;
>  
>                 arch_refresh_nodedata(nid, pgdat);
> +       } else {
> +               /* Reset the pgdat to reuse */
> +               memset(pgdat, 0, sizeof(*pgdat));
>         }

Hi Gu,

If schedule last a long time, next_zone may be still access the pgdat here,
so it is not safe enough, right?

Thanks
Xishi Qiu

>  
>         /* we can use NODE_DATA(nid) from here */
> @@ -2021,15 +2024,6 @@ void try_offline_node(int nid)
>  
>         /* notify that the node is down */
>         call_node_notify(NODE_DOWN, (void *)(long)nid);
> -
> -       /*
> -        * Since there is no way to guarentee the address of pgdat/zone is not
> -        * on stack of any kernel threads or used by other kernel objects
> -        * without reference counting or other symchronizing method, do not
> -        * reset node_data and free pgdat here. Just reset it to 0 and reuse
> -        * the memory when the node is online again.
> -        */
> -       memset(pgdat, 0, sizeof(*pgdat));
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> .
>>
> 
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  3:53       ` Gu Zheng
@ 2015-03-04  8:31         ` Xie XiuQi
  -1 siblings, 0 replies; 32+ messages in thread
From: Xie XiuQi @ 2015-03-04  8:31 UTC (permalink / raw)
  To: Gu Zheng, Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Hanjun Guo

On 2015/3/4 11:53, Gu Zheng wrote:
> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:			offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> 				offline cpu and memory, then try_offline_node()
>>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>> 	next_online_pgdat()
>> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL
> 
> 	memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.

Hi Gu,

This pgdat isn't 0, but pgdat->zone[i]->zone_pgdat is 0.
So pgdat is 0 in next_zone().

--
/*
 * next_zone - helper magic for for_each_zone()
 */
struct zone *next_zone(struct zone *zone)
{
        pg_data_t *pgdat = zone->zone_pgdat;

        if (zone < pgdat->node_zones + MAX_NR_ZONES - 1)
                zone++;
        else {
                pgdat = next_online_pgdat(pgdat);
                if (pgdat)
                        zone = pgdat->node_zones;
                else
                        zone = NULL;
        }
        return zone;
}

> But anyway, the bug is real, we must fix it.
> 
> Regards,
> Gu
> 
>>
>> I add some printk, it shows the above pgdat is just the offline node's pgdat.
>> The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
>> And stop machine could not resolve it, because cond_resched() maybe in cyclical code.
>>
>> [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
>> [ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
>> [ 1422.011178] PGD 0 
>> [ 1422.011180] Oops: 0000 [#1] SMP 
>> [ 1422.011409] ACPI: Device does not support D3cold
>> [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
>> [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
>> [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
>> [ 1422.012065] Workqueue: events vmstat_update
>> [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
>> [ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
>> [ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
>> [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
>> [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
>> [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
>> [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
>> [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
>> [ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
>> [ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
>> [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 1422.012328] Stack:
>> [ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
>> [ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
>> [ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
>> [ 1422.012328] Call Trace:
>> [ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
>> [ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
>> [ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
>> [ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
>> [ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
>> [ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
>> [ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
>> [ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
>> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
>> [ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
>> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
>>
>> Thanks,
>> Xishi Qiu
>>
>>> Regards,
>>> Gu
>>>
>>>>
>>>> Thanks,
>>>> Xishi Qiu
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> .
>>
> 
> 
> 
> .
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  8:31         ` Xie XiuQi
  0 siblings, 0 replies; 32+ messages in thread
From: Xie XiuQi @ 2015-03-04  8:31 UTC (permalink / raw)
  To: Gu Zheng, Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Hanjun Guo

On 2015/3/4 11:53, Gu Zheng wrote:
> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:			offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> 				offline cpu and memory, then try_offline_node()
>>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>> 	next_online_pgdat()
>> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL
> 
> 	memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.

Hi Gu,

This pgdat isn't 0, but pgdat->zone[i]->zone_pgdat is 0.
So pgdat is 0 in next_zone().

--
/*
 * next_zone - helper magic for for_each_zone()
 */
struct zone *next_zone(struct zone *zone)
{
        pg_data_t *pgdat = zone->zone_pgdat;

        if (zone < pgdat->node_zones + MAX_NR_ZONES - 1)
                zone++;
        else {
                pgdat = next_online_pgdat(pgdat);
                if (pgdat)
                        zone = pgdat->node_zones;
                else
                        zone = NULL;
        }
        return zone;
}

> But anyway, the bug is real, we must fix it.
> 
> Regards,
> Gu
> 
>>
>> I add some printk, it shows the above pgdat is just the offline node's pgdat.
>> The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
>> And stop machine could not resolve it, because cond_resched() maybe in cyclical code.
>>
>> [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
>> [ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
>> [ 1422.011178] PGD 0 
>> [ 1422.011180] Oops: 0000 [#1] SMP 
>> [ 1422.011409] ACPI: Device does not support D3cold
>> [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
>> [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
>> [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
>> [ 1422.012065] Workqueue: events vmstat_update
>> [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
>> [ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
>> [ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
>> [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
>> [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
>> [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
>> [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
>> [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
>> [ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
>> [ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
>> [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 1422.012328] Stack:
>> [ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
>> [ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
>> [ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
>> [ 1422.012328] Call Trace:
>> [ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
>> [ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
>> [ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
>> [ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
>> [ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
>> [ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
>> [ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
>> [ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
>> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
>> [ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
>> [ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
>>
>> Thanks,
>> Xishi Qiu
>>
>>> Regards,
>>> Gu
>>>
>>>>
>>>> Thanks,
>>>> Xishi Qiu
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>> .
>>
> 
> 
> 
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  8:03           ` Xishi Qiu
@ 2015-03-04  8:53             ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 32+ messages in thread
From: Kamezawa Hiroyuki @ 2015-03-04  8:53 UTC (permalink / raw)
  To: Xishi Qiu, Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

On 2015/03/04 17:03, Xishi Qiu wrote:
> On 2015/3/4 11:56, Gu Zheng wrote:
>
>> Hi Xishi,
>> On 03/04/2015 10:52 AM, Xishi Qiu wrote:
>>
>>> On 2015/3/4 10:22, Xishi Qiu wrote:
>>>
>>>> On 2015/3/3 18:20, Gu Zheng wrote:
>>>>
>>>>> Hi Xishi,
>>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>>>
>>>>>> When hot-remove a numa node, we will clear pgdat,
>>>>>> but is memset 0 safe in try_offline_node()?
>>>>>
>>>>> It is not safe here. In fact, this is a temporary solution here.
>>>>> As you know, pgdat is accessed lock-less now, so protection
>>>>> mechanism (RCU?) is needed to make it completely safe here,
>>>>> but it seems a bit over-kill.
>>>>>
>>>
>>> Hi Gu,
>>>
>>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
>>> I find this will be fine in the stress test except the warning
>>> when hot-add memory.
>>
>> As you see, it will trigger the warning in free_area_init_node().
>> Could you try the following patch? It will reset the pgdat before reuse it.
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1778628..0717649 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>                          return NULL;
>>
>>                  arch_refresh_nodedata(nid, pgdat);
>> +       } else {
>> +               /* Reset the pgdat to reuse */
>> +               memset(pgdat, 0, sizeof(*pgdat));
>>          }
>
> Hi Gu,
>
> If schedule last a long time, next_zone may be still access the pgdat here,
> so it is not safe enough, right?
>

How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than
memset() ?

It seems breaking pointer information in pgdat is not a choice.
Just proper "values" should be reset.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  8:53             ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 32+ messages in thread
From: Kamezawa Hiroyuki @ 2015-03-04  8:53 UTC (permalink / raw)
  To: Xishi Qiu, Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan

On 2015/03/04 17:03, Xishi Qiu wrote:
> On 2015/3/4 11:56, Gu Zheng wrote:
>
>> Hi Xishi,
>> On 03/04/2015 10:52 AM, Xishi Qiu wrote:
>>
>>> On 2015/3/4 10:22, Xishi Qiu wrote:
>>>
>>>> On 2015/3/3 18:20, Gu Zheng wrote:
>>>>
>>>>> Hi Xishi,
>>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>>>
>>>>>> When hot-remove a numa node, we will clear pgdat,
>>>>>> but is memset 0 safe in try_offline_node()?
>>>>>
>>>>> It is not safe here. In fact, this is a temporary solution here.
>>>>> As you know, pgdat is accessed lock-less now, so protection
>>>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
>>>>> but it seems a bit over-kill.
>>>>>
>>>
>>> Hi Gu,
>>>
>>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
>>> I find this will be fine in the stress test except the warning
>>> when hot-add memory.
>>
>> As you see, it will trigger the warning in free_area_init_node().
>> Could you try the following patch? It will reset the pgdat before reuse it.
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1778628..0717649 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>                          return NULL;
>>
>>                  arch_refresh_nodedata(nid, pgdat);
>> +       } else {
>> +               /* Reset the pgdat to reuse */
>> +               memset(pgdat, 0, sizeof(*pgdat));
>>          }
>
> Hi Gu,
>
> If schedule last a long time, next_zone may be still access the pgdat here,
> so it is not safe enough, right?
>

How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than
memset() ?

It seems breaking pointer information in pgdat is not a choice.
Just proper "values" should be reset.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-04  8:53             ` Kamezawa Hiroyuki
@ 2015-03-04  9:53               ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-04  9:53 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan, Taku Izumi

On 03/04/2015 04:53 PM, Kamezawa Hiroyuki wrote:

> On 2015/03/04 17:03, Xishi Qiu wrote:
>> On 2015/3/4 11:56, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/04/2015 10:52 AM, Xishi Qiu wrote:
>>>
>>>> On 2015/3/4 10:22, Xishi Qiu wrote:
>>>>
>>>>> On 2015/3/3 18:20, Gu Zheng wrote:
>>>>>
>>>>>> Hi Xishi,
>>>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>>>>
>>>>>>> When hot-remove a numa node, we will clear pgdat,
>>>>>>> but is memset 0 safe in try_offline_node()?
>>>>>>
>>>>>> It is not safe here. In fact, this is a temporary solution here.
>>>>>> As you know, pgdat is accessed lock-less now, so protection
>>>>>> mechanism (RCU?) is needed to make it completely safe here,
>>>>>> but it seems a bit over-kill.
>>>>>>
>>>>
>>>> Hi Gu,
>>>>
>>>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
>>>> I find this will be fine in the stress test except the warning
>>>> when hot-add memory.
>>>
>>> As you see, it will trigger the warning in free_area_init_node().
>>> Could you try the following patch? It will reset the pgdat before reuse it.
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 1778628..0717649 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>>                          return NULL;
>>>
>>>                  arch_refresh_nodedata(nid, pgdat);
>>> +       } else {
>>> +               /* Reset the pgdat to reuse */
>>> +               memset(pgdat, 0, sizeof(*pgdat));
>>>          }
>>
>> Hi Gu,
>>
>> If schedule last a long time, next_zone may be still access the pgdat here,
>> so it is not safe enough, right?


Hi Xishi,

IMO, the scheduled time is rather short if compares with the time gap
between hot remove and hot re-add a node, so we can say it is safe here.

>>
> 
> How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than
> memset() ?
> 
> It seems breaking pointer information in pgdat is not a choice.
> Just proper "values" should be reset.

Anyway, sounds reasonable.

Best regards,
Gu

> 
> Thanks,
> -Kame
> 
> 
> 
> .
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-04  9:53               ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-04  9:53 UTC (permalink / raw)
  To: Kamezawa Hiroyuki, Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo, Xiexiuqi,
	Hanjun Guo, Li Zefan, Taku Izumi

On 03/04/2015 04:53 PM, Kamezawa Hiroyuki wrote:

> On 2015/03/04 17:03, Xishi Qiu wrote:
>> On 2015/3/4 11:56, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/04/2015 10:52 AM, Xishi Qiu wrote:
>>>
>>>> On 2015/3/4 10:22, Xishi Qiu wrote:
>>>>
>>>>> On 2015/3/3 18:20, Gu Zheng wrote:
>>>>>
>>>>>> Hi Xishi,
>>>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>>>>
>>>>>>> When hot-remove a numa node, we will clear pgdat,
>>>>>>> but is memset 0 safe in try_offline_node()?
>>>>>>
>>>>>> It is not safe here. In fact, this is a temporary solution here.
>>>>>> As you know, pgdat is accessed lock-less now, so protection
>>>>>> mechanism (RCU?) is needed to make it completely safe here,
>>>>>> but it seems a bit over-kill.
>>>>>>
>>>>
>>>> Hi Gu,
>>>>
>>>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ?
>>>> I find this will be fine in the stress test except the warning
>>>> when hot-add memory.
>>>
>>> As you see, it will trigger the warning in free_area_init_node().
>>> Could you try the following patch? It will reset the pgdat before reuse it.
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 1778628..0717649 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>>                          return NULL;
>>>
>>>                  arch_refresh_nodedata(nid, pgdat);
>>> +       } else {
>>> +               /* Reset the pgdat to reuse */
>>> +               memset(pgdat, 0, sizeof(*pgdat));
>>>          }
>>
>> Hi Gu,
>>
>> If schedule last a long time, next_zone may be still access the pgdat here,
>> so it is not safe enough, right?


Hi Xishi,

IMO, the scheduled time is rather short if compares with the time gap
between hot remove and hot re-add a node, so we can say it is safe here.

>>
> 
> How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than
> memset() ?
> 
> It seems breaking pointer information in pgdat is not a choice.
> Just proper "values" should be reset.

Anyway, sounds reasonable.

Best regards,
Gu

> 
> Thanks,
> -Kame
> 
> 
> 
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-03  3:30 ` Xishi Qiu
@ 2015-03-05  8:26   ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-05  8:26 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo

Hi Xishi,
Could you please try the following one?
It postpones the reset of obsolete pgdat from try_offline_node() to
hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
pgdat->classzone_idx to be 0 rather than the whole reset by memset()
as Kame suggested.

Regards,
Gu

---
 mm/memory_hotplug.c |   13 ++++---------
 1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1778628..c17eebf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 			return NULL;
 
 		arch_refresh_nodedata(nid, pgdat);
+	} else {
+		/* Reset the nr_zones and classzone_idx to 0 before reuse */
+		pgdat->nr_zones = 0;
+		pgdat->classzone_idx = 0;
 	}
 
 	/* we can use NODE_DATA(nid) from here */
@@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
 
 	/* notify that the node is down */
 	call_node_notify(NODE_DOWN, (void *)(long)nid);
-
-	/*
-	 * Since there is no way to guarentee the address of pgdat/zone is not
-	 * on stack of any kernel threads or used by other kernel objects
-	 * without reference counting or other symchronizing method, do not
-	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
-	 * the memory when the node is online again.
-	 */
-	memset(pgdat, 0, sizeof(*pgdat));
 }
 EXPORT_SYMBOL(try_offline_node);
 
-- 
1.7.7



On 03/03/2015 11:30 AM, Xishi Qiu wrote:

> When hot-remove a numa node, we will clear pgdat,
> but is memset 0 safe in try_offline_node()?
> 
> process A:			offline node XX:
> for_each_populated_zone()
> find online node XX
> cond_resched()
> 				offline cpu and memory, then try_offline_node()
> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
> access node XX's pgdat
> NULL pointer access error
> 
> Thanks,
> Xishi Qiu
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-05  8:26   ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-05  8:26 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo

Hi Xishi,
Could you please try the following one?
It postpones the reset of obsolete pgdat from try_offline_node() to
hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
pgdat->classzone_idx to be 0 rather than the whole reset by memset()
as Kame suggested.

Regards,
Gu

---
 mm/memory_hotplug.c |   13 ++++---------
 1 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1778628..c17eebf 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
 			return NULL;
 
 		arch_refresh_nodedata(nid, pgdat);
+	} else {
+		/* Reset the nr_zones and classzone_idx to 0 before reuse */
+		pgdat->nr_zones = 0;
+		pgdat->classzone_idx = 0;
 	}
 
 	/* we can use NODE_DATA(nid) from here */
@@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
 
 	/* notify that the node is down */
 	call_node_notify(NODE_DOWN, (void *)(long)nid);
-
-	/*
-	 * Since there is no way to guarentee the address of pgdat/zone is not
-	 * on stack of any kernel threads or used by other kernel objects
-	 * without reference counting or other symchronizing method, do not
-	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
-	 * the memory when the node is online again.
-	 */
-	memset(pgdat, 0, sizeof(*pgdat));
 }
 EXPORT_SYMBOL(try_offline_node);
 
-- 
1.7.7



On 03/03/2015 11:30 AM, Xishi Qiu wrote:

> When hot-remove a numa node, we will clear pgdat,
> but is memset 0 safe in try_offline_node()?
> 
> process A:			offline node XX:
> for_each_populated_zone()
> find online node XX
> cond_resched()
> 				offline cpu and memory, then try_offline_node()
> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
> access node XX's pgdat
> NULL pointer access error
> 
> Thanks,
> Xishi Qiu
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-05  8:26   ` Gu Zheng
@ 2015-03-05  9:39     ` Xishi Qiu
  -1 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-05  9:39 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

On 2015/3/5 16:26, Gu Zheng wrote:

> Hi Xishi,
> Could you please try the following one?
> It postpones the reset of obsolete pgdat from try_offline_node() to
> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
> as Kame suggested.
> 
> Regards,
> Gu
> 
> ---
>  mm/memory_hotplug.c |   13 ++++---------
>  1 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1778628..c17eebf 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>  			return NULL;
>  
>  		arch_refresh_nodedata(nid, pgdat);
> +	} else {
> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
> +		pgdat->nr_zones = 0;
> +		pgdat->classzone_idx = 0;

Hi Gu,

This is just to avoid the warning, I think it's no meaning.
Here is the changlog from the original patch:

commit 88fdf75d1bb51d85ba00c466391770056d44bc03
    ...
    Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
    when it is allocated.  Arch code and memory hotplug already initiailize
    pg_data_t.  So this warning should never happen.  I select fields *randomly*
    near the beginning, middle and end of pg_data_t for checking.
    ...

Thanks,
Xishi Qiu

>  	}
>  
>  	/* we can use NODE_DATA(nid) from here */
> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>  
>  	/* notify that the node is down */
>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
> -
> -	/*
> -	 * Since there is no way to guarentee the address of pgdat/zone is not
> -	 * on stack of any kernel threads or used by other kernel objects
> -	 * without reference counting or other symchronizing method, do not
> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
> -	 * the memory when the node is online again.
> -	 */
> -	memset(pgdat, 0, sizeof(*pgdat));
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-05  9:39     ` Xishi Qiu
  0 siblings, 0 replies; 32+ messages in thread
From: Xishi Qiu @ 2015-03-05  9:39 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

On 2015/3/5 16:26, Gu Zheng wrote:

> Hi Xishi,
> Could you please try the following one?
> It postpones the reset of obsolete pgdat from try_offline_node() to
> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
> as Kame suggested.
> 
> Regards,
> Gu
> 
> ---
>  mm/memory_hotplug.c |   13 ++++---------
>  1 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1778628..c17eebf 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>  			return NULL;
>  
>  		arch_refresh_nodedata(nid, pgdat);
> +	} else {
> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
> +		pgdat->nr_zones = 0;
> +		pgdat->classzone_idx = 0;

Hi Gu,

This is just to avoid the warning, I think it's no meaning.
Here is the changlog from the original patch:

commit 88fdf75d1bb51d85ba00c466391770056d44bc03
    ...
    Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
    when it is allocated.  Arch code and memory hotplug already initiailize
    pg_data_t.  So this warning should never happen.  I select fields *randomly*
    near the beginning, middle and end of pg_data_t for checking.
    ...

Thanks,
Xishi Qiu

>  	}
>  
>  	/* we can use NODE_DATA(nid) from here */
> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>  
>  	/* notify that the node is down */
>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
> -
> -	/*
> -	 * Since there is no way to guarentee the address of pgdat/zone is not
> -	 * on stack of any kernel threads or used by other kernel objects
> -	 * without reference counting or other symchronizing method, do not
> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
> -	 * the memory when the node is online again.
> -	 */
> -	memset(pgdat, 0, sizeof(*pgdat));
>  }
>  EXPORT_SYMBOL(try_offline_node);
>  



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-05  9:39     ` Xishi Qiu
@ 2015-03-05  9:45       ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-05  9:45 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

Hi Xishi,

On 03/05/2015 05:39 PM, Xishi Qiu wrote:

> On 2015/3/5 16:26, Gu Zheng wrote:
> 
>> Hi Xishi,
>> Could you please try the following one?
>> It postpones the reset of obsolete pgdat from try_offline_node() to
>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
>> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
>> as Kame suggested.
>>
>> Regards,
>> Gu
>>
>> ---
>>  mm/memory_hotplug.c |   13 ++++---------
>>  1 files changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1778628..c17eebf 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>  			return NULL;
>>  
>>  		arch_refresh_nodedata(nid, pgdat);
>> +	} else {
>> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
>> +		pgdat->nr_zones = 0;
>> +		pgdat->classzone_idx = 0;
> 
> Hi Gu,
> 
> This is just to avoid the warning, I think it's no meaning.

Can not agree.
The key point here is postponing the reset of obsolete pgdat to the time we
want to reuse it to avoid the effect(Oops: 0000 as you mentioned), and avoiding
warning is the minor benefit, though it is also important.

> Here is the changlog from the original patch:
> 
> commit 88fdf75d1bb51d85ba00c466391770056d44bc03
>     ...
>     Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
>     when it is allocated.  Arch code and memory hotplug already initiailize
>     pg_data_t.  So this warning should never happen.  I select fields *randomly*
>     near the beginning, middle and end of pg_data_t for checking.
>     ...

There was not hot remove node that time, so it seems did not consider the *reuse*
case, but anyway, we should not break it here.

Regards,
Gu

> 
> Thanks,
> Xishi Qiu
> 
>>  	}
>>  
>>  	/* we can use NODE_DATA(nid) from here */
>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>>  
>>  	/* notify that the node is down */
>>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
>> -
>> -	/*
>> -	 * Since there is no way to guarentee the address of pgdat/zone is not
>> -	 * on stack of any kernel threads or used by other kernel objects
>> -	 * without reference counting or other symchronizing method, do not
>> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
>> -	 * the memory when the node is online again.
>> -	 */
>> -	memset(pgdat, 0, sizeof(*pgdat));
>>  }
>>  EXPORT_SYMBOL(try_offline_node);
>>  
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> .
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-05  9:45       ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-05  9:45 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

Hi Xishi,

On 03/05/2015 05:39 PM, Xishi Qiu wrote:

> On 2015/3/5 16:26, Gu Zheng wrote:
> 
>> Hi Xishi,
>> Could you please try the following one?
>> It postpones the reset of obsolete pgdat from try_offline_node() to
>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
>> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
>> as Kame suggested.
>>
>> Regards,
>> Gu
>>
>> ---
>>  mm/memory_hotplug.c |   13 ++++---------
>>  1 files changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1778628..c17eebf 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>  			return NULL;
>>  
>>  		arch_refresh_nodedata(nid, pgdat);
>> +	} else {
>> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
>> +		pgdat->nr_zones = 0;
>> +		pgdat->classzone_idx = 0;
> 
> Hi Gu,
> 
> This is just to avoid the warning, I think it's no meaning.

Can not agree.
The key point here is postponing the reset of obsolete pgdat to the time we
want to reuse it to avoid the effect(Oops: 0000 as you mentioned), and avoiding
warning is the minor benefit, though it is also important.

> Here is the changlog from the original patch:
> 
> commit 88fdf75d1bb51d85ba00c466391770056d44bc03
>     ...
>     Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
>     when it is allocated.  Arch code and memory hotplug already initiailize
>     pg_data_t.  So this warning should never happen.  I select fields *randomly*
>     near the beginning, middle and end of pg_data_t for checking.
>     ...

There was not hot remove node that time, so it seems did not consider the *reuse*
case, but anyway, we should not break it here.

Regards,
Gu

> 
> Thanks,
> Xishi Qiu
> 
>>  	}
>>  
>>  	/* we can use NODE_DATA(nid) from here */
>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>>  
>>  	/* notify that the node is down */
>>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
>> -
>> -	/*
>> -	 * Since there is no way to guarentee the address of pgdat/zone is not
>> -	 * on stack of any kernel threads or used by other kernel objects
>> -	 * without reference counting or other symchronizing method, do not
>> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
>> -	 * the memory when the node is online again.
>> -	 */
>> -	memset(pgdat, 0, sizeof(*pgdat));
>>  }
>>  EXPORT_SYMBOL(try_offline_node);
>>  
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-05  9:39     ` Xishi Qiu
@ 2015-03-11  1:12       ` Gu Zheng
  -1 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-11  1:12 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

Hi Xishi,

What is the condition of this problem now?

Regards,
Gu
On 03/05/2015 05:39 PM, Xishi Qiu wrote:

> On 2015/3/5 16:26, Gu Zheng wrote:
> 
>> Hi Xishi,
>> Could you please try the following one?
>> It postpones the reset of obsolete pgdat from try_offline_node() to
>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
>> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
>> as Kame suggested.
>>
>> Regards,
>> Gu
>>
>> ---
>>  mm/memory_hotplug.c |   13 ++++---------
>>  1 files changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1778628..c17eebf 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>  			return NULL;
>>  
>>  		arch_refresh_nodedata(nid, pgdat);
>> +	} else {
>> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
>> +		pgdat->nr_zones = 0;
>> +		pgdat->classzone_idx = 0;
> 
> Hi Gu,
> 
> This is just to avoid the warning, I think it's no meaning.
> Here is the changlog from the original patch:
> 
> commit 88fdf75d1bb51d85ba00c466391770056d44bc03
>     ...
>     Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
>     when it is allocated.  Arch code and memory hotplug already initiailize
>     pg_data_t.  So this warning should never happen.  I select fields *randomly*
>     near the beginning, middle and end of pg_data_t for checking.
>     ...
> 
> Thanks,
> Xishi Qiu
> 
>>  	}
>>  
>>  	/* we can use NODE_DATA(nid) from here */
>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>>  
>>  	/* notify that the node is down */
>>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
>> -
>> -	/*
>> -	 * Since there is no way to guarentee the address of pgdat/zone is not
>> -	 * on stack of any kernel threads or used by other kernel objects
>> -	 * without reference counting or other symchronizing method, do not
>> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
>> -	 * the memory when the node is online again.
>> -	 */
>> -	memset(pgdat, 0, sizeof(*pgdat));
>>  }
>>  EXPORT_SYMBOL(try_offline_node);
>>  
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> .
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-11  1:12       ` Gu Zheng
  0 siblings, 0 replies; 32+ messages in thread
From: Gu Zheng @ 2015-03-11  1:12 UTC (permalink / raw)
  To: Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

Hi Xishi,

What is the condition of this problem now?

Regards,
Gu
On 03/05/2015 05:39 PM, Xishi Qiu wrote:

> On 2015/3/5 16:26, Gu Zheng wrote:
> 
>> Hi Xishi,
>> Could you please try the following one?
>> It postpones the reset of obsolete pgdat from try_offline_node() to
>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
>> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
>> as Kame suggested.
>>
>> Regards,
>> Gu
>>
>> ---
>>  mm/memory_hotplug.c |   13 ++++---------
>>  1 files changed, 4 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 1778628..c17eebf 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>  			return NULL;
>>  
>>  		arch_refresh_nodedata(nid, pgdat);
>> +	} else {
>> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
>> +		pgdat->nr_zones = 0;
>> +		pgdat->classzone_idx = 0;
> 
> Hi Gu,
> 
> This is just to avoid the warning, I think it's no meaning.
> Here is the changlog from the original patch:
> 
> commit 88fdf75d1bb51d85ba00c466391770056d44bc03
>     ...
>     Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
>     when it is allocated.  Arch code and memory hotplug already initiailize
>     pg_data_t.  So this warning should never happen.  I select fields *randomly*
>     near the beginning, middle and end of pg_data_t for checking.
>     ...
> 
> Thanks,
> Xishi Qiu
> 
>>  	}
>>  
>>  	/* we can use NODE_DATA(nid) from here */
>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>>  
>>  	/* notify that the node is down */
>>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
>> -
>> -	/*
>> -	 * Since there is no way to guarentee the address of pgdat/zone is not
>> -	 * on stack of any kernel threads or used by other kernel objects
>> -	 * without reference counting or other symchronizing method, do not
>> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
>> -	 * the memory when the node is online again.
>> -	 */
>> -	memset(pgdat, 0, sizeof(*pgdat));
>>  }
>>  EXPORT_SYMBOL(try_offline_node);
>>  
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
  2015-03-11  1:12       ` Gu Zheng
@ 2015-03-11  2:51         ` Xie XiuQi
  -1 siblings, 0 replies; 32+ messages in thread
From: Xie XiuQi @ 2015-03-11  2:51 UTC (permalink / raw)
  To: Gu Zheng, Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

On 2015/3/11 9:12, Gu Zheng wrote:
> Hi Xishi,
> 
> What is the condition of this problem now?

Hi Gu,

I have no machine to do this test now. But I've tested the
patch "just remove memset 0" more than 20 hours last week,
it's OK.

Thanks,
	Xie XiuQi

> 
> Regards,
> Gu
> On 03/05/2015 05:39 PM, Xishi Qiu wrote:
> 
>> On 2015/3/5 16:26, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> Could you please try the following one?
>>> It postpones the reset of obsolete pgdat from try_offline_node() to
>>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
>>> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
>>> as Kame suggested.
>>>
>>> Regards,
>>> Gu
>>>
>>> ---
>>>  mm/memory_hotplug.c |   13 ++++---------
>>>  1 files changed, 4 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 1778628..c17eebf 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>>  			return NULL;
>>>  
>>>  		arch_refresh_nodedata(nid, pgdat);
>>> +	} else {
>>> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
>>> +		pgdat->nr_zones = 0;
>>> +		pgdat->classzone_idx = 0;
>>
>> Hi Gu,
>>
>> This is just to avoid the warning, I think it's no meaning.
>> Here is the changlog from the original patch:
>>
>> commit 88fdf75d1bb51d85ba00c466391770056d44bc03
>>     ...
>>     Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
>>     when it is allocated.  Arch code and memory hotplug already initiailize
>>     pg_data_t.  So this warning should never happen.  I select fields *randomly*
>>     near the beginning, middle and end of pg_data_t for checking.
>>     ...
>>
>> Thanks,
>> Xishi Qiu
>>
>>>  	}
>>>  
>>>  	/* we can use NODE_DATA(nid) from here */
>>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>>>  
>>>  	/* notify that the node is down */
>>>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
>>> -
>>> -	/*
>>> -	 * Since there is no way to guarentee the address of pgdat/zone is not
>>> -	 * on stack of any kernel threads or used by other kernel objects
>>> -	 * without reference counting or other symchronizing method, do not
>>> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
>>> -	 * the memory when the node is online again.
>>> -	 */
>>> -	memset(pgdat, 0, sizeof(*pgdat));
>>>  }
>>>  EXPORT_SYMBOL(try_offline_node);
>>>  
>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> .
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: node-hotplug: is memset 0 safe in try_offline_node()?
@ 2015-03-11  2:51         ` Xie XiuQi
  0 siblings, 0 replies; 32+ messages in thread
From: Xie XiuQi @ 2015-03-11  2:51 UTC (permalink / raw)
  To: Gu Zheng, Xishi Qiu
  Cc: Yasuaki Ishimatsu, Andrew Morton, Tang Chen, Yinghai Lu,
	Linux MM, LKML, Toshi Kani, Mel Gorman, Tejun Heo,
	Kamezawa Hiroyuki

On 2015/3/11 9:12, Gu Zheng wrote:
> Hi Xishi,
> 
> What is the condition of this problem now?

Hi Gu,

I have no machine to do this test now. But I've tested the
patch "just remove memset 0" more than 20 hours last week,
it's OK.

Thanks,
	Xie XiuQi

> 
> Regards,
> Gu
> On 03/05/2015 05:39 PM, Xishi Qiu wrote:
> 
>> On 2015/3/5 16:26, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> Could you please try the following one?
>>> It postpones the reset of obsolete pgdat from try_offline_node() to
>>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and
>>> pgdat->classzone_idx to be 0 rather than the whole reset by memset()
>>> as Kame suggested.
>>>
>>> Regards,
>>> Gu
>>>
>>> ---
>>>  mm/memory_hotplug.c |   13 ++++---------
>>>  1 files changed, 4 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 1778628..c17eebf 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
>>>  			return NULL;
>>>  
>>>  		arch_refresh_nodedata(nid, pgdat);
>>> +	} else {
>>> +		/* Reset the nr_zones and classzone_idx to 0 before reuse */
>>> +		pgdat->nr_zones = 0;
>>> +		pgdat->classzone_idx = 0;
>>
>> Hi Gu,
>>
>> This is just to avoid the warning, I think it's no meaning.
>> Here is the changlog from the original patch:
>>
>> commit 88fdf75d1bb51d85ba00c466391770056d44bc03
>>     ...
>>     Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero
>>     when it is allocated.  Arch code and memory hotplug already initiailize
>>     pg_data_t.  So this warning should never happen.  I select fields *randomly*
>>     near the beginning, middle and end of pg_data_t for checking.
>>     ...
>>
>> Thanks,
>> Xishi Qiu
>>
>>>  	}
>>>  
>>>  	/* we can use NODE_DATA(nid) from here */
>>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid)
>>>  
>>>  	/* notify that the node is down */
>>>  	call_node_notify(NODE_DOWN, (void *)(long)nid);
>>> -
>>> -	/*
>>> -	 * Since there is no way to guarentee the address of pgdat/zone is not
>>> -	 * on stack of any kernel threads or used by other kernel objects
>>> -	 * without reference counting or other symchronizing method, do not
>>> -	 * reset node_data and free pgdat here. Just reset it to 0 and reuse
>>> -	 * the memory when the node is online again.
>>> -	 */
>>> -	memset(pgdat, 0, sizeof(*pgdat));
>>>  }
>>>  EXPORT_SYMBOL(try_offline_node);
>>>  
>>
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> .
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2015-03-11  2:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-03  3:30 node-hotplug: is memset 0 safe in try_offline_node()? Xishi Qiu
2015-03-03  3:30 ` Xishi Qiu
2015-03-03 10:20 ` Gu Zheng
2015-03-03 10:20   ` Gu Zheng
2015-03-04  2:22   ` Xishi Qiu
2015-03-04  2:22     ` Xishi Qiu
2015-03-04  2:52     ` Xishi Qiu
2015-03-04  2:52       ` Xishi Qiu
2015-03-04  3:56       ` Gu Zheng
2015-03-04  3:56         ` Gu Zheng
2015-03-04  8:03         ` Xishi Qiu
2015-03-04  8:03           ` Xishi Qiu
2015-03-04  8:53           ` Kamezawa Hiroyuki
2015-03-04  8:53             ` Kamezawa Hiroyuki
2015-03-04  9:53             ` Gu Zheng
2015-03-04  9:53               ` Gu Zheng
2015-03-04  3:53     ` Gu Zheng
2015-03-04  3:53       ` Gu Zheng
2015-03-04  7:01       ` Xishi Qiu
2015-03-04  7:01         ` Xishi Qiu
2015-03-04  8:31       ` Xie XiuQi
2015-03-04  8:31         ` Xie XiuQi
2015-03-05  8:26 ` Gu Zheng
2015-03-05  8:26   ` Gu Zheng
2015-03-05  9:39   ` Xishi Qiu
2015-03-05  9:39     ` Xishi Qiu
2015-03-05  9:45     ` Gu Zheng
2015-03-05  9:45       ` Gu Zheng
2015-03-11  1:12     ` Gu Zheng
2015-03-11  1:12       ` Gu Zheng
2015-03-11  2:51       ` Xie XiuQi
2015-03-11  2:51         ` Xie XiuQi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.