* [PATCH 0/2] Fix node meminfo corruption. @ 2014-10-31 9:46 Tang Chen 2014-10-31 9:46 ` [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat Tang Chen ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Tang Chen @ 2014-10-31 9:46 UTC (permalink / raw) To: akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, tangchen There are two problems when calculating node meminfo: 1. When hot-adding a node without onlining any page, MemTotal corrupted. # hot-add node2 (memory not onlined) # cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB /* corrupted */ Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB ...... 2. When onlining memory on node2, MemFree of node3 corrupted. # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done # cat /sys/devices/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 33549092 kB Node 2 MemUsed: 5340 kB ...... # cat /sys/devices/system/node/node3/meminfo Node 3 MemTotal: 0 kB Node 3 MemFree: 248 kB /* corrupted */ Node 3 MemUsed: 0 kB ...... This patch-set fixes them. Tang Chen (2): mem-hotplug: Reset node managed pages when hot-adding a new pgdat. mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages(). include/linux/bootmem.h | 1 + include/linux/mm.h | 1 + mm/bootmem.c | 9 +++++---- mm/memory_hotplug.c | 15 ++++++++++++++- mm/nobootmem.c | 8 +++++--- mm/page_alloc.c | 5 +++++ 6 files changed, 31 insertions(+), 8 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat. 2014-10-31 9:46 [PATCH 0/2] Fix node meminfo corruption Tang Chen @ 2014-10-31 9:46 ` Tang Chen 2014-11-04 1:50 ` Xishi Qiu 2014-10-31 9:46 ` [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages() Tang Chen 2014-11-04 1:10 ` [PATCH 0/2] Fix node meminfo corruption Tang Chen 2 siblings, 1 reply; 9+ messages in thread From: Tang Chen @ 2014-10-31 9:46 UTC (permalink / raw) To: akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, tangchen, Yasuaki Ishimatsu In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value. hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> --- include/linux/bootmem.h | 1 + mm/bootmem.c | 9 +++++---- mm/memory_hotplug.c | 9 +++++++++ mm/nobootmem.c | 8 +++++--- 4 files changed, 20 insertions(+), 7 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index 4e2bd4c..0995c2d 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -46,6 +46,7 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat, extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); extern unsigned long free_all_bootmem(void); +extern void reset_node_managed_pages(pg_data_t *pgdat); extern void reset_all_zones_managed_pages(void); extern void free_bootmem_node(pg_data_t *pgdat, diff --git a/mm/bootmem.c b/mm/bootmem.c index 8a000ce..477be69 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -243,13 +243,10 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) static int reset_managed_pages_done __initdata; -static inline void __init reset_node_managed_pages(pg_data_t *pgdat) +void reset_node_managed_pages(pg_data_t *pgdat) { struct zone *z; - if (reset_managed_pages_done) - return; - for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) z->managed_pages = 0; } @@ -258,8 +255,12 @@ void __init reset_all_zones_managed_pages(void) { struct pglist_data *pgdat; + if (reset_managed_pages_done) + return; + for_each_online_pgdat(pgdat) reset_node_managed_pages(pgdat); + reset_managed_pages_done = 1; } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 29d8693..3ab01b2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -31,6 +31,7 @@ #include <linux/stop_machine.h> #include <linux/hugetlb.h> #include <linux/memblock.h> +#include <linux/bootmem.h> #include <asm/tlbflush.h> @@ -1096,6 +1097,14 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) build_all_zonelists(pgdat, NULL); mutex_unlock(&zonelists_mutex); + /* + * zone->managed_pages is set to an approximate value in + * free_area_init_core(), which will cause + * /sys/device/system/node/nodeX/meminfo has wrong data. + * So reset it to 0 before any memory is onlined. + */ + reset_node_managed_pages(pgdat); + return pgdat; } diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 7c7ab32..90b5046 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -145,12 +145,10 @@ static unsigned long __init free_low_memory_core_early(void) static int reset_managed_pages_done __initdata; -static inline void __init reset_node_managed_pages(pg_data_t *pgdat) +void reset_node_managed_pages(pg_data_t *pgdat) { struct zone *z; - if (reset_managed_pages_done) - return; for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) z->managed_pages = 0; } @@ -159,8 +157,12 @@ void __init reset_all_zones_managed_pages(void) { struct pglist_data *pgdat; + if (reset_managed_pages_done) + return; + for_each_online_pgdat(pgdat) reset_node_managed_pages(pgdat); + reset_managed_pages_done = 1; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat. 2014-10-31 9:46 ` [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat Tang Chen @ 2014-11-04 1:50 ` Xishi Qiu 0 siblings, 0 replies; 9+ messages in thread From: Xishi Qiu @ 2014-11-04 1:50 UTC (permalink / raw) To: Tang Chen Cc: akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, mgorman, rientjes, hannes, linux-mm, linux-kernel, Yasuaki Ishimatsu On 2014/10/31 17:46, Tang Chen wrote: > In free_area_init_core(), zone->managed_pages is set to an approximate > value for lowmem, and will be adjusted when the bootmem allocator frees > pages into the buddy system. But free_area_init_core() is also called > by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages > of the newly added node's pgdat is set to an approximate value in the > very beginning. Even if the memory on that node has node been onlined, > /sys/device/system/node/nodeXXX/meminfo has wrong value. > > hot-add node2 (memory not onlined) > cat /sys/device/system/node/node2/meminfo > Node 2 MemTotal: 33554432 kB > Node 2 MemFree: 0 kB > Node 2 MemUsed: 33554432 kB > Node 2 Active: 0 kB > > This patch fixes this problem by reset node managed pages to 0 after hot-adding > a new node. > > 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() > 2. Make reset_node_managed_pages() non-static > 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized > > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> > Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> > --- > include/linux/bootmem.h | 1 + > mm/bootmem.c | 9 +++++---- > mm/memory_hotplug.c | 9 +++++++++ > mm/nobootmem.c | 8 +++++--- > 4 files changed, 20 insertions(+), 7 deletions(-) > > diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h > index 4e2bd4c..0995c2d 100644 > --- a/include/linux/bootmem.h > +++ b/include/linux/bootmem.h > @@ -46,6 +46,7 @@ extern unsigned long init_bootmem_node(pg_data_t *pgdat, > extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); > > extern unsigned long free_all_bootmem(void); > +extern void reset_node_managed_pages(pg_data_t *pgdat); > extern void reset_all_zones_managed_pages(void); > > extern void free_bootmem_node(pg_data_t *pgdat, > diff --git a/mm/bootmem.c b/mm/bootmem.c > index 8a000ce..477be69 100644 > --- a/mm/bootmem.c > +++ b/mm/bootmem.c > @@ -243,13 +243,10 @@ static unsigned long __init free_all_bootmem_core(bootmem_data_t *bdata) > > static int reset_managed_pages_done __initdata; > > -static inline void __init reset_node_managed_pages(pg_data_t *pgdat) > +void reset_node_managed_pages(pg_data_t *pgdat) > { > struct zone *z; > > - if (reset_managed_pages_done) > - return; > - > for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) > z->managed_pages = 0; > } > @@ -258,8 +255,12 @@ void __init reset_all_zones_managed_pages(void) > { > struct pglist_data *pgdat; > > + if (reset_managed_pages_done) > + return; > + > for_each_online_pgdat(pgdat) > reset_node_managed_pages(pgdat); > + > reset_managed_pages_done = 1; > } > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 29d8693..3ab01b2 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -31,6 +31,7 @@ > #include <linux/stop_machine.h> > #include <linux/hugetlb.h> > #include <linux/memblock.h> > +#include <linux/bootmem.h> > > #include <asm/tlbflush.h> > > @@ -1096,6 +1097,14 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > build_all_zonelists(pgdat, NULL); > mutex_unlock(&zonelists_mutex); > > + /* > + * zone->managed_pages is set to an approximate value in > + * free_area_init_core(), which will cause > + * /sys/device/system/node/nodeX/meminfo has wrong data. > + * So reset it to 0 before any memory is onlined. > + */ > + reset_node_managed_pages(pgdat); > + I aggree with you, we should reset it before online pages. Thanks, Xishi Qiu > return pgdat; > } > > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index 7c7ab32..90b5046 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -145,12 +145,10 @@ static unsigned long __init free_low_memory_core_early(void) > > static int reset_managed_pages_done __initdata; > > -static inline void __init reset_node_managed_pages(pg_data_t *pgdat) > +void reset_node_managed_pages(pg_data_t *pgdat) > { > struct zone *z; > > - if (reset_managed_pages_done) > - return; > for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) > z->managed_pages = 0; > } > @@ -159,8 +157,12 @@ void __init reset_all_zones_managed_pages(void) > { > struct pglist_data *pgdat; > > + if (reset_managed_pages_done) > + return; > + > for_each_online_pgdat(pgdat) > reset_node_managed_pages(pgdat); > + > reset_managed_pages_done = 1; > } > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages(). 2014-10-31 9:46 [PATCH 0/2] Fix node meminfo corruption Tang Chen 2014-10-31 9:46 ` [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat Tang Chen @ 2014-10-31 9:46 ` Tang Chen 2014-11-04 8:10 ` Yasuaki Ishimatsu 2014-11-05 1:01 ` Kamezawa Hiroyuki 2014-11-04 1:10 ` [PATCH 0/2] Fix node meminfo corruption Tang Chen 2 siblings, 2 replies; 9+ messages in thread From: Tang Chen @ 2014-10-31 9:46 UTC (permalink / raw) To: akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, tangchen, Gu Zheng When we are doing memory hot-add, the following functions are called: add_memory() |--> hotadd_new_pgdat() |--> free_area_init_node() |--> free_area_init_core() |--> zone->present_pages = realsize; /* 1. zone is populated */ |--> zone_pcp_init() |--> zone->pageset = &boot_pageset; /* 2. zone->pageset is set to boot_pageset */ There are two problems here: 1. Zones could be populated before any memory is onlined. 2. All the zones on a newly added node have the same pageset pointing to boot_pageset. The above two problems will result in the following problem: When we online memory on one node, e.g node2, the following code is executed: online_pages() { ...... if (!populated_zone(zone)) { need_zonelists_rebuild = 1; build_all_zonelists(NULL, zone); } ...... } Because of problem 1, the zone has been populated, and the build_all_zonelists() will never called. zone->pageset won't be updated. Because of problem 2, All the zones on a newly added node have the same pageset pointing to boot_pageset. And as a result, when we online memory on node2, node3's meminfo will corrupt. Pages on node2 may be freed to node3. # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done # cat /sys/devices/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 33549092 kB Node 2 MemUsed: 5340 kB ...... # cat /sys/devices/system/node/node3/meminfo Node 3 MemTotal: 0 kB Node 3 MemFree: 248 kB /* corrupted */ Node 3 MemUsed: 0 kB ...... We have to populate some zones before onlining memory, otherwise no memory could be onlined. So when onlining pages, we should also check if zone->pageset is pointing to boot_pageset. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> --- include/linux/mm.h | 1 + mm/memory_hotplug.c | 6 +++++- mm/page_alloc.c | 5 +++++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 02d11ee..83e6505 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...); extern void setup_per_cpu_pageset(void); +extern bool zone_pcp_initialized(struct zone *zone); extern void zone_pcp_update(struct zone *zone); extern void zone_pcp_reset(struct zone *zone); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 3ab01b2..bc0de0f 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ * If this zone is not populated, then it is not in zonelist. * This means the page allocator ignores this zone. * So, zonelist must be updated after online. + * + * If this zone is populated, zone->pageset could be initialized + * to boot_pageset for the first time a node is added. If so, + * zone->pageset should be allocated. */ mutex_lock(&zonelists_mutex); - if (!populated_zone(zone)) { + if (!populated_zone(zone) || !zone_pcp_initialized(zone)) { need_zonelists_rebuild = 1; build_all_zonelists(NULL, zone); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 736d8e1..4ff1540 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone) } #endif +bool zone_pcp_initialized(struct zone *zone) +{ + return (zone->pageset != &boot_pageset); +} + void zone_pcp_reset(struct zone *zone) { unsigned long flags; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages(). 2014-10-31 9:46 ` [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages() Tang Chen @ 2014-11-04 8:10 ` Yasuaki Ishimatsu 2014-11-04 8:46 ` Tang Chen 2014-11-05 1:01 ` Kamezawa Hiroyuki 1 sibling, 1 reply; 9+ messages in thread From: Yasuaki Ishimatsu @ 2014-11-04 8:10 UTC (permalink / raw) To: Tang Chen, akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, Gu Zheng (2014/10/31 18:46), Tang Chen wrote: > When we are doing memory hot-add, the following functions are called: > > add_memory() > |--> hotadd_new_pgdat() > |--> free_area_init_node() > |--> free_area_init_core() > |--> zone->present_pages = realsize; /* 1. zone is populated */ > |--> zone_pcp_init() > |--> zone->pageset = &boot_pageset; /* 2. zone->pageset is set to boot_pageset */ > > There are two problems here: > 1. Zones could be populated before any memory is onlined. > 2. All the zones on a newly added node have the same pageset pointing to boot_pageset. > > The above two problems will result in the following problem: > When we online memory on one node, e.g node2, the following code is executed: > > online_pages() > { > ...... > if (!populated_zone(zone)) { > need_zonelists_rebuild = 1; > build_all_zonelists(NULL, zone); > } > ...... > } > > Because of problem 1, the zone has been populated, and the build_all_zonelists() > will never called. zone->pageset won't be updated. > Because of problem 2, All the zones on a newly added node have the same pageset > pointing to boot_pageset. > And as a result, when we online memory on node2, node3's meminfo will corrupt. > Pages on node2 may be freed to node3. > > # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done > # cat /sys/devices/system/node/node2/meminfo > Node 2 MemTotal: 33554432 kB > Node 2 MemFree: 33549092 kB > Node 2 MemUsed: 5340 kB > ...... > # cat /sys/devices/system/node/node3/meminfo > Node 3 MemTotal: 0 kB > Node 3 MemFree: 248 kB /* corrupted */ > Node 3 MemUsed: 0 kB > ...... > > We have to populate some zones before onlining memory, otherwise no memory could be onlined. > So when onlining pages, we should also check if zone->pageset is pointing to boot_pageset. > > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> > --- > include/linux/mm.h | 1 + > mm/memory_hotplug.c | 6 +++++- > mm/page_alloc.c | 5 +++++ > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 02d11ee..83e6505 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...); > > extern void setup_per_cpu_pageset(void); > > +extern bool zone_pcp_initialized(struct zone *zone); > extern void zone_pcp_update(struct zone *zone); > extern void zone_pcp_reset(struct zone *zone); > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 3ab01b2..bc0de0f 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ > * If this zone is not populated, then it is not in zonelist. > * This means the page allocator ignores this zone. > * So, zonelist must be updated after online. > + * > + * If this zone is populated, zone->pageset could be initialized > + * to boot_pageset for the first time a node is added. If so, > + * zone->pageset should be allocated. > */ > mutex_lock(&zonelists_mutex); > - if (!populated_zone(zone)) { > + if (!populated_zone(zone) || !zone_pcp_initialized(zone)) { > need_zonelists_rebuild = 1; > build_all_zonelists(NULL, zone); > } Why does zone->present_pages of the hot-added memroy have valid value? In my understading, the present_pages is incremented/decremented by memory online/offline. So when hot adding memory, the zone->present_pages of the memory should be 0. Thanks, Yasuaki Ishimatsu > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 736d8e1..4ff1540 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone) > } > #endif > > +bool zone_pcp_initialized(struct zone *zone) > +{ > + return (zone->pageset != &boot_pageset); > +} > + > void zone_pcp_reset(struct zone *zone) > { > unsigned long flags; > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages(). 2014-11-04 8:10 ` Yasuaki Ishimatsu @ 2014-11-04 8:46 ` Tang Chen 0 siblings, 0 replies; 9+ messages in thread From: Tang Chen @ 2014-11-04 8:46 UTC (permalink / raw) To: Yasuaki Ishimatsu, akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, Gu Zheng, tangchen On 11/04/2014 04:10 PM, Yasuaki Ishimatsu wrote: > (2014/10/31 18:46), Tang Chen wrote: >> When we are doing memory hot-add, the following functions are called: >> >> add_memory() >> |--> hotadd_new_pgdat() >> |--> free_area_init_node() >> |--> free_area_init_core() >> |--> zone->present_pages = realsize; /* 1. zone is populated */ >> |--> zone_pcp_init() >> |--> zone->pageset = &boot_pageset; /* 2. zone->pageset is set to boot_pageset */ >> >> There are two problems here: >> 1. Zones could be populated before any memory is onlined. >> 2. All the zones on a newly added node have the same pageset pointing to boot_pageset. >> >> The above two problems will result in the following problem: >> When we online memory on one node, e.g node2, the following code is executed: >> >> online_pages() >> { >> ...... >> if (!populated_zone(zone)) { >> need_zonelists_rebuild = 1; >> build_all_zonelists(NULL, zone); >> } >> ...... >> } >> >> Because of problem 1, the zone has been populated, and the build_all_zonelists() >> will never called. zone->pageset won't be updated. >> Because of problem 2, All the zones on a newly added node have the same pageset >> pointing to boot_pageset. >> And as a result, when we online memory on node2, node3's meminfo will corrupt. >> Pages on node2 may be freed to node3. >> >> # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done >> # cat /sys/devices/system/node/node2/meminfo >> Node 2 MemTotal: 33554432 kB >> Node 2 MemFree: 33549092 kB >> Node 2 MemUsed: 5340 kB >> ...... >> # cat /sys/devices/system/node/node3/meminfo >> Node 3 MemTotal: 0 kB >> Node 3 MemFree: 248 kB /* corrupted */ >> Node 3 MemUsed: 0 kB >> ...... >> >> We have to populate some zones before onlining memory, otherwise no memory could be onlined. >> So when onlining pages, we should also check if zone->pageset is pointing to boot_pageset. >> >> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> >> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> >> --- >> include/linux/mm.h | 1 + >> mm/memory_hotplug.c | 6 +++++- >> mm/page_alloc.c | 5 +++++ >> 3 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index 02d11ee..83e6505 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...); >> >> extern void setup_per_cpu_pageset(void); >> >> +extern bool zone_pcp_initialized(struct zone *zone); >> extern void zone_pcp_update(struct zone *zone); >> extern void zone_pcp_reset(struct zone *zone); >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 3ab01b2..bc0de0f 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ >> * If this zone is not populated, then it is not in zonelist. >> * This means the page allocator ignores this zone. >> * So, zonelist must be updated after online. >> + * >> + * If this zone is populated, zone->pageset could be initialized >> + * to boot_pageset for the first time a node is added. If so, >> + * zone->pageset should be allocated. >> */ >> mutex_lock(&zonelists_mutex); >> - if (!populated_zone(zone)) { >> + if (!populated_zone(zone) || !zone_pcp_initialized(zone)) { >> need_zonelists_rebuild = 1; >> build_all_zonelists(NULL, zone); >> } > Why does zone->present_pages of the hot-added memroy have valid value? > In my understading, the present_pages is incremented/decremented by memory > online/offline. So when hot adding memory, the zone->present_pages of the > memory should be 0. Before zone->managed_pages was introduced, zone->present_pages had been abused. It had two meaning: 1. pages existing in the zone 2. pages managed by buddy system in the zone So, zone->managed_pages was introduced. Now, it looks like: 1. zone->present_pages is the pages existing in the zone 2. zone->managed_pages is the pages managed by buddy system So when hot adding memory, the zone->present_pages should not be 0. zone->managed_pages should be 0. And here, since zone->managed_pages is updated in online_pages_range() callback, I think we should remove zone->present_pages and node_present_pages updates from online_pages(). Furthermore, since we can online pages or online_movable pages, zones in a node could be changed. So we should update zone->present_pages when pfn is moved left or right. Thanks. > > Thanks, > Yasuaki Ishimatsu > > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 736d8e1..4ff1540 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone) >> } >> #endif >> >> +bool zone_pcp_initialized(struct zone *zone) >> +{ >> + return (zone->pageset != &boot_pageset); >> +} >> + >> void zone_pcp_reset(struct zone *zone) >> { >> unsigned long flags; >> > > . > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages(). 2014-10-31 9:46 ` [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages() Tang Chen 2014-11-04 8:10 ` Yasuaki Ishimatsu @ 2014-11-05 1:01 ` Kamezawa Hiroyuki 2014-11-05 2:17 ` Tang Chen 1 sibling, 1 reply; 9+ messages in thread From: Kamezawa Hiroyuki @ 2014-11-05 1:01 UTC (permalink / raw) To: Tang Chen, akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, Gu Zheng (2014/10/31 18:46), Tang Chen wrote: > When we are doing memory hot-add, the following functions are called: > > add_memory() > |--> hotadd_new_pgdat() > |--> free_area_init_node() > |--> free_area_init_core() > |--> zone->present_pages = realsize; /* 1. zone is populated */ > |--> zone_pcp_init() > |--> zone->pageset = &boot_pageset; /* 2. zone->pageset is set to boot_pageset */ > > There are two problems here: > 1. Zones could be populated before any memory is onlined. > 2. All the zones on a newly added node have the same pageset pointing to boot_pageset. > > The above two problems will result in the following problem: > When we online memory on one node, e.g node2, the following code is executed: > > online_pages() > { > ...... > if (!populated_zone(zone)) { > need_zonelists_rebuild = 1; > build_all_zonelists(NULL, zone); > } > ...... > } > > Because of problem 1, the zone has been populated, and the build_all_zonelists() > will never called. zone->pageset won't be updated. > Because of problem 2, All the zones on a newly added node have the same pageset > pointing to boot_pageset. > And as a result, when we online memory on node2, node3's meminfo will corrupt. > Pages on node2 may be freed to node3. > > # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done > # cat /sys/devices/system/node/node2/meminfo > Node 2 MemTotal: 33554432 kB > Node 2 MemFree: 33549092 kB > Node 2 MemUsed: 5340 kB > ...... > # cat /sys/devices/system/node/node3/meminfo > Node 3 MemTotal: 0 kB > Node 3 MemFree: 248 kB /* corrupted */ > Node 3 MemUsed: 0 kB > ...... > > We have to populate some zones before onlining memory, otherwise no memory could be onlined. > So when onlining pages, we should also check if zone->pageset is pointing to boot_pageset. > > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> > Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> > --- > include/linux/mm.h | 1 + > mm/memory_hotplug.c | 6 +++++- > mm/page_alloc.c | 5 +++++ > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 02d11ee..83e6505 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1732,6 +1732,7 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...); > > extern void setup_per_cpu_pageset(void); > > +extern bool zone_pcp_initialized(struct zone *zone); > extern void zone_pcp_update(struct zone *zone); > extern void zone_pcp_reset(struct zone *zone); > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 3ab01b2..bc0de0f 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ > * If this zone is not populated, then it is not in zonelist. > * This means the page allocator ignores this zone. > * So, zonelist must be updated after online. > + * > + * If this zone is populated, zone->pageset could be initialized > + * to boot_pageset for the first time a node is added. If so, > + * zone->pageset should be allocated. > */ > mutex_lock(&zonelists_mutex); > - if (!populated_zone(zone)) { > + if (!populated_zone(zone) || !zone_pcp_initialized(zone)) { Please don't add another strange meanings to zone's pcplist. If you say zone->present_pages doesn't mean zone has pages in buddy list any more, please rewrite all parts using zone->present_pages including populated_zone(). I think there are several parts calculates parameters based on present_pages. I myself doesn't welcome having both of compliated "managed pages" and "present pages"... How about adding static inline int managed_zone(struct zone *zone) { return (!!zone->managed_pages); } for this bug fix ? Other parts, using present_pages, should be considered well. Thanks, -Kame Thanks, -Kame > need_zonelists_rebuild = 1; > build_all_zonelists(NULL, zone); > } > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 736d8e1..4ff1540 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6456,6 +6456,11 @@ void __meminit zone_pcp_update(struct zone *zone) > } > #endif > > +bool zone_pcp_initialized(struct zone *zone) > +{ > + return (zone->pageset != &boot_pageset); > +} > + > void zone_pcp_reset(struct zone *zone) > { > unsigned long flags; > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages(). 2014-11-05 1:01 ` Kamezawa Hiroyuki @ 2014-11-05 2:17 ` Tang Chen 0 siblings, 0 replies; 9+ messages in thread From: Tang Chen @ 2014-11-05 2:17 UTC (permalink / raw) To: Kamezawa Hiroyuki, akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, Gu Zheng, jiang.liu On 11/05/2014 09:01 AM, Kamezawa Hiroyuki wrote: > ...... > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 3ab01b2..bc0de0f 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1013,9 +1013,13 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ > * If this zone is not populated, then it is not in zonelist. > * This means the page allocator ignores this zone. > * So, zonelist must be updated after online. > + * > + * If this zone is populated, zone->pageset could be initialized > + * to boot_pageset for the first time a node is added. If so, > + * zone->pageset should be allocated. > */ > mutex_lock(&zonelists_mutex); > - if (!populated_zone(zone)) { > + if (!populated_zone(zone) || !zone_pcp_initialized(zone)) { > Please don't add another strange meanings to zone's pcplist. > > If you say zone->present_pages doesn't mean zone has pages in buddy list any more, > please rewrite all parts using zone->present_pages including populated_zone(). Adding Liu Jiang... I think zone->managed_pages was introduced by Liu Jiang in the following patch: >From 9feedc9d831e18ae6d0d15aa562e5e46ba53647b Mon Sep 17 00:00:00 2001 From: Jiang Liu <liuj97@gmail.com> Date: Wed, 12 Dec 2012 13:52:12 -0800 Subject: [PATCH 1/1] mm: introduce new field "managed_pages" to struct zone Before this patch, zone->present_pages meant "amount of memory (excluding holes)", not the zone has pages in buddy system. So I think zone->present_pages keeps its meaning as before. But it may be abused somewhere else, such as here. > I think there are several parts calculates parameters based on present_pages. > > I myself doesn't welcome having both of compliated "managed pages" and "present pages"... > > How about adding > > static inline int managed_zone(struct zone *zone) > { > return (!!zone->managed_pages); > } > > for this bug fix ? Yes, adding this function could fix this problem. And by the way, we have the following code in onine_pages(): zone->present_pages += onlined_pages; pgdat_resize_lock(zone->zone_pgdat, &flags); zone->zone_pgdat->node_present_pages += onlined_pages; pgdat_resize_unlock(zone->zone_pgdat, &flags); We should do this when the zone size is changed, not where it is now. Will send patches for this soon. Thanks. > > Other parts, using present_pages, should be considered well. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Fix node meminfo corruption. 2014-10-31 9:46 [PATCH 0/2] Fix node meminfo corruption Tang Chen 2014-10-31 9:46 ` [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat Tang Chen 2014-10-31 9:46 ` [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages() Tang Chen @ 2014-11-04 1:10 ` Tang Chen 2 siblings, 0 replies; 9+ messages in thread From: Tang Chen @ 2014-11-04 1:10 UTC (permalink / raw) To: akpm, santosh.shilimkar, grygorii.strashko, yinghai, isimatu.yasuaki, fabf, nzimmer, wangnan0, vdavydov, toshi.kani, phacht, tj, kirill.shutemov, riel, luto, hpa, aarcange, qiuxishi, mgorman, rientjes, hannes Cc: linux-mm, linux-kernel, tangchen Hi all, I thinks these two problems are very important and should be merged ASAP. Would you please help to have a look at it ? Thanks. On 10/31/2014 05:46 PM, Tang Chen wrote: > There are two problems when calculating node meminfo: > > 1. When hot-adding a node without onlining any page, MemTotal corrupted. > > # hot-add node2 (memory not onlined) > # cat /sys/device/system/node/node2/meminfo > Node 2 MemTotal: 33554432 kB /* corrupted */ > Node 2 MemFree: 0 kB > Node 2 MemUsed: 33554432 kB > Node 2 Active: 0 kB > ...... > > > 2. When onlining memory on node2, MemFree of node3 corrupted. > > # for ((i = 2048; i < 2064; i++)); do echo online_movable > /sys/devices/system/node/node2/memory$i/state; done > # cat /sys/devices/system/node/node2/meminfo > Node 2 MemTotal: 33554432 kB > Node 2 MemFree: 33549092 kB > Node 2 MemUsed: 5340 kB > ...... > # cat /sys/devices/system/node/node3/meminfo > Node 3 MemTotal: 0 kB > Node 3 MemFree: 248 kB /* corrupted */ > Node 3 MemUsed: 0 kB > ...... > > This patch-set fixes them. > > Tang Chen (2): > mem-hotplug: Reset node managed pages when hot-adding a new pgdat. > mem-hotplug: Fix wrong check for zone->pageset initialization in > online_pages(). > > include/linux/bootmem.h | 1 + > include/linux/mm.h | 1 + > mm/bootmem.c | 9 +++++---- > mm/memory_hotplug.c | 15 ++++++++++++++- > mm/nobootmem.c | 8 +++++--- > mm/page_alloc.c | 5 +++++ > 6 files changed, 31 insertions(+), 8 deletions(-) > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-11-05 2:18 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-10-31 9:46 [PATCH 0/2] Fix node meminfo corruption Tang Chen 2014-10-31 9:46 ` [PATCH 1/2] mem-hotplug: Reset node managed pages when hot-adding a new pgdat Tang Chen 2014-11-04 1:50 ` Xishi Qiu 2014-10-31 9:46 ` [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages() Tang Chen 2014-11-04 8:10 ` Yasuaki Ishimatsu 2014-11-04 8:46 ` Tang Chen 2014-11-05 1:01 ` Kamezawa Hiroyuki 2014-11-05 2:17 ` Tang Chen 2014-11-04 1:10 ` [PATCH 0/2] Fix node meminfo corruption Tang Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).