* [PATCHv2 1/3] mm/numa: change the topo of build_zonelist_xx()
2018-12-20 9:50 [PATCHv2 0/3] mm: bugfix for NULL reference in mm on all archs Pingfan Liu
@ 2018-12-20 9:50 ` Pingfan Liu
2018-12-21 7:56 ` kbuild test robot
2018-12-20 9:50 ` [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node Pingfan Liu
2018-12-20 9:50 ` [PATCHv2 3/3] powerpc/numa: make all possible node be instanced against NULL reference in node_zonelist() Pingfan Liu
2 siblings, 1 reply; 9+ messages in thread
From: Pingfan Liu @ 2018-12-20 9:50 UTC (permalink / raw)
To: linux-mm
Cc: Michal Hocko, H. Peter Anvin, Ingo Molnar, x86, linux-kernel,
Pingfan Liu, Paul Mackerras, Mike Rapoport, Borislav Petkov,
Jonathan Cameron, Bjorn Helgaas, David Rientjes, Andrew Morton,
linuxppc-dev, Thomas Gleixner, Vlastimil Babka
The current build_zonelist_xx func relies on pgdat instance to build
zonelist, if a numa node is offline, there will no pgdat instance for it.
But in some case, there is still requirement for zonelist of offline node,
especially with nr_cpus option.
This patch change these funcs topo to ease the building of zonelist for
offline nodes.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
---
mm/page_alloc.c | 44 +++++++++++++++++++++-----------------------
1 file changed, 21 insertions(+), 23 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ec9cc4..17dbf6e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5049,7 +5049,7 @@ static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
*
* Add all populated zones of a node to the zonelist.
*/
-static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
+static int build_zonerefs_node(int nid, struct zoneref *zonerefs)
{
struct zone *zone;
enum zone_type zone_type = MAX_NR_ZONES;
@@ -5057,7 +5057,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
do {
zone_type--;
- zone = pgdat->node_zones + zone_type;
+ zone = NODE_DATA(nid)->node_zones + zone_type;
if (managed_zone(zone)) {
zoneref_set_zone(zone, &zonerefs[nr_zones++]);
check_highest_zone(zone_type);
@@ -5186,20 +5186,20 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
* This results in maximum locality--normal zone overflows into local
* DMA zone, if any--but risks exhausting DMA zone.
*/
-static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order,
- unsigned nr_nodes)
+static void build_zonelists_in_node_order(struct zonelist *node_zonelists,
+ int *node_order, unsigned int nr_nodes)
{
struct zoneref *zonerefs;
int i;
- zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs;
+ zonerefs = node_zonelists[ZONELIST_FALLBACK]._zonerefs;
for (i = 0; i < nr_nodes; i++) {
int nr_zones;
pg_data_t *node = NODE_DATA(node_order[i]);
- nr_zones = build_zonerefs_node(node, zonerefs);
+ nr_zones = build_zonerefs_node(node->node_id, zonerefs);
zonerefs += nr_zones;
}
zonerefs->zone = NULL;
@@ -5209,13 +5209,14 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int *node_order,
/*
* Build gfp_thisnode zonelists
*/
-static void build_thisnode_zonelists(pg_data_t *pgdat)
+static void build_thisnode_zonelists(struct zonelist *node_zonelists,
+ int nid)
{
struct zoneref *zonerefs;
int nr_zones;
- zonerefs = pgdat->node_zonelists[ZONELIST_NOFALLBACK]._zonerefs;
- nr_zones = build_zonerefs_node(pgdat, zonerefs);
+ zonerefs = node_zonelists[ZONELIST_NOFALLBACK]._zonerefs;
+ nr_zones = build_zonerefs_node(nid, zonerefs);
zonerefs += nr_zones;
zonerefs->zone = NULL;
zonerefs->zone_idx = 0;
@@ -5228,15 +5229,14 @@ static void build_thisnode_zonelists(pg_data_t *pgdat)
* may still exist in local DMA zone.
*/
-static void build_zonelists(pg_data_t *pgdat)
+static void build_zonelists(struct zonelist *node_zonelists, int local_node)
{
static int node_order[MAX_NUMNODES];
int node, load, nr_nodes = 0;
nodemask_t used_mask;
- int local_node, prev_node;
+ int prev_node;
/* NUMA-aware ordering of nodes */
- local_node = pgdat->node_id;
load = nr_online_nodes;
prev_node = local_node;
nodes_clear(used_mask);
@@ -5257,8 +5257,8 @@ static void build_zonelists(pg_data_t *pgdat)
load--;
}
- build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
- build_thisnode_zonelists(pgdat);
+ build_zonelists_in_node_order(node_zonelists, node_order, nr_nodes);
+ build_thisnode_zonelists(node_zonelists, local_node);
}
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
@@ -5283,16 +5283,14 @@ static void setup_min_unmapped_ratio(void);
static void setup_min_slab_ratio(void);
#else /* CONFIG_NUMA */
-static void build_zonelists(pg_data_t *pgdat)
+static void build_zonelists(struct zonelist *node_zonelists, int local_node)
{
int node, local_node;
struct zoneref *zonerefs;
int nr_zones;
- local_node = pgdat->node_id;
-
- zonerefs = pgdat->node_zonelists[ZONELIST_FALLBACK]._zonerefs;
- nr_zones = build_zonerefs_node(pgdat, zonerefs);
+ zonerefs = node_zonelists[ZONELIST_FALLBACK]._zonerefs;
+ nr_zones = build_zonerefs_node(local_node, zonerefs);
zonerefs += nr_zones;
/*
@@ -5306,13 +5304,13 @@ static void build_zonelists(pg_data_t *pgdat)
for (node = local_node + 1; node < MAX_NUMNODES; node++) {
if (!node_online(node))
continue;
- nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
+ nr_zones = build_zonerefs_node(node, zonerefs);
zonerefs += nr_zones;
}
for (node = 0; node < local_node; node++) {
if (!node_online(node))
continue;
- nr_zones = build_zonerefs_node(NODE_DATA(node), zonerefs);
+ nr_zones = build_zonerefs_node(node, zonerefs);
zonerefs += nr_zones;
}
@@ -5359,12 +5357,12 @@ static void __build_all_zonelists(void *data)
* building zonelists is fine - no need to touch other nodes.
*/
if (self && !node_online(self->node_id)) {
- build_zonelists(self);
+ build_zonelists(self->node_zonelists, self->node_id);
} else {
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
- build_zonelists(pgdat);
+ build_zonelists(pgdat->node_zonelists, pgdat->node_id);
}
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node
2018-12-20 9:50 [PATCHv2 0/3] mm: bugfix for NULL reference in mm on all archs Pingfan Liu
2018-12-20 9:50 ` [PATCHv2 1/3] mm/numa: change the topo of build_zonelist_xx() Pingfan Liu
@ 2018-12-20 9:50 ` Pingfan Liu
2018-12-20 11:35 ` Michal Hocko
2018-12-20 9:50 ` [PATCHv2 3/3] powerpc/numa: make all possible node be instanced against NULL reference in node_zonelist() Pingfan Liu
2 siblings, 1 reply; 9+ messages in thread
From: Pingfan Liu @ 2018-12-20 9:50 UTC (permalink / raw)
To: linux-mm
Cc: Michal Hocko, H. Peter Anvin, Ingo Molnar, x86, linux-kernel,
Pingfan Liu, Paul Mackerras, Mike Rapoport, Borislav Petkov,
Jonathan Cameron, Bjorn Helgaas, David Rientjes, Andrew Morton,
linuxppc-dev, Thomas Gleixner, Vlastimil Babka
I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. It is due to
some pgdat is not instanced when specifying nr_cpus, e.g, on x86, not
initialized by init_cpu_to_node()->init_memory_less_node(). But
device->numa_node info is used as preferred_nid param for
__alloc_pages_nodemask(), which causes NULL reference
ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
Although this bug is detected on x86, it should affect all archs, where
a machine with a numa-node having no memory, if nr_cpus prevents the
instance of the node, and the device on the node tries to allocate memory
with device->numa_node info.
There are two alternative methods to fix the bug.
-1. Make all possible numa nodes be instanced. This should be done for all
archs
-2. Using zonelist instead of pgdat when encountering un-instanced node,
and only do this when needed.
This patch adopts the 2nd method, uses possible_zonelist[] to mirror
node_zonelists[], and tries to build zonelist for the offline node when needed.
Notes about the crashing info:
-1. kexec -l with nr_cpus=4
-2. system info
NUMA node0 CPU(s): 0,8,16,24
NUMA node1 CPU(s): 2,10,18,26
NUMA node2 CPU(s): 4,12,20,28
NUMA node3 CPU(s): 6,14,22,30
NUMA node4 CPU(s): 1,9,17,25
NUMA node5 CPU(s): 3,11,19,27
NUMA node6 CPU(s): 5,13,21,29
NUMA node7 CPU(s): 7,15,23,31
-3. panic stack
[...]
[ 5.721547] atomic64_test: passed for x86-64 platform with CX8 and with SSE
[ 5.729187] pcieport 0000:00:01.1: Signaling PME with IRQ 34
[ 5.735187] pcieport 0000:00:01.2: Signaling PME with IRQ 35
[ 5.741168] pcieport 0000:00:01.3: Signaling PME with IRQ 36
[ 5.747189] pcieport 0000:00:07.1: Signaling PME with IRQ 37
[ 5.754061] pcieport 0000:00:08.1: Signaling PME with IRQ 39
[ 5.760727] pcieport 0000:20:07.1: Signaling PME with IRQ 40
[ 5.766955] pcieport 0000:20:08.1: Signaling PME with IRQ 42
[ 5.772742] BUG: unable to handle kernel paging request at 0000000000002088
[ 5.773618] PGD 0 P4D 0
[ 5.773618] Oops: 0000 [#1] SMP NOPTI
[ 5.773618] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1+ #3
[ 5.773618] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.4.3 06/29/2018
[ 5.773618] RIP: 0010:__alloc_pages_nodemask+0xe2/0x2a0
[ 5.773618] Code: 00 00 44 89 ea 80 ca 80 41 83 f8 01 44 0f 44 ea 89 da c1 ea 08 83 e2 01 88 54 24 20 48 8b 54 24 08 48 85 d2 0f 85 46 01 00 00 <3b> 77 08 0f 82 3d 01 00 00 48 89 f8 44 89 ea 48 89
e1 44 89 e6 89
[ 5.773618] RSP: 0018:ffffaa600005fb20 EFLAGS: 00010246
[ 5.773618] RAX: 0000000000000000 RBX: 00000000006012c0 RCX: 0000000000000000
[ 5.773618] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080
[ 5.773618] RBP: 00000000006012c0 R08: 0000000000000000 R09: 0000000000000002
[ 5.773618] R10: 00000000006080c0 R11: 0000000000000002 R12: 0000000000000000
[ 5.773618] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000002
[ 5.773618] FS: 0000000000000000(0000) GS:ffff8c69afe00000(0000) knlGS:0000000000000000
[ 5.773618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.773618] CR2: 0000000000002088 CR3: 000000087e00a000 CR4: 00000000003406e0
[ 5.773618] Call Trace:
[ 5.773618] new_slab+0xa9/0x570
[ 5.773618] ___slab_alloc+0x375/0x540
[ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0
[ 5.773618] __slab_alloc+0x1c/0x38
[ 5.773618] __kmalloc_node_track_caller+0xc8/0x270
[ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0
[ 5.773618] devm_kmalloc+0x28/0x60
[ 5.773618] pinctrl_bind_pins+0x2b/0x2a0
[ 5.773618] really_probe+0x73/0x420
[ 5.773618] driver_probe_device+0x115/0x130
[ 5.773618] __driver_attach+0x103/0x110
[ 5.773618] ? driver_probe_device+0x130/0x130
[ 5.773618] bus_for_each_dev+0x67/0xc0
[ 5.773618] ? klist_add_tail+0x3b/0x70
[ 5.773618] bus_add_driver+0x41/0x260
[ 5.773618] ? pcie_port_setup+0x4d/0x4d
[ 5.773618] driver_register+0x5b/0xe0
[ 5.773618] ? pcie_port_setup+0x4d/0x4d
[ 5.773618] do_one_initcall+0x4e/0x1d4
[ 5.773618] ? init_setup+0x25/0x28
[ 5.773618] kernel_init_freeable+0x1c1/0x26e
[ 5.773618] ? loglevel+0x5b/0x5b
[ 5.773618] ? rest_init+0xb0/0xb0
[ 5.773618] kernel_init+0xa/0x110
[ 5.773618] ret_from_fork+0x22/0x40
[ 5.773618] Modules linked in:
[ 5.773618] CR2: 0000000000002088
[ 5.773618] ---[ end trace 1030c9120a03d081 ]---
[...]
Other notes about the reproduction of this bug:
After appling the following patch:
'commit 0d76bcc960e6 ("Revert "ACPI/PCI: Pay attention to device-specific
_PXM node values"")'
This bug is covered and not triggered on my test AMD machine.
But it should still exist since dev->numa_node info can be set by other
method on other archs when using nr_cpus param
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
---
include/linux/gfp.h | 10 +++++++++-
mm/page_alloc.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 55 insertions(+), 7 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0705164..0ddf809 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -442,6 +442,9 @@ static inline int gfp_zonelist(gfp_t flags)
return ZONELIST_FALLBACK;
}
+extern struct zonelist *possible_zonelists[];
+extern int build_fallback_zonelists(int node);
+
/*
* We get the zone list from the current node and the gfp_mask.
* This zone list contains a maximum of MAXNODES*MAX_NR_ZONES zones.
@@ -453,7 +456,12 @@ static inline int gfp_zonelist(gfp_t flags)
*/
static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
{
- return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
+ if (unlikely(!possible_zonelists[nid])) {
+ WARN_ONCE(1, "alloc from offline node: %d\n", nid);
+ if (unlikely(build_fallback_zonelists(nid)))
+ nid = first_online_node;
+ }
+ return possible_zonelists[nid] + gfp_zonelist(flags);
}
#ifndef HAVE_ARCH_FREE_PAGE
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 17dbf6e..608b51d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -121,6 +121,8 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
};
EXPORT_SYMBOL(node_states);
+struct zonelist *possible_zonelists[MAX_NUMNODES] __read_mostly;
+
/* Protect totalram_pages and zone->managed_pages */
static DEFINE_SPINLOCK(managed_page_count_lock);
@@ -5180,7 +5182,6 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
return best_node;
}
-
/*
* Build zonelists ordered by node and zones within node.
* This results in maximum locality--normal zone overflows into local
@@ -5222,6 +5223,7 @@ static void build_thisnode_zonelists(struct zonelist *node_zonelists,
zonerefs->zone_idx = 0;
}
+
/*
* Build zonelists ordered by zone and nodes within zones.
* This results in conserving DMA zone[s] until all Normal memory is
@@ -5229,7 +5231,8 @@ static void build_thisnode_zonelists(struct zonelist *node_zonelists,
* may still exist in local DMA zone.
*/
-static void build_zonelists(struct zonelist *node_zonelists, int local_node)
+static void build_zonelists(struct zonelist *node_zonelists,
+ int local_node, bool exclude_self)
{
static int node_order[MAX_NUMNODES];
int node, load, nr_nodes = 0;
@@ -5240,6 +5243,8 @@ static void build_zonelists(struct zonelist *node_zonelists, int local_node)
load = nr_online_nodes;
prev_node = local_node;
nodes_clear(used_mask);
+ if (exclude_self)
+ node_set(local_node, used_mask);
memset(node_order, 0, sizeof(node_order));
while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
@@ -5258,7 +5263,40 @@ static void build_zonelists(struct zonelist *node_zonelists, int local_node)
}
build_zonelists_in_node_order(node_zonelists, node_order, nr_nodes);
- build_thisnode_zonelists(node_zonelists, local_node);
+ if (!exclude_self)
+ build_thisnode_zonelists(node_zonelists, local_node);
+ possible_zonelists[local_node] = node_zonelists;
+}
+
+/* this is rare case in which building zonelists for offline node, but
+ * there is dev used on it
+ */
+int build_fallback_zonelists(int node)
+{
+ static DEFINE_SPINLOCK(lock);
+ nodemask_t *used_mask;
+ struct zonelist *zl;
+ int ret = 0;
+
+ spin_lock(&lock);
+ if (unlikely(possible_zonelists[node] != NULL))
+ goto unlock;
+
+ used_mask = kmalloc(sizeof(nodemask_t), GFP_ATOMIC);
+ zl = kmalloc(sizeof(struct zonelist)*MAX_ZONELISTS, GFP_ATOMIC);
+ if (unlikely(!used_mask || !zl)) {
+ ret = -ENOMEM;
+ kfree(used_mask);
+ kfree(zl);
+ goto unlock;
+ }
+
+ __nodes_complement(used_mask, &node_online_map, MAX_NUMNODES);
+ build_zonelists(zl, node, true);
+ kfree(used_mask);
+unlock:
+ spin_unlock(&lock);
+ return ret;
}
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
@@ -5283,7 +5321,8 @@ static void setup_min_unmapped_ratio(void);
static void setup_min_slab_ratio(void);
#else /* CONFIG_NUMA */
-static void build_zonelists(struct zonelist *node_zonelists, int local_node)
+static void build_zonelists(struct zonelist *node_zonelists,
+ int local_node, bool _unused)
{
int node, local_node;
struct zoneref *zonerefs;
@@ -5357,12 +5396,13 @@ static void __build_all_zonelists(void *data)
* building zonelists is fine - no need to touch other nodes.
*/
if (self && !node_online(self->node_id)) {
- build_zonelists(self->node_zonelists, self->node_id);
+ build_zonelists(self->node_zonelists, self->node_id, false);
} else {
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
- build_zonelists(pgdat->node_zonelists, pgdat->node_id);
+ build_zonelists(pgdat->node_zonelists, pgdat->node_id,
+ false);
}
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCHv2 3/3] powerpc/numa: make all possible node be instanced against NULL reference in node_zonelist()
2018-12-20 9:50 [PATCHv2 0/3] mm: bugfix for NULL reference in mm on all archs Pingfan Liu
2018-12-20 9:50 ` [PATCHv2 1/3] mm/numa: change the topo of build_zonelist_xx() Pingfan Liu
2018-12-20 9:50 ` [PATCHv2 2/3] mm/numa: build zonelist when alloc for device on offline node Pingfan Liu
@ 2018-12-20 9:50 ` Pingfan Liu
2 siblings, 0 replies; 9+ messages in thread
From: Pingfan Liu @ 2018-12-20 9:50 UTC (permalink / raw)
To: linux-mm
Cc: Michal Hocko, H. Peter Anvin, Ingo Molnar, x86, linux-kernel,
Pingfan Liu, Paul Mackerras, Mike Rapoport, Borislav Petkov,
Jonathan Cameron, Bjorn Helgaas, David Rientjes, Andrew Morton,
linuxppc-dev, Thomas Gleixner, Vlastimil Babka
This patch tries to resolve a bug rooted at mm when using nr_cpus. It was
reported at [1]. The root cause is: device->numa_node info is used as
preferred_nid param for __alloc_pages_nodemask(), which causes NULL
reference when ac->zonelist = node_zonelist(preferred_nid, gfp_mask), due to
the preferred_nid is not online and not instanced. Hence the bug affects
all archs if a machine having a memory less numa-node, but a device on the
node is used and provide numa_node info to __alloc_pages_nodemask().
This patch makes all possible node online for ppc.
[1]: https://lore.kernel.org/patchwork/patch/1020838/
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
---
Note:
[1-2/3] implements one way to fix the bug, while this patch tries another way.
Hence using this patch when [1-2/3] is not acceptable.
arch/powerpc/mm/numa.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ce28ae5..31d81a4 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -864,10 +864,19 @@ void __init initmem_init(void)
memblock_dump_all();
- for_each_online_node(nid) {
+ /* Instance all possible nodes to overcome potential NULL reference
+ * issue on node_zonelist() when using nr_cpus
+ */
+ for_each_node(nid) {
unsigned long start_pfn, end_pfn;
- get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+ if (node_online(nid))
+ get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
+ else {
+ start_pfn = end_pfn = 0;
+ /* online it, so later zonelists[] will be built */
+ node_set_online(nid);
+ }
setup_node_data(nid, start_pfn, end_pfn);
sparse_memory_present_with_active_regions(nid);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread