* [PATCH 0/2] x86, numa: always initialize all possible nodes @ 2019-02-12 9:53 Michal Hocko 2019-02-12 9:53 ` [PATCH 1/2] " Michal Hocko ` (3 more replies) 0 siblings, 4 replies; 21+ messages in thread From: Michal Hocko @ 2019-02-12 9:53 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Ingo Molnar, linuxppc-dev Hi, this has been posted as an RFC previously [1]. There didn't seem to be any objections so I am reposting this for inclusion. I have added a debugging patch which prints the zonelist setup for each numa node for an easier debugging of a broken zonelist setup. [1] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/2] x86, numa: always initialize all possible nodes 2019-02-12 9:53 [PATCH 0/2] x86, numa: always initialize all possible nodes Michal Hocko @ 2019-02-12 9:53 ` Michal Hocko 2019-05-01 19:12 ` Barret Rhoden 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2019-02-12 9:53 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Michal Hocko, Ingo Molnar, linuxppc-dev From: Michal Hocko <mhocko@suse.com> Pingfan Liu has reported the following splat [ 5.772742] BUG: unable to handle kernel paging request at 0000000000002088 [ 5.773618] PGD 0 P4D 0 [ 5.773618] Oops: 0000 [#1] SMP NOPTI [ 5.773618] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1+ #3 [ 5.773618] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.4.3 06/29/2018 [ 5.773618] RIP: 0010:__alloc_pages_nodemask+0xe2/0x2a0 [ 5.773618] Code: 00 00 44 89 ea 80 ca 80 41 83 f8 01 44 0f 44 ea 89 da c1 ea 08 83 e2 01 88 54 24 20 48 8b 54 24 08 48 85 d2 0f 85 46 01 00 00 <3b> 77 08 0f 82 3d 01 00 00 48 89 f8 44 89 ea 48 89 e1 44 89 e6 89 [ 5.773618] RSP: 0018:ffffaa600005fb20 EFLAGS: 00010246 [ 5.773618] RAX: 0000000000000000 RBX: 00000000006012c0 RCX: 0000000000000000 [ 5.773618] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080 [ 5.773618] RBP: 00000000006012c0 R08: 0000000000000000 R09: 0000000000000002 [ 5.773618] R10: 00000000006080c0 R11: 0000000000000002 R12: 0000000000000000 [ 5.773618] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000002 [ 5.773618] FS: 0000000000000000(0000) GS:ffff8c69afe00000(0000) knlGS:0000000000000000 [ 5.773618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.773618] CR2: 0000000000002088 CR3: 000000087e00a000 CR4: 00000000003406e0 [ 5.773618] Call Trace: [ 5.773618] new_slab+0xa9/0x570 [ 5.773618] ___slab_alloc+0x375/0x540 [ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] __slab_alloc+0x1c/0x38 [ 5.773618] __kmalloc_node_track_caller+0xc8/0x270 [ 5.773618] ? pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] devm_kmalloc+0x28/0x60 [ 5.773618] pinctrl_bind_pins+0x2b/0x2a0 [ 5.773618] really_probe+0x73/0x420 [ 5.773618] driver_probe_device+0x115/0x130 [ 5.773618] __driver_attach+0x103/0x110 [ 5.773618] ? driver_probe_device+0x130/0x130 [ 5.773618] bus_for_each_dev+0x67/0xc0 [ 5.773618] ? klist_add_tail+0x3b/0x70 [ 5.773618] bus_add_driver+0x41/0x260 [ 5.773618] ? pcie_port_setup+0x4d/0x4d [ 5.773618] driver_register+0x5b/0xe0 [ 5.773618] ? pcie_port_setup+0x4d/0x4d [ 5.773618] do_one_initcall+0x4e/0x1d4 [ 5.773618] ? init_setup+0x25/0x28 [ 5.773618] kernel_init_freeable+0x1c1/0x26e [ 5.773618] ? loglevel+0x5b/0x5b [ 5.773618] ? rest_init+0xb0/0xb0 [ 5.773618] kernel_init+0xa/0x110 [ 5.773618] ret_from_fork+0x22/0x40 [ 5.773618] Modules linked in: [ 5.773618] CR2: 0000000000002088 [ 5.773618] ---[ end trace 1030c9120a03d081 ]--- with his AMD machine with the following topology NUMA node0 CPU(s): 0,8,16,24 NUMA node1 CPU(s): 2,10,18,26 NUMA node2 CPU(s): 4,12,20,28 NUMA node3 CPU(s): 6,14,22,30 NUMA node4 CPU(s): 1,9,17,25 NUMA node5 CPU(s): 3,11,19,27 NUMA node6 CPU(s): 5,13,21,29 NUMA node7 CPU(s): 7,15,23,31 [ 0.007418] Early memory node ranges [ 0.007419] node 1: [mem 0x0000000000001000-0x000000000008efff] [ 0.007420] node 1: [mem 0x0000000000090000-0x000000000009ffff] [ 0.007422] node 1: [mem 0x0000000000100000-0x000000005c3d6fff] [ 0.007422] node 1: [mem 0x00000000643df000-0x0000000068ff7fff] [ 0.007423] node 1: [mem 0x000000006c528000-0x000000006fffffff] [ 0.007424] node 1: [mem 0x0000000100000000-0x000000047fffffff] [ 0.007425] node 5: [mem 0x0000000480000000-0x000000087effffff] and nr_cpus set to 4. The underlying reason is tha the device is bound to node 2 which doesn't have any memory and init_cpu_to_node only initializes memory-less nodes for possible cpus which nr_cpus restrics. This in turn means that proper zonelists are not allocated and the page allocator blows up. Fix the issue by reworking how x86 initializes the memory less nodes. The current implementation is hacked into the workflow and it doesn't allow any flexibility. There is init_memory_less_node called for each offline node that has a CPU as already mentioned above. This will make sure that we will have a new online node without any memory. Much later on we build a zone list for this node and things seem to work, except they do not (e.g. due to nr_cpus). Not to mention that it doesn't really make much sense to consider an empty node as online because we just consider this node whenever we want to iterate nodes to use and empty node is obviously not the best candidate. This is all just too fragile. The new code relies on the arch specific initialization to allocate all possible NUMA nodes (including memory less) - numa_register_memblks in this case. Generic code then initializes both zonelists (__build_all_zonelists) and allocator internals (free_area_init_nodes) for all non-null pgdats rather than online ones. For the x86 specific part also do not make new node online in alloc_node_data because this is too early to know that. numa_register_memblks knows that a node has some memory so it can make the node online appropriately. init_memory_less_node hack can be safely removed altogether now. Reported-by: Pingfan Liu <kernelfans@gmail.com> Tested-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- arch/x86/mm/numa.c | 27 +++------------------------ mm/page_alloc.c | 15 +++++++++------ 2 files changed, 12 insertions(+), 30 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 1308f5408bf7..b3621ee4dfe8 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -216,8 +216,6 @@ static void __init alloc_node_data(int nid) node_data[nid] = nd; memset(NODE_DATA(nid), 0, sizeof(pg_data_t)); - - node_set_online(nid); } /** @@ -570,7 +568,7 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) return -EINVAL; /* Finally register nodes. */ - for_each_node_mask(nid, node_possible_map) { + for_each_node_mask(nid, numa_nodes_parsed) { u64 start = PFN_PHYS(max_pfn); u64 end = 0; @@ -581,9 +579,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) end = max(mi->blk[i].end, end); } - if (start >= end) - continue; - /* * Don't confuse VM with a node that doesn't have the * minimum amount of memory: @@ -592,6 +587,8 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) continue; alloc_node_data(nid); + if (end) + node_set_online(nid); } /* Dump memblock with node info and return. */ @@ -721,21 +718,6 @@ void __init x86_numa_init(void) numa_init(dummy_numa_init); } -static void __init init_memory_less_node(int nid) -{ - unsigned long zones_size[MAX_NR_ZONES] = {0}; - unsigned long zholes_size[MAX_NR_ZONES] = {0}; - - /* Allocate and initialize node data. Memory-less node is now online.*/ - alloc_node_data(nid); - free_area_init_node(nid, zones_size, 0, zholes_size); - - /* - * All zonelists will be built later in start_kernel() after per cpu - * areas are initialized. - */ -} - /* * Setup early cpu_to_node. * @@ -763,9 +745,6 @@ void __init init_cpu_to_node(void) if (node == NUMA_NO_NODE) continue; - if (!node_online(node)) - init_memory_less_node(node); - numa_set_node(cpu, node); } } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2ec9cc407216..2e097f336126 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5361,10 +5361,11 @@ static void __build_all_zonelists(void *data) if (self && !node_online(self->node_id)) { build_zonelists(self); } else { - for_each_online_node(nid) { + for_each_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); - build_zonelists(pgdat); + if (pgdat) + build_zonelists(pgdat); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES @@ -6644,10 +6645,8 @@ static unsigned long __init find_min_pfn_for_node(int nid) for_each_mem_pfn_range(i, nid, &start_pfn, NULL, NULL) min_pfn = min(min_pfn, start_pfn); - if (min_pfn == ULONG_MAX) { - pr_warn("Could not find start_pfn for node %d\n", nid); + if (min_pfn == ULONG_MAX) return 0; - } return min_pfn; } @@ -6991,8 +6990,12 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) mminit_verify_pageflags_layout(); setup_nr_node_ids(); zero_resv_unavail(); - for_each_online_node(nid) { + for_each_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); + + if (!pgdat) + continue; + free_area_init_node(nid, NULL, find_min_pfn_for_node(nid), NULL); -- 2.20.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] x86, numa: always initialize all possible nodes 2019-02-12 9:53 ` [PATCH 1/2] " Michal Hocko @ 2019-05-01 19:12 ` Barret Rhoden 2019-05-02 13:00 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Barret Rhoden @ 2019-05-01 19:12 UTC (permalink / raw) To: Michal Hocko, linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Michal Hocko, Ingo Molnar, linuxppc-dev Hi - This patch triggered an oops for me (more below). On 2/12/19 4:53 AM, Michal Hocko wrote: [snip] > Fix the issue by reworking how x86 initializes the memory less nodes. > The current implementation is hacked into the workflow and it doesn't > allow any flexibility. There is init_memory_less_node called for each > offline node that has a CPU as already mentioned above. This will make > sure that we will have a new online node without any memory. Much later > on we build a zone list for this node and things seem to work, except > they do not (e.g. due to nr_cpus). Not to mention that it doesn't really > make much sense to consider an empty node as online because we just > consider this node whenever we want to iterate nodes to use and empty > node is obviously not the best candidate. This is all just too fragile. The problem might be in here - I have a case with a 'memoryless' node that has CPUs that get onlined during SMP boot, but that onlining triggers a page fault during device registration. I'm running on a NUMA machine but I marked all of the memory on node 1 as type 12 (PRAM), using the memmap arg. That makes node 1 appear to have no memory. During SMP boot, the fault is in bus_add_device(): error = sysfs_create_link(&bus->p->devices_kset->kobj, bus->p is NULL. That p is the subsys_private struct, and it should have been set in postcore_initcall(register_node_type); But that happens after SMP boot. This fault happens during SMP boot. The old code had set this node online via alloc_node_data(), so when it came time to do_cpu_up() -> try_online_node(), the node was already up and nothing happened. Now, it attempts to online the node, which registers the node with sysfs, but that can't happen before the 'node' subsystem is registered. My modified e820 map looks like this: > [ 0.000000] user: [mem 0x0000000000000100-0x000000000009c7ff] usable > [ 0.000000] user: [mem 0x000000000009c800-0x000000000009ffff] reserved > [ 0.000000] user: [mem 0x00000000000e0000-0x00000000000fffff] reserved > [ 0.000000] user: [mem 0x0000000000100000-0x0000000073216fff] usable > [ 0.000000] user: [mem 0x0000000073217000-0x0000000075316fff] reserved > [ 0.000000] user: [mem 0x0000000075317000-0x00000000754f8fff] ACPI data > [ 0.000000] user: [mem 0x00000000754f9000-0x0000000076057fff] ACPI NVS > [ 0.000000] user: [mem 0x0000000076058000-0x0000000077ae9fff] reserved > [ 0.000000] user: [mem 0x0000000077aea000-0x0000000077ffffff] usable > [ 0.000000] user: [mem 0x0000000078000000-0x000000008fffffff] reserved > [ 0.000000] user: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved > [ 0.000000] user: [mem 0x00000000ff000000-0x00000000ffffffff] reserved > [ 0.000000] user: [mem 0x0000000100000000-0x00000004ffffffff] usable > [ 0.000000] user: [mem 0x0000000500000000-0x000000603fffffff] persistent (type 12) Which leads to an empty zone 1: > [ 0.016060] Initmem setup node 0 [mem 0x0000000000001000-0x00000004ffffffff] > [ 0.073310] Initmem setup node 1 [mem 0x0000000000000000-0x0000000000000000] The backtrace: > [ 2.175327] Call Trace: > [ 2.175327] device_add+0x43e/0x690 > [ 2.175327] device_register+0x107/0x110 > [ 2.175327] __register_one_node+0x72/0x150 > [ 2.175327] __try_online_node+0x8f/0xd0 > [ 2.175327] try_online_node+0x2b/0x50 > [ 2.175327] do_cpu_up+0x46/0xf0 > [ 2.175327] cpu_up+0x13/0x20 > [ 2.175327] smp_init+0x6e/0xd0 > [ 2.175327] kernel_init_freeable+0xe5/0x21f > [ 2.175327] ? rest_init+0xb0/0xb0 > [ 2.175327] kernel_init+0xf/0x180 > [ 2.175327] ? rest_init+0xb0/0xb0 > [ 2.175327] ret_from_fork+0x1f/0x30 To get it booting again, I unconditionally node_set_online: arch/x86/mm/numa.c @@ -583,7 +583,7 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) continue; alloc_node_data(nid); - if (end) + //if (end) node_set_online(nid); } A more elegant solution may be to avoid registering with sysfs during early boot, or something else entirely. But I figured I'd ask for help at this point. =) Thanks, Barret ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] x86, numa: always initialize all possible nodes 2019-05-01 19:12 ` Barret Rhoden @ 2019-05-02 13:00 ` Michal Hocko 2019-06-26 13:54 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2019-05-02 13:00 UTC (permalink / raw) To: Barret Rhoden Cc: Tony Luck, linux-ia64, Dave Hansen, Peter Zijlstra, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Wed 01-05-19 15:12:32, Barret Rhoden wrote: [...] > A more elegant solution may be to avoid registering with sysfs during early > boot, or something else entirely. But I figured I'd ask for help at this > point. =) Thanks for the report and an excellent analysis! This is really helpful. I will think about this some more but I am traveling this week. It seems really awkward to register a sysfs file for an empty range. That looks like a bug to me. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] x86, numa: always initialize all possible nodes 2019-05-02 13:00 ` Michal Hocko @ 2019-06-26 13:54 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2019-06-26 13:54 UTC (permalink / raw) To: Barret Rhoden Cc: Tony Luck, linux-ia64, Dave Hansen, Peter Zijlstra, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Thu 02-05-19 09:00:31, Michal Hocko wrote: > On Wed 01-05-19 15:12:32, Barret Rhoden wrote: > [...] > > A more elegant solution may be to avoid registering with sysfs during early > > boot, or something else entirely. But I figured I'd ask for help at this > > point. =) > > Thanks for the report and an excellent analysis! This is really helpful. > I will think about this some more but I am traveling this week. It seems > really awkward to register a sysfs file for an empty range. That looks > like a bug to me. I am sorry, but I didn't get to this for a long time and I am still busy. The patch has been dropped from the mm tree (thus linux-next). I hope I can revisit this or somebody else will take over and finish this work. This is much more trickier than I anticipated unfortunately. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/2] mm: be more verbose about zonelist initialization 2019-02-12 9:53 [PATCH 0/2] x86, numa: always initialize all possible nodes Michal Hocko 2019-02-12 9:53 ` [PATCH 1/2] " Michal Hocko @ 2019-02-12 9:53 ` Michal Hocko 2019-02-13 0:12 ` kbuild test robot ` (3 more replies) 2019-02-12 10:19 ` [PATCH 0/2] x86, numa: always initialize all possible nodes Mike Rapoport 2019-02-26 13:12 ` Michal Hocko 3 siblings, 4 replies; 21+ messages in thread From: Michal Hocko @ 2019-02-12 9:53 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Michal Hocko, Ingo Molnar, linuxppc-dev From: Michal Hocko <mhocko@suse.com> We have seen several bugs where zonelists have not been initialized properly and it is not really straightforward to track those bugs down. One way to help a bit at least is to dump zonelists of each node when they are (re)initialized. Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/page_alloc.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2e097f336126..c30d59f803fb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5259,6 +5259,11 @@ static void build_zonelists(pg_data_t *pgdat) build_zonelists_in_node_order(pgdat, node_order, nr_nodes); build_thisnode_zonelists(pgdat); + + pr_info("node[%d] zonelist: ", pgdat->node_id); + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); + pr_cont("\n"); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES -- 2.20.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm: be more verbose about zonelist initialization 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko @ 2019-02-13 0:12 ` kbuild test robot 2019-02-13 2:13 ` kbuild test robot ` (2 subsequent siblings) 3 siblings, 0 replies; 21+ messages in thread From: kbuild test robot @ 2019-02-13 0:12 UTC (permalink / raw) To: Michal Hocko Cc: Tony Luck, linux-ia64, Dave Hansen, Peter Zijlstra, x86, LKML, Pingfan Liu, linux-mm, Michal Hocko, kbuild-all, Ingo Molnar, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 5852 bytes --] Hi Michal, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.0-rc4 next-20190212] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/x86-numa-always-initialize-all-possible-nodes/20190213-071628 config: x86_64-randconfig-x016-201906 (attached as .config) compiler: gcc-8 (Debian 8.2.0-20) 8.2.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from include/linux/gfp.h:6, from include/linux/mm.h:10, from mm/page_alloc.c:18: mm/page_alloc.c: In function 'build_zonelists': >> mm/page_alloc.c:5423:31: error: 'z' undeclared (first use in this function) for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^ include/linux/mmzone.h:1036:7: note: in definition of macro 'for_each_zone_zonelist_nodemask' for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^ mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ mm/page_alloc.c:5423:31: note: each undeclared identifier is reported only once for each function it appears in for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^ include/linux/mmzone.h:1036:7: note: in definition of macro 'for_each_zone_zonelist_nodemask' for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^ mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ >> mm/page_alloc.c:5423:25: error: 'zone' undeclared (first use in this function) for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~ include/linux/mmzone.h:1036:59: note: in definition of macro 'for_each_zone_zonelist_nodemask' for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^~~~ mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ include/linux/mmzone.h:1036:57: warning: left-hand operand of comma expression has no effect [-Wunused-value] for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^ include/linux/mmzone.h:1058:2: note: in expansion of macro 'for_each_zone_zonelist_nodemask' for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ include/linux/mmzone.h:1038:50: warning: left-hand operand of comma expression has no effect [-Wunused-value] z = next_zones_zonelist(++z, highidx, nodemask), \ ^ include/linux/mmzone.h:1058:2: note: in expansion of macro 'for_each_zone_zonelist_nodemask' for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ vim +/z +5423 mm/page_alloc.c 5382 5383 /* 5384 * Build zonelists ordered by zone and nodes within zones. 5385 * This results in conserving DMA zone[s] until all Normal memory is 5386 * exhausted, but results in overflowing to remote node while memory 5387 * may still exist in local DMA zone. 5388 */ 5389 5390 static void build_zonelists(pg_data_t *pgdat) 5391 { 5392 static int node_order[MAX_NUMNODES]; 5393 int node, load, nr_nodes = 0; 5394 nodemask_t used_mask; 5395 int local_node, prev_node; 5396 5397 /* NUMA-aware ordering of nodes */ 5398 local_node = pgdat->node_id; 5399 load = nr_online_nodes; 5400 prev_node = local_node; 5401 nodes_clear(used_mask); 5402 5403 memset(node_order, 0, sizeof(node_order)); 5404 while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { 5405 /* 5406 * We don't want to pressure a particular node. 5407 * So adding penalty to the first node in same 5408 * distance group to make it round-robin. 5409 */ 5410 if (node_distance(local_node, node) != 5411 node_distance(local_node, prev_node)) 5412 node_load[node] = load; 5413 5414 node_order[nr_nodes++] = node; 5415 prev_node = node; 5416 load--; 5417 } 5418 5419 build_zonelists_in_node_order(pgdat, node_order, nr_nodes); 5420 build_thisnode_zonelists(pgdat); 5421 5422 pr_info("node[%d] zonelist: ", pgdat->node_id); > 5423 for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) 5424 pr_cont("%d:%s ", zone_to_nid(zone), zone->name); 5425 pr_cont("\n"); 5426 } 5427 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 26045 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/2] mm: be more verbose about zonelist initialization 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko 2019-02-13 0:12 ` kbuild test robot @ 2019-02-13 2:13 ` kbuild test robot 2019-02-13 9:40 ` [PATCH v2 " Michal Hocko 2019-02-13 9:43 ` [PATCH v3 " Michal Hocko 3 siblings, 0 replies; 21+ messages in thread From: kbuild test robot @ 2019-02-13 2:13 UTC (permalink / raw) To: Michal Hocko Cc: Tony Luck, linux-ia64, Dave Hansen, Peter Zijlstra, x86, LKML, Pingfan Liu, linux-mm, Michal Hocko, kbuild-all, Ingo Molnar, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 5864 bytes --] Hi Michal, I love your patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v5.0-rc4 next-20190212] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michal-Hocko/x86-numa-always-initialize-all-possible-nodes/20190213-071628 config: x86_64-kexec (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): In file included from include/linux/gfp.h:6:0, from include/linux/mm.h:10, from mm/page_alloc.c:18: mm/page_alloc.c: In function 'build_zonelists': mm/page_alloc.c:5423:31: error: 'z' undeclared (first use in this function) for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^ include/linux/mmzone.h:1036:7: note: in definition of macro 'for_each_zone_zonelist_nodemask' for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^ >> mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ mm/page_alloc.c:5423:31: note: each undeclared identifier is reported only once for each function it appears in for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^ include/linux/mmzone.h:1036:7: note: in definition of macro 'for_each_zone_zonelist_nodemask' for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^ >> mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ mm/page_alloc.c:5423:25: error: 'zone' undeclared (first use in this function) for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^ include/linux/mmzone.h:1036:59: note: in definition of macro 'for_each_zone_zonelist_nodemask' for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^~~~ >> mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ include/linux/mmzone.h:1036:57: warning: left-hand operand of comma expression has no effect [-Wunused-value] for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ ^ >> include/linux/mmzone.h:1058:2: note: in expansion of macro 'for_each_zone_zonelist_nodemask' for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ include/linux/mmzone.h:1038:50: warning: left-hand operand of comma expression has no effect [-Wunused-value] z = next_zones_zonelist(++z, highidx, nodemask), \ ^ >> include/linux/mmzone.h:1058:2: note: in expansion of macro 'for_each_zone_zonelist_nodemask' for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> mm/page_alloc.c:5423:2: note: in expansion of macro 'for_each_zone_zonelist' for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) ^~~~~~~~~~~~~~~~~~~~~~ vim +/for_each_zone_zonelist +5423 mm/page_alloc.c 5382 5383 /* 5384 * Build zonelists ordered by zone and nodes within zones. 5385 * This results in conserving DMA zone[s] until all Normal memory is 5386 * exhausted, but results in overflowing to remote node while memory 5387 * may still exist in local DMA zone. 5388 */ 5389 5390 static void build_zonelists(pg_data_t *pgdat) 5391 { 5392 static int node_order[MAX_NUMNODES]; 5393 int node, load, nr_nodes = 0; 5394 nodemask_t used_mask; 5395 int local_node, prev_node; 5396 5397 /* NUMA-aware ordering of nodes */ 5398 local_node = pgdat->node_id; 5399 load = nr_online_nodes; 5400 prev_node = local_node; 5401 nodes_clear(used_mask); 5402 5403 memset(node_order, 0, sizeof(node_order)); 5404 while ((node = find_next_best_node(local_node, &used_mask)) >= 0) { 5405 /* 5406 * We don't want to pressure a particular node. 5407 * So adding penalty to the first node in same 5408 * distance group to make it round-robin. 5409 */ 5410 if (node_distance(local_node, node) != 5411 node_distance(local_node, prev_node)) 5412 node_load[node] = load; 5413 5414 node_order[nr_nodes++] = node; 5415 prev_node = node; 5416 load--; 5417 } 5418 5419 build_zonelists_in_node_order(pgdat, node_order, nr_nodes); 5420 build_thisnode_zonelists(pgdat); 5421 5422 pr_info("node[%d] zonelist: ", pgdat->node_id); > 5423 for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) 5424 pr_cont("%d:%s ", zone_to_nid(zone), zone->name); 5425 pr_cont("\n"); 5426 } 5427 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 26383 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 2/2] mm: be more verbose about zonelist initialization 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko 2019-02-13 0:12 ` kbuild test robot 2019-02-13 2:13 ` kbuild test robot @ 2019-02-13 9:40 ` Michal Hocko 2019-02-13 9:43 ` [PATCH v3 " Michal Hocko 3 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2019-02-13 9:40 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Michal Hocko, Ingo Molnar, linuxppc-dev From: Michal Hocko <mhocko@suse.com> We have seen several bugs where zonelists have not been initialized properly and it is not really straightforward to track those bugs down. One way to help a bit at least is to dump zonelists of each node when they are (re)initialized. Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/page_alloc.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2e097f336126..02c843f0db4f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5234,6 +5234,7 @@ static void build_zonelists(pg_data_t *pgdat) int node, load, nr_nodes = 0; nodemask_t used_mask; int local_node, prev_node; + struct zone *zone; /* NUMA-aware ordering of nodes */ local_node = pgdat->node_id; @@ -5259,6 +5260,11 @@ static void build_zonelists(pg_data_t *pgdat) build_zonelists_in_node_order(pgdat, node_order, nr_nodes); build_thisnode_zonelists(pgdat); + + pr_info("node[%d] zonelist: ", pgdat->node_id); + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); + pr_cont("\n"); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES -- 2.20.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko ` (2 preceding siblings ...) 2019-02-13 9:40 ` [PATCH v2 " Michal Hocko @ 2019-02-13 9:43 ` Michal Hocko 2019-02-13 10:32 ` Peter Zijlstra 2019-02-13 16:14 ` Dave Hansen 3 siblings, 2 replies; 21+ messages in thread From: Michal Hocko @ 2019-02-13 9:43 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Michal Hocko, Ingo Molnar, linuxppc-dev From: Michal Hocko <mhocko@suse.com> We have seen several bugs where zonelists have not been initialized properly and it is not really straightforward to track those bugs down. One way to help a bit at least is to dump zonelists of each node when they are (re)initialized. Signed-off-by: Michal Hocko <mhocko@suse.com> --- Sorry for spamming. I have screwed up ammending the previous version. mm/page_alloc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2e097f336126..52e54d16662a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5234,6 +5234,8 @@ static void build_zonelists(pg_data_t *pgdat) int node, load, nr_nodes = 0; nodemask_t used_mask; int local_node, prev_node; + struct zone *zone; + struct zoneref *z; /* NUMA-aware ordering of nodes */ local_node = pgdat->node_id; @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat) build_zonelists_in_node_order(pgdat, node_order, nr_nodes); build_thisnode_zonelists(pgdat); + + pr_info("node[%d] zonelist: ", pgdat->node_id); + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); + pr_cont("\n"); } #ifdef CONFIG_HAVE_MEMORYLESS_NODES -- 2.20.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-13 9:43 ` [PATCH v3 " Michal Hocko @ 2019-02-13 10:32 ` Peter Zijlstra 2019-02-13 11:50 ` Michal Hocko 2019-02-13 16:14 ` Dave Hansen 1 sibling, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-02-13 10:32 UTC (permalink / raw) To: Michal Hocko Cc: Tony Luck, linux-ia64, Dave Hansen, x86, LKML, Pingfan Liu, linux-mm, Michal Hocko, Ingo Molnar, linuxppc-dev On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote: > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat) > > build_zonelists_in_node_order(pgdat, node_order, nr_nodes); > build_thisnode_zonelists(pgdat); > + > + pr_info("node[%d] zonelist: ", pgdat->node_id); > + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) > + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); > + pr_cont("\n"); > } Have you ran this by the SGI and other stupid large machine vendors? Traditionally they tend to want to remove such things instead of adding them. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-13 10:32 ` Peter Zijlstra @ 2019-02-13 11:50 ` Michal Hocko 2019-02-13 13:11 ` Peter Zijlstra 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2019-02-13 11:50 UTC (permalink / raw) To: Peter Zijlstra Cc: Tony Luck, linux-ia64, Dave Hansen, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Wed 13-02-19 11:32:31, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote: > > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat) > > > > build_zonelists_in_node_order(pgdat, node_order, nr_nodes); > > build_thisnode_zonelists(pgdat); > > + > > + pr_info("node[%d] zonelist: ", pgdat->node_id); > > + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) > > + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); > > + pr_cont("\n"); > > } > > Have you ran this by the SGI and other stupid large machine vendors? I do not have such a large machine handy. The biggest I have has handfull (say dozen) of NUMA nodes. > Traditionally they tend to want to remove such things instead of adding > them. I do not insist on this patch but I find it handy. If there is an opposition I will not miss it much. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-13 11:50 ` Michal Hocko @ 2019-02-13 13:11 ` Peter Zijlstra 2019-02-13 13:41 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Peter Zijlstra @ 2019-02-13 13:11 UTC (permalink / raw) To: Michal Hocko Cc: Tony Luck, linux-ia64, Dave Hansen, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Wed, Feb 13, 2019 at 12:50:14PM +0100, Michal Hocko wrote: > On Wed 13-02-19 11:32:31, Peter Zijlstra wrote: > > On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote: > > > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat) > > > > > > build_zonelists_in_node_order(pgdat, node_order, nr_nodes); > > > build_thisnode_zonelists(pgdat); > > > + > > > + pr_info("node[%d] zonelist: ", pgdat->node_id); > > > + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) > > > + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); > > > + pr_cont("\n"); > > > } > > > > Have you ran this by the SGI and other stupid large machine vendors? > > I do not have such a large machine handy. The biggest I have has > handfull (say dozen) of NUMA nodes. > > > Traditionally they tend to want to remove such things instead of adding > > them. > > I do not insist on this patch but I find it handy. If there is an > opposition I will not miss it much. Well, I don't have machines like that either and don't mind the patch. Just raising the issue; I've had the big iron boys complain about similar things (typically printing something for every CPU, which gets out of hand much faster than zones, but still). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-13 13:11 ` Peter Zijlstra @ 2019-02-13 13:41 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2019-02-13 13:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Tony Luck, linux-ia64, Dave Hansen, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Wed 13-02-19 14:11:31, Peter Zijlstra wrote: > On Wed, Feb 13, 2019 at 12:50:14PM +0100, Michal Hocko wrote: > > On Wed 13-02-19 11:32:31, Peter Zijlstra wrote: > > > On Wed, Feb 13, 2019 at 10:43:15AM +0100, Michal Hocko wrote: > > > > @@ -5259,6 +5261,11 @@ static void build_zonelists(pg_data_t *pgdat) > > > > > > > > build_zonelists_in_node_order(pgdat, node_order, nr_nodes); > > > > build_thisnode_zonelists(pgdat); > > > > + > > > > + pr_info("node[%d] zonelist: ", pgdat->node_id); > > > > + for_each_zone_zonelist(zone, z, &pgdat->node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1) > > > > + pr_cont("%d:%s ", zone_to_nid(zone), zone->name); > > > > + pr_cont("\n"); > > > > } > > > > > > Have you ran this by the SGI and other stupid large machine vendors? > > > > I do not have such a large machine handy. The biggest I have has > > handfull (say dozen) of NUMA nodes. > > > > > Traditionally they tend to want to remove such things instead of adding > > > them. > > > > I do not insist on this patch but I find it handy. If there is an > > opposition I will not miss it much. > > Well, I don't have machines like that either and don't mind the patch. > Just raising the issue; I've had the big iron boys complain about > similar things (typically printing something for every CPU, which gets > out of hand much faster than zones, but still). Maybe we can try to push this through and revert if somebody complains about an excessive output. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-13 9:43 ` [PATCH v3 " Michal Hocko 2019-02-13 10:32 ` Peter Zijlstra @ 2019-02-13 16:14 ` Dave Hansen 2019-02-13 16:18 ` Michal Hocko 1 sibling, 1 reply; 21+ messages in thread From: Dave Hansen @ 2019-02-13 16:14 UTC (permalink / raw) To: Michal Hocko, linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Michal Hocko, Ingo Molnar, linuxppc-dev On 2/13/19 1:43 AM, Michal Hocko wrote: > > We have seen several bugs where zonelists have not been initialized > properly and it is not really straightforward to track those bugs down. > One way to help a bit at least is to dump zonelists of each node when > they are (re)initialized. Were you thinking of boot-time bugs and crashes, or just stuff going wonky after boot? We don't have the zonelists dumped in /proc anywhere, do we? Would that help? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 2/2] mm: be more verbose about zonelist initialization 2019-02-13 16:14 ` Dave Hansen @ 2019-02-13 16:18 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2019-02-13 16:18 UTC (permalink / raw) To: Dave Hansen Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Wed 13-02-19 08:14:50, Dave Hansen wrote: > On 2/13/19 1:43 AM, Michal Hocko wrote: > > > > We have seen several bugs where zonelists have not been initialized > > properly and it is not really straightforward to track those bugs down. > > One way to help a bit at least is to dump zonelists of each node when > > they are (re)initialized. > > Were you thinking of boot-time bugs and crashes, or just stuff going > wonky after boot? Mostly boot time. I haven't seen hotplug related bugs in this direction. All the issues I have seen so far is that we forget a node altogether and it ends up with no zonelists at all. But who knows maybe we have some hidden bugs where zonelists is initialized only partially for some reason and there is no real way to find out. > We don't have the zonelists dumped in /proc anywhere, do we? Would that > help? I would prefer to not export such an implementation detail into proc -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] x86, numa: always initialize all possible nodes 2019-02-12 9:53 [PATCH 0/2] x86, numa: always initialize all possible nodes Michal Hocko 2019-02-12 9:53 ` [PATCH 1/2] " Michal Hocko 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko @ 2019-02-12 10:19 ` Mike Rapoport 2019-02-26 13:12 ` Michal Hocko 3 siblings, 0 replies; 21+ messages in thread From: Mike Rapoport @ 2019-02-12 10:19 UTC (permalink / raw) To: Michal Hocko Cc: Tony Luck, linux-ia64, Dave Hansen, Peter Zijlstra, x86, LKML, Pingfan Liu, linux-mm, Ingo Molnar, linuxppc-dev On Tue, Feb 12, 2019 at 10:53:41AM +0100, Michal Hocko wrote: > Hi, > this has been posted as an RFC previously [1]. There didn't seem to be > any objections so I am reposting this for inclusion. I have added a > debugging patch which prints the zonelist setup for each numa node > for an easier debugging of a broken zonelist setup. > > [1] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org FWIW, Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> for the series. -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] x86, numa: always initialize all possible nodes 2019-02-12 9:53 [PATCH 0/2] x86, numa: always initialize all possible nodes Michal Hocko ` (2 preceding siblings ...) 2019-02-12 10:19 ` [PATCH 0/2] x86, numa: always initialize all possible nodes Mike Rapoport @ 2019-02-26 13:12 ` Michal Hocko 2019-04-15 11:42 ` Michal Hocko 3 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2019-02-26 13:12 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Ingo Molnar, linuxppc-dev On Tue 12-02-19 10:53:41, Michal Hocko wrote: > Hi, > this has been posted as an RFC previously [1]. There didn't seem to be > any objections so I am reposting this for inclusion. I have added a > debugging patch which prints the zonelist setup for each numa node > for an easier debugging of a broken zonelist setup. > > [1] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org Friendly ping. I haven't heard any complains so can we route this via tip/x86/mm or should we go via mmotm. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] x86, numa: always initialize all possible nodes 2019-02-26 13:12 ` Michal Hocko @ 2019-04-15 11:42 ` Michal Hocko 2019-04-15 15:43 ` Dave Hansen 2019-04-16 6:54 ` Michal Hocko 0 siblings, 2 replies; 21+ messages in thread From: Michal Hocko @ 2019-04-15 11:42 UTC (permalink / raw) To: linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Ingo Molnar, linuxppc-dev On Tue 26-02-19 14:12:01, Michal Hocko wrote: > On Tue 12-02-19 10:53:41, Michal Hocko wrote: > > Hi, > > this has been posted as an RFC previously [1]. There didn't seem to be > > any objections so I am reposting this for inclusion. I have added a > > debugging patch which prints the zonelist setup for each numa node > > for an easier debugging of a broken zonelist setup. > > > > [1] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org > > Friendly ping. I haven't heard any complains so can we route this via > tip/x86/mm or should we go via mmotm. It seems that Dave is busy. Let's add Andrew. Can we get this [1] merged finally, please? [1] http://lkml.kernel.org/r/20190212095343.23315-1-mhocko@kernel.org -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] x86, numa: always initialize all possible nodes 2019-04-15 11:42 ` Michal Hocko @ 2019-04-15 15:43 ` Dave Hansen 2019-04-16 6:54 ` Michal Hocko 1 sibling, 0 replies; 21+ messages in thread From: Dave Hansen @ 2019-04-15 15:43 UTC (permalink / raw) To: Michal Hocko, linux-mm Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Ingo Molnar, linuxppc-dev On 4/15/19 4:42 AM, Michal Hocko wrote: >> Friendly ping. I haven't heard any complains so can we route this via >> tip/x86/mm or should we go via mmotm. > It seems that Dave is busy. Let's add Andrew. Can we get this [1] merged > finally, please? Sorry these slipped through the cracks. These look sane to me. Because it pokes around mm/page_alloc.c a bit, and could impact other architectures, my preference would be for Andrew to pick these up for -mm. But, I don't feel that strongly about it. Reviewed-by: Dave Hansen <dave.hansen@intel.com> ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/2] x86, numa: always initialize all possible nodes 2019-04-15 11:42 ` Michal Hocko 2019-04-15 15:43 ` Dave Hansen @ 2019-04-16 6:54 ` Michal Hocko 1 sibling, 0 replies; 21+ messages in thread From: Michal Hocko @ 2019-04-16 6:54 UTC (permalink / raw) To: linux-mm, Andrew Morton Cc: Tony Luck, linux-ia64, Peter Zijlstra, x86, LKML, Pingfan Liu, Dave Hansen, Ingo Molnar, linuxppc-dev Forgot to cc Andrew. Now for real. Andrew please note that Dave has reviewed the patch http://lkml.kernel.org/r/77b364e5-a30c-964a-6985-00b759dac128@intel.com Or do you want me to resubmit? On Mon 15-04-19 13:42:09, Michal Hocko wrote: > On Tue 26-02-19 14:12:01, Michal Hocko wrote: > > On Tue 12-02-19 10:53:41, Michal Hocko wrote: > > > Hi, > > > this has been posted as an RFC previously [1]. There didn't seem to be > > > any objections so I am reposting this for inclusion. I have added a > > > debugging patch which prints the zonelist setup for each numa node > > > for an easier debugging of a broken zonelist setup. > > > > > > [1] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org > > > > Friendly ping. I haven't heard any complains so can we route this via > > tip/x86/mm or should we go via mmotm. > > It seems that Dave is busy. Let's add Andrew. Can we get this [1] merged > finally, please? > > [1] http://lkml.kernel.org/r/20190212095343.23315-1-mhocko@kernel.org -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-06-26 13:59 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-12 9:53 [PATCH 0/2] x86, numa: always initialize all possible nodes Michal Hocko 2019-02-12 9:53 ` [PATCH 1/2] " Michal Hocko 2019-05-01 19:12 ` Barret Rhoden 2019-05-02 13:00 ` Michal Hocko 2019-06-26 13:54 ` Michal Hocko 2019-02-12 9:53 ` [PATCH 2/2] mm: be more verbose about zonelist initialization Michal Hocko 2019-02-13 0:12 ` kbuild test robot 2019-02-13 2:13 ` kbuild test robot 2019-02-13 9:40 ` [PATCH v2 " Michal Hocko 2019-02-13 9:43 ` [PATCH v3 " Michal Hocko 2019-02-13 10:32 ` Peter Zijlstra 2019-02-13 11:50 ` Michal Hocko 2019-02-13 13:11 ` Peter Zijlstra 2019-02-13 13:41 ` Michal Hocko 2019-02-13 16:14 ` Dave Hansen 2019-02-13 16:18 ` Michal Hocko 2019-02-12 10:19 ` [PATCH 0/2] x86, numa: always initialize all possible nodes Mike Rapoport 2019-02-26 13:12 ` Michal Hocko 2019-04-15 11:42 ` Michal Hocko 2019-04-15 15:43 ` Dave Hansen 2019-04-16 6:54 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).