linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pingfan Liu <kernelfans@gmail.com>
To: mhocko@kernel.org
Cc: Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Subject: Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline
Date: Fri, 7 Dec 2018 21:20:17 +0800	[thread overview]
Message-ID: <CAFgQCTuf95pJSWDc1BNQ=gN76aJ_dtxMRbAV9a28X6w8vapdMQ@mail.gmail.com> (raw)
In-Reply-To: <20181207113044.GB1286@dhcp22.suse.cz>

On Fri, Dec 7, 2018 at 7:30 PM Michal Hocko <mhocko@kernel.org> wrote:
>
[...]
> On Fri 07-12-18 17:40:09, Pingfan Liu wrote:
> > On Fri, Dec 7, 2018 at 3:53 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Fri 07-12-18 10:56:51, Pingfan Liu wrote:
> > > [...]
> > > > In a short word, the fix method should consider about the two factors:
> > > > semantic of online-node and the effect on all archs
> > >
> > > I am pretty sure there is a lot of room for unification in this area.
> > > Nevertheless I strongly believe the bug should be fixed firs with the
> > > simplest way and all the cleanup should be done on top.
> > >
> > > Do I get it right that the diff worked for you and I can prepare a full
> > > patch?
> > >
> > Sure, I am glad to test you new patch.
>
> From 46e68be89d9c299fd497b2b8bea3f2add144f17f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 7 Dec 2018 12:23:32 +0100
> Subject: [PATCH] x86, numa: always initialize all possible nodes
>
> Pingfan Liu has reported the following splat
> [    5.772742] BUG: unable to handle kernel paging request at 0000000000002088
> [    5.773618] PGD 0 P4D 0
> [    5.773618] Oops: 0000 [#1] SMP NOPTI
> [    5.773618] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1+ #3
> [    5.773618] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.4.3 06/29/2018
> [    5.773618] RIP: 0010:__alloc_pages_nodemask+0xe2/0x2a0
> [    5.773618] Code: 00 00 44 89 ea 80 ca 80 41 83 f8 01 44 0f 44 ea 89 da c1 ea 08 83 e2 01 88 54 24 20 48 8b 54 24 08 48 85 d2 0f 85 46 01 00 00 <3b> 77 08 0f 82 3d 01 00 00 48 89 f8 44 89 ea 48 89
> e1 44 89 e6 89
> [    5.773618] RSP: 0018:ffffaa600005fb20 EFLAGS: 00010246
> [    5.773618] RAX: 0000000000000000 RBX: 00000000006012c0 RCX: 0000000000000000
> [    5.773618] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080
> [    5.773618] RBP: 00000000006012c0 R08: 0000000000000000 R09: 0000000000000002
> [    5.773618] R10: 00000000006080c0 R11: 0000000000000002 R12: 0000000000000000
> [    5.773618] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000002
> [    5.773618] FS:  0000000000000000(0000) GS:ffff8c69afe00000(0000) knlGS:0000000000000000
> [    5.773618] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.773618] CR2: 0000000000002088 CR3: 000000087e00a000 CR4: 00000000003406e0
> [    5.773618] Call Trace:
> [    5.773618]  new_slab+0xa9/0x570
> [    5.773618]  ___slab_alloc+0x375/0x540
> [    5.773618]  ? pinctrl_bind_pins+0x2b/0x2a0
> [    5.773618]  __slab_alloc+0x1c/0x38
> [    5.773618]  __kmalloc_node_track_caller+0xc8/0x270
> [    5.773618]  ? pinctrl_bind_pins+0x2b/0x2a0
> [    5.773618]  devm_kmalloc+0x28/0x60
> [    5.773618]  pinctrl_bind_pins+0x2b/0x2a0
> [    5.773618]  really_probe+0x73/0x420
> [    5.773618]  driver_probe_device+0x115/0x130
> [    5.773618]  __driver_attach+0x103/0x110
> [    5.773618]  ? driver_probe_device+0x130/0x130
> [    5.773618]  bus_for_each_dev+0x67/0xc0
> [    5.773618]  ? klist_add_tail+0x3b/0x70
> [    5.773618]  bus_add_driver+0x41/0x260
> [    5.773618]  ? pcie_port_setup+0x4d/0x4d
> [    5.773618]  driver_register+0x5b/0xe0
> [    5.773618]  ? pcie_port_setup+0x4d/0x4d
> [    5.773618]  do_one_initcall+0x4e/0x1d4
> [    5.773618]  ? init_setup+0x25/0x28
> [    5.773618]  kernel_init_freeable+0x1c1/0x26e
> [    5.773618]  ? loglevel+0x5b/0x5b
> [    5.773618]  ? rest_init+0xb0/0xb0
> [    5.773618]  kernel_init+0xa/0x110
> [    5.773618]  ret_from_fork+0x22/0x40
> [    5.773618] Modules linked in:
> [    5.773618] CR2: 0000000000002088
> [    5.773618] ---[ end trace 1030c9120a03d081 ]---
>
> with his AMD machine with the following topology
>   NUMA node0 CPU(s):     0,8,16,24
>   NUMA node1 CPU(s):     2,10,18,26
>   NUMA node2 CPU(s):     4,12,20,28
>   NUMA node3 CPU(s):     6,14,22,30
>   NUMA node4 CPU(s):     1,9,17,25
>   NUMA node5 CPU(s):     3,11,19,27
>   NUMA node6 CPU(s):     5,13,21,29
>   NUMA node7 CPU(s):     7,15,23,31
>
> [    0.007418] Early memory node ranges
> [    0.007419]   node   1: [mem 0x0000000000001000-0x000000000008efff]
> [    0.007420]   node   1: [mem 0x0000000000090000-0x000000000009ffff]
> [    0.007422]   node   1: [mem 0x0000000000100000-0x000000005c3d6fff]
> [    0.007422]   node   1: [mem 0x00000000643df000-0x0000000068ff7fff]
> [    0.007423]   node   1: [mem 0x000000006c528000-0x000000006fffffff]
> [    0.007424]   node   1: [mem 0x0000000100000000-0x000000047fffffff]
> [    0.007425]   node   5: [mem 0x0000000480000000-0x000000087effffff]
>
> and nr_cpus set to 4. The underlying reason is tha the device is bound
> to node 2 which doesn't have any memory and init_cpu_to_node only
> initializes memory-less nodes for possible cpus which nr_cpus restrics.
> This in turn means that proper zonelists are not allocated and the page
> allocator blows up.
>
> Fix the issue by moving init_memory_less_node into numa_register_memblks
> and always initialize all possible nodes consistently at a single place.
>
> Reported-by: Pingfan Liu <kernelfans@gmail.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/x86/mm/numa.c | 33 +++++++++++++++------------------
>  1 file changed, 15 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 1308f5408bf7..4575ae4d5449 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -527,6 +527,19 @@ static void __init numa_clear_kernel_node_hotplug(void)
>         }
>  }
>
> +static void __init init_memory_less_node(int nid)
> +{
> +       unsigned long zones_size[MAX_NR_ZONES] = {0};
> +       unsigned long zholes_size[MAX_NR_ZONES] = {0};
> +
> +       free_area_init_node(nid, zones_size, 0, zholes_size);
> +
> +       /*
> +        * All zonelists will be built later in start_kernel() after per cpu
> +        * areas are initialized.
> +        */
> +}
> +
>  static int __init numa_register_memblks(struct numa_meminfo *mi)
>  {
>         unsigned long uninitialized_var(pfn_align);
> @@ -592,6 +605,8 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
>                         continue;
>
>                 alloc_node_data(nid);
> +               if (!end)
> +                       init_memory_less_node(nid);
>         }
>
>         /* Dump memblock with node info and return. */
> @@ -721,21 +736,6 @@ void __init x86_numa_init(void)
>         numa_init(dummy_numa_init);
>  }
>
> -static void __init init_memory_less_node(int nid)
> -{
> -       unsigned long zones_size[MAX_NR_ZONES] = {0};
> -       unsigned long zholes_size[MAX_NR_ZONES] = {0};
> -
> -       /* Allocate and initialize node data. Memory-less node is now online.*/
> -       alloc_node_data(nid);
> -       free_area_init_node(nid, zones_size, 0, zholes_size);
> -
> -       /*
> -        * All zonelists will be built later in start_kernel() after per cpu
> -        * areas are initialized.
> -        */
> -}
> -
>  /*
>   * Setup early cpu_to_node.
>   *
> @@ -763,9 +763,6 @@ void __init init_cpu_to_node(void)
>                 if (node == NUMA_NO_NODE)
>                         continue;
>
> -               if (!node_online(node))
> -                       init_memory_less_node(node);
> -
>                 numa_set_node(cpu, node);
>         }
>  }
> --
> 2.19.2
>
Hi Michal,

As I mentioned in my previous email, I have manually apply the patch,
and the patch can not work for normal bootup. Your new patch seems to
have no essential changes, I applied it and had a try. It does not
work yet.

Thanks,
Pingfan

  reply	other threads:[~2018-12-07 13:20 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04  3:05 [PATCH] mm/alloc: fallback to first node if the wanted node offline Pingfan Liu
2018-12-04  3:53 ` David Rientjes
2018-12-04  7:16   ` Pingfan Liu
2018-12-05  5:49     ` Pingfan Liu
2018-12-05 19:00       ` David Rientjes
2018-12-04  6:54 ` Wei Yang
2018-12-04  7:20   ` Pingfan Liu
2018-12-04  8:34     ` Wei Yang
2018-12-04  8:52       ` Pingfan Liu
2018-12-04  9:09         ` Wei Yang
2018-12-05  5:50           ` Pingfan Liu
2018-12-04  7:22 ` Michal Hocko
2018-12-04  8:20   ` Pingfan Liu
2018-12-04  8:40     ` Wei Yang
2018-12-04  8:56       ` Pingfan Liu
2018-12-04  8:56     ` Michal Hocko
2018-12-04 14:42       ` Vlastimil Babka
2018-12-05  5:38       ` Pingfan Liu
2018-12-05  9:21         ` Michal Hocko
2018-12-05  9:29           ` Pingfan Liu
2018-12-05  9:40             ` Vlastimil Babka
2018-12-06  3:07               ` Pingfan Liu
2018-12-06  8:28                 ` Michal Hocko
2018-12-06 10:03                   ` Pingfan Liu
2018-12-06 10:44                     ` Pingfan Liu
2018-12-06 12:11                       ` Michal Hocko
2018-12-07  2:56                         ` Pingfan Liu
2018-12-07  7:53                           ` Michal Hocko
2018-12-07  9:40                             ` Pingfan Liu
2018-12-07 11:30                               ` Michal Hocko
2018-12-07 13:20                                 ` Pingfan Liu [this message]
2018-12-07 14:22                                   ` Michal Hocko
2018-12-07 14:27                                     ` Pingfan Liu
2018-12-07 14:50                                       ` Michal Hocko
2018-12-07 15:56                                       ` Michal Hocko
2018-12-10  4:00                                         ` Pingfan Liu
2018-12-10  7:57                                           ` Pingfan Liu
2018-12-10 12:37                                         ` Michal Hocko
2018-12-11  8:05                                           ` Pingfan Liu
2018-12-11  9:44                                             ` Michal Hocko
2018-12-12  8:33                                               ` Pingfan Liu
2018-12-12  8:31                                           ` Pingfan Liu
2018-12-12 11:53                                             ` Michal Hocko
2018-12-13  8:37                                               ` Pingfan Liu
2018-12-13  9:04                                                 ` Pingfan Liu
2018-12-17 13:29                                                   ` Michal Hocko
2018-12-20  7:19                                                     ` Pingfan Liu
2018-12-20  9:19                                                       ` Michal Hocko
2019-01-08 14:34                                                         ` Michal Hocko
2019-01-09  3:13                                                           ` Pingfan Liu
2019-01-11  3:12                                                           ` Pingfan Liu
2019-01-11  9:23                                                             ` Michal Hocko
2018-12-17 12:57                                                 ` Michal Hocko
2018-12-05  9:43             ` Michal Hocko
2018-12-06  3:34               ` Pingfan Liu
2018-12-06  7:23                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFgQCTuf95pJSWDc1BNQ=gN76aJ_dtxMRbAV9a28X6w8vapdMQ@mail.gmail.com' \
    --to=kernelfans@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).