All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Christopher Lameter <cl@linux.com>,
	Sachin Sant <sachinp@linux.vnet.ibm.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Linux-Next Mailing List <linux-next@vger.kernel.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9
Date: Thu, 27 Feb 2020 13:02:59 +0100	[thread overview]
Message-ID: <20200227120259.GD3771@dhcp22.suse.cz> (raw)
In-Reply-To: <c412ee69-80f9-b013-67d4-3b0a2f6aff7f@suse.cz>

On Wed 26-02-20 22:45:52, Vlastimil Babka wrote:
> On 2/26/20 7:41 PM, Michal Hocko wrote:
> > On Wed 26-02-20 18:25:28, Cristopher Lameter wrote:
> >> On Mon, 24 Feb 2020, Michal Hocko wrote:
> >>
> >>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently
> >>> from the page allocator?
> >>
> >> The page allocator will do the same thing if you pass GFP_THISNODE and
> >> insist on allocating memory from a node that does not exist.
> > 
> > I do not think that the page allocator would blow up even with
> > GFP_THISNODE. The allocation would just fail on memory less node.
> > 
> > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> > semantic right? At least I do not see anything like that documented
> > anywhere.
> 
> Seems like SLAB at least behaves like the page allocator. See
> ____cache_alloc_node() where it basically does:
> 
> page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid);
> ...
> if (!page)
> 	fallback_alloc(cachep, flags)
> 
> gfp_exact_node() adds __GFP_THISNODE among other things, so the initial
> attempt does try to stick only to the given node. But fallback_alloc()
> doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE
> then it wouldn't work as intended, as fallback_alloc() doesn't get the
> nodeid, but instead will use numa_mem_id(). That part could probably be
> improved.
> 
> SLUB's ___slab_alloc() has for example this:
> if (node != NUMA_NO_NODE && !node_present_pages(node))

Hmm, just a quick note. Shouldn't this be node_managed_pages? In most
cases the difference is negligible but I can imagine crazy setups where
all present pages are simply consumed.

>     searchnode = node_to_mem_node(node);
> 
> That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to
> node_to_mem_node() node if allocating on memoryless node"), suggesting
> that the scenario in this bug report should work. Perhaps it just got
> broken unintentionally later.

A very good reference. Thanks!

> And AFAICS the whole path leading to alloc_slab_page() also doesn't add
> __GFP_THISNODE, but will keep it if caller passed it, and ultimately it
> does:
> 
> 
> if (node == NUMA_NO_NODE)
>     page = alloc_pages(flags, order);
> else
>     page = __alloc_pages_node(node, flags, order);
> 
> So yeah looks like SLUB's kmalloc_node() is supposed to behave like the
> page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not
> enforce it by itself. There's probably just some missing data structure
> initialization somewhere right now for memoryless nodes.

Thanks for the confirmation!
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>,
	Pekka Enberg <penberg@kernel.org>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Linux-Next Mailing List <linux-next@vger.kernel.org>,
	David Rientjes <rientjes@google.com>,
	Christopher Lameter <cl@linux.com>,
	linuxppc-dev@lists.ozlabs.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9
Date: Thu, 27 Feb 2020 13:02:59 +0100	[thread overview]
Message-ID: <20200227120259.GD3771@dhcp22.suse.cz> (raw)
In-Reply-To: <c412ee69-80f9-b013-67d4-3b0a2f6aff7f@suse.cz>

On Wed 26-02-20 22:45:52, Vlastimil Babka wrote:
> On 2/26/20 7:41 PM, Michal Hocko wrote:
> > On Wed 26-02-20 18:25:28, Cristopher Lameter wrote:
> >> On Mon, 24 Feb 2020, Michal Hocko wrote:
> >>
> >>> Hmm, nasty. Is there any reason why kmalloc_node behaves differently
> >>> from the page allocator?
> >>
> >> The page allocator will do the same thing if you pass GFP_THISNODE and
> >> insist on allocating memory from a node that does not exist.
> > 
> > I do not think that the page allocator would blow up even with
> > GFP_THISNODE. The allocation would just fail on memory less node.
> > 
> > Besides that kmalloc_node shouldn't really have an implicit GFP_THISNODE
> > semantic right? At least I do not see anything like that documented
> > anywhere.
> 
> Seems like SLAB at least behaves like the page allocator. See
> ____cache_alloc_node() where it basically does:
> 
> page = cache_grow_begin(cachep, gfp_exact_node(flags), nodeid);
> ...
> if (!page)
> 	fallback_alloc(cachep, flags)
> 
> gfp_exact_node() adds __GFP_THISNODE among other things, so the initial
> attempt does try to stick only to the given node. But fallback_alloc()
> doesn't. In fact, even if kmalloc_node() was called with __GFP_THISNODE
> then it wouldn't work as intended, as fallback_alloc() doesn't get the
> nodeid, but instead will use numa_mem_id(). That part could probably be
> improved.
> 
> SLUB's ___slab_alloc() has for example this:
> if (node != NUMA_NO_NODE && !node_present_pages(node))

Hmm, just a quick note. Shouldn't this be node_managed_pages? In most
cases the difference is negligible but I can imagine crazy setups where
all present pages are simply consumed.

>     searchnode = node_to_mem_node(node);
> 
> That's from Joonsoo's 2014 commit a561ce00b09e ("slub: fall back to
> node_to_mem_node() node if allocating on memoryless node"), suggesting
> that the scenario in this bug report should work. Perhaps it just got
> broken unintentionally later.

A very good reference. Thanks!

> And AFAICS the whole path leading to alloc_slab_page() also doesn't add
> __GFP_THISNODE, but will keep it if caller passed it, and ultimately it
> does:
> 
> 
> if (node == NUMA_NO_NODE)
>     page = alloc_pages(flags, order);
> else
>     page = __alloc_pages_node(node, flags, order);
> 
> So yeah looks like SLUB's kmalloc_node() is supposed to behave like the
> page allocator's __alloc_pages_node() and respect __GFP_THISNODE but not
> enforce it by itself. There's probably just some missing data structure
> initialization somewhere right now for memoryless nodes.

Thanks for the confirmation!
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2020-02-27 12:03 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 10:45 [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9 Sachin Sant
2020-02-18 10:50 ` Kirill Tkhai
2020-02-18 10:50   ` Kirill Tkhai
2020-02-18 11:01   ` Kirill Tkhai
2020-02-18 11:01     ` Kirill Tkhai
2020-02-18 11:35     ` Kirill Tkhai
2020-02-18 11:35       ` Kirill Tkhai
2020-02-18 11:40     ` Sachin Sant
2020-02-18 11:55       ` Michal Hocko
2020-02-18 11:55         ` Michal Hocko
2020-02-18 14:00         ` Sachin Sant
2020-02-18 14:00           ` Sachin Sant
2020-02-18 14:26           ` Michal Hocko
2020-02-18 14:26             ` Michal Hocko
2020-02-18 15:11             ` Sachin Sant
2020-02-18 15:11               ` Sachin Sant
2020-02-18 15:24               ` Michal Hocko
2020-02-18 15:24                 ` Michal Hocko
2020-02-22  3:38                 ` Christopher Lameter
2020-02-22  3:38                   ` Christopher Lameter
2020-02-24  8:58                   ` Michal Hocko
2020-02-24  8:58                     ` Michal Hocko
2020-02-26 18:25                     ` Christopher Lameter
2020-02-26 18:25                       ` Christopher Lameter
2020-02-26 18:41                       ` Michal Hocko
2020-02-26 18:41                         ` Michal Hocko
2020-02-26 18:44                         ` Christopher Lameter
2020-02-26 18:44                           ` Christopher Lameter
2020-02-26 19:01                           ` Michal Hocko
2020-02-26 19:01                             ` Michal Hocko
2020-02-26 20:31                             ` David Rientjes
2020-02-26 20:31                               ` David Rientjes
2020-02-26 20:52                               ` Michal Hocko
2020-02-26 20:52                                 ` Michal Hocko
2020-02-26 21:45                         ` Vlastimil Babka
2020-02-26 21:45                           ` Vlastimil Babka
2020-02-26 22:29                           ` Vlastimil Babka
2020-02-26 22:29                             ` Vlastimil Babka
2020-02-27 12:12                             ` Michal Hocko
2020-02-27 12:12                               ` Michal Hocko
2020-02-27 16:00                               ` Sachin Sant
2020-02-27 16:00                                 ` Sachin Sant
2020-02-27 16:16                                 ` Vlastimil Babka
2020-02-27 18:26                                   ` Michal Hocko
2020-02-27 18:26                                     ` Michal Hocko
2020-03-10 15:01                                     ` Michal Hocko
2020-03-10 15:01                                       ` Michal Hocko
2020-03-12 12:18                                       ` Michael Ellerman
2020-03-12 12:18                                         ` Michael Ellerman
2020-03-12 16:51                                         ` Sachin Sant
2020-03-12 16:51                                           ` Sachin Sant
2020-03-13 10:48                                           ` Michael Ellerman
2020-03-13 10:48                                             ` Michael Ellerman
2020-03-13 11:12                                             ` Srikar Dronamraju
2020-03-13 11:12                                               ` Srikar Dronamraju
2020-03-13 11:35                                               ` Vlastimil Babka
2020-03-13 11:35                                                 ` Vlastimil Babka
2020-03-14  8:10                                                 ` Sachin Sant
2020-02-27 12:02                           ` Michal Hocko [this message]
2020-02-27 12:02                             ` Michal Hocko
2020-02-18 11:38   ` Sachin Sant
2020-02-18 11:53     ` Kirill Tkhai
2020-03-17 13:17 ` [PATCH 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
2020-03-17 13:17   ` Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
2020-03-17 13:17     ` Srikar Dronamraju
2020-03-17 13:37     ` Srikar Dronamraju
2020-03-17 13:37       ` Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
2020-03-17 13:17     ` Srikar Dronamraju
2020-03-17 13:34     ` Vlastimil Babka
2020-03-17 13:34       ` Vlastimil Babka
2020-03-17 13:45       ` Srikar Dronamraju
2020-03-17 13:45         ` Srikar Dronamraju
2020-03-17 13:53         ` Vlastimil Babka
2020-03-17 13:53           ` Vlastimil Babka
2020-03-17 14:51           ` Srikar Dronamraju
2020-03-17 14:51             ` Srikar Dronamraju
2020-03-17 15:29             ` Vlastimil Babka
2020-03-17 15:29               ` Vlastimil Babka
2020-03-18  7:29               ` Srikar Dronamraju
2020-03-18  7:29                 ` Srikar Dronamraju
2020-03-17 16:41       ` Srikar Dronamraju
2020-03-17 16:41         ` Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
2020-03-17 13:17     ` Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
2020-03-17 13:17     ` Srikar Dronamraju
2020-03-17 14:22     ` Bharata B Rao
2020-03-17 14:22       ` Bharata B Rao
2020-03-17 14:29       ` Srikar Dronamraju
2020-03-17 14:29         ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200227120259.GD3771@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=sachinp@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.