linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/2] harden alloc_pages against bogus nid
@ 2018-08-01 20:04 Jeremy Linton
  2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jeremy Linton @ 2018-08-01 20:04 UTC (permalink / raw)
  To: linux-mm
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel, Jeremy Linton

The thread "avoid alloc memory on offline node"

https://lkml.org/lkml/2018/6/7/251

Asked at one point why the kzalloc_node was crashing rather than
returning memory from a valid node. The thread ended up fixing
the immediate causes of the crash but left open the case of bad
proximity values being in DSDT tables without corrisponding
SRAT/SLIT entries as is happening on another machine.

Its also easy to fix that, but we should also harden the allocator
sufficiently that it doesn't crash when passed an invalid node id.
There are a couple possible ways to do this, and i've attached two
separate patches which individually fix that problem.

The first detects the offline node before calling
the new_slab code path when it becomes apparent that the allocation isn't
going to succeed. The second actually hardens node_zonelist() and
prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL
zonelist. This latter case happens if the node has never been initialized
or is possibly out of range. There are other places (NODE_DATA &
online_node) which should be checking if the node id's are > MAX_NUMNODES.

Jeremy Linton (2):
  slub: Avoid trying to allocate memory on offline nodes
  mm: harden alloc_pages code paths against bogus nodes

 include/linux/gfp.h | 2 ++
 mm/page_alloc.c     | 2 ++
 mm/slub.c           | 2 ++
 3 files changed, 6 insertions(+)

-- 
2.14.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes
  2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton
@ 2018-08-01 20:04 ` Jeremy Linton
  2018-08-02  9:15   ` Michal Hocko
  2018-08-02 14:23   ` Christopher Lameter
  2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton
  2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton
  2 siblings, 2 replies; 15+ messages in thread
From: Jeremy Linton @ 2018-08-01 20:04 UTC (permalink / raw)
  To: linux-mm
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel, Jeremy Linton

If a user calls the *alloc_node() functions with an invalid node
its possible to crash in alloc_pages_nodemask because NODE_DATA()
returns a bad node, which propogates into the node zonelist in
prepare_alloc_pages. This avoids that by not trying to allocate
new slabs against offline nodes.

(example backtrace)

  __alloc_pages_nodemask+0x128/0xf48
  allocate_slab+0x94/0x528
  new_slab+0x68/0xc8
  ___slab_alloc+0x44c/0x520
  __slab_alloc+0x50/0x68
  kmem_cache_alloc_node_trace+0xe0/0x230

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 mm/slub.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 51258eff4178..e03719bac1e2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		if (unlikely(!node_match(page, searchnode))) {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
+			if (!node_online(searchnode))
+				node = NUMA_NO_NODE;
 			goto new_slab;
 		}
 	}
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes
  2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton
  2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
@ 2018-08-01 20:04 ` Jeremy Linton
  2018-08-02  7:31   ` Michal Hocko
  2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton
  2 siblings, 1 reply; 15+ messages in thread
From: Jeremy Linton @ 2018-08-01 20:04 UTC (permalink / raw)
  To: linux-mm
  Cc: cl, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel, Jeremy Linton

Its possible to crash __alloc_pages_nodemask by passing it
bogus node ids. This is caused by NODE_DATA() returning null
(hopefully) when the requested node is offline. We can
harded against the basic case of a mostly valid node, that
isn't online by checking for null and failing prepare_alloc_pages.

But this then suggests we should also harden NODE_DATA() like this

#define NODE_DATA(nid)         ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL)

eventually this starts to add a bunch of generally uneeded checks
in some code paths that are called quite frequently.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
---
 include/linux/gfp.h | 2 ++
 mm/page_alloc.c     | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index a6afcec53795..17d70271c42e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -436,6 +436,8 @@ static inline int gfp_zonelist(gfp_t flags)
  */
 static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
 {
+	if (unlikely(!NODE_DATA(nid))) //VM_WARN_ON?
+		return NULL;
 	return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a790ef4be74e..3a3d9ac2662a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4306,6 +4306,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 {
 	ac->high_zoneidx = gfp_zone(gfp_mask);
 	ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
+	if (!ac->zonelist)
+		return false;
 	ac->nodemask = nodemask;
 	ac->migratetype = gfpflags_to_migratetype(gfp_mask);
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC 0/2] harden alloc_pages against bogus nid
  2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton
  2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
  2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton
@ 2018-08-01 21:50 ` Andrew Morton
  2018-08-01 22:56   ` Jeremy Linton
  2 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-08-01 21:50 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

On Wed,  1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:

> The thread "avoid alloc memory on offline node"
> 
> https://lkml.org/lkml/2018/6/7/251
> 
> Asked at one point why the kzalloc_node was crashing rather than
> returning memory from a valid node. The thread ended up fixing
> the immediate causes of the crash but left open the case of bad
> proximity values being in DSDT tables without corrisponding
> SRAT/SLIT entries as is happening on another machine.
> 
> Its also easy to fix that, but we should also harden the allocator
> sufficiently that it doesn't crash when passed an invalid node id.
> There are a couple possible ways to do this, and i've attached two
> separate patches which individually fix that problem.
> 
> The first detects the offline node before calling
> the new_slab code path when it becomes apparent that the allocation isn't
> going to succeed. The second actually hardens node_zonelist() and
> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL
> zonelist. This latter case happens if the node has never been initialized
> or is possibly out of range. There are other places (NODE_DATA &
> online_node) which should be checking if the node id's are > MAX_NUMNODES.
> 

What is it that leads to a caller requesting memory from an invalid
node?  A race against offlining?  If so then that's a lack of
appropriate locking, isn't it?

I don't see a problem with emitting a warning and then selecting a
different node so we can keep running.  But we do want that warning, so
we can understand the root cause and fix it?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/2] harden alloc_pages against bogus nid
  2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton
@ 2018-08-01 22:56   ` Jeremy Linton
  2018-08-02  0:14     ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Linton @ 2018-08-01 22:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

Hi,

On 08/01/2018 04:50 PM, Andrew Morton wrote:
> On Wed,  1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:
> 
>> The thread "avoid alloc memory on offline node"
>>
>> https://lkml.org/lkml/2018/6/7/251
>>
>> Asked at one point why the kzalloc_node was crashing rather than
>> returning memory from a valid node. The thread ended up fixing
>> the immediate causes of the crash but left open the case of bad
>> proximity values being in DSDT tables without corrisponding
>> SRAT/SLIT entries as is happening on another machine.
>>
>> Its also easy to fix that, but we should also harden the allocator
>> sufficiently that it doesn't crash when passed an invalid node id.
>> There are a couple possible ways to do this, and i've attached two
>> separate patches which individually fix that problem.
>>
>> The first detects the offline node before calling
>> the new_slab code path when it becomes apparent that the allocation isn't
>> going to succeed. The second actually hardens node_zonelist() and
>> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL
>> zonelist. This latter case happens if the node has never been initialized
>> or is possibly out of range. There are other places (NODE_DATA &
>> online_node) which should be checking if the node id's are > MAX_NUMNODES.
>>
> 
> What is it that leads to a caller requesting memory from an invalid
> node?  A race against offlining?  If so then that's a lack of
> appropriate locking, isn't it?

There were a couple unrelated cases, both having to do with the PXN 
associated with a PCI port. The first case AFAIK, the domain wasn't 
really invalid if the entire SRAT was parsed and nodes created even when 
there weren't associated CPUs. The second case (a different machine) is 
simply a PXN value that is completely invalid (no associated 
SLIT/SRAT/etc entries) due to firmware making a mistake when a socket 
isn't populated.

There have been a few other suggested or merged patches for the 
individual problems above, this set is just an attempt at avoiding a 
full crash if/when another similar problem happens.


> 
> I don't see a problem with emitting a warning and then selecting a
> different node so we can keep running.  But we do want that warning, so
> we can understand the root cause and fix it?

Yes, we do want to know when an invalid id is passed, i will add the 
VM_WARN in the first one.

The second one I wasn't sure about as failing prepare_alloc_pages() 
generates a couple of error messages, but the system then continues 
operation.

I guess my question though is which method (or both/something else?) is 
the preferred way to harden this up?

Thanks for looking at this.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/2] harden alloc_pages against bogus nid
  2018-08-01 22:56   ` Jeremy Linton
@ 2018-08-02  0:14     ` Andrew Morton
  2018-08-03  3:15       ` Jeremy Linton
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-08-02  0:14 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

On Wed, 1 Aug 2018 17:56:46 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:

> Hi,
> 
> On 08/01/2018 04:50 PM, Andrew Morton wrote:
> > On Wed,  1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:
> > 
> >> The thread "avoid alloc memory on offline node"
> >>
> >> https://lkml.org/lkml/2018/6/7/251
> >>
> >> Asked at one point why the kzalloc_node was crashing rather than
> >> returning memory from a valid node. The thread ended up fixing
> >> the immediate causes of the crash but left open the case of bad
> >> proximity values being in DSDT tables without corrisponding
> >> SRAT/SLIT entries as is happening on another machine.
> >>
> >> Its also easy to fix that, but we should also harden the allocator
> >> sufficiently that it doesn't crash when passed an invalid node id.
> >> There are a couple possible ways to do this, and i've attached two
> >> separate patches which individually fix that problem.
> >>
> >> The first detects the offline node before calling
> >> the new_slab code path when it becomes apparent that the allocation isn't
> >> going to succeed. The second actually hardens node_zonelist() and
> >> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL
> >> zonelist. This latter case happens if the node has never been initialized
> >> or is possibly out of range. There are other places (NODE_DATA &
> >> online_node) which should be checking if the node id's are > MAX_NUMNODES.
> >>
> > 
> > What is it that leads to a caller requesting memory from an invalid
> > node?  A race against offlining?  If so then that's a lack of
> > appropriate locking, isn't it?
> 
> There were a couple unrelated cases, both having to do with the PXN 
> associated with a PCI port. The first case AFAIK, the domain wasn't 
> really invalid if the entire SRAT was parsed and nodes created even when 
> there weren't associated CPUs. The second case (a different machine) is 
> simply a PXN value that is completely invalid (no associated 
> SLIT/SRAT/etc entries) due to firmware making a mistake when a socket 
> isn't populated.
> 
> There have been a few other suggested or merged patches for the 
> individual problems above, this set is just an attempt at avoiding a 
> full crash if/when another similar problem happens.

Please add the above info to the changelog.

> 
> > 
> > I don't see a problem with emitting a warning and then selecting a
> > different node so we can keep running.  But we do want that warning, so
> > we can understand the root cause and fix it?
> 
> Yes, we do want to know when an invalid id is passed, i will add the 
> VM_WARN in the first one.
> 
> The second one I wasn't sure about as failing prepare_alloc_pages() 
> generates a couple of error messages, but the system then continues 
> operation.
> 
> I guess my question though is which method (or both/something else?) is 
> the preferred way to harden this up?

The first patch looked neater.  Can we get a WARN_ON in there as well?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes
  2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton
@ 2018-08-02  7:31   ` Michal Hocko
  2018-08-03  3:17     ` Jeremy Linton
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-08-02  7:31 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

On Wed 01-08-18 15:04:18, Jeremy Linton wrote:
> Its possible to crash __alloc_pages_nodemask by passing it
> bogus node ids. This is caused by NODE_DATA() returning null
> (hopefully) when the requested node is offline. We can
> harded against the basic case of a mostly valid node, that
> isn't online by checking for null and failing prepare_alloc_pages.
> 
> But this then suggests we should also harden NODE_DATA() like this
> 
> #define NODE_DATA(nid)         ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL)
> 
> eventually this starts to add a bunch of generally uneeded checks
> in some code paths that are called quite frequently.

But the page allocator is really a hot path and people will not be happy
to have yet another branch there. No code should really use invalid numa
node ids in the first place.

If I remember those bugs correctly then it was the arch code which was
doing something wrong. I would prefer that code to be fixed instead.

> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
> ---
>  include/linux/gfp.h | 2 ++
>  mm/page_alloc.c     | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index a6afcec53795..17d70271c42e 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -436,6 +436,8 @@ static inline int gfp_zonelist(gfp_t flags)
>   */
>  static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
>  {
> +	if (unlikely(!NODE_DATA(nid))) //VM_WARN_ON?
> +		return NULL;
>  	return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
>  }
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a790ef4be74e..3a3d9ac2662a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4306,6 +4306,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>  {
>  	ac->high_zoneidx = gfp_zone(gfp_mask);
>  	ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
> +	if (!ac->zonelist)
> +		return false;
>  	ac->nodemask = nodemask;
>  	ac->migratetype = gfpflags_to_migratetype(gfp_mask);
>  
> -- 
> 2.14.3
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes
  2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
@ 2018-08-02  9:15   ` Michal Hocko
  2018-08-03  3:21     ` Jeremy Linton
  2018-08-02 14:23   ` Christopher Lameter
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2018-08-02  9:15 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

On Wed 01-08-18 15:04:17, Jeremy Linton wrote:
[...]
> @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  		if (unlikely(!node_match(page, searchnode))) {
>  			stat(s, ALLOC_NODE_MISMATCH);
>  			deactivate_slab(s, page, c->freelist, c);
> +			if (!node_online(searchnode))
> +				node = NUMA_NO_NODE;
>  			goto new_slab;

This is inherently racy. Numa node can get offline at any point after
you check it here. Making it race free would involve some sort of
locking and I am not really convinced this is a good idea.

>  		}
>  	}
> -- 
> 2.14.3
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes
  2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
  2018-08-02  9:15   ` Michal Hocko
@ 2018-08-02 14:23   ` Christopher Lameter
  2018-08-03  3:12     ` Jeremy Linton
  1 sibling, 1 reply; 15+ messages in thread
From: Christopher Lameter @ 2018-08-02 14:23 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko,
	vbabka, Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel,
	bhelgaas, linux-kernel

On Wed, 1 Aug 2018, Jeremy Linton wrote:

> diff --git a/mm/slub.c b/mm/slub.c
> index 51258eff4178..e03719bac1e2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  		if (unlikely(!node_match(page, searchnode))) {
>  			stat(s, ALLOC_NODE_MISMATCH);
>  			deactivate_slab(s, page, c->freelist, c);
> +			if (!node_online(searchnode))
> +				node = NUMA_NO_NODE;
>  			goto new_slab;
>  		}
>  	}
>

Would it not be better to implement this check in the page allocator?
There is also the issue of how to fallback to the nearest node.

NUMA_NO_NODE should fallback to the current memory allocation policy but
it seems by inserting it here you would end up just with the default node
for the processor.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes
  2018-08-02 14:23   ` Christopher Lameter
@ 2018-08-03  3:12     ` Jeremy Linton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremy Linton @ 2018-08-03  3:12 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: linux-mm, penberg, rientjes, iamjoonsoo.kim, akpm, mhocko,
	vbabka, Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel,
	bhelgaas, linux-kernel

Hi,

On 08/02/2018 09:23 AM, Christopher Lameter wrote:
> On Wed, 1 Aug 2018, Jeremy Linton wrote:
> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 51258eff4178..e03719bac1e2 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   		if (unlikely(!node_match(page, searchnode))) {
>>   			stat(s, ALLOC_NODE_MISMATCH);
>>   			deactivate_slab(s, page, c->freelist, c);
>> +			if (!node_online(searchnode))
>> +				node = NUMA_NO_NODE;
>>   			goto new_slab;
>>   		}
>>   	}
>>
> 
> Would it not be better to implement this check in the page allocator?
> There is also the issue of how to fallback to the nearest node.

Possibly? Falling back to the nearest node though, should be handled if 
memory-less nodes is enabled, which in the problematic case isn't.

> 
> NUMA_NO_NODE should fallback to the current memory allocation policy but
> it seems by inserting it here you would end up just with the default node
> for the processor.

I picked this spot (compared to 2/2) because a number of paths are 
funneling through here, and in this case it shouldn't be a very hot path.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 0/2] harden alloc_pages against bogus nid
  2018-08-02  0:14     ` Andrew Morton
@ 2018-08-03  3:15       ` Jeremy Linton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeremy Linton @ 2018-08-03  3:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, mhocko, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

Hi,

On 08/01/2018 07:14 PM, Andrew Morton wrote:
> On Wed, 1 Aug 2018 17:56:46 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:
> 
>> Hi,
>>
>> On 08/01/2018 04:50 PM, Andrew Morton wrote:
>>> On Wed,  1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:
>>>
>>>> The thread "avoid alloc memory on offline node"
>>>>
>>>> https://lkml.org/lkml/2018/6/7/251
>>>>
>>>> Asked at one point why the kzalloc_node was crashing rather than
>>>> returning memory from a valid node. The thread ended up fixing
>>>> the immediate causes of the crash but left open the case of bad
>>>> proximity values being in DSDT tables without corrisponding
>>>> SRAT/SLIT entries as is happening on another machine.
>>>>
>>>> Its also easy to fix that, but we should also harden the allocator
>>>> sufficiently that it doesn't crash when passed an invalid node id.
>>>> There are a couple possible ways to do this, and i've attached two
>>>> separate patches which individually fix that problem.
>>>>
>>>> The first detects the offline node before calling
>>>> the new_slab code path when it becomes apparent that the allocation isn't
>>>> going to succeed. The second actually hardens node_zonelist() and
>>>> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL
>>>> zonelist. This latter case happens if the node has never been initialized
>>>> or is possibly out of range. There are other places (NODE_DATA &
>>>> online_node) which should be checking if the node id's are > MAX_NUMNODES.
>>>>
>>>
>>> What is it that leads to a caller requesting memory from an invalid
>>> node?  A race against offlining?  If so then that's a lack of
>>> appropriate locking, isn't it?
>>
>> There were a couple unrelated cases, both having to do with the PXN
>> associated with a PCI port. The first case AFAIK, the domain wasn't
>> really invalid if the entire SRAT was parsed and nodes created even when
>> there weren't associated CPUs. The second case (a different machine) is
>> simply a PXN value that is completely invalid (no associated
>> SLIT/SRAT/etc entries) due to firmware making a mistake when a socket
>> isn't populated.
>>
>> There have been a few other suggested or merged patches for the
>> individual problems above, this set is just an attempt at avoiding a
>> full crash if/when another similar problem happens.
> 
> Please add the above info to the changelog.

Sure.

> 
>>
>>>
>>> I don't see a problem with emitting a warning and then selecting a
>>> different node so we can keep running.  But we do want that warning, so
>>> we can understand the root cause and fix it?
>>
>> Yes, we do want to know when an invalid id is passed, i will add the
>> VM_WARN in the first one.
>>
>> The second one I wasn't sure about as failing prepare_alloc_pages()
>> generates a couple of error messages, but the system then continues
>> operation.
>>
>> I guess my question though is which method (or both/something else?) is
>> the preferred way to harden this up?
> 
> The first patch looked neater.  Can we get a WARN_ON in there as well?
> 

Yes,

Thanks,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes
  2018-08-02  7:31   ` Michal Hocko
@ 2018-08-03  3:17     ` Jeremy Linton
  2018-08-03  6:24       ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Linton @ 2018-08-03  3:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

Hi,

On 08/02/2018 02:31 AM, Michal Hocko wrote:
> On Wed 01-08-18 15:04:18, Jeremy Linton wrote:
>> Its possible to crash __alloc_pages_nodemask by passing it
>> bogus node ids. This is caused by NODE_DATA() returning null
>> (hopefully) when the requested node is offline. We can
>> harded against the basic case of a mostly valid node, that
>> isn't online by checking for null and failing prepare_alloc_pages.
>>
>> But this then suggests we should also harden NODE_DATA() like this
>>
>> #define NODE_DATA(nid)         ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL)
>>
>> eventually this starts to add a bunch of generally uneeded checks
>> in some code paths that are called quite frequently.
> 
> But the page allocator is really a hot path and people will not be happy
> to have yet another branch there. No code should really use invalid numa
> node ids in the first place.
> 
> If I remember those bugs correctly then it was the arch code which was
> doing something wrong. I would prefer that code to be fixed instead.

Yes, I think the consensus is that 2/2 should be dropped.

The arch code is being fixed (both cases) this patch set is just an 
attempt to harden this code path against future failures like that so 
that we get some warnings/ugly messages rather than early boot failures.

Thanks,



>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
>> ---
>>   include/linux/gfp.h | 2 ++
>>   mm/page_alloc.c     | 2 ++
>>   2 files changed, 4 insertions(+)
>>
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index a6afcec53795..17d70271c42e 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -436,6 +436,8 @@ static inline int gfp_zonelist(gfp_t flags)
>>    */
>>   static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
>>   {
>> +	if (unlikely(!NODE_DATA(nid))) //VM_WARN_ON?
>> +		return NULL;
>>   	return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
>>   }
>>   
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a790ef4be74e..3a3d9ac2662a 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4306,6 +4306,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>>   {
>>   	ac->high_zoneidx = gfp_zone(gfp_mask);
>>   	ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
>> +	if (!ac->zonelist)
>> +		return false;
>>   	ac->nodemask = nodemask;
>>   	ac->migratetype = gfpflags_to_migratetype(gfp_mask);
>>   
>> -- 
>> 2.14.3
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes
  2018-08-02  9:15   ` Michal Hocko
@ 2018-08-03  3:21     ` Jeremy Linton
  2018-08-03  6:20       ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Jeremy Linton @ 2018-08-03  3:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

Hi,

On 08/02/2018 04:15 AM, Michal Hocko wrote:
> On Wed 01-08-18 15:04:17, Jeremy Linton wrote:
> [...]
>> @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   		if (unlikely(!node_match(page, searchnode))) {
>>   			stat(s, ALLOC_NODE_MISMATCH);
>>   			deactivate_slab(s, page, c->freelist, c);
>> +			if (!node_online(searchnode))
>> +				node = NUMA_NO_NODE;
>>   			goto new_slab;
> 
> This is inherently racy. Numa node can get offline at any point after
> you check it here. Making it race free would involve some sort of
> locking and I am not really convinced this is a good idea.

I spent some time looking/thinking about this, and i'm pretty sure its 
not creating any new problems. But OTOH, I think the node_online() check 
is probably a bit misleading as what we really want to assure is that 
node<MAX_NUMNODES and that there is going to be a valid entry in 
NODE_DATA() so we don't deference null.


> 
>>   		}
>>   	}
>> -- 
>> 2.14.3
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes
  2018-08-03  3:21     ` Jeremy Linton
@ 2018-08-03  6:20       ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2018-08-03  6:20 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

On Thu 02-08-18 22:21:53, Jeremy Linton wrote:
> Hi,
> 
> On 08/02/2018 04:15 AM, Michal Hocko wrote:
> > On Wed 01-08-18 15:04:17, Jeremy Linton wrote:
> > [...]
> > > @@ -2519,6 +2519,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> > >   		if (unlikely(!node_match(page, searchnode))) {
> > >   			stat(s, ALLOC_NODE_MISMATCH);
> > >   			deactivate_slab(s, page, c->freelist, c);
> > > +			if (!node_online(searchnode))
> > > +				node = NUMA_NO_NODE;
> > >   			goto new_slab;
> > 
> > This is inherently racy. Numa node can get offline at any point after
> > you check it here. Making it race free would involve some sort of
> > locking and I am not really convinced this is a good idea.
> 
> I spent some time looking/thinking about this, and i'm pretty sure its not
> creating any new problems. But OTOH, I think the node_online() check is
> probably a bit misleading as what we really want to assure is that
> node<MAX_NUMNODES and that there is going to be a valid entry in NODE_DATA()
> so we don't deference null.

Exactly. And we do rely that the user of the allocator doesn't really
use bogus parameters. This is not a function to be used for untrusted or
unsanitized inputs.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes
  2018-08-03  3:17     ` Jeremy Linton
@ 2018-08-03  6:24       ` Michal Hocko
  0 siblings, 0 replies; 15+ messages in thread
From: Michal Hocko @ 2018-08-03  6:24 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-mm, cl, penberg, rientjes, iamjoonsoo.kim, akpm, vbabka,
	Punit.Agrawal, Lorenzo.Pieralisi, linux-arm-kernel, bhelgaas,
	linux-kernel

On Thu 02-08-18 22:17:49, Jeremy Linton wrote:
> Hi,
> 
> On 08/02/2018 02:31 AM, Michal Hocko wrote:
> > On Wed 01-08-18 15:04:18, Jeremy Linton wrote:
> > > Its possible to crash __alloc_pages_nodemask by passing it
> > > bogus node ids. This is caused by NODE_DATA() returning null
> > > (hopefully) when the requested node is offline. We can
> > > harded against the basic case of a mostly valid node, that
> > > isn't online by checking for null and failing prepare_alloc_pages.
> > > 
> > > But this then suggests we should also harden NODE_DATA() like this
> > > 
> > > #define NODE_DATA(nid)         ( (nid) < MAX_NUMNODES ? node_data[(nid)] : NULL)
> > > 
> > > eventually this starts to add a bunch of generally uneeded checks
> > > in some code paths that are called quite frequently.
> > 
> > But the page allocator is really a hot path and people will not be happy
> > to have yet another branch there. No code should really use invalid numa
> > node ids in the first place.
> > 
> > If I remember those bugs correctly then it was the arch code which was
> > doing something wrong. I would prefer that code to be fixed instead.
> 
> Yes, I think the consensus is that 2/2 should be dropped.
> 
> The arch code is being fixed (both cases) this patch set is just an attempt
> to harden this code path against future failures like that so that we get
> some warnings/ugly messages rather than early boot failures.

Hmm, this is a completely different story. We do have VM_{BUG,WARN}_ON
which are noops for most configurations. It is primarily meant to be
enabled for developers or special debug kernels. If you have an example
when such an early splat in the log would safe a lot of head scratching
then this would sound like a reasonable justification to add
	VM_WARN_ON(!NODE_DATA(nid))
into the page allocator, me thinks. But considering that would should
get NULL ptr splat anyway then I am not really so sure. But maybe we are
in a context where warning would get into the log while a blow up would
just make the whole machine silent...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-08-03  6:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton
2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
2018-08-02  9:15   ` Michal Hocko
2018-08-03  3:21     ` Jeremy Linton
2018-08-03  6:20       ` Michal Hocko
2018-08-02 14:23   ` Christopher Lameter
2018-08-03  3:12     ` Jeremy Linton
2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton
2018-08-02  7:31   ` Michal Hocko
2018-08-03  3:17     ` Jeremy Linton
2018-08-03  6:24       ` Michal Hocko
2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton
2018-08-01 22:56   ` Jeremy Linton
2018-08-02  0:14     ` Andrew Morton
2018-08-03  3:15       ` Jeremy Linton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).