linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeremy Linton <jeremy.linton@arm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com, mhocko@suse.com,
	vbabka@suse.cz, Punit.Agrawal@arm.com, Lorenzo.Pieralisi@arm.com,
	linux-arm-kernel@lists.infradead.org, bhelgaas@google.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 0/2] harden alloc_pages against bogus nid
Date: Thu, 2 Aug 2018 22:15:10 -0500	[thread overview]
Message-ID: <a38071b1-6068-c5ca-d408-736bdb8e7073@arm.com> (raw)
In-Reply-To: <20180801171414.30e54a106733ccaaa566388d@linux-foundation.org>

Hi,

On 08/01/2018 07:14 PM, Andrew Morton wrote:
> On Wed, 1 Aug 2018 17:56:46 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:
> 
>> Hi,
>>
>> On 08/01/2018 04:50 PM, Andrew Morton wrote:
>>> On Wed,  1 Aug 2018 15:04:16 -0500 Jeremy Linton <jeremy.linton@arm.com> wrote:
>>>
>>>> The thread "avoid alloc memory on offline node"
>>>>
>>>> https://lkml.org/lkml/2018/6/7/251
>>>>
>>>> Asked at one point why the kzalloc_node was crashing rather than
>>>> returning memory from a valid node. The thread ended up fixing
>>>> the immediate causes of the crash but left open the case of bad
>>>> proximity values being in DSDT tables without corrisponding
>>>> SRAT/SLIT entries as is happening on another machine.
>>>>
>>>> Its also easy to fix that, but we should also harden the allocator
>>>> sufficiently that it doesn't crash when passed an invalid node id.
>>>> There are a couple possible ways to do this, and i've attached two
>>>> separate patches which individually fix that problem.
>>>>
>>>> The first detects the offline node before calling
>>>> the new_slab code path when it becomes apparent that the allocation isn't
>>>> going to succeed. The second actually hardens node_zonelist() and
>>>> prepare_alloc_pages() in the face of NODE_DATA(nid) returning a NULL
>>>> zonelist. This latter case happens if the node has never been initialized
>>>> or is possibly out of range. There are other places (NODE_DATA &
>>>> online_node) which should be checking if the node id's are > MAX_NUMNODES.
>>>>
>>>
>>> What is it that leads to a caller requesting memory from an invalid
>>> node?  A race against offlining?  If so then that's a lack of
>>> appropriate locking, isn't it?
>>
>> There were a couple unrelated cases, both having to do with the PXN
>> associated with a PCI port. The first case AFAIK, the domain wasn't
>> really invalid if the entire SRAT was parsed and nodes created even when
>> there weren't associated CPUs. The second case (a different machine) is
>> simply a PXN value that is completely invalid (no associated
>> SLIT/SRAT/etc entries) due to firmware making a mistake when a socket
>> isn't populated.
>>
>> There have been a few other suggested or merged patches for the
>> individual problems above, this set is just an attempt at avoiding a
>> full crash if/when another similar problem happens.
> 
> Please add the above info to the changelog.

Sure.

> 
>>
>>>
>>> I don't see a problem with emitting a warning and then selecting a
>>> different node so we can keep running.  But we do want that warning, so
>>> we can understand the root cause and fix it?
>>
>> Yes, we do want to know when an invalid id is passed, i will add the
>> VM_WARN in the first one.
>>
>> The second one I wasn't sure about as failing prepare_alloc_pages()
>> generates a couple of error messages, but the system then continues
>> operation.
>>
>> I guess my question though is which method (or both/something else?) is
>> the preferred way to harden this up?
> 
> The first patch looked neater.  Can we get a WARN_ON in there as well?
> 

Yes,

Thanks,

      reply	other threads:[~2018-08-03  3:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-01 20:04 [RFC 0/2] harden alloc_pages against bogus nid Jeremy Linton
2018-08-01 20:04 ` [RFC 1/2] slub: Avoid trying to allocate memory on offline nodes Jeremy Linton
2018-08-02  9:15   ` Michal Hocko
2018-08-03  3:21     ` Jeremy Linton
2018-08-03  6:20       ` Michal Hocko
2018-08-02 14:23   ` Christopher Lameter
2018-08-03  3:12     ` Jeremy Linton
2018-08-01 20:04 ` [RFC 2/2] mm: harden alloc_pages code paths against bogus nodes Jeremy Linton
2018-08-02  7:31   ` Michal Hocko
2018-08-03  3:17     ` Jeremy Linton
2018-08-03  6:24       ` Michal Hocko
2018-08-01 21:50 ` [RFC 0/2] harden alloc_pages against bogus nid Andrew Morton
2018-08-01 22:56   ` Jeremy Linton
2018-08-02  0:14     ` Andrew Morton
2018-08-03  3:15       ` Jeremy Linton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a38071b1-6068-c5ca-d408-736bdb8e7073@arm.com \
    --to=jeremy.linton@arm.com \
    --cc=Lorenzo.Pieralisi@arm.com \
    --cc=Punit.Agrawal@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).