Re: [PATCH V2] x86: Fix an issue with invalid ACPI NUMA config

From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: <linux-pci@vger.kernel.org>, <x86@kernel.org>,
	<helgaas@kernel.org>, <linuxarm@huawei.com>,
	Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>, <martin@geanix.com>
Subject: Re: [PATCH V2] x86: Fix an issue with invalid ACPI NUMA config
Date: Wed, 12 Dec 2018 09:39:14 +0000	[thread overview]
Message-ID: <20181212093914.00002aed@huawei.com> (raw)
In-Reply-To: <a5a938d3-ecc9-028a-3b28-610feda8f3f8@intel.com>

On Tue, 11 Dec 2018 10:19:49 -0800
Dave Hansen <dave.hansen@intel.com> wrote:

> On 12/11/18 1:47 AM, Jonathan Cameron wrote:
> > When the PCI code later comes along and calls acpi_get_node() for any PCI
> > card below the root port, it navigates up the ACPI tree until it finds the
> > _PXM value in the root port. This value is then passed to
> > acpi_map_pxm_to_node().
> > 
> > As numa_off has not been set on x86 it tries to allocate a NUMA node, from
> > the unused set, without setting up all the infrastructure that would
> > normally accompany such a call.   
> 
> FWIW, this _sounds_ like the real problem here.  We're allowing an
> allocation to proceed without some infrastructure that we require.
> Shouldn't we be detecting that this infrastructure is not in place and
> warn about *it* at least?
> 
> I'm a bit worried that this is just papering over an unknown error to
> make a hang go away.  It seems a bit too far away from the root cause.

I'm not totally convinced.  We are warning about it on the two lines just off the
top of this patch.

"No NUMA configuration found"
"Faking a node at [mem....]"

We are falling back to the exact same code paths as if you had deliberately
turned off NUMA at the command line with messages stating that is the case.
That approach seems to be safe and is consistent.

Now there is a potential corner here where I agree with you that it may
make sense to 'also' add protections in the acpi_map_pxm_to_node() path
which is that where we do have a valid NUMA configuration and along comes
a new device with a node outside of those that are defined,
(note there is a change coming in next ACPI precisely to work around a case
that causes this to validly happen when the OS sees some new features and
doesn't know what to do with them - it still relies on the ACPI tables
having the right magic in them though for the fallback to work - more
on that when the spec is out...).

One option would be to (in addition to this patch) add a new version of
acpi_get_node that will only give you a node that actually exists
and an error otherwise allowing code to fallback to to NO_NODE.

Other than the error we might be able to use acpi_map_pxm_to_online_node
for this, or call both acpi_map_pxm_to_node and acpi_map_pxm_to_online_node
and compare the answers to verify we are getting the node we want?

Jonathan