All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: <helgaas@kernel.org>, <linux-pci@vger.kernel.org>,
	<linuxarm@huawei.com>, Ingo Molnar <mingo@kernel.org>,
	<linux-acpi@vger.kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>, <x86@kernel.org>
Subject: Re: [PATCH] x86: Fix an issue with invalid ACPI numa config
Date: Tue, 20 Nov 2018 13:19:35 +0000	[thread overview]
Message-ID: <20181120131935.00001f3a@huawei.com> (raw)
In-Reply-To: <20181120120105.GI2131@hirez.programming.kicks-ass.net>

On Tue, 20 Nov 2018 13:01:05 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Nov 15, 2018 at 11:06:17AM +0000, Jonathan Cameron wrote:
> > The addition of support to read the numa node for a PCI
> > card specified by _PXM resulted in Martin's system not
> > booting.   Looking at the ACPI tables it seems that there
> > are PXM entries for the root ports, but no SRAT table.
> > 
> > The absence of SRAT table results in dummy_numa_init being
> > called.  However, unlike on arm64, this doesn't then result
> > in numa_off being set.  When the PCI code later comes along
> > and calls acpi_get_node for any PCI card below the root port,
> > it navigates up the ACPI tree until it finds the PXM value in
> > the root port. This value is then passed to
> > acpi_map_pxm_to_node.  If numa_off is set this returns,
> > NUMA_NO_NODE (as it does on arm64), on x86 it instead tries
> > to allocate a numa node from the unused set without setting
> > up all the infrastructure that would normally accompany such
> > a call.  We have not identified exactly which driver is
> > causing the subsequent hang for Martin.
> > 
> > It is invalid under the ACPI spec to specify new
> > numa nodes using PXM if they have no presence in SRAT.
> > Thus the simplest fix is to set numa_off when it is off due
> > to an invalid SRAT (here not present at all).
> > 
> > I do not have easy access to appropriate x86 numa systems so
> > would appreciate some testing of this one!
> > 
> > Known problem boards setups:
> > 
> > AMD Ryzen Threadripper 2950X on ASROCK X399 TAICHI
> > MSI X399 SLI PLUS (probably - not confirmed yet)
> > 
> > The PCI patch has been reverted, so this fix is not critical.
> > 
> > Reported-by: Martin Hundeb?ll <martin@geanix.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Fixes: bad7dcd94f39 ("ACPI/PCI: Pay attention to device-specific _PXM node values")
> > 
> > ---
> >  arch/x86/mm/numa.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> > index 1308f5408bf7..ce1182f953ff 100644
> > --- a/arch/x86/mm/numa.c
> > +++ b/arch/x86/mm/numa.c
> > @@ -695,6 +695,8 @@ static int __init dummy_numa_init(void)
> >  	node_set(0, numa_nodes_parsed);
> >  	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >  
> > +	numa_off = true;  
> 
> Should we not:
> 
> 	pr_err(FW_BUG "Invalid SRAT table.\n");
> 
> or something along those lines?
> 
> We should take every possibility to call out broken and non-compliant
> firmware.

While I agree we should definitely be calling this out nice and loud,
not having an SRAT isn't indicating a broken firmware as SRAT is optional
in ACPI.  The breakage only occurs much later when the DSDT contains a
PXM entry that doesn't correspond to any entries in SRAT.

To report at that point, we would need some background info stashed
on why numa_off was set.  It could be off because we set it so via
the kernel command line.

If people are happy with this general direction I'm happy to spin a
patch to do that state tracking and pr_err as a follow up.

Thanks,

Jonathan


  reply	other threads:[~2018-11-20 13:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-15 11:06 [PATCH] x86: Fix an issue with invalid ACPI numa config Jonathan Cameron
2018-11-15 11:09 ` Jonathan Cameron
2018-11-20 12:01 ` Peter Zijlstra
2018-11-20 13:19   ` Jonathan Cameron [this message]
2018-12-03 10:15     ` Jonathan Cameron
2018-12-10 23:56 ` Bjorn Helgaas
2018-12-11  9:47   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181120131935.00001f3a@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=helgaas@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.