All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	mhocko@kernel.org, peterz@infradead.org, robin.murphy@arm.com,
	geert@linux-m68k.org, gregkh@linuxfoundation.org,
	paul.burton@mips.com
Subject: Re: [PATCH] PCI: Warn about host bridge device when its numa node is NO_NODE
Date: Tue, 22 Oct 2019 16:04:20 -0500	[thread overview]
Message-ID: <20191022210420.GA17717@google.com> (raw)
In-Reply-To: <1571467543-26125-1-git-send-email-linyunsheng@huawei.com>

On Sat, Oct 19, 2019 at 02:45:43PM +0800, Yunsheng Lin wrote:
> As the disscusion in [1]:

We need to justify this patch right here in the commit log, not with a
pointer to a 50+ message email thread.

> A PCI device really _MUST_ have a node assigned. 

No, it's not really essential.  It's *nice* if we know the node
closest to a PCI device, but the system should function correctly even
if we don't.  The only problem is that it will be slower.

I think the underlying problem you're addressing is that:

  - NUMA_NO_NODE == -1,
  - dev_to_node(dev) may return NUMA_NO_NODE,
  - kmalloc(dev) relies on cpumask_of_node(dev_to_node(dev)), and
  - cpumask_of_node(NUMA_NO_NODE) makes an invalid array reference

For example, on arm64, mips loongson, s390, and x86,
cpumask_of_node(node) returns "node_to_cpumask_map[node]", and -1 is
an invalid array index.

That problem can't be solved by emitting a warning, of course.  I
assume some variation of your "numa: make node_to_cpumask_map()
NUMA_NO_NODE aware" patch [a] will solve that problem.

[a] https://lore.kernel.org/linux-mips/1568535656-158979-1-git-send-email-linyunsheng@huawei.com/T/#u

It is probably a good idea to emit a warning about the performance
issue.

When I run your patch on qemu, I see this:

  ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
  acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
  acpi PNP0A08:00: _OSC: platform does not support [LTR]
  acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability]
  PCI host bridge to bus 0000:00
  pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
  pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
  pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
  pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
  pci_bus 0000:00: root bus resource [mem 0x100000000-0x8ffffffff window]
  pci_bus 0000:00: root bus resource [bus 00-ff]
   pci0000:00: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.

I didn't debug it to see what's wrong with the " pci0000:00" string.
Ideally it would be connected with "acpi PNP0A08:00" since that's the
place where BIOS would make a fix but I suppose "pci_bus 0000:00"
would be adequate.

> It is possible to
> have a PCI bridge shared between two nodes, such that the PCI
> devices have equidistance. But the moment you scale this out, you
> either get devices that are 'local' to a package while having
> multiple packages, or if you maintain a single bridge in a big
> system, things become so slow it all doesn't matter anyway.
> Assigning a node (one of the shared) is, in the generic ase of
> multiple packages, the better solution over assigning all nodes.
> 
> As pci_device_add() will assign the pci device' node according to
> the bus the device is on, which is decided by pcibus_to_node().
> Currently different arch may implement the pcibus_to_node() based
> on bus->sysdata or bus device' node, which has the same node as
> the bridge device.
>
> And for devices behind another bridge case, the child bus device
> is setup with proper parent bus device and inherit its parent'
> sysdata in pci_alloc_child_bus(), so the pcie device under the
> child bus should have the same node as the parent bridge when
> device_add() is called, which will set the node to its parent's
> node when the child device' node is NUMA_NO_NODE.
> 
> So this patch only warns about the case when a host bridge device
> is registered with a node of NO_NODE in pci_register_host_bridge().
> And it only warns about that when there are more than one numa
> nodes in the system.


> [1] https://lore.kernel.org/lkml/1568724534-146242-1-git-send-email-linyunsheng@huawei.com/
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  drivers/pci/probe.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 3d5271a..22be96a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -927,6 +927,9 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>  	list_add_tail(&bus->node, &pci_root_buses);
>  	up_write(&pci_bus_sem);
>  
> +	if (nr_node_ids > 1 && dev_to_node(bus->bridge) == NUMA_NO_NODE)
> +		dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.\n");
> +
>  	return 0;
>  
>  unregister:

  parent reply	other threads:[~2019-10-22 21:04 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-19  6:45 [PATCH] PCI: Warn about host bridge device when its numa node is NO_NODE Yunsheng Lin
2019-10-19  8:34 ` Christoph Hellwig
2019-10-21  4:05   ` Yunsheng Lin
2019-10-22 13:55     ` Robin Murphy
2019-10-23  8:24       ` Yunsheng Lin
2019-10-22 21:04 ` Bjorn Helgaas [this message]
2019-10-23  8:22   ` Yunsheng Lin
2019-10-23 17:10     ` Bjorn Helgaas
2019-10-24  9:20       ` Michal Hocko
2019-10-24 17:40         ` Bjorn Helgaas
2019-10-25  8:16           ` Michal Hocko
2019-10-25  8:51             ` Yunsheng Lin
2019-10-24  9:39       ` Yunsheng Lin
2019-10-24 10:16       ` Robin Murphy
2019-10-25 12:51         ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022210420.GA17717@google.com \
    --to=helgaas@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=paul.burton@mips.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.