LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: dalias@libc.org, linux-sh@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	catalin.marinas@arm.com, dave.hansen@linux.intel.com,
	heiko.carstens@de.ibm.com, jiaxun.yang@flygoat.com,
	Michal Hocko <mhocko@kernel.org>,
	mwb@linux.vnet.ibm.com, paulus@samba.org, hpa@zytor.com,
	sparclinux@vger.kernel.org, chenhc@lemote.com, will@kernel.org,
	cai@lca.pw, linux-s390@vger.kernel.org,
	ysato@users.sourceforge.jp, linux-acpi@vger.kernel.org,
	x86@kernel.org, rppt@linux.ibm.com, borntraeger@de.ibm.com,
	dledford@redhat.com, mingo@redhat.com,
	jeffrey.t.kirsher@intel.com, jhogan@kernel.org,
	mattst88@gmail.com, linux-mips@vger.kernel.org, lenb@kernel.org,
	len.brown@intel.com, gor@linux.ibm.com,
	anshuman.khandual@arm.com, bp@alien8.de, luto@kernel.org,
	bhelgaas@google.com, tglx@linutronix.de,
	naveen.n.rao@linux.vnet.ibm.com,
	linux-arm-kernel@lists.infradead.org, rth@twiddle.net,
	axboe@kernel.dk, linux-pci@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-kernel@vger.kernel.org, ralf@linux-mips.org,
	tbogendoerfer@suse.de, paul.burton@mips.com,
	linux-alpha@vger.kernel.org, rafael@kernel.org,
	ink@jurassic.park.msu.ru, akpm@linux-foundation.org,
	Robin Murphy <robin.murphy@arm.com>,
	davem@davemloft.net
Subject: Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
Date: Tue, 15 Oct 2019 18:40:29 +0800
Message-ID: <34450edf-2249-ee7a-fc83-f4a923f75989@huawei.com> (raw)
In-Reply-To: <20191014092509.GA3050088@kroah.com>

On 2019/10/14 17:25, Greg KH wrote:
> On Mon, Oct 14, 2019 at 04:00:46PM +0800, Yunsheng Lin wrote:
>> On 2019/10/12 18:47, Greg KH wrote:
>>> On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
>>>> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
>>>>> On 2019/10/12 15:40, Greg KH wrote:
>>>>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>>>>>>> add pci and acpi maintainer
>>>>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>>>>>>
>>>>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>>>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>>>>>>> NUMA_NO_NODE aware?
>>>>>>>>
>>>>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>>>>>>> have a node assigned.
>>>>>>>>
>>>>>>>> It not having one, is a straight up bug. We must not silently accept
>>>>>>>> NO_NODE there, ever.
>>>>>>>>
>>>>>>>
>>>>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
>>>>>>> device is not set by "not silently accept NO_NODE".
>>>>>>
>>>>>> If the firmware of a pci device does not provide the node information,
>>>>>> then yes, warn about that.
>>>>>>
>>>>>>> As Greg has asked about in [1]:
>>>>>>> what is a user to do when the user sees the kernel reporting that?
>>>>>>>
>>>>>>> We may tell user to contact their vendor for info or updates about
>>>>>>> that when they do not know about their system well enough, but their
>>>>>>> vendor may get away with this by quoting ACPI spec as the spec
>>>>>>> considering this optional. Should the user believe this is indeed a
>>>>>>> fw bug or a misreport from the kernel?
>>>>>>
>>>>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
>>>>>>
>>>>>>> If this kind of reporting is common pratice and will not cause any
>>>>>>> misunderstanding, then maybe we can report that.
>>>>>>
>>>>>> Yes, please do so, that's the only way those boxes are ever going to get
>>>>>> fixed.  And go add the test to the "firmware testing" tool that is based
>>>>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
>>>>>> before they ship hardware.
>>>>>>
>>>>>> This shouldn't be a big deal, we warn of other hardware bugs all the
>>>>>> time.
>>>>>
>>>>> Ok, thanks for clarifying.
>>>>>
>>>>> Will send a patch to catch the case when a pcie device without numa node
>>>>> being set and warn about it.
>>>>>
>>>>> Maybe use dev->bus to verify if it is a pci device?
>>>>
>>>> No, do that in the pci bus core code itself, when creating the devices
>>>> as that is when you know, or do not know, the numa node, right?
>>>>
>>>> This can't be in the driver core only, as each bus type will have a
>>>> different way of determining what the node the device is on.  For some
>>>> reason, I thought the PCI core code already does this, right?
>>>
>>> Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
>>> thing...
>>>
>>> Anyway, it looks like the pci core code does call set_dev_node() based
>>> on the PCI bridge, so if that is set up properly, all should be fine.
>>>
>>> If not, well, you have buggy firmware and you need to warn about that at
>>> the time you are creating the bridge.  Look at the call to
>>> pcibus_to_node() in pci_register_host_bridge().
>>
>> Thanks for pointing out the specific function.
>> Maybe we do not need to warn about the case when the device has a parent,
>> because we must have warned about the parent if the device has a parent
>> and the parent also has a node of NO_NODE, so do not need to warn the child
>> device anymore? like blew:
>>
>> @@ -932,6 +932,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>>         list_add_tail(&bus->node, &pci_root_buses);
>>         up_write(&pci_bus_sem);
>>
>> +       if (nr_node_ids > 1 && !parent &&
> 
> Why do you need to check this?  If you have a parent, it's your node
> should be set, if not, that's an error, right?

If the device has parent and the parent device also has a node of
NUMA_NO_NODE, then maybe we have warned about the parent device, so
we do not have to warn about the child device?

In pci_register_host_bridge():

	if (!parent)
		set_dev_node(bus->bridge, pcibus_to_node(bus));

The above only set the node of the bridge device to the node of bus if
the bridge device does not have a parent.

	bus->dev.parent = bus->bridge;

	dev_set_name(&bus->dev, "%04x:%02x", pci_domain_nr(bus), bus->number);
	name = dev_name(&bus->dev);

	err = device_register(&bus->dev);

The above then set the bus device's parent to bridge device, and then
call device_register(), which will set the bus device's node according to
bridge device' node.

> 
>> +           dev_to_node(bus->bridge) == NUMA_NO_NODE)
>> +               dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW. Please contact your vendor for updates.\n");
>> +
>>         return 0;
> 
> Who set that bus->bridge node to NUMA_NO_NODE?

It seems x86 and arm64 may have different implemention of
pcibus_to_node():

For arm64:
int pcibus_to_node(struct pci_bus *bus)
{
	return dev_to_node(&bus->dev);
}

And the node of bus is set in:
int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
{
	if (!acpi_disabled) {
		struct pci_config_window *cfg = bridge->bus->sysdata;
		struct acpi_device *adev = to_acpi_device(cfg->parent);
		struct device *bus_dev = &bridge->bus->dev;

		ACPI_COMPANION_SET(&bridge->dev, adev);
		set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
	}

	return 0;
}

acpi_get_node() may return NUMA_NO_NODE in pcibios_root_bridge_prepare(),
which will set the node of bus_dev to NUMA_NO_NODE


x86:
static inline int __pcibus_to_node(const struct pci_bus *bus)
{
	const struct pci_sysdata *sd = bus->sysdata;

	return sd->node;
}

And the node of bus is set in pci_acpi_scan_root(), which uses
pci_acpi_root_get_node() get the node of a bus. And it also may return
NUMA_NO_NODE.


> If that is set, the firmware is broken, as you say, but you need to tell
> the user what firmware is broken.

Maybe mentioning the BIOS in log?
dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.\n");


> 
> Try something like this out and see what happens on your machine that
> had things "broken".  What does it say?

Does not have a older bios right now.
But always returning NUMA_NO_NODE by below patch:

--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -484,6 +484,7 @@ int acpi_get_node(acpi_handle handle)

        pxm = acpi_get_pxm(handle);

-       return acpi_map_pxm_to_node(pxm);
+       return -1;
+       //return acpi_map_pxm_to_node(pxm);

it gives the blow warning in my machine:

[   16.126136]  pci0000:00: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   17.733831]  pci0000:7b: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   18.020924]  pci0000:7a: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   18.552832]  pci0000:78: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   19.514948]  pci0000:7c: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   20.652990]  pci0000:74: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   22.573200]  pci0000:80: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   23.225355]  pci0000:bb: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   23.514040]  pci0000:ba: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   24.050107]  pci0000:b8: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   25.017491]  pci0000:bc: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   25.557974]  pci0000:b4: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.

> 
>> Also, we do not need to warn about that in pci_device_add(), Right?
>> Because we must have warned about the pci host bridge of the pci device.
> 
> That should be true, yes.
> 
>> I may be wrong about above because I am not so familiar with the pci.
>>
>>>
>>> And yes, you need to do this all on a per-bus-type basis, as has been
>>> pointed out.  It's up to the bus to create the device and set this up
>>> properly.
>>
>> Thanks.
>> Will do that on per-bus-type basis.
> 
> Good luck, I don't really think that most, if any, of this is needed,
> but hey, it's nice to clean it up where it can be :)
> 
> greg k-h
> 
> .
> 


  parent reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-17 12:48 Yunsheng Lin
2019-09-21 22:38 ` Paul Burton
2019-09-23  2:31   ` Yunsheng Lin
2019-09-23 15:15 ` Peter Zijlstra
2019-09-23 15:28   ` Michal Hocko
2019-09-23 15:48     ` Peter Zijlstra
2019-09-23 16:52       ` Michal Hocko
2019-09-23 20:34         ` Peter Zijlstra
2019-09-24  1:29           ` Yunsheng Lin
2019-09-24  9:25             ` Peter Zijlstra
2019-09-24 11:07               ` Yunsheng Lin
2019-09-24 11:28                 ` Peter Zijlstra
2019-09-24 11:44                   ` Yunsheng Lin
2019-09-24 11:58                     ` Peter Zijlstra
2019-09-24 12:09                       ` Yunsheng Lin
2019-09-24  7:47           ` Michal Hocko
2019-09-24  9:17             ` Peter Zijlstra
2019-09-24 10:56               ` Michal Hocko
2019-09-24 11:23                 ` Peter Zijlstra
2019-09-24 11:54                   ` Michal Hocko
2019-09-24 12:09                     ` Peter Zijlstra
2019-09-24 12:25                       ` Michal Hocko
2019-09-24 12:43                         ` Peter Zijlstra
2019-09-24 12:59                           ` Peter Zijlstra
2019-09-24 13:19                             ` Michal Hocko
2019-09-25  9:14                               ` Yunsheng Lin
2019-09-25 10:41                                 ` Peter Zijlstra
2019-10-08  8:38                                   ` Yunsheng Lin
2019-10-09 12:25                                     ` Robin Murphy
2019-10-10  6:07                                       ` Yunsheng Lin
2019-10-10  7:32                                         ` Michal Hocko
2019-10-11  3:27                                           ` Yunsheng Lin
2019-10-11 11:15                                             ` Peter Zijlstra
2019-10-12  6:17                                               ` Yunsheng Lin
2019-10-12  7:40                                                 ` Greg KH
2019-10-12  9:47                                                   ` Yunsheng Lin
2019-10-12 10:40                                                     ` Greg KH
2019-10-12 10:47                                                       ` Greg KH
2019-10-14  8:00                                                         ` Yunsheng Lin
2019-10-14  9:25                                                           ` Greg KH
2019-10-14  9:49                                                             ` Peter Zijlstra
2019-10-14 10:04                                                               ` Greg KH
2019-10-15 10:40                                                             ` Yunsheng Lin [this message]
2019-10-15 16:58                                                               ` Greg KH
2019-10-16 12:07                                                                 ` Yunsheng Lin
2019-10-28  9:20                                                   ` Yunsheng Lin
2019-10-29  8:53                                                     ` Michal Hocko
2019-10-30  1:58                                                       ` Yunsheng Lin
2019-10-10  8:56                                       ` Peter Zijlstra
2019-09-25 10:40                               ` Peter Zijlstra
2019-09-25 13:25                                 ` Michal Hocko
2019-09-25 16:31                                   ` Peter Zijlstra
2019-09-25 21:45                                     ` Peter Zijlstra
2019-09-26  9:05                                       ` Peter Zijlstra
2019-09-26 12:10                                         ` Peter Zijlstra
2019-09-26 11:45                                     ` Geert Uytterhoeven
2019-09-26 12:24                                       ` Peter Zijlstra

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34450edf-2249-ee7a-fc83-f4a923f75989@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=borntraeger@de.ibm.com \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=catalin.marinas@arm.com \
    --cc=chenhc@lemote.com \
    --cc=dalias@libc.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dledford@redhat.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jhogan@kernel.org \
    --cc=jiaxun.yang@flygoat.com \
    --cc=len.brown@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mattst88@gmail.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mwb@linux.vnet.ibm.com \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=paul.burton@mips.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=ralf@linux-mips.org \
    --cc=rjw@rjwysocki.net \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.ibm.com \
    --cc=rth@twiddle.net \
    --cc=sparclinux@vger.kernel.org \
    --cc=tbogendoerfer@suse.de \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git