Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
       [not found]                   ` <20191011111539.GX2311@hirez.programming.kicks-ass.net>
@ 2019-10-12  6:17                     ` Yunsheng Lin
  2019-10-12  7:40                       ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-12  6:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michal Hocko, Robin Murphy, catalin.marinas, will, mingo, bp,
	rth, ink, mattst88, benh, paulus, mpe, heiko.carstens, gor,
	borntraeger, ysato, dalias, davem, ralf, paul.burton, jhogan,
	jiaxun.yang, chenhc, akpm, rppt, anshuman.khandual, tglx, cai,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, gregkh, bhelgaas,
	linux-pci, Rafael J. Wysocki, lenb, linux-acpi

add pci and acpi maintainer
cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org

On 2019/10/11 19:15, Peter Zijlstra wrote:
> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>> But I failed to see why the above is related to making node_to_cpumask_map()
>> NUMA_NO_NODE aware?
> 
> Your initial bug is for hns3, which is a PCI device, which really _MUST_
> have a node assigned.
> 
> It not having one, is a straight up bug. We must not silently accept
> NO_NODE there, ever.
> 

I suppose you mean reporting a lack of affinity when the node of a pcie
device is not set by "not silently accept NO_NODE".

As Greg has asked about in [1]:
what is a user to do when the user sees the kernel reporting that?

We may tell user to contact their vendor for info or updates about
that when they do not know about their system well enough, but their
vendor may get away with this by quoting ACPI spec as the spec
considering this optional. Should the user believe this is indeed a
fw bug or a misreport from the kernel?

If this kind of reporting is common pratice and will not cause any
misunderstanding, then maybe we can report that.

[1] https://lore.kernel.org/lkml/20190905055727.GB23826@kroah.com/

> .
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-12  6:17                     ` [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware Yunsheng Lin
@ 2019-10-12  7:40                       ` Greg KH
  2019-10-12  9:47                         ` Yunsheng Lin
  2019-10-28  9:20                         ` Yunsheng Lin
  0 siblings, 2 replies; 15+ messages in thread
From: Greg KH @ 2019-10-12  7:40 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
> add pci and acpi maintainer
> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
> 
> On 2019/10/11 19:15, Peter Zijlstra wrote:
> > On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
> >> But I failed to see why the above is related to making node_to_cpumask_map()
> >> NUMA_NO_NODE aware?
> > 
> > Your initial bug is for hns3, which is a PCI device, which really _MUST_
> > have a node assigned.
> > 
> > It not having one, is a straight up bug. We must not silently accept
> > NO_NODE there, ever.
> > 
> 
> I suppose you mean reporting a lack of affinity when the node of a pcie
> device is not set by "not silently accept NO_NODE".

If the firmware of a pci device does not provide the node information,
then yes, warn about that.

> As Greg has asked about in [1]:
> what is a user to do when the user sees the kernel reporting that?
> 
> We may tell user to contact their vendor for info or updates about
> that when they do not know about their system well enough, but their
> vendor may get away with this by quoting ACPI spec as the spec
> considering this optional. Should the user believe this is indeed a
> fw bug or a misreport from the kernel?

Say it is a firmware bug, if it is a firmware bug, that's simple.

> If this kind of reporting is common pratice and will not cause any
> misunderstanding, then maybe we can report that.

Yes, please do so, that's the only way those boxes are ever going to get
fixed.  And go add the test to the "firmware testing" tool that is based
on Linux that Intel has somewhere, to give vendors a chance to fix this
before they ship hardware.

This shouldn't be a big deal, we warn of other hardware bugs all the
time.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-12  7:40                       ` Greg KH
@ 2019-10-12  9:47                         ` Yunsheng Lin
  2019-10-12 10:40                           ` Greg KH
  2019-10-28  9:20                         ` Yunsheng Lin
  1 sibling, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-12  9:47 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On 2019/10/12 15:40, Greg KH wrote:
> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>> add pci and acpi maintainer
>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>
>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>> NUMA_NO_NODE aware?
>>>
>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>> have a node assigned.
>>>
>>> It not having one, is a straight up bug. We must not silently accept
>>> NO_NODE there, ever.
>>>
>>
>> I suppose you mean reporting a lack of affinity when the node of a pcie
>> device is not set by "not silently accept NO_NODE".
> 
> If the firmware of a pci device does not provide the node information,
> then yes, warn about that.
> 
>> As Greg has asked about in [1]:
>> what is a user to do when the user sees the kernel reporting that?
>>
>> We may tell user to contact their vendor for info or updates about
>> that when they do not know about their system well enough, but their
>> vendor may get away with this by quoting ACPI spec as the spec
>> considering this optional. Should the user believe this is indeed a
>> fw bug or a misreport from the kernel?
> 
> Say it is a firmware bug, if it is a firmware bug, that's simple.
> 
>> If this kind of reporting is common pratice and will not cause any
>> misunderstanding, then maybe we can report that.
> 
> Yes, please do so, that's the only way those boxes are ever going to get
> fixed.  And go add the test to the "firmware testing" tool that is based
> on Linux that Intel has somewhere, to give vendors a chance to fix this
> before they ship hardware.
> 
> This shouldn't be a big deal, we warn of other hardware bugs all the
> time.

Ok, thanks for clarifying.

Will send a patch to catch the case when a pcie device without numa node
being set and warn about it.

Maybe use dev->bus to verify if it is a pci device?

> 
> thanks,
> 
> greg k-h
> 
> .
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-12  9:47                         ` Yunsheng Lin
@ 2019-10-12 10:40                           ` Greg KH
  2019-10-12 10:47                             ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2019-10-12 10:40 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
> On 2019/10/12 15:40, Greg KH wrote:
> > On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
> >> add pci and acpi maintainer
> >> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
> >>
> >> On 2019/10/11 19:15, Peter Zijlstra wrote:
> >>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
> >>>> But I failed to see why the above is related to making node_to_cpumask_map()
> >>>> NUMA_NO_NODE aware?
> >>>
> >>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
> >>> have a node assigned.
> >>>
> >>> It not having one, is a straight up bug. We must not silently accept
> >>> NO_NODE there, ever.
> >>>
> >>
> >> I suppose you mean reporting a lack of affinity when the node of a pcie
> >> device is not set by "not silently accept NO_NODE".
> > 
> > If the firmware of a pci device does not provide the node information,
> > then yes, warn about that.
> > 
> >> As Greg has asked about in [1]:
> >> what is a user to do when the user sees the kernel reporting that?
> >>
> >> We may tell user to contact their vendor for info or updates about
> >> that when they do not know about their system well enough, but their
> >> vendor may get away with this by quoting ACPI spec as the spec
> >> considering this optional. Should the user believe this is indeed a
> >> fw bug or a misreport from the kernel?
> > 
> > Say it is a firmware bug, if it is a firmware bug, that's simple.
> > 
> >> If this kind of reporting is common pratice and will not cause any
> >> misunderstanding, then maybe we can report that.
> > 
> > Yes, please do so, that's the only way those boxes are ever going to get
> > fixed.  And go add the test to the "firmware testing" tool that is based
> > on Linux that Intel has somewhere, to give vendors a chance to fix this
> > before they ship hardware.
> > 
> > This shouldn't be a big deal, we warn of other hardware bugs all the
> > time.
> 
> Ok, thanks for clarifying.
> 
> Will send a patch to catch the case when a pcie device without numa node
> being set and warn about it.
> 
> Maybe use dev->bus to verify if it is a pci device?

No, do that in the pci bus core code itself, when creating the devices
as that is when you know, or do not know, the numa node, right?

This can't be in the driver core only, as each bus type will have a
different way of determining what the node the device is on.  For some
reason, I thought the PCI core code already does this, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-12 10:40                           ` Greg KH
@ 2019-10-12 10:47                             ` Greg KH
  2019-10-14  8:00                               ` Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2019-10-12 10:47 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
> > On 2019/10/12 15:40, Greg KH wrote:
> > > On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
> > >> add pci and acpi maintainer
> > >> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
> > >>
> > >> On 2019/10/11 19:15, Peter Zijlstra wrote:
> > >>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
> > >>>> But I failed to see why the above is related to making node_to_cpumask_map()
> > >>>> NUMA_NO_NODE aware?
> > >>>
> > >>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
> > >>> have a node assigned.
> > >>>
> > >>> It not having one, is a straight up bug. We must not silently accept
> > >>> NO_NODE there, ever.
> > >>>
> > >>
> > >> I suppose you mean reporting a lack of affinity when the node of a pcie
> > >> device is not set by "not silently accept NO_NODE".
> > > 
> > > If the firmware of a pci device does not provide the node information,
> > > then yes, warn about that.
> > > 
> > >> As Greg has asked about in [1]:
> > >> what is a user to do when the user sees the kernel reporting that?
> > >>
> > >> We may tell user to contact their vendor for info or updates about
> > >> that when they do not know about their system well enough, but their
> > >> vendor may get away with this by quoting ACPI spec as the spec
> > >> considering this optional. Should the user believe this is indeed a
> > >> fw bug or a misreport from the kernel?
> > > 
> > > Say it is a firmware bug, if it is a firmware bug, that's simple.
> > > 
> > >> If this kind of reporting is common pratice and will not cause any
> > >> misunderstanding, then maybe we can report that.
> > > 
> > > Yes, please do so, that's the only way those boxes are ever going to get
> > > fixed.  And go add the test to the "firmware testing" tool that is based
> > > on Linux that Intel has somewhere, to give vendors a chance to fix this
> > > before they ship hardware.
> > > 
> > > This shouldn't be a big deal, we warn of other hardware bugs all the
> > > time.
> > 
> > Ok, thanks for clarifying.
> > 
> > Will send a patch to catch the case when a pcie device without numa node
> > being set and warn about it.
> > 
> > Maybe use dev->bus to verify if it is a pci device?
> 
> No, do that in the pci bus core code itself, when creating the devices
> as that is when you know, or do not know, the numa node, right?
> 
> This can't be in the driver core only, as each bus type will have a
> different way of determining what the node the device is on.  For some
> reason, I thought the PCI core code already does this, right?

Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
thing...

Anyway, it looks like the pci core code does call set_dev_node() based
on the PCI bridge, so if that is set up properly, all should be fine.

If not, well, you have buggy firmware and you need to warn about that at
the time you are creating the bridge.  Look at the call to
pcibus_to_node() in pci_register_host_bridge().

And yes, you need to do this all on a per-bus-type basis, as has been
pointed out.  It's up to the bus to create the device and set this up
properly.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-12 10:47                             ` Greg KH
@ 2019-10-14  8:00                               ` Yunsheng Lin
  2019-10-14  9:25                                 ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-14  8:00 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On 2019/10/12 18:47, Greg KH wrote:
> On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
>> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
>>> On 2019/10/12 15:40, Greg KH wrote:
>>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>>>>> add pci and acpi maintainer
>>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>>>>
>>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>>>>> NUMA_NO_NODE aware?
>>>>>>
>>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>>>>> have a node assigned.
>>>>>>
>>>>>> It not having one, is a straight up bug. We must not silently accept
>>>>>> NO_NODE there, ever.
>>>>>>
>>>>>
>>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
>>>>> device is not set by "not silently accept NO_NODE".
>>>>
>>>> If the firmware of a pci device does not provide the node information,
>>>> then yes, warn about that.
>>>>
>>>>> As Greg has asked about in [1]:
>>>>> what is a user to do when the user sees the kernel reporting that?
>>>>>
>>>>> We may tell user to contact their vendor for info or updates about
>>>>> that when they do not know about their system well enough, but their
>>>>> vendor may get away with this by quoting ACPI spec as the spec
>>>>> considering this optional. Should the user believe this is indeed a
>>>>> fw bug or a misreport from the kernel?
>>>>
>>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
>>>>
>>>>> If this kind of reporting is common pratice and will not cause any
>>>>> misunderstanding, then maybe we can report that.
>>>>
>>>> Yes, please do so, that's the only way those boxes are ever going to get
>>>> fixed.  And go add the test to the "firmware testing" tool that is based
>>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
>>>> before they ship hardware.
>>>>
>>>> This shouldn't be a big deal, we warn of other hardware bugs all the
>>>> time.
>>>
>>> Ok, thanks for clarifying.
>>>
>>> Will send a patch to catch the case when a pcie device without numa node
>>> being set and warn about it.
>>>
>>> Maybe use dev->bus to verify if it is a pci device?
>>
>> No, do that in the pci bus core code itself, when creating the devices
>> as that is when you know, or do not know, the numa node, right?
>>
>> This can't be in the driver core only, as each bus type will have a
>> different way of determining what the node the device is on.  For some
>> reason, I thought the PCI core code already does this, right?
> 
> Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
> thing...
> 
> Anyway, it looks like the pci core code does call set_dev_node() based
> on the PCI bridge, so if that is set up properly, all should be fine.
> 
> If not, well, you have buggy firmware and you need to warn about that at
> the time you are creating the bridge.  Look at the call to
> pcibus_to_node() in pci_register_host_bridge().

Thanks for pointing out the specific function.
Maybe we do not need to warn about the case when the device has a parent,
because we must have warned about the parent if the device has a parent
and the parent also has a node of NO_NODE, so do not need to warn the child
device anymore? like blew:

@@ -932,6 +932,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
        list_add_tail(&bus->node, &pci_root_buses);
        up_write(&pci_bus_sem);

+       if (nr_node_ids > 1 && !parent &&
+           dev_to_node(bus->bridge) == NUMA_NO_NODE)
+               dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW. Please contact your vendor for updates.\n");
+
        return 0;


Also, we do not need to warn about that in pci_device_add(), Right?
Because we must have warned about the pci host bridge of the pci device.

I may be wrong about above because I am not so familiar with the pci.

> 
> And yes, you need to do this all on a per-bus-type basis, as has been
> pointed out.  It's up to the bus to create the device and set this up
> properly.

Thanks.
Will do that on per-bus-type basis.

> 
> thanks,
> 
> greg k-h
> 
> .
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-14  8:00                               ` Yunsheng Lin
@ 2019-10-14  9:25                                 ` Greg KH
  2019-10-14  9:49                                   ` Peter Zijlstra
  2019-10-15 10:40                                   ` Yunsheng Lin
  0 siblings, 2 replies; 15+ messages in thread
From: Greg KH @ 2019-10-14  9:25 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Mon, Oct 14, 2019 at 04:00:46PM +0800, Yunsheng Lin wrote:
> On 2019/10/12 18:47, Greg KH wrote:
> > On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
> >> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
> >>> On 2019/10/12 15:40, Greg KH wrote:
> >>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
> >>>>> add pci and acpi maintainer
> >>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
> >>>>>
> >>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
> >>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
> >>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
> >>>>>>> NUMA_NO_NODE aware?
> >>>>>>
> >>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
> >>>>>> have a node assigned.
> >>>>>>
> >>>>>> It not having one, is a straight up bug. We must not silently accept
> >>>>>> NO_NODE there, ever.
> >>>>>>
> >>>>>
> >>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
> >>>>> device is not set by "not silently accept NO_NODE".
> >>>>
> >>>> If the firmware of a pci device does not provide the node information,
> >>>> then yes, warn about that.
> >>>>
> >>>>> As Greg has asked about in [1]:
> >>>>> what is a user to do when the user sees the kernel reporting that?
> >>>>>
> >>>>> We may tell user to contact their vendor for info or updates about
> >>>>> that when they do not know about their system well enough, but their
> >>>>> vendor may get away with this by quoting ACPI spec as the spec
> >>>>> considering this optional. Should the user believe this is indeed a
> >>>>> fw bug or a misreport from the kernel?
> >>>>
> >>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
> >>>>
> >>>>> If this kind of reporting is common pratice and will not cause any
> >>>>> misunderstanding, then maybe we can report that.
> >>>>
> >>>> Yes, please do so, that's the only way those boxes are ever going to get
> >>>> fixed.  And go add the test to the "firmware testing" tool that is based
> >>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
> >>>> before they ship hardware.
> >>>>
> >>>> This shouldn't be a big deal, we warn of other hardware bugs all the
> >>>> time.
> >>>
> >>> Ok, thanks for clarifying.
> >>>
> >>> Will send a patch to catch the case when a pcie device without numa node
> >>> being set and warn about it.
> >>>
> >>> Maybe use dev->bus to verify if it is a pci device?
> >>
> >> No, do that in the pci bus core code itself, when creating the devices
> >> as that is when you know, or do not know, the numa node, right?
> >>
> >> This can't be in the driver core only, as each bus type will have a
> >> different way of determining what the node the device is on.  For some
> >> reason, I thought the PCI core code already does this, right?
> > 
> > Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
> > thing...
> > 
> > Anyway, it looks like the pci core code does call set_dev_node() based
> > on the PCI bridge, so if that is set up properly, all should be fine.
> > 
> > If not, well, you have buggy firmware and you need to warn about that at
> > the time you are creating the bridge.  Look at the call to
> > pcibus_to_node() in pci_register_host_bridge().
> 
> Thanks for pointing out the specific function.
> Maybe we do not need to warn about the case when the device has a parent,
> because we must have warned about the parent if the device has a parent
> and the parent also has a node of NO_NODE, so do not need to warn the child
> device anymore? like blew:
> 
> @@ -932,6 +932,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>         list_add_tail(&bus->node, &pci_root_buses);
>         up_write(&pci_bus_sem);
> 
> +       if (nr_node_ids > 1 && !parent &&

Why do you need to check this?  If you have a parent, it's your node
should be set, if not, that's an error, right?

> +           dev_to_node(bus->bridge) == NUMA_NO_NODE)
> +               dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW. Please contact your vendor for updates.\n");
> +
>         return 0;

Who set that bus->bridge node to NUMA_NO_NODE?
If that is set, the firmware is broken, as you say, but you need to tell
the user what firmware is broken.

Try something like this out and see what happens on your machine that
had things "broken".  What does it say?

> Also, we do not need to warn about that in pci_device_add(), Right?
> Because we must have warned about the pci host bridge of the pci device.

That should be true, yes.

> I may be wrong about above because I am not so familiar with the pci.
> 
> > 
> > And yes, you need to do this all on a per-bus-type basis, as has been
> > pointed out.  It's up to the bus to create the device and set this up
> > properly.
> 
> Thanks.
> Will do that on per-bus-type basis.

Good luck, I don't really think that most, if any, of this is needed,
but hey, it's nice to clean it up where it can be :)

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-14  9:25                                 ` Greg KH
@ 2019-10-14  9:49                                   ` Peter Zijlstra
  2019-10-14 10:04                                     ` Greg KH
  2019-10-15 10:40                                   ` Yunsheng Lin
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2019-10-14  9:49 UTC (permalink / raw)
  To: Greg KH
  Cc: Yunsheng Lin, Michal Hocko, Robin Murphy, catalin.marinas, will,
	mingo, bp, rth, ink, mattst88, benh, paulus, mpe, heiko.carstens,
	gor, borntraeger, ysato, dalias, davem, ralf, paul.burton,
	jhogan, jiaxun.yang, chenhc, akpm, rppt, anshuman.khandual, tglx,
	cai, linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, bhelgaas,
	linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Mon, Oct 14, 2019 at 11:25:09AM +0200, Greg KH wrote:
> Good luck, I don't really think that most, if any, of this is needed,
> but hey, it's nice to clean it up where it can be :)

Some of the virtual devices we have (that use devm) really ought to set
the node too, like drivers/base/cpu.c and driver/base/node.c and
arguably the cooling devices too (they create a device per cpu).

The patch I had here:

  https://lkml.kernel.org/r/20190925214526.GA4643@worktop.programming.kicks-ass.net

takes the more radical approach of requiring a node, except when
explicitly marked not (the fake devices that don't use devm for
example).

But yes, PCI and other physical busses really should be having a node
set, no excuses.

Anyway, I don't think non-physical devices actually use
cpumask_of_node() much, a quick grep didn't show any.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-14  9:49                                   ` Peter Zijlstra
@ 2019-10-14 10:04                                     ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2019-10-14 10:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Yunsheng Lin, Michal Hocko, Robin Murphy, catalin.marinas, will,
	mingo, bp, rth, ink, mattst88, benh, paulus, mpe, heiko.carstens,
	gor, borntraeger, ysato, dalias, davem, ralf, paul.burton,
	jhogan, jiaxun.yang, chenhc, akpm, rppt, anshuman.khandual, tglx,
	cai, linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, bhelgaas,
	linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Mon, Oct 14, 2019 at 11:49:12AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 14, 2019 at 11:25:09AM +0200, Greg KH wrote:
> > Good luck, I don't really think that most, if any, of this is needed,
> > but hey, it's nice to clean it up where it can be :)
> 
> Some of the virtual devices we have (that use devm) really ought to set
> the node too, like drivers/base/cpu.c and driver/base/node.c and
> arguably the cooling devices too (they create a device per cpu).
> 
> The patch I had here:
> 
>   https://lkml.kernel.org/r/20190925214526.GA4643@worktop.programming.kicks-ass.net
> 
> takes the more radical approach of requiring a node, except when
> explicitly marked not (the fake devices that don't use devm for
> example).

I like that patch :)

> But yes, PCI and other physical busses really should be having a node
> set, no excuses.

Agreed, at least just warning on the bus creation will make it a bit
less "noisy", in contrast to your patch.  But the messages in your patch
show people just how broken their bioses really are.  Which is always
fun...

> Anyway, I don't think non-physical devices actually use
> cpumask_of_node() much, a quick grep didn't show any.

That's good.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-14  9:25                                 ` Greg KH
  2019-10-14  9:49                                   ` Peter Zijlstra
@ 2019-10-15 10:40                                   ` Yunsheng Lin
  2019-10-15 16:58                                     ` Greg KH
  1 sibling, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-15 10:40 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On 2019/10/14 17:25, Greg KH wrote:
> On Mon, Oct 14, 2019 at 04:00:46PM +0800, Yunsheng Lin wrote:
>> On 2019/10/12 18:47, Greg KH wrote:
>>> On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
>>>> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
>>>>> On 2019/10/12 15:40, Greg KH wrote:
>>>>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>>>>>>> add pci and acpi maintainer
>>>>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>>>>>>
>>>>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>>>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>>>>>>> NUMA_NO_NODE aware?
>>>>>>>>
>>>>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>>>>>>> have a node assigned.
>>>>>>>>
>>>>>>>> It not having one, is a straight up bug. We must not silently accept
>>>>>>>> NO_NODE there, ever.
>>>>>>>>
>>>>>>>
>>>>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
>>>>>>> device is not set by "not silently accept NO_NODE".
>>>>>>
>>>>>> If the firmware of a pci device does not provide the node information,
>>>>>> then yes, warn about that.
>>>>>>
>>>>>>> As Greg has asked about in [1]:
>>>>>>> what is a user to do when the user sees the kernel reporting that?
>>>>>>>
>>>>>>> We may tell user to contact their vendor for info or updates about
>>>>>>> that when they do not know about their system well enough, but their
>>>>>>> vendor may get away with this by quoting ACPI spec as the spec
>>>>>>> considering this optional. Should the user believe this is indeed a
>>>>>>> fw bug or a misreport from the kernel?
>>>>>>
>>>>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
>>>>>>
>>>>>>> If this kind of reporting is common pratice and will not cause any
>>>>>>> misunderstanding, then maybe we can report that.
>>>>>>
>>>>>> Yes, please do so, that's the only way those boxes are ever going to get
>>>>>> fixed.  And go add the test to the "firmware testing" tool that is based
>>>>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
>>>>>> before they ship hardware.
>>>>>>
>>>>>> This shouldn't be a big deal, we warn of other hardware bugs all the
>>>>>> time.
>>>>>
>>>>> Ok, thanks for clarifying.
>>>>>
>>>>> Will send a patch to catch the case when a pcie device without numa node
>>>>> being set and warn about it.
>>>>>
>>>>> Maybe use dev->bus to verify if it is a pci device?
>>>>
>>>> No, do that in the pci bus core code itself, when creating the devices
>>>> as that is when you know, or do not know, the numa node, right?
>>>>
>>>> This can't be in the driver core only, as each bus type will have a
>>>> different way of determining what the node the device is on.  For some
>>>> reason, I thought the PCI core code already does this, right?
>>>
>>> Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
>>> thing...
>>>
>>> Anyway, it looks like the pci core code does call set_dev_node() based
>>> on the PCI bridge, so if that is set up properly, all should be fine.
>>>
>>> If not, well, you have buggy firmware and you need to warn about that at
>>> the time you are creating the bridge.  Look at the call to
>>> pcibus_to_node() in pci_register_host_bridge().
>>
>> Thanks for pointing out the specific function.
>> Maybe we do not need to warn about the case when the device has a parent,
>> because we must have warned about the parent if the device has a parent
>> and the parent also has a node of NO_NODE, so do not need to warn the child
>> device anymore? like blew:
>>
>> @@ -932,6 +932,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>>         list_add_tail(&bus->node, &pci_root_buses);
>>         up_write(&pci_bus_sem);
>>
>> +       if (nr_node_ids > 1 && !parent &&
> 
> Why do you need to check this?  If you have a parent, it's your node
> should be set, if not, that's an error, right?

If the device has parent and the parent device also has a node of
NUMA_NO_NODE, then maybe we have warned about the parent device, so
we do not have to warn about the child device?

In pci_register_host_bridge():

	if (!parent)
		set_dev_node(bus->bridge, pcibus_to_node(bus));

The above only set the node of the bridge device to the node of bus if
the bridge device does not have a parent.

	bus->dev.parent = bus->bridge;

	dev_set_name(&bus->dev, "%04x:%02x", pci_domain_nr(bus), bus->number);
	name = dev_name(&bus->dev);

	err = device_register(&bus->dev);

The above then set the bus device's parent to bridge device, and then
call device_register(), which will set the bus device's node according to
bridge device' node.

> 
>> +           dev_to_node(bus->bridge) == NUMA_NO_NODE)
>> +               dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW. Please contact your vendor for updates.\n");
>> +
>>         return 0;
> 
> Who set that bus->bridge node to NUMA_NO_NODE?

It seems x86 and arm64 may have different implemention of
pcibus_to_node():

For arm64:
int pcibus_to_node(struct pci_bus *bus)
{
	return dev_to_node(&bus->dev);
}

And the node of bus is set in:
int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
{
	if (!acpi_disabled) {
		struct pci_config_window *cfg = bridge->bus->sysdata;
		struct acpi_device *adev = to_acpi_device(cfg->parent);
		struct device *bus_dev = &bridge->bus->dev;

		ACPI_COMPANION_SET(&bridge->dev, adev);
		set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
	}

	return 0;
}

acpi_get_node() may return NUMA_NO_NODE in pcibios_root_bridge_prepare(),
which will set the node of bus_dev to NUMA_NO_NODE


x86:
static inline int __pcibus_to_node(const struct pci_bus *bus)
{
	const struct pci_sysdata *sd = bus->sysdata;

	return sd->node;
}

And the node of bus is set in pci_acpi_scan_root(), which uses
pci_acpi_root_get_node() get the node of a bus. And it also may return
NUMA_NO_NODE.


> If that is set, the firmware is broken, as you say, but you need to tell
> the user what firmware is broken.

Maybe mentioning the BIOS in log?
dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.\n");


> 
> Try something like this out and see what happens on your machine that
> had things "broken".  What does it say?

Does not have a older bios right now.
But always returning NUMA_NO_NODE by below patch:

--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -484,6 +484,7 @@ int acpi_get_node(acpi_handle handle)

        pxm = acpi_get_pxm(handle);

-       return acpi_map_pxm_to_node(pxm);
+       return -1;
+       //return acpi_map_pxm_to_node(pxm);

it gives the blow warning in my machine:

[   16.126136]  pci0000:00: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   17.733831]  pci0000:7b: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   18.020924]  pci0000:7a: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   18.552832]  pci0000:78: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   19.514948]  pci0000:7c: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   20.652990]  pci0000:74: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   22.573200]  pci0000:80: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   23.225355]  pci0000:bb: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   23.514040]  pci0000:ba: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   24.050107]  pci0000:b8: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   25.017491]  pci0000:bc: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
[   25.557974]  pci0000:b4: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.

> 
>> Also, we do not need to warn about that in pci_device_add(), Right?
>> Because we must have warned about the pci host bridge of the pci device.
> 
> That should be true, yes.
> 
>> I may be wrong about above because I am not so familiar with the pci.
>>
>>>
>>> And yes, you need to do this all on a per-bus-type basis, as has been
>>> pointed out.  It's up to the bus to create the device and set this up
>>> properly.
>>
>> Thanks.
>> Will do that on per-bus-type basis.
> 
> Good luck, I don't really think that most, if any, of this is needed,
> but hey, it's nice to clean it up where it can be :)
> 
> greg k-h
> 
> .
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-15 10:40                                   ` Yunsheng Lin
@ 2019-10-15 16:58                                     ` Greg KH
  2019-10-16 12:07                                       ` Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2019-10-15 16:58 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Tue, Oct 15, 2019 at 06:40:29PM +0800, Yunsheng Lin wrote:
> On 2019/10/14 17:25, Greg KH wrote:
> > On Mon, Oct 14, 2019 at 04:00:46PM +0800, Yunsheng Lin wrote:
> >> On 2019/10/12 18:47, Greg KH wrote:
> >>> On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
> >>>> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
> >>>>> On 2019/10/12 15:40, Greg KH wrote:
> >>>>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
> >>>>>>> add pci and acpi maintainer
> >>>>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
> >>>>>>>
> >>>>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
> >>>>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
> >>>>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
> >>>>>>>>> NUMA_NO_NODE aware?
> >>>>>>>>
> >>>>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
> >>>>>>>> have a node assigned.
> >>>>>>>>
> >>>>>>>> It not having one, is a straight up bug. We must not silently accept
> >>>>>>>> NO_NODE there, ever.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
> >>>>>>> device is not set by "not silently accept NO_NODE".
> >>>>>>
> >>>>>> If the firmware of a pci device does not provide the node information,
> >>>>>> then yes, warn about that.
> >>>>>>
> >>>>>>> As Greg has asked about in [1]:
> >>>>>>> what is a user to do when the user sees the kernel reporting that?
> >>>>>>>
> >>>>>>> We may tell user to contact their vendor for info or updates about
> >>>>>>> that when they do not know about their system well enough, but their
> >>>>>>> vendor may get away with this by quoting ACPI spec as the spec
> >>>>>>> considering this optional. Should the user believe this is indeed a
> >>>>>>> fw bug or a misreport from the kernel?
> >>>>>>
> >>>>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
> >>>>>>
> >>>>>>> If this kind of reporting is common pratice and will not cause any
> >>>>>>> misunderstanding, then maybe we can report that.
> >>>>>>
> >>>>>> Yes, please do so, that's the only way those boxes are ever going to get
> >>>>>> fixed.  And go add the test to the "firmware testing" tool that is based
> >>>>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
> >>>>>> before they ship hardware.
> >>>>>>
> >>>>>> This shouldn't be a big deal, we warn of other hardware bugs all the
> >>>>>> time.
> >>>>>
> >>>>> Ok, thanks for clarifying.
> >>>>>
> >>>>> Will send a patch to catch the case when a pcie device without numa node
> >>>>> being set and warn about it.
> >>>>>
> >>>>> Maybe use dev->bus to verify if it is a pci device?
> >>>>
> >>>> No, do that in the pci bus core code itself, when creating the devices
> >>>> as that is when you know, or do not know, the numa node, right?
> >>>>
> >>>> This can't be in the driver core only, as each bus type will have a
> >>>> different way of determining what the node the device is on.  For some
> >>>> reason, I thought the PCI core code already does this, right?
> >>>
> >>> Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
> >>> thing...
> >>>
> >>> Anyway, it looks like the pci core code does call set_dev_node() based
> >>> on the PCI bridge, so if that is set up properly, all should be fine.
> >>>
> >>> If not, well, you have buggy firmware and you need to warn about that at
> >>> the time you are creating the bridge.  Look at the call to
> >>> pcibus_to_node() in pci_register_host_bridge().
> >>
> >> Thanks for pointing out the specific function.
> >> Maybe we do not need to warn about the case when the device has a parent,
> >> because we must have warned about the parent if the device has a parent
> >> and the parent also has a node of NO_NODE, so do not need to warn the child
> >> device anymore? like blew:
> >>
> >> @@ -932,6 +932,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> >>         list_add_tail(&bus->node, &pci_root_buses);
> >>         up_write(&pci_bus_sem);
> >>
> >> +       if (nr_node_ids > 1 && !parent &&
> > 
> > Why do you need to check this?  If you have a parent, it's your node
> > should be set, if not, that's an error, right?
> 
> If the device has parent and the parent device also has a node of
> NUMA_NO_NODE, then maybe we have warned about the parent device, so
> we do not have to warn about the child device?

But it's a PCI bridge, if it is not set properly, that needs to be fixed
otherwise the PCI devices attached to it have no hope of working
properly.

> In pci_register_host_bridge():
> 
> 	if (!parent)
> 		set_dev_node(bus->bridge, pcibus_to_node(bus));
> 
> The above only set the node of the bridge device to the node of bus if
> the bridge device does not have a parent.

Odd, what happens to devices behind another bridge today?  Are their
nodes set properly today?  Is the node supposed to be the same as the
parent bridge?

> >> +           dev_to_node(bus->bridge) == NUMA_NO_NODE)
> >> +               dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW. Please contact your vendor for updates.\n");
> >> +
> >>         return 0;
> > 
> > Who set that bus->bridge node to NUMA_NO_NODE?
> 
> It seems x86 and arm64 may have different implemention of
> pcibus_to_node():
> 
> For arm64:
> int pcibus_to_node(struct pci_bus *bus)
> {
> 	return dev_to_node(&bus->dev);
> }
> 
> And the node of bus is set in:
> int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
> {
> 	if (!acpi_disabled) {
> 		struct pci_config_window *cfg = bridge->bus->sysdata;
> 		struct acpi_device *adev = to_acpi_device(cfg->parent);
> 		struct device *bus_dev = &bridge->bus->dev;
> 
> 		ACPI_COMPANION_SET(&bridge->dev, adev);
> 		set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
> 	}
> 
> 	return 0;
> }
> 
> acpi_get_node() may return NUMA_NO_NODE in pcibios_root_bridge_prepare(),
> which will set the node of bus_dev to NUMA_NO_NODE
> 
> 
> x86:
> static inline int __pcibus_to_node(const struct pci_bus *bus)
> {
> 	const struct pci_sysdata *sd = bus->sysdata;
> 
> 	return sd->node;
> }
> 
> And the node of bus is set in pci_acpi_scan_root(), which uses
> pci_acpi_root_get_node() get the node of a bus. And it also may return
> NUMA_NO_NODE.

Fixing that will be good :)

> > If that is set, the firmware is broken, as you say, but you need to tell
> > the user what firmware is broken.
> 
> Maybe mentioning the BIOS in log?
> dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.\n");

That's a good start.  Try running it on your machines (big and small)
and see what happens.

> > Try something like this out and see what happens on your machine that
> > had things "broken".  What does it say?
> 
> Does not have a older bios right now.
> But always returning NUMA_NO_NODE by below patch:
> 
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -484,6 +484,7 @@ int acpi_get_node(acpi_handle handle)
> 
>         pxm = acpi_get_pxm(handle);
> 
> -       return acpi_map_pxm_to_node(pxm);
> +       return -1;
> +       //return acpi_map_pxm_to_node(pxm);
> 
> it gives the blow warning in my machine:
> 
> [   16.126136]  pci0000:00: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   17.733831]  pci0000:7b: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   18.020924]  pci0000:7a: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   18.552832]  pci0000:78: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   19.514948]  pci0000:7c: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   20.652990]  pci0000:74: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   22.573200]  pci0000:80: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   23.225355]  pci0000:bb: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   23.514040]  pci0000:ba: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   24.050107]  pci0000:b8: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   25.017491]  pci0000:bc: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> [   25.557974]  pci0000:b4: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.

And can you fix your bios?  If you can't then why are we going to warn
about this?

:)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-15 16:58                                     ` Greg KH
@ 2019-10-16 12:07                                       ` Yunsheng Lin
  0 siblings, 0 replies; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-16 12:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Michal Hocko, Robin Murphy, catalin.marinas,
	will, mingo, bp, rth, ink, mattst88, benh, paulus, mpe,
	heiko.carstens, gor, borntraeger, ysato, dalias, davem, ralf,
	paul.burton, jhogan, jiaxun.yang, chenhc, akpm, rppt,
	anshuman.khandual, tglx, cai, linux-arm-kernel, linux-kernel,
	hpa, x86, dave.hansen, luto, len.brown, axboe, dledford,
	jeffrey.t.kirsher, linux-alpha, naveen.n.rao, mwb, linuxppc-dev,
	linux-s390, linux-sh, sparclinux, tbogendoerfer, linux-mips,
	rafael, bhelgaas, linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On 2019/10/16 0:58, Greg KH wrote:
> On Tue, Oct 15, 2019 at 06:40:29PM +0800, Yunsheng Lin wrote:
>> On 2019/10/14 17:25, Greg KH wrote:
>>> On Mon, Oct 14, 2019 at 04:00:46PM +0800, Yunsheng Lin wrote:
>>>> On 2019/10/12 18:47, Greg KH wrote:
>>>>> On Sat, Oct 12, 2019 at 12:40:01PM +0200, Greg KH wrote:
>>>>>> On Sat, Oct 12, 2019 at 05:47:56PM +0800, Yunsheng Lin wrote:
>>>>>>> On 2019/10/12 15:40, Greg KH wrote:
>>>>>>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>>>>>>>>> add pci and acpi maintainer
>>>>>>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>>>>>>>>
>>>>>>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>>>>>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>>>>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>>>>>>>>> NUMA_NO_NODE aware?
>>>>>>>>>>
>>>>>>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>>>>>>>>> have a node assigned.
>>>>>>>>>>
>>>>>>>>>> It not having one, is a straight up bug. We must not silently accept
>>>>>>>>>> NO_NODE there, ever.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
>>>>>>>>> device is not set by "not silently accept NO_NODE".
>>>>>>>>
>>>>>>>> If the firmware of a pci device does not provide the node information,
>>>>>>>> then yes, warn about that.
>>>>>>>>
>>>>>>>>> As Greg has asked about in [1]:
>>>>>>>>> what is a user to do when the user sees the kernel reporting that?
>>>>>>>>>
>>>>>>>>> We may tell user to contact their vendor for info or updates about
>>>>>>>>> that when they do not know about their system well enough, but their
>>>>>>>>> vendor may get away with this by quoting ACPI spec as the spec
>>>>>>>>> considering this optional. Should the user believe this is indeed a
>>>>>>>>> fw bug or a misreport from the kernel?
>>>>>>>>
>>>>>>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
>>>>>>>>
>>>>>>>>> If this kind of reporting is common pratice and will not cause any
>>>>>>>>> misunderstanding, then maybe we can report that.
>>>>>>>>
>>>>>>>> Yes, please do so, that's the only way those boxes are ever going to get
>>>>>>>> fixed.  And go add the test to the "firmware testing" tool that is based
>>>>>>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
>>>>>>>> before they ship hardware.
>>>>>>>>
>>>>>>>> This shouldn't be a big deal, we warn of other hardware bugs all the
>>>>>>>> time.
>>>>>>>
>>>>>>> Ok, thanks for clarifying.
>>>>>>>
>>>>>>> Will send a patch to catch the case when a pcie device without numa node
>>>>>>> being set and warn about it.
>>>>>>>
>>>>>>> Maybe use dev->bus to verify if it is a pci device?
>>>>>>
>>>>>> No, do that in the pci bus core code itself, when creating the devices
>>>>>> as that is when you know, or do not know, the numa node, right?
>>>>>>
>>>>>> This can't be in the driver core only, as each bus type will have a
>>>>>> different way of determining what the node the device is on.  For some
>>>>>> reason, I thought the PCI core code already does this, right?
>>>>>
>>>>> Yes, pci_irq_get_node(), which NO ONE CALLS!  I should go delete that
>>>>> thing...
>>>>>
>>>>> Anyway, it looks like the pci core code does call set_dev_node() based
>>>>> on the PCI bridge, so if that is set up properly, all should be fine.
>>>>>
>>>>> If not, well, you have buggy firmware and you need to warn about that at
>>>>> the time you are creating the bridge.  Look at the call to
>>>>> pcibus_to_node() in pci_register_host_bridge().
>>>>
>>>> Thanks for pointing out the specific function.
>>>> Maybe we do not need to warn about the case when the device has a parent,
>>>> because we must have warned about the parent if the device has a parent
>>>> and the parent also has a node of NO_NODE, so do not need to warn the child
>>>> device anymore? like blew:
>>>>
>>>> @@ -932,6 +932,10 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>>>>         list_add_tail(&bus->node, &pci_root_buses);
>>>>         up_write(&pci_bus_sem);
>>>>
>>>> +       if (nr_node_ids > 1 && !parent &&
>>>
>>> Why do you need to check this?  If you have a parent, it's your node
>>> should be set, if not, that's an error, right?
>>
>> If the device has parent and the parent device also has a node of
>> NUMA_NO_NODE, then maybe we have warned about the parent device, so
>> we do not have to warn about the child device?
> 
> But it's a PCI bridge, if it is not set properly, that needs to be fixed
> otherwise the PCI devices attached to it have no hope of working
> properly.

You may be right, thanks.

If it's a root PCI bridge and it does have a parent device, but
the parent device is not a pcie device and it's node is NUMA_NO_NODE,
then we will miss warning about this case.

> 
>> In pci_register_host_bridge():
>>
>> 	if (!parent)
>> 		set_dev_node(bus->bridge, pcibus_to_node(bus));
>>
>> The above only set the node of the bridge device to the node of bus if
>> the bridge device does not have a parent.
> 
> Odd, what happens to devices behind another bridge today?  Are their
> nodes set properly today?  Is the node supposed to be the same as the
> parent bridge?

It seems only the root bridge is added in pci_register_host_bridge(),
and other bridges under the root bridge is added in pci_alloc_child_bus().

And in pci_alloc_child_bus(), the child bus device is setup with proper
parent, so the pcie device under the child bus should have the same node
as the parent bridge when device_add() is called, which will set the node
to its parent's node when the child device' node is NUMA_NO_NODE.

Do not have a system with multi bridges at hand to debug it, so I may
be wrong about above.

> 
>>>> +           dev_to_node(bus->bridge) == NUMA_NO_NODE)
>>>> +               dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW. Please contact your vendor for updates.\n");
>>>> +
>>>>         return 0;
>>>
>>> Who set that bus->bridge node to NUMA_NO_NODE?
>>
>> It seems x86 and arm64 may have different implemention of
>> pcibus_to_node():
>>
>> For arm64:
>> int pcibus_to_node(struct pci_bus *bus)
>> {
>> 	return dev_to_node(&bus->dev);
>> }
>>
>> And the node of bus is set in:
>> int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
>> {
>> 	if (!acpi_disabled) {
>> 		struct pci_config_window *cfg = bridge->bus->sysdata;
>> 		struct acpi_device *adev = to_acpi_device(cfg->parent);
>> 		struct device *bus_dev = &bridge->bus->dev;
>>
>> 		ACPI_COMPANION_SET(&bridge->dev, adev);
>> 		set_dev_node(bus_dev, acpi_get_node(acpi_device_handle(adev)));
>> 	}
>>
>> 	return 0;
>> }
>>
>> acpi_get_node() may return NUMA_NO_NODE in pcibios_root_bridge_prepare(),
>> which will set the node of bus_dev to NUMA_NO_NODE
>>
>>
>> x86:
>> static inline int __pcibus_to_node(const struct pci_bus *bus)
>> {
>> 	const struct pci_sysdata *sd = bus->sysdata;
>>
>> 	return sd->node;
>> }
>>
>> And the node of bus is set in pci_acpi_scan_root(), which uses
>> pci_acpi_root_get_node() get the node of a bus. And it also may return
>> NUMA_NO_NODE.
> 
> Fixing that will be good :)>
>>> If that is set, the firmware is broken, as you say, but you need to tell
>>> the user what firmware is broken.
>>
>> Maybe mentioning the BIOS in log?
>> dev_err(bus->bridge, FW_BUG "No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.\n");
> 
> That's a good start.  Try running it on your machines (big and small)
> and see what happens.

There is no fw bug log output as above when using newer bios(
which has provided the device node through ACPI) in my machine.

> 
>>> Try something like this out and see what happens on your machine that
>>> had things "broken".  What does it say?
>>
>> Does not have a older bios right now.
>> But always returning NUMA_NO_NODE by below patch:
>>
>> --- a/drivers/acpi/numa.c
>> +++ b/drivers/acpi/numa.c
>> @@ -484,6 +484,7 @@ int acpi_get_node(acpi_handle handle)
>>
>>         pxm = acpi_get_pxm(handle);
>>
>> -       return acpi_map_pxm_to_node(pxm);
>> +       return -1;
>> +       //return acpi_map_pxm_to_node(pxm);
>>
>> it gives the blow warning in my machine:
>>
>> [   16.126136]  pci0000:00: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   17.733831]  pci0000:7b: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   18.020924]  pci0000:7a: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   18.552832]  pci0000:78: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   19.514948]  pci0000:7c: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   20.652990]  pci0000:74: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   22.573200]  pci0000:80: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   23.225355]  pci0000:bb: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   23.514040]  pci0000:ba: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   24.050107]  pci0000:b8: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   25.017491]  pci0000:bc: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
>> [   25.557974]  pci0000:b4: [Firmware Bug]: No node assigned on NUMA capable HW by BIOS. Please contact your vendor for updates.
> 
> And can you fix your bios?  If you can't then why are we going to warn
> about this?

Yes, our new bios has fixed that.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-12  7:40                       ` Greg KH
  2019-10-12  9:47                         ` Yunsheng Lin
@ 2019-10-28  9:20                         ` Yunsheng Lin
  2019-10-29  8:53                           ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-28  9:20 UTC (permalink / raw)
  To: Greg KH, Peter Zijlstra, Michal Hocko
  Cc: Robin Murphy, catalin.marinas, will, mingo, bp, rth, ink,
	mattst88, benh, paulus, mpe, heiko.carstens, gor, borntraeger,
	ysato, dalias, davem, ralf, paul.burton, jhogan, jiaxun.yang,
	chenhc, akpm, rppt, anshuman.khandual, tglx, cai,
	linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, bhelgaas,
	linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On 2019/10/12 15:40, Greg KH wrote:
> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>> add pci and acpi maintainer
>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>
>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>> NUMA_NO_NODE aware?
>>>
>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>> have a node assigned.
>>>
>>> It not having one, is a straight up bug. We must not silently accept
>>> NO_NODE there, ever.
>>>
>>
>> I suppose you mean reporting a lack of affinity when the node of a pcie
>> device is not set by "not silently accept NO_NODE".
> 
> If the firmware of a pci device does not provide the node information,
> then yes, warn about that.
> 
>> As Greg has asked about in [1]:
>> what is a user to do when the user sees the kernel reporting that?
>>
>> We may tell user to contact their vendor for info or updates about
>> that when they do not know about their system well enough, but their
>> vendor may get away with this by quoting ACPI spec as the spec
>> considering this optional. Should the user believe this is indeed a
>> fw bug or a misreport from the kernel?
> 
> Say it is a firmware bug, if it is a firmware bug, that's simple.
> 
>> If this kind of reporting is common pratice and will not cause any
>> misunderstanding, then maybe we can report that.
> 
> Yes, please do so, that's the only way those boxes are ever going to get
> fixed.  And go add the test to the "firmware testing" tool that is based
> on Linux that Intel has somewhere, to give vendors a chance to fix this
> before they ship hardware.
> 
> This shouldn't be a big deal, we warn of other hardware bugs all the
> time.

Hi, all.

The warning for the above case has been added in [1].

So maybe it makes sense to make node_to_cpumask_map() NUMA_NO_NODE aware
now?

If Yes, this patch still can be applied to the latest linus' tree cleanly,
Do I need to resend it?


[1] https://lore.kernel.org/linux-pci/1571467543-26125-1-git-send-email-linyunsheng@huawei.com/
> 
> thanks,
> 
> greg k-h
> 
> .
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-28  9:20                         ` Yunsheng Lin
@ 2019-10-29  8:53                           ` Michal Hocko
  2019-10-30  1:58                             ` Yunsheng Lin
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2019-10-29  8:53 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Greg KH, Peter Zijlstra, Robin Murphy, catalin.marinas, will,
	mingo, bp, rth, ink, mattst88, benh, paulus, mpe, heiko.carstens,
	gor, borntraeger, ysato, dalias, davem, ralf, paul.burton,
	jhogan, jiaxun.yang, chenhc, akpm, rppt, anshuman.khandual, tglx,
	cai, linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, bhelgaas,
	linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On Mon 28-10-19 17:20:33, Yunsheng Lin wrote:
> On 2019/10/12 15:40, Greg KH wrote:
> > On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
> >> add pci and acpi maintainer
> >> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
> >>
> >> On 2019/10/11 19:15, Peter Zijlstra wrote:
> >>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
> >>>> But I failed to see why the above is related to making node_to_cpumask_map()
> >>>> NUMA_NO_NODE aware?
> >>>
> >>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
> >>> have a node assigned.
> >>>
> >>> It not having one, is a straight up bug. We must not silently accept
> >>> NO_NODE there, ever.
> >>>
> >>
> >> I suppose you mean reporting a lack of affinity when the node of a pcie
> >> device is not set by "not silently accept NO_NODE".
> > 
> > If the firmware of a pci device does not provide the node information,
> > then yes, warn about that.
> > 
> >> As Greg has asked about in [1]:
> >> what is a user to do when the user sees the kernel reporting that?
> >>
> >> We may tell user to contact their vendor for info or updates about
> >> that when they do not know about their system well enough, but their
> >> vendor may get away with this by quoting ACPI spec as the spec
> >> considering this optional. Should the user believe this is indeed a
> >> fw bug or a misreport from the kernel?
> > 
> > Say it is a firmware bug, if it is a firmware bug, that's simple.
> > 
> >> If this kind of reporting is common pratice and will not cause any
> >> misunderstanding, then maybe we can report that.
> > 
> > Yes, please do so, that's the only way those boxes are ever going to get
> > fixed.  And go add the test to the "firmware testing" tool that is based
> > on Linux that Intel has somewhere, to give vendors a chance to fix this
> > before they ship hardware.
> > 
> > This shouldn't be a big deal, we warn of other hardware bugs all the
> > time.
> 
> Hi, all.
> 
> The warning for the above case has been added in [1].
> 
> So maybe it makes sense to make node_to_cpumask_map() NUMA_NO_NODE aware
> now?
> 
> If Yes, this patch still can be applied to the latest linus' tree cleanly,
> Do I need to resend it?
> 

By this patch you mean http://lkml.kernel.org/r/1568724534-146242-1-git-send-email-linyunsheng@huawei.com
right?

I would just resend it unless there is still a clear disagreement over
it.

> [1] https://lore.kernel.org/linux-pci/1571467543-26125-1-git-send-email-linyunsheng@huawei.com/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware
  2019-10-29  8:53                           ` Michal Hocko
@ 2019-10-30  1:58                             ` Yunsheng Lin
  0 siblings, 0 replies; 15+ messages in thread
From: Yunsheng Lin @ 2019-10-30  1:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg KH, Peter Zijlstra, Robin Murphy, catalin.marinas, will,
	mingo, bp, rth, ink, mattst88, benh, paulus, mpe, heiko.carstens,
	gor, borntraeger, ysato, dalias, davem, ralf, paul.burton,
	jhogan, jiaxun.yang, chenhc, akpm, rppt, anshuman.khandual, tglx,
	cai, linux-arm-kernel, linux-kernel, hpa, x86, dave.hansen, luto,
	len.brown, axboe, dledford, jeffrey.t.kirsher, linux-alpha,
	naveen.n.rao, mwb, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, tbogendoerfer, linux-mips, rafael, bhelgaas,
	linux-pci, Rafael J. Wysocki, lenb, linux-acpi

On 2019/10/29 16:53, Michal Hocko wrote:
> On Mon 28-10-19 17:20:33, Yunsheng Lin wrote:
>> On 2019/10/12 15:40, Greg KH wrote:
>>> On Sat, Oct 12, 2019 at 02:17:26PM +0800, Yunsheng Lin wrote:
>>>> add pci and acpi maintainer
>>>> cc linux-pci@vger.kernel.org and linux-acpi@vger.kernel.org
>>>>
>>>> On 2019/10/11 19:15, Peter Zijlstra wrote:
>>>>> On Fri, Oct 11, 2019 at 11:27:54AM +0800, Yunsheng Lin wrote:
>>>>>> But I failed to see why the above is related to making node_to_cpumask_map()
>>>>>> NUMA_NO_NODE aware?
>>>>>
>>>>> Your initial bug is for hns3, which is a PCI device, which really _MUST_
>>>>> have a node assigned.
>>>>>
>>>>> It not having one, is a straight up bug. We must not silently accept
>>>>> NO_NODE there, ever.
>>>>>
>>>>
>>>> I suppose you mean reporting a lack of affinity when the node of a pcie
>>>> device is not set by "not silently accept NO_NODE".
>>>
>>> If the firmware of a pci device does not provide the node information,
>>> then yes, warn about that.
>>>
>>>> As Greg has asked about in [1]:
>>>> what is a user to do when the user sees the kernel reporting that?
>>>>
>>>> We may tell user to contact their vendor for info or updates about
>>>> that when they do not know about their system well enough, but their
>>>> vendor may get away with this by quoting ACPI spec as the spec
>>>> considering this optional. Should the user believe this is indeed a
>>>> fw bug or a misreport from the kernel?
>>>
>>> Say it is a firmware bug, if it is a firmware bug, that's simple.
>>>
>>>> If this kind of reporting is common pratice and will not cause any
>>>> misunderstanding, then maybe we can report that.
>>>
>>> Yes, please do so, that's the only way those boxes are ever going to get
>>> fixed.  And go add the test to the "firmware testing" tool that is based
>>> on Linux that Intel has somewhere, to give vendors a chance to fix this
>>> before they ship hardware.
>>>
>>> This shouldn't be a big deal, we warn of other hardware bugs all the
>>> time.
>>
>> Hi, all.
>>
>> The warning for the above case has been added in [1].
>>
>> So maybe it makes sense to make node_to_cpumask_map() NUMA_NO_NODE aware
>> now?
>>
>> If Yes, this patch still can be applied to the latest linus' tree cleanly,
>> Do I need to resend it?
>>
> 
> By this patch you mean http://lkml.kernel.org/r/1568724534-146242-1-git-send-email-linyunsheng@huawei.com
> right?

Yes.

> 
> I would just resend it unless there is still a clear disagreement over
> it.

Ok, thanks.

Will resend it to see if there is still a disagreement over it.

> 
>> [1] https://lore.kernel.org/linux-pci/1571467543-26125-1-git-send-email-linyunsheng@huawei.com/
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190924124325.GQ2349@hirez.programming.kicks-ass.net>
     [not found] ` <20190924125936.GR2349@hirez.programming.kicks-ass.net>
     [not found]   ` <20190924131939.GS23050@dhcp22.suse.cz>
     [not found]     ` <1adcbe68-6753-3497-48a0-cc84ac503372@huawei.com>
     [not found]       ` <20190925104108.GE4553@hirez.programming.kicks-ass.net>
     [not found]         ` <47fa4cee-8528-7c23-c7de-7be1b65aa2ae@huawei.com>
     [not found]           ` <bec80499-86d9-bf1f-df23-9044a8099992@arm.com>
     [not found]             ` <a5f0fc80-8e88-b781-77ce-1213e5d62125@huawei.com>
     [not found]               ` <20191010073212.GB18412@dhcp22.suse.cz>
     [not found]                 ` <6cc94f9b-0d79-93a8-5ec2-4f6c21639268@huawei.com>
     [not found]                   ` <20191011111539.GX2311@hirez.programming.kicks-ass.net>
2019-10-12  6:17                     ` [PATCH v6] numa: make node_to_cpumask_map() NUMA_NO_NODE aware Yunsheng Lin
2019-10-12  7:40                       ` Greg KH
2019-10-12  9:47                         ` Yunsheng Lin
2019-10-12 10:40                           ` Greg KH
2019-10-12 10:47                             ` Greg KH
2019-10-14  8:00                               ` Yunsheng Lin
2019-10-14  9:25                                 ` Greg KH
2019-10-14  9:49                                   ` Peter Zijlstra
2019-10-14 10:04                                     ` Greg KH
2019-10-15 10:40                                   ` Yunsheng Lin
2019-10-15 16:58                                     ` Greg KH
2019-10-16 12:07                                       ` Yunsheng Lin
2019-10-28  9:20                         ` Yunsheng Lin
2019-10-29  8:53                           ` Michal Hocko
2019-10-30  1:58                             ` Yunsheng Lin

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git