linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64/numa: Add more vetting in numa_set_distance()
@ 2018-10-26 13:57 John Garry
  2018-10-29 11:25 ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: John Garry @ 2018-10-26 13:57 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: linux-arm-kernel, linux-kernel, linuxarm, John Garry

Currently it is acceptable to set the distance between 2 separate nodes to
LOCAL_DISTANCE.

Reject this as it is invalid.

This change avoids a crash reported in [1].

[1] https://www.spinics.net/lists/arm-kernel/msg683304.html

Signed-off-by: John Garry <john.garry@huawei.com>

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..6092e3d 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
 	}
 
 	if ((u8)distance != distance ||
-	    (from == to && distance != LOCAL_DISTANCE)) {
+	    (from == to && distance != LOCAL_DISTANCE) ||
+	    (from != to && distance == LOCAL_DISTANCE)) {
 		pr_warn_once("Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
 			     from, to, distance);
 		return;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-26 13:57 [PATCH] arm64/numa: Add more vetting in numa_set_distance() John Garry
@ 2018-10-29 11:25 ` Will Deacon
  2018-10-29 12:14   ` John Garry
  0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2018-10-29 11:25 UTC (permalink / raw)
  To: John Garry; +Cc: catalin.marinas, linux-arm-kernel, linux-kernel, linuxarm

Hi John,

On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> Currently it is acceptable to set the distance between 2 separate nodes to
> LOCAL_DISTANCE.
> 
> Reject this as it is invalid.
> 
> This change avoids a crash reported in [1].
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 146c04c..6092e3d 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
>  	}
>  
>  	if ((u8)distance != distance ||
> -	    (from == to && distance != LOCAL_DISTANCE)) {
> +	    (from == to && distance != LOCAL_DISTANCE) ||
> +	    (from != to && distance == LOCAL_DISTANCE)) {

The current code here is more-or-less lifted from the x86 implementation
of numa_set_distance(). I think we should either factor out the sanity check
into a core helper or make the core code robust to these funny configurations.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 11:25 ` Will Deacon
@ 2018-10-29 12:14   ` John Garry
  2018-10-29 12:16     ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: John Garry @ 2018-10-29 12:14 UTC (permalink / raw)
  To: Will Deacon; +Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

On 29/10/2018 11:25, Will Deacon wrote:
> Hi John,
>

Hi Will,

> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>> Currently it is acceptable to set the distance between 2 separate nodes to
>> LOCAL_DISTANCE.
>>
>> Reject this as it is invalid.
>>
>> This change avoids a crash reported in [1].
>>
>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 146c04c..6092e3d 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
>>  	}
>>
>>  	if ((u8)distance != distance ||
>> -	    (from == to && distance != LOCAL_DISTANCE)) {
>> +	    (from == to && distance != LOCAL_DISTANCE) ||
>> +	    (from != to && distance == LOCAL_DISTANCE)) {
>
> The current code here is more-or-less lifted from the x86 implementation
> of numa_set_distance().

Right, I did notice this. I didn't think that x86 folks would be so 
concerned since they generally only use ACPI, and the ACPI code already 
validates these distances in drivers/acpi/numa.c: slit_valid() [unlike 
OF code].

  I think we should either factor out the sanity check
> into a core helper or make the core code robust to these funny configurations.

OK, so to me it would make sense to factor out a sanity check into a 
core helper.

Cheers,
John

>
> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 12:14   ` John Garry
@ 2018-10-29 12:16     ` Will Deacon
  2018-10-29 12:32       ` John Garry
  0 siblings, 1 reply; 13+ messages in thread
From: Will Deacon @ 2018-10-29 12:16 UTC (permalink / raw)
  To: John Garry; +Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

On Mon, Oct 29, 2018 at 12:14:09PM +0000, John Garry wrote:
> On 29/10/2018 11:25, Will Deacon wrote:
> >On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> >>Currently it is acceptable to set the distance between 2 separate nodes to
> >>LOCAL_DISTANCE.
> >>
> >>Reject this as it is invalid.
> >>
> >>This change avoids a crash reported in [1].
> >>
> >>[1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> >>
> >>Signed-off-by: John Garry <john.garry@huawei.com>
> >>
> >>diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>index 146c04c..6092e3d 100644
> >>--- a/arch/arm64/mm/numa.c
> >>+++ b/arch/arm64/mm/numa.c
> >>@@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
> >> 	}
> >>
> >> 	if ((u8)distance != distance ||
> >>-	    (from == to && distance != LOCAL_DISTANCE)) {
> >>+	    (from == to && distance != LOCAL_DISTANCE) ||
> >>+	    (from != to && distance == LOCAL_DISTANCE)) {
> >
> >The current code here is more-or-less lifted from the x86 implementation
> >of numa_set_distance().
> 
> Right, I did notice this. I didn't think that x86 folks would be so
> concerned since they generally only use ACPI, and the ACPI code already
> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> code].
> 
>  I think we should either factor out the sanity check
> >into a core helper or make the core code robust to these funny configurations.
> 
> OK, so to me it would make sense to factor out a sanity check into a core
> helper.

That, or have the OF code perform the same validation that slit_valid() is
doing for ACPI. I'm just trying to avoid other architectures running into
this problem down the line.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 12:16     ` Will Deacon
@ 2018-10-29 12:32       ` John Garry
  2018-10-29 12:45         ` Anshuman Khandual
  0 siblings, 1 reply; 13+ messages in thread
From: John Garry @ 2018-10-29 12:32 UTC (permalink / raw)
  To: Will Deacon; +Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

On 29/10/2018 12:16, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 12:14:09PM +0000, John Garry wrote:
>> On 29/10/2018 11:25, Will Deacon wrote:
>>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>>> Currently it is acceptable to set the distance between 2 separate nodes to
>>>> LOCAL_DISTANCE.
>>>>
>>>> Reject this as it is invalid.
>>>>
>>>> This change avoids a crash reported in [1].
>>>>
>>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>>
>>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>>>
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index 146c04c..6092e3d 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
>>>> 	}
>>>>
>>>> 	if ((u8)distance != distance ||
>>>> -	    (from == to && distance != LOCAL_DISTANCE)) {
>>>> +	    (from == to && distance != LOCAL_DISTANCE) ||
>>>> +	    (from != to && distance == LOCAL_DISTANCE)) {
>>>
>>> The current code here is more-or-less lifted from the x86 implementation
>>> of numa_set_distance().
>>
>> Right, I did notice this. I didn't think that x86 folks would be so
>> concerned since they generally only use ACPI, and the ACPI code already
>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
>> code].
>>
>>  I think we should either factor out the sanity check
>>> into a core helper or make the core code robust to these funny configurations.
>>
>> OK, so to me it would make sense to factor out a sanity check into a core
>> helper.
>
> That, or have the OF code perform the same validation that slit_valid() is
> doing for ACPI. I'm just trying to avoid other architectures running into
> this problem down the line.
>

Right, OF code should do this validation job if ACPI is doing it 
(especially since the DT bindings actually specify the distance rules), 
and not rely on the arch NUMA code to accept/reject numa_set_distance() 
combinations.

And, in addition to this, I'd say OF should disable NUMA if given an 
invalid table (like ACPI does).

Cheers,
John

> Will
>
> .
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 12:32       ` John Garry
@ 2018-10-29 12:45         ` Anshuman Khandual
  2018-10-29 14:44           ` John Garry
  2018-10-29 14:48           ` Will Deacon
  0 siblings, 2 replies; 13+ messages in thread
From: Anshuman Khandual @ 2018-10-29 12:45 UTC (permalink / raw)
  To: John Garry, Will Deacon
  Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm



On 10/29/2018 06:02 PM, John Garry wrote:
> On 29/10/2018 12:16, Will Deacon wrote:
>> On Mon, Oct 29, 2018 at 12:14:09PM +0000, John Garry wrote:
>>> On 29/10/2018 11:25, Will Deacon wrote:
>>>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>>>> Currently it is acceptable to set the distance between 2 separate nodes to
>>>>> LOCAL_DISTANCE.
>>>>>
>>>>> Reject this as it is invalid.
>>>>>
>>>>> This change avoids a crash reported in [1].
>>>>>
>>>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>>>
>>>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>>>>
>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>> index 146c04c..6092e3d 100644
>>>>> --- a/arch/arm64/mm/numa.c
>>>>> +++ b/arch/arm64/mm/numa.c
>>>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
>>>>>     }
>>>>>
>>>>>     if ((u8)distance != distance ||
>>>>> -        (from == to && distance != LOCAL_DISTANCE)) {
>>>>> +        (from == to && distance != LOCAL_DISTANCE) ||
>>>>> +        (from != to && distance == LOCAL_DISTANCE)) {
>>>>
>>>> The current code here is more-or-less lifted from the x86 implementation
>>>> of numa_set_distance().
>>>
>>> Right, I did notice this. I didn't think that x86 folks would be so
>>> concerned since they generally only use ACPI, and the ACPI code already
>>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
>>> code].
>>>
>>>  I think we should either factor out the sanity check
>>>> into a core helper or make the core code robust to these funny configurations.
>>>
>>> OK, so to me it would make sense to factor out a sanity check into a core
>>> helper.
>>
>> That, or have the OF code perform the same validation that slit_valid() is
>> doing for ACPI. I'm just trying to avoid other architectures running into
>> this problem down the line.
>>
> 
> Right, OF code should do this validation job if ACPI is doing it (especially since the DT bindings actually specify the distance rules), and not rely on the arch NUMA code to accept/reject numa_set_distance() combinations.

I would say this particular condition checking still falls under arch NUMA init
code sanity check like other basic tests what numa_set_distance() currently does
already but it should not be a necessity for the OF driver to check these. It can
choose to check but arch NUMA should check basic things like two different NUMA
nodes should not have LOCAL_DISTANCE as distance like in this case.

	(from == to && distance != LOCAL_DISTANCE) ||
		(from != to && distance == LOCAL_DISTANCE))


> 
> And, in addition to this, I'd say OF should disable NUMA if given an invalid table (like ACPI does).

Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
starts booting. Platform should have sent right values, OF driver trying to
adjust stuff what platform has sent with FDT once the kernel starts booting is
not right. For example "Kernel NUMA wont like the distance factors lets clean
then up before passing on to MM". Disabling NUMA is one such major decision which
should be with arch NUMA code not with OF driver.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 12:45         ` Anshuman Khandual
@ 2018-10-29 14:44           ` John Garry
  2018-10-30  2:46             ` Anshuman Khandual
  2018-10-29 14:48           ` Will Deacon
  1 sibling, 1 reply; 13+ messages in thread
From: John Garry @ 2018-10-29 14:44 UTC (permalink / raw)
  To: Anshuman Khandual, Will Deacon
  Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

>>>>
>>>>  I think we should either factor out the sanity check
>>>>> into a core helper or make the core code robust to these funny configurations.
>>>>
>>>> OK, so to me it would make sense to factor out a sanity check into a core
>>>> helper.
>>>
>>> That, or have the OF code perform the same validation that slit_valid() is
>>> doing for ACPI. I'm just trying to avoid other architectures running into
>>> this problem down the line.
>>>
>>
>> Right, OF code should do this validation job if ACPI is doing it (especially since the DT bindings actually specify the distance rules), and not rely on the arch NUMA code to accept/reject numa_set_distance() combinations.
>
> I would say this particular condition checking still falls under arch NUMA init
> code sanity check like other basic tests what numa_set_distance() currently does
> already but it should not be a necessity for the OF driver to check these.

The checks in the arch NUMA code mean that invalid inter-node distance 
combinations are ignored.

However, if any entries in the table are invalid, then the whole table 
can be discarded as none of it can be believed, i.e. it's better to 
validate the table.

It can
> choose to check but arch NUMA should check basic things like two different NUMA
> nodes should not have LOCAL_DISTANCE as distance like in this case.
>
> 	(from == to && distance != LOCAL_DISTANCE) ||
> 		(from != to && distance == LOCAL_DISTANCE))
>
>
>>
>> And, in addition to this, I'd say OF should disable NUMA if given an invalid table (like ACPI does).
>
> Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
> starts booting. Platform should have sent right values, OF driver trying to
> adjust stuff what platform has sent with FDT once the kernel starts booting is
> not right. For example "Kernel NUMA wont like the distance factors lets clean
> then up before passing on to MM".

Sorry, but I don't know who was advocating this.

Disabling NUMA is one such major decision which
> should be with arch NUMA code not with OF driver.

I meant parsing the table would fail, so arch NUMA would fall back on 
dummy NUMA.

>

Thanks,
John



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 12:45         ` Anshuman Khandual
  2018-10-29 14:44           ` John Garry
@ 2018-10-29 14:48           ` Will Deacon
  2018-10-30  3:00             ` Anshuman Khandual
  1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2018-10-29 14:48 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: John Garry, catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
> On 10/29/2018 06:02 PM, John Garry wrote:
> > On 29/10/2018 12:16, Will Deacon wrote:
> >> On Mon, Oct 29, 2018 at 12:14:09PM +0000, John Garry wrote:
> >>> On 29/10/2018 11:25, Will Deacon wrote:
> >>>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
> >>>>> Currently it is acceptable to set the distance between 2 separate nodes to
> >>>>> LOCAL_DISTANCE.
> >>>>>
> >>>>> Reject this as it is invalid.
> >>>>>
> >>>>> This change avoids a crash reported in [1].
> >>>>>
> >>>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
> >>>>>
> >>>>> Signed-off-by: John Garry <john.garry@huawei.com>
> >>>>>
> >>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >>>>> index 146c04c..6092e3d 100644
> >>>>> --- a/arch/arm64/mm/numa.c
> >>>>> +++ b/arch/arm64/mm/numa.c
> >>>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
> >>>>>     }
> >>>>>
> >>>>>     if ((u8)distance != distance ||
> >>>>> -        (from == to && distance != LOCAL_DISTANCE)) {
> >>>>> +        (from == to && distance != LOCAL_DISTANCE) ||
> >>>>> +        (from != to && distance == LOCAL_DISTANCE)) {
> >>>>
> >>>> The current code here is more-or-less lifted from the x86 implementation
> >>>> of numa_set_distance().
> >>>
> >>> Right, I did notice this. I didn't think that x86 folks would be so
> >>> concerned since they generally only use ACPI, and the ACPI code already
> >>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
> >>> code].
> >>>
> >>>  I think we should either factor out the sanity check
> >>>> into a core helper or make the core code robust to these funny configurations.
> >>>
> >>> OK, so to me it would make sense to factor out a sanity check into a core
> >>> helper.
> >>
> >> That, or have the OF code perform the same validation that slit_valid() is
> >> doing for ACPI. I'm just trying to avoid other architectures running into
> >> this problem down the line.
> >>
> > 
> > Right, OF code should do this validation job if ACPI is doing it
> > (especially since the DT bindings actually specify the distance rules),
> > and not rely on the arch NUMA code to accept/reject numa_set_distance()
> > combinations.
> 
> I would say this particular condition checking still falls under arch NUMA init
> code sanity check like other basic tests what numa_set_distance() currently does
> already but it should not be a necessity for the OF driver to check these. It can
> choose to check but arch NUMA should check basic things like two different NUMA
> nodes should not have LOCAL_DISTANCE as distance like in this case.
> 
> 	(from == to && distance != LOCAL_DISTANCE) ||
> 		(from != to && distance == LOCAL_DISTANCE))
> 
> 
> > 
> > And, in addition to this, I'd say OF should disable NUMA if given an
> > invalid table (like ACPI does).
> 
> Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
> starts booting. Platform should have sent right values, OF driver trying to
> adjust stuff what platform has sent with FDT once the kernel starts booting is
> not right. For example "Kernel NUMA wont like the distance factors lets clean
> then up before passing on to MM". Disabling NUMA is one such major decision which
> should be with arch NUMA code not with OF driver.

I don't fully understand what you're getting at here, but why would the
check posted by John be arch-specific? It's already done in the core code
for ACPI, so there's a discrepancy between ACPI and FDT that should be
resolved. I'd also argue that the subtleties of this check are actually
based on what the core code is willing to accept in terms of the NUMA
description, so it's also the best place to enforce it.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 14:44           ` John Garry
@ 2018-10-30  2:46             ` Anshuman Khandual
  2018-11-01 11:27               ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: Anshuman Khandual @ 2018-10-30  2:46 UTC (permalink / raw)
  To: John Garry, Will Deacon
  Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm



On 10/29/2018 08:14 PM, John Garry wrote:
>>>>>
>>>>>  I think we should either factor out the sanity check
>>>>>> into a core helper or make the core code robust to these funny configurations.
>>>>>
>>>>> OK, so to me it would make sense to factor out a sanity check into a core
>>>>> helper.
>>>>
>>>> That, or have the OF code perform the same validation that slit_valid() is
>>>> doing for ACPI. I'm just trying to avoid other architectures running into
>>>> this problem down the line.
>>>>
>>>
>>> Right, OF code should do this validation job if ACPI is doing it (especially since the DT bindings actually specify the distance rules), and not rely on the arch NUMA code to accept/reject numa_set_distance() combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA init
>> code sanity check like other basic tests what numa_set_distance() currently does
>> already but it should not be a necessity for the OF driver to check these.
> 
> The checks in the arch NUMA code mean that invalid inter-node distance combinations are ignored.

Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
one of them as well ? numa_set_distance() updates the table or just throws some
warnings while skipping entries it deems invalid. It would be okay to have this
new check there in addition to others like this patch suggests.

> 
> However, if any entries in the table are invalid, then the whole table can be discarded as none of it can be believed, i.e. it's better to validate the table.
>

Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
NUMA code numa_set_distance() never had the opportunity to do the sanity
checks as ACPI slit_valid() has completely invalidated the table.

Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
checks on the distance values parse from the "distance-matrix" property
and all the checks directly falls on numa_set_distance(). This needs to
be fixed in line with ACPI

* If (to == from) ---> distance = LOCAL_DISTANCE
* If (to != from) ---> distance > LOCAL_DISTANCE

At the same time its okay to just enhance numa_set_distance() test coverage
to include this new test. If we would have trusted firmware parsing all the
way, existing basic checks about node range, distance stuff should not have
been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
part, we should include this new check there as well.

> It can
>> choose to check but arch NUMA should check basic things like two different NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>>     (from == to && distance != LOCAL_DISTANCE) ||
>>         (from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM".
> 
> Sorry, but I don't know who was advocating this.

I was just giving an example. Invalidating NUMA distance table during firmware
table (ACPI or FDT) parsing forces arm64_numa_init() to fall back on dummy NUMA
node which is like disabling NUMA. But that is the current semantics with ACPI
parsing which I overlooked. Fixing OF driver to do the same wont extend this
any further, hence my previous concern does not stand valid.

> 
> Disabling NUMA is one such major decision which
>> should be with arch NUMA code not with OF driver.
> 
> I meant parsing the table would fail, so arch NUMA would fall back on dummy NUMA.

Right and ACPI parsing does that and can force a fallback on a dummy NUMA node.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-29 14:48           ` Will Deacon
@ 2018-10-30  3:00             ` Anshuman Khandual
  0 siblings, 0 replies; 13+ messages in thread
From: Anshuman Khandual @ 2018-10-30  3:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: John Garry, catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm



On 10/29/2018 08:18 PM, Will Deacon wrote:
> On Mon, Oct 29, 2018 at 06:15:42PM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 06:02 PM, John Garry wrote:
>>> On 29/10/2018 12:16, Will Deacon wrote:
>>>> On Mon, Oct 29, 2018 at 12:14:09PM +0000, John Garry wrote:
>>>>> On 29/10/2018 11:25, Will Deacon wrote:
>>>>>> On Fri, Oct 26, 2018 at 09:57:47PM +0800, John Garry wrote:
>>>>>>> Currently it is acceptable to set the distance between 2 separate nodes to
>>>>>>> LOCAL_DISTANCE.
>>>>>>>
>>>>>>> Reject this as it is invalid.
>>>>>>>
>>>>>>> This change avoids a crash reported in [1].
>>>>>>>
>>>>>>> [1] https://www.spinics.net/lists/arm-kernel/msg683304.html
>>>>>>>
>>>>>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>>>>>>
>>>>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>>>>> index 146c04c..6092e3d 100644
>>>>>>> --- a/arch/arm64/mm/numa.c
>>>>>>> +++ b/arch/arm64/mm/numa.c
>>>>>>> @@ -335,7 +335,8 @@ void __init numa_set_distance(int from, int to, int distance)
>>>>>>>     }
>>>>>>>
>>>>>>>     if ((u8)distance != distance ||
>>>>>>> -        (from == to && distance != LOCAL_DISTANCE)) {
>>>>>>> +        (from == to && distance != LOCAL_DISTANCE) ||
>>>>>>> +        (from != to && distance == LOCAL_DISTANCE)) {
>>>>>>
>>>>>> The current code here is more-or-less lifted from the x86 implementation
>>>>>> of numa_set_distance().
>>>>>
>>>>> Right, I did notice this. I didn't think that x86 folks would be so
>>>>> concerned since they generally only use ACPI, and the ACPI code already
>>>>> validates these distances in drivers/acpi/numa.c: slit_valid() [unlike OF
>>>>> code].
>>>>>
>>>>>  I think we should either factor out the sanity check
>>>>>> into a core helper or make the core code robust to these funny configurations.
>>>>>
>>>>> OK, so to me it would make sense to factor out a sanity check into a core
>>>>> helper.
>>>>
>>>> That, or have the OF code perform the same validation that slit_valid() is
>>>> doing for ACPI. I'm just trying to avoid other architectures running into
>>>> this problem down the line.
>>>>
>>>
>>> Right, OF code should do this validation job if ACPI is doing it
>>> (especially since the DT bindings actually specify the distance rules),
>>> and not rely on the arch NUMA code to accept/reject numa_set_distance()
>>> combinations.
>>
>> I would say this particular condition checking still falls under arch NUMA init
>> code sanity check like other basic tests what numa_set_distance() currently does
>> already but it should not be a necessity for the OF driver to check these. It can
>> choose to check but arch NUMA should check basic things like two different NUMA
>> nodes should not have LOCAL_DISTANCE as distance like in this case.
>>
>> 	(from == to && distance != LOCAL_DISTANCE) ||
>> 		(from != to && distance == LOCAL_DISTANCE))
>>
>>
>>>
>>> And, in addition to this, I'd say OF should disable NUMA if given an
>>> invalid table (like ACPI does).
>>
>> Taking a decision to disable NUMA should be with kernel (arch NUMA) once kernel
>> starts booting. Platform should have sent right values, OF driver trying to
>> adjust stuff what platform has sent with FDT once the kernel starts booting is
>> not right. For example "Kernel NUMA wont like the distance factors lets clean
>> then up before passing on to MM". Disabling NUMA is one such major decision which
>> should be with arch NUMA code not with OF driver.
> 
> I don't fully understand what you're getting at here, but why would the
> check posted by John be arch-specific? It's already done in the core code
> for ACPI, so there's a discrepancy between ACPI and FDT that should be
> resolved. I'd also argue that the subtleties of this check are actually
> based on what the core code is willing to accept in terms of the NUMA
> description, so it's also the best place to enforce it.

Agreed. I had overlooked the existing semantics with respect to ACPI parsing.
Yes, there is a discrepancy with respect to FDT which should be fixed. But
IMHO its also worth to enhance numa_set_distance() checks with this proposed
new check as well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-10-30  2:46             ` Anshuman Khandual
@ 2018-11-01 11:27               ` Will Deacon
  2018-11-01 11:39                 ` John Garry
  2018-11-08 14:20                 ` Anshuman Khandual
  0 siblings, 2 replies; 13+ messages in thread
From: Will Deacon @ 2018-11-01 11:27 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: John Garry, catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

[ Nit: Please wrap your lines when replying -- I've done it for you here ]

On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
> On 10/29/2018 08:14 PM, John Garry wrote:
> >>>>>  I think we should either factor out the sanity check
> >>>>>> into a core helper or make the core code robust to these funny configurations.
> >>>>>
> >>>>> OK, so to me it would make sense to factor out a sanity check into a core
> >>>>> helper.
> >>>>
> >>>> That, or have the OF code perform the same validation that slit_valid() is
> >>>> doing for ACPI. I'm just trying to avoid other architectures running into
> >>>> this problem down the line.
> >>>>
> >>>
> >>> Right, OF code should do this validation job if ACPI is doing it
> >>> (especially since the DT bindings actually specify the distance
> >>> rules), and not rely on the arch NUMA code to accept/reject
> >>> numa_set_distance() combinations.
> >>
> >> I would say this particular condition checking still falls under arch NUMA init
> >> code sanity check like other basic tests what numa_set_distance() currently does
> >> already but it should not be a necessity for the OF driver to check these.
> > 
> > The checks in the arch NUMA code mean that invalid inter-node distance
> > combinations are ignored.
> 
> Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
> one of them as well ? numa_set_distance() updates the table or just throws some
> warnings while skipping entries it deems invalid. It would be okay to have this
> new check there in addition to others like this patch suggests.

If we're changing numa_set_distance(), I'd actually be inclined to do the
opposite of what you're suggesting and move as much of this boilerplate
checking into the core code. Perhaps we could add something like
check_fw_numa_topology() and have both ACPI and OF call that so that the
arch doesn't need to worry about malformed tables at all.

> > However, if any entries in the table are invalid, then the whole table
> > can be discarded as none of it can be believed, i.e. it's better to
> > validate the table.
> >
> 
> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
> NUMA code numa_set_distance() never had the opportunity to do the sanity
> checks as ACPI slit_valid() has completely invalidated the table.
> 
> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
> checks on the distance values parse from the "distance-matrix" property
> and all the checks directly falls on numa_set_distance(). This needs to
> be fixed in line with ACPI
> 
> * If (to == from) ---> distance = LOCAL_DISTANCE
> * If (to != from) ---> distance > LOCAL_DISTANCE
> 
> At the same time its okay to just enhance numa_set_distance() test coverage
> to include this new test. If we would have trusted firmware parsing all the
> way, existing basic checks about node range, distance stuff should not have
> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
> part, we should include this new check there as well.

I don't see what we gain by duplicating the check. In fact, it has a few
downsides:

  (1) It confuses the semantics of the API, because it is no longer clear
      who "owns" the check

  (2) It duplicates code in each architecture

  (3) Some clever-cloggs will remove at least some of the duplication in
      future

I'm not willing to accept the check in the arm64 code if we update the
OF code.

I think the way forward here is for John to fix the crash he reported by
adding the check to the OF code. If somebody wants to follow up with
subsequent patches to move more of the checking out of the arch code, then
we can review that as a separate series.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-11-01 11:27               ` Will Deacon
@ 2018-11-01 11:39                 ` John Garry
  2018-11-08 14:20                 ` Anshuman Khandual
  1 sibling, 0 replies; 13+ messages in thread
From: John Garry @ 2018-11-01 11:39 UTC (permalink / raw)
  To: Will Deacon, Anshuman Khandual
  Cc: catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm

>>
>> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
>> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
>> NUMA code numa_set_distance() never had the opportunity to do the sanity
>> checks as ACPI slit_valid() has completely invalidated the table.
>>
>> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
>> checks on the distance values parse from the "distance-matrix" property
>> and all the checks directly falls on numa_set_distance(). This needs to
>> be fixed in line with ACPI
>>
>> * If (to == from) ---> distance = LOCAL_DISTANCE
>> * If (to != from) ---> distance > LOCAL_DISTANCE
>>
>> At the same time its okay to just enhance numa_set_distance() test coverage
>> to include this new test. If we would have trusted firmware parsing all the
>> way, existing basic checks about node range, distance stuff should not have
>> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
>> part, we should include this new check there as well.
>
> I don't see what we gain by duplicating the check. In fact, it has a few
> downsides:
>
>   (1) It confuses the semantics of the API, because it is no longer clear
>       who "owns" the check
>
>   (2) It duplicates code in each architecture
>
>   (3) Some clever-cloggs will remove at least some of the duplication in
>       future
>
> I'm not willing to accept the check in the arm64 code if we update the
> OF code.
>
> I think the way forward here is for John to fix the crash he reported by
> adding the check to the OF code.

I was planning on doing that.

If somebody wants to follow up with
> subsequent patches to move more of the checking out of the arch code, then
> we can review that as a separate series.

Cheers,
John

>
> Will
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] arm64/numa: Add more vetting in numa_set_distance()
  2018-11-01 11:27               ` Will Deacon
  2018-11-01 11:39                 ` John Garry
@ 2018-11-08 14:20                 ` Anshuman Khandual
  1 sibling, 0 replies; 13+ messages in thread
From: Anshuman Khandual @ 2018-11-08 14:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: John Garry, catalin.marinas, linux-kernel, linux-arm-kernel, linuxarm



On 11/01/2018 04:57 PM, Will Deacon wrote:
> [ Nit: Please wrap your lines when replying -- I've done it for you here ]
> 
> On Tue, Oct 30, 2018 at 08:16:21AM +0530, Anshuman Khandual wrote:
>> On 10/29/2018 08:14 PM, John Garry wrote:
>>>>>>>  I think we should either factor out the sanity check
>>>>>>>> into a core helper or make the core code robust to these funny configurations.
>>>>>>>
>>>>>>> OK, so to me it would make sense to factor out a sanity check into a core
>>>>>>> helper.
>>>>>>
>>>>>> That, or have the OF code perform the same validation that slit_valid() is
>>>>>> doing for ACPI. I'm just trying to avoid other architectures running into
>>>>>> this problem down the line.
>>>>>>
>>>>>
>>>>> Right, OF code should do this validation job if ACPI is doing it
>>>>> (especially since the DT bindings actually specify the distance
>>>>> rules), and not rely on the arch NUMA code to accept/reject
>>>>> numa_set_distance() combinations.
>>>>
>>>> I would say this particular condition checking still falls under arch NUMA init
>>>> code sanity check like other basic tests what numa_set_distance() currently does
>>>> already but it should not be a necessity for the OF driver to check these.
>>>
>>> The checks in the arch NUMA code mean that invalid inter-node distance
>>> combinations are ignored.
>>
>> Right and should not this new test (from != to && distance == LOCAL_DISTANCE) be
>> one of them as well ? numa_set_distance() updates the table or just throws some
>> warnings while skipping entries it deems invalid. It would be okay to have this
>> new check there in addition to others like this patch suggests.

I missed this response due to some problem in my mail client and re-iterated
some of the same points again on the V2 (https://lkml.org/lkml/2018/11/8/734).
My apologies for the same.

> 
> If we're changing numa_set_distance(), I'd actually be inclined to do the
> opposite of what you're suggesting and move as much of this boilerplate
> checking into the core code. Perhaps we could add something like
> check_fw_numa_topology() and have both ACPI and OF call that so that the
> arch doesn't need to worry about malformed tables at all.

Right although I doubt that we could ever have a common check_fw_numa_topology()
check as the semantics for the table and individual entries there in for DT and
ACPI might be different. But I agree what you are saying.

> 
>>> However, if any entries in the table are invalid, then the whole table
>>> can be discarded as none of it can be believed, i.e. it's better to
>>> validate the table.
>>>
>>
>> Agreed. slit_valid() on the ACPI parsing is currently enforcing that before
>> acpi_numa_slit_init() which would call into numa_set_distance(). Hence arch
>> NUMA code numa_set_distance() never had the opportunity to do the sanity
>> checks as ACPI slit_valid() has completely invalidated the table.
>>
>> Unlike ACPI path, of_numa_parse_distance_map_v1() does not do any sanity
>> checks on the distance values parse from the "distance-matrix" property
>> and all the checks directly falls on numa_set_distance(). This needs to
>> be fixed in line with ACPI
>>
>> * If (to == from) ---> distance = LOCAL_DISTANCE
>> * If (to != from) ---> distance > LOCAL_DISTANCE
>>
>> At the same time its okay to just enhance numa_set_distance() test coverage
>> to include this new test. If we would have trusted firmware parsing all the
>> way, existing basic checks about node range, distance stuff should not have
>> been there in numa_set_distance(). Hence IMHO even if we fix the OF driver
>> part, we should include this new check there as well.
> 
> I don't see what we gain by duplicating the check. In fact, it has a few
> downsides:
> 
>   (1) It confuses the semantics of the API, because it is no longer clear
>       who "owns" the check
> 
>   (2) It duplicates code in each architecture
> 
>   (3) Some clever-cloggs will remove at least some of the duplication in
>       future

Agreed, it has down sides.

> 
> I'm not willing to accept the check in the arm64 code if we update the
> OF code.

Okay, we should remove them instead.

> 
> I think the way forward here is for John to fix the crash he reported by
> adding the check to the OF code. If somebody wants to follow up with
> subsequent patches to move more of the checking out of the arch code, then
> we can review that as a separate series.

Okay. Anyways I had some other comments related to semantics checking at
the OF driver level but I can probably send them out as a separate patch
later. Once again it was my bad to miss this response in the first place.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-11-08 14:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-26 13:57 [PATCH] arm64/numa: Add more vetting in numa_set_distance() John Garry
2018-10-29 11:25 ` Will Deacon
2018-10-29 12:14   ` John Garry
2018-10-29 12:16     ` Will Deacon
2018-10-29 12:32       ` John Garry
2018-10-29 12:45         ` Anshuman Khandual
2018-10-29 14:44           ` John Garry
2018-10-30  2:46             ` Anshuman Khandual
2018-11-01 11:27               ` Will Deacon
2018-11-01 11:39                 ` John Garry
2018-11-08 14:20                 ` Anshuman Khandual
2018-10-29 14:48           ` Will Deacon
2018-10-30  3:00             ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).