linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ACPI and NUMA guys, please help to check if this patch is OK
@ 2012-05-23  2:46 ethan zhao
  2012-05-23  3:30 ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: ethan zhao @ 2012-05-23  2:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kurt Garloff, Len Brown

[PATCH] drivers/acpi/numa.c: Add localities checking code against
proximity domains to slit_valid()

 Some buggy BIOS/ACPI will set different number to SLIT localities and
SRAT proximity domains,
 That will make NUMA configuration invalid and kernel will output
information like following

NUMA:Warning:invalid distance parameter, from=-1 to=-1 distance=83

This patch adds some checking code to slit_valid() function in order
to check theSLIT localities
count against SRAT proximity domains number and give clear information
about ACPI bug.

Signed-off-by: ethan.zhao <ethan.kernel@gmail.com>
---
 drivers/acpi/numa.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index e56f3be..55c8a8e 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -161,6 +161,13 @@ static __init int slit_valid(struct acpi_table_slit *slit)
 {
        int i, j;
        int d = slit->locality_count;
+       int pxd = nodes_weight(nodes_found_map);
+       if (pxd != d) {
+               printk(KERN_INFO "ACPI: BIOS bug! SLIT localities
count %d doesn't equal SRAT proximity domains number %d\n",
+                       d , pxd);
+               return 0;
+       }
+
        for (i = 0; i < d; i++) {
                for (j = 0; j < d; j++)  {
                        u8 val = slit->entry[d*i + j];
--
1.7.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: ACPI and NUMA guys, please help to check if this patch is OK
  2012-05-23  2:46 ACPI and NUMA guys, please help to check if this patch is OK ethan zhao
@ 2012-05-23  3:30 ` David Rientjes
  2012-05-24  3:08   ` ethan zhao
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2012-05-23  3:30 UTC (permalink / raw)
  To: ethan zhao; +Cc: linux-kernel, Kurt Garloff, Len Brown

On Wed, 23 May 2012, ethan zhao wrote:

> [PATCH] drivers/acpi/numa.c: Add localities checking code against
> proximity domains to slit_valid()
> 
>  Some buggy BIOS/ACPI will set different number to SLIT localities and
> SRAT proximity domains,
>  That will make NUMA configuration invalid and kernel will output
> information like following
> 
> NUMA:Warning:invalid distance parameter, from=-1 to=-1 distance=83
> 
> This patch adds some checking code to slit_valid() function in order
> to check theSLIT localities
> count against SRAT proximity domains number and give clear information
> about ACPI bug.
> 
> Signed-off-by: ethan.zhao <ethan.kernel@gmail.com>

There's nothing in the ACPI spec that prohibits this and the result is 
non-fatal (it only emits a warning of the mismatch), so this patch is 
unnecessary.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ACPI and NUMA guys, please help to check if this patch is OK
  2012-05-23  3:30 ` David Rientjes
@ 2012-05-24  3:08   ` ethan zhao
  2012-05-24  4:45     ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: ethan zhao @ 2012-05-24  3:08 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-kernel, Kurt Garloff, Len Brown

David,
    What we can do to help improving Linux OS is only about fatal
error ? Can I do something to give more clear warning information and
easy to find the root cause ?

Thanks,
Ethan

On Wed, May 23, 2012 at 11:30 AM, David Rientjes <rientjes@google.com> wrote:
> On Wed, 23 May 2012, ethan zhao wrote:
>
>> [PATCH] drivers/acpi/numa.c: Add localities checking code against
>> proximity domains to slit_valid()
>>
>>  Some buggy BIOS/ACPI will set different number to SLIT localities and
>> SRAT proximity domains,
>>  That will make NUMA configuration invalid and kernel will output
>> information like following
>>
>> NUMA:Warning:invalid distance parameter, from=-1 to=-1 distance=83
>>
>> This patch adds some checking code to slit_valid() function in order
>> to check theSLIT localities
>> count against SRAT proximity domains number and give clear information
>> about ACPI bug.
>>
>> Signed-off-by: ethan.zhao <ethan.kernel@gmail.com>
>
> There's nothing in the ACPI spec that prohibits this and the result is
> non-fatal (it only emits a warning of the mismatch), so this patch is
> unnecessary.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ACPI and NUMA guys, please help to check if this patch is OK
  2012-05-24  3:08   ` ethan zhao
@ 2012-05-24  4:45     ` David Rientjes
       [not found]       ` <CABawtvMtO5NR-j3zX2xxXfgZ_XxrCdF+k+n4jG3KyEVe=xYjjA@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2012-05-24  4:45 UTC (permalink / raw)
  To: ethan zhao; +Cc: linux-kernel, Kurt Garloff, Len Brown

On Thu, 24 May 2012, ethan zhao wrote:

> David,
>     What we can do to help improving Linux OS is only about fatal
> error ? Can I do something to give more clear warning information and
> easy to find the root cause ?
> 

The warning you already quoted in your changelog is sufficient warning 
that the SLIT is bad and it suppresses setting that distance.  Your 
change, however, completely invalidates the SRAT if the number of 
localities does not match the number of pxms.  We'd much rather simply 
suppress the bad distance rather than invalidate the SRAT, especially 
considering there is nothing in the ACPI spec that requires it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ACPI and NUMA guys, please help to check if this patch is OK
       [not found]       ` <CABawtvMtO5NR-j3zX2xxXfgZ_XxrCdF+k+n4jG3KyEVe=xYjjA@mail.gmail.com>
@ 2012-11-13  8:12         ` Ethan Zhao
  2012-11-13  8:27           ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: Ethan Zhao @ 2012-11-13  8:12 UTC (permalink / raw)
  To: David Rientjes; +Cc: LKML, len.brown

David,
   I come back to suggest the above again because I hit the same issue
on another type server and that took me sometime to find out what's
wrong for no clear information when validating the SLIT. That patch
will not invalidate the SRAT if SLIT is bad.  The patch will only
suppress the optional SLIT table if the table has more or less
PXM(locality) than SRAT.

[PATCH] drivers/acpi/numa.c: Add localities checking code against
proximity domains to slit_valid()

 Some buggy BIOS/ACPI will set different number to SLIT localities and
SRAT proximity domains,
 That will make NUMA configuration invalid and kernel will output
information like following

NUMA:Warning:invalid distance parameter, from=-1 to=-1 distance=83

This patch adds some checking code to slit_valid() function in order
to check theSLIT localities
count against SRAT proximity domains number and give clear information
about ACPI bug.

Signed-off-by: ethan.zhao <ethan.kernel@gmail.com>
---
 drivers/acpi/numa.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index e56f3be..55c8a8e 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -161,6 +161,13 @@ static __init int slit_valid(struct acpi_table_slit *slit)
 {
        int i, j;
        int d = slit->locality_count;
+       int pxd = nodes_weight(nodes_found_map);
+       if (pxd != d) {
+               printk(KERN_INFO "ACPI: BIOS bug! SLIT localities
count %d doesn't equal SRAT proximity domains number %d\n",
+                       d , pxd);
+               return 0;
+       }
+
        for (i = 0; i < d; i++) {
                for (j = 0; j < d; j++)  {
                        u8 val = slit->entry[d*i + j];
--
1.7.1


Thanks,
Ethan

On Thu, May 24, 2012 at 12:59 PM, ethan zhao <ethan.kernel@gmail.com> wrote:
> That is OK,
> Thanks
>
> On Thu, May 24, 2012 at 12:45 PM, David Rientjes <rientjes@google.com> wrote:
>> On Thu, 24 May 2012, ethan zhao wrote:
>>
>>> David,
>>>     What we can do to help improving Linux OS is only about fatal
>>> error ? Can I do something to give more clear warning information and
>>> easy to find the root cause ?
>>>
>>
>> The warning you already quoted in your changelog is sufficient warning
>> that the SLIT is bad and it suppresses setting that distance.  Your
>> change, however, completely invalidates the SRAT if the number of
>> localities does not match the number of pxms.  We'd much rather simply
>> suppress the bad distance rather than invalidate the SRAT, especially
>> considering there is nothing in the ACPI spec that requires it.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: ACPI and NUMA guys, please help to check if this patch is OK
  2012-11-13  8:12         ` Ethan Zhao
@ 2012-11-13  8:27           ` David Rientjes
  0 siblings, 0 replies; 6+ messages in thread
From: David Rientjes @ 2012-11-13  8:27 UTC (permalink / raw)
  To: Ethan Zhao; +Cc: LKML, len.brown

On Tue, 13 Nov 2012, Ethan Zhao wrote:

> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index e56f3be..55c8a8e 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -161,6 +161,13 @@ static __init int slit_valid(struct acpi_table_slit *slit)
>  {
>         int i, j;
>         int d = slit->locality_count;
> +       int pxd = nodes_weight(nodes_found_map);
> +       if (pxd != d) {
> +               printk(KERN_INFO "ACPI: BIOS bug! SLIT localities
> count %d doesn't equal SRAT proximity domains number %d\n",
> +                       d , pxd);
> +               return 0;
> +       }
> +
>         for (i = 0; i < d; i++) {
>                 for (j = 0; j < d; j++)  {
>                         u8 val = slit->entry[d*i + j];

This is incorrect: the SLIT locality count comes straight from the BIOS 
whereas the maximum number of nodes the kernel supports is defined by 
CONFIG_NODES_SHIFT.  If this patch were to be merged, the SLIT is 
invalidated when CONFIG_NODES_SHIFT is too low whereas today we respect 
the proximity of nodes that can be onlined which is a regression.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-13  8:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-23  2:46 ACPI and NUMA guys, please help to check if this patch is OK ethan zhao
2012-05-23  3:30 ` David Rientjes
2012-05-24  3:08   ` ethan zhao
2012-05-24  4:45     ` David Rientjes
     [not found]       ` <CABawtvMtO5NR-j3zX2xxXfgZ_XxrCdF+k+n4jG3KyEVe=xYjjA@mail.gmail.com>
2012-11-13  8:12         ` Ethan Zhao
2012-11-13  8:27           ` David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).