linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
@ 2010-01-01 19:50 David John
  2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David John @ 2010-01-01 19:50 UTC (permalink / raw)
  To: Jesse Barnes, Rusty Russell; +Cc: andreas.herrmann3, linux-kernel

Hi All,

Commit e0cd516 causes an null pointer dereference when reading from the
sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
proximity info, since dev->numa_node gets set to -1 for all PCI devices,
which then gets passed to cpumask_of_node.

The patch following this mail fixes the problem for x86. Perhaps a more
thorough solution would be to fix the PCI layer to set the node
information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
we have node 0)? I don't know whether it is safe / correct to do this.

Regards,
David.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Check the node argument passed to cpumask_of_node
  2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
@ 2010-01-01 19:55 ` David John
  2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
  2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
  2 siblings, 0 replies; 11+ messages in thread
From: David John @ 2010-01-01 19:55 UTC (permalink / raw)
  To: jbarnes, rusty; +Cc: andreas.herrmann3, linux-kernel

Ensure that the node value is valid.

Signed-off-by: David John <davidjon@xenontk.org>

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c5087d7..1141a6e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
 static inline const struct cpumask *cpumask_of_node(int node)
 {
-	return node_to_cpumask_map[node];
+	return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
+						   node_to_cpumask_map[node];
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
  2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
  2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
@ 2010-01-01 21:55 ` Rafael J. Wysocki
  2010-01-02  6:15   ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
  2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
  2 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-01-01 21:55 UTC (permalink / raw)
  To: davidjon; +Cc: Jesse Barnes, Rusty Russell, andreas.herrmann3, linux-kernel

On Friday 01 January 2010, David John wrote:
> Hi All,

Hi,

> Commit e0cd516 causes an null pointer dereference when reading from the
> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
> which then gets passed to cpumask_of_node.
> 
> The patch following this mail fixes the problem for x86. Perhaps a more
> thorough solution would be to fix the PCI layer to set the node
> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
> we have node 0)? I don't know whether it is safe / correct to do this.

First, it would be nicer if you said what the commit subject was in addition
to providing its hash.

Second, why don't you put the information above into the changelog of your
patch?

Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
  2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
  2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
  2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
@ 2010-01-01 22:22 ` Yinghai Lu
  2010-01-02  6:18   ` David John
  2 siblings, 1 reply; 11+ messages in thread
From: Yinghai Lu @ 2010-01-01 22:22 UTC (permalink / raw)
  To: davidjon; +Cc: Jesse Barnes, Rusty Russell, andreas.herrmann3, linux-kernel

On Fri, Jan 1, 2010 at 11:50 AM, David John <davidjon@xenontk.org> wrote:
> Hi All,
>
> Commit e0cd516 causes an null pointer dereference when reading from the
> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
> which then gets passed to cpumask_of_node.
>
> The patch following this mail fixes the problem for x86. Perhaps a more
> thorough solution would be to fix the PCI layer to set the node
> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
> we have node 0)? I don't know whether it is safe / correct to do this.

no.

1. -1, mean calling code will use node that code is running on.
2. the system that have two or more nodes, and more peer root buses.
if the first node doesn't have RAM installed, no node0 then.

YH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] Check the node argument passed to cpumask_of_node
  2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
@ 2010-01-02  6:15   ` David John
  2010-01-04  4:28     ` Rusty Russell
  2010-01-04  5:15     ` [PATCH v2] " Jin Dongming
  0 siblings, 2 replies; 11+ messages in thread
From: David John @ 2010-01-02  6:15 UTC (permalink / raw)
  To: jbarnes, rusty; +Cc: andreas.herrmann3, rjw, linux-kernel

Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
causes an null pointer dereference when reading from the sysfs attributes local_cpu*
on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
for all PCI devices, which then gets passed to cpumask_of_node.

Ensure that the node value is valid.

Signed-off-by: David John <davidjon@xenontk.org>

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c5087d7..1141a6e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
 /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
 static inline const struct cpumask *cpumask_of_node(int node)
 {
-	return node_to_cpumask_map[node];
+	return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
+						   node_to_cpumask_map[node];
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
  2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
@ 2010-01-02  6:18   ` David John
  0 siblings, 0 replies; 11+ messages in thread
From: David John @ 2010-01-02  6:18 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Jesse Barnes, Rusty Russell, andreas.herrmann3, linux-kernel

On 01/02/2010 03:52 AM, Yinghai Lu wrote:
> On Fri, Jan 1, 2010 at 11:50 AM, David John <davidjon@xenontk.org> wrote:
>> Hi All,
>>
>> Commit e0cd516 causes an null pointer dereference when reading from the
>> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
>> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
>> which then gets passed to cpumask_of_node.
>>
>> The patch following this mail fixes the problem for x86. Perhaps a more
>> thorough solution would be to fix the PCI layer to set the node
>> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
>> we have node 0)? I don't know whether it is safe / correct to do this.
> 
> no.
> 
> 1. -1, mean calling code will use node that code is running on.
> 2. the system that have two or more nodes, and more peer root buses.
> if the first node doesn't have RAM installed, no node0 then.
> 
> YH
> 

Oh I see. Thanks.

Regards,
David.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Check the node argument passed to cpumask_of_node
  2010-01-02  6:15   ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
@ 2010-01-04  4:28     ` Rusty Russell
  2010-01-04 11:42       ` David John
  2010-01-04  5:15     ` [PATCH v2] " Jin Dongming
  1 sibling, 1 reply; 11+ messages in thread
From: Rusty Russell @ 2010-01-04  4:28 UTC (permalink / raw)
  To: David John; +Cc: jbarnes, andreas.herrmann3, rjw, linux-kernel

On Sat, 2 Jan 2010 04:45:47 pm David John wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
> for all PCI devices, which then gets passed to cpumask_of_node.
> 
> Ensure that the node value is valid.

This only works for x86, and only for !CONFIG_DEBUG_PER_CPU_MAPS.

I suggest fixing the callers introduced in e0cd516 for the moment,
then if you feel enthused, change the semantics of cpumask_of_node and
remove the checks from the various callers.

(And please only check for -1: it has a special meaning, unlike other
invalid numbers which indicate a bug).

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Check the node argument passed to cpumask_of_node
  2010-01-02  6:15   ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
  2010-01-04  4:28     ` Rusty Russell
@ 2010-01-04  5:15     ` Jin Dongming
  1 sibling, 0 replies; 11+ messages in thread
From: Jin Dongming @ 2010-01-04  5:15 UTC (permalink / raw)
  To: David John; +Cc: jbarnes, rusty, andreas.herrmann3, rjw, linux-kernel

Hi, David

This problem also happened when the CONFIG_DEBUG_PER_CPU_MAPS was used,
so how about modifying the code for it working well?

Best Regards,
Jin Dongming

David John wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
> for all PCI devices, which then gets passed to cpumask_of_node.
> 
> Ensure that the node value is valid.
> 
> Signed-off-by: David John <davidjon@xenontk.org>
> 
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index c5087d7..1141a6e 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
>  /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
>  static inline const struct cpumask *cpumask_of_node(int node)
>  {
> -	return node_to_cpumask_map[node];
> +	return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
> +						   node_to_cpumask_map[node];
>  }
>  #endif
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] Check the node argument passed to cpumask_of_node
  2010-01-04  4:28     ` Rusty Russell
@ 2010-01-04 11:42       ` David John
  2010-01-04 14:58         ` [PATCH v3] " David John
  0 siblings, 1 reply; 11+ messages in thread
From: David John @ 2010-01-04 11:42 UTC (permalink / raw)
  To: Rusty Russell; +Cc: jbarnes, andreas.herrmann3, rjw, linux-kernel

On 01/04/2010 09:58 AM, Rusty Russell wrote:
> On Sat, 2 Jan 2010 04:45:47 pm David John wrote:
>> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
>> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
>> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
>> for all PCI devices, which then gets passed to cpumask_of_node.
>>
>> Ensure that the node value is valid.
> 
> This only works for x86, and only for !CONFIG_DEBUG_PER_CPU_MAPS.
> 
> I suggest fixing the callers introduced in e0cd516 for the moment,
> then if you feel enthused, change the semantics of cpumask_of_node and
> remove the checks from the various callers.

Okay, I'll do as you suggested.

Regards,
David.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3] Check the node argument passed to cpumask_of_node
  2010-01-04 11:42       ` David John
@ 2010-01-04 14:58         ` David John
  2010-01-04 17:18           ` Jesse Barnes
  0 siblings, 1 reply; 11+ messages in thread
From: David John @ 2010-01-04 14:58 UTC (permalink / raw)
  To: jbarnes, rusty; +Cc: andreas.herrmann3, rjw, jin.dongming, linux-kernel

Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
causes an null pointer dereference when reading from the sysfs attributes local_cpu*
on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
for all PCI devices, which then gets passed to cpumask_of_node.

Add a check to prevent this.

Signed-off-by: David John <davidjon@xenontk.org>

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c5df94e..807224e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -75,7 +75,8 @@ static ssize_t local_cpus_show(struct device *dev,
 	int len;
 
 #ifdef CONFIG_NUMA
-	mask = cpumask_of_node(dev_to_node(dev));
+	mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
+					  cpumask_of_node(dev_to_node(dev));
 #else
 	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
 #endif
@@ -93,7 +94,8 @@ static ssize_t local_cpulist_show(struct device *dev,
 	int len;
 
 #ifdef CONFIG_NUMA
-	mask = cpumask_of_node(dev_to_node(dev));
+	mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
+					  cpumask_of_node(dev_to_node(dev));
 #else
 	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
 #endif

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] Check the node argument passed to cpumask_of_node
  2010-01-04 14:58         ` [PATCH v3] " David John
@ 2010-01-04 17:18           ` Jesse Barnes
  0 siblings, 0 replies; 11+ messages in thread
From: Jesse Barnes @ 2010-01-04 17:18 UTC (permalink / raw)
  To: David John; +Cc: rusty, andreas.herrmann3, rjw, jin.dongming, linux-kernel

On Mon,  4 Jan 2010 20:28:57 +0530
David John <davidjon@xenontk.org> wrote:

> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus'
> NUMA information" causes an null pointer dereference when reading
> from the sysfs attributes local_cpu* on Intel machines with no ACPI
> NUMA proximity info, since dev->numa_node gets set to -1 for all PCI
> devices, which then gets passed to cpumask_of_node.
> 
> Add a check to prevent this.
> 
> Signed-off-by: David John <davidjon@xenontk.org>
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index c5df94e..807224e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -75,7 +75,8 @@ static ssize_t local_cpus_show(struct device *dev,
>  	int len;
>  
>  #ifdef CONFIG_NUMA
> -	mask = cpumask_of_node(dev_to_node(dev));
> +	mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
> +
> cpumask_of_node(dev_to_node(dev)); #else
>  	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
>  #endif
> @@ -93,7 +94,8 @@ static ssize_t local_cpulist_show(struct device
> *dev, int len;
>  
>  #ifdef CONFIG_NUMA
> -	mask = cpumask_of_node(dev_to_node(dev));
> +	mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
> +
> cpumask_of_node(dev_to_node(dev)); #else
>  	mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
>  #endif
> 

Applied to my for-linus branch, thanks David.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-04 17:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
2010-01-02  6:15   ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
2010-01-04  4:28     ` Rusty Russell
2010-01-04 11:42       ` David John
2010-01-04 14:58         ` [PATCH v3] " David John
2010-01-04 17:18           ` Jesse Barnes
2010-01-04  5:15     ` [PATCH v2] " Jin Dongming
2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
2010-01-02  6:18   ` David John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).