* [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
@ 2010-01-01 19:50 David John
2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: David John @ 2010-01-01 19:50 UTC (permalink / raw)
To: Jesse Barnes, Rusty Russell; +Cc: andreas.herrmann3, linux-kernel
Hi All,
Commit e0cd516 causes an null pointer dereference when reading from the
sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
proximity info, since dev->numa_node gets set to -1 for all PCI devices,
which then gets passed to cpumask_of_node.
The patch following this mail fixes the problem for x86. Perhaps a more
thorough solution would be to fix the PCI layer to set the node
information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
we have node 0)? I don't know whether it is safe / correct to do this.
Regards,
David.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] Check the node argument passed to cpumask_of_node
2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
@ 2010-01-01 19:55 ` David John
2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
2 siblings, 0 replies; 11+ messages in thread
From: David John @ 2010-01-01 19:55 UTC (permalink / raw)
To: jbarnes, rusty; +Cc: andreas.herrmann3, linux-kernel
Ensure that the node value is valid.
Signed-off-by: David John <davidjon@xenontk.org>
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c5087d7..1141a6e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
static inline const struct cpumask *cpumask_of_node(int node)
{
- return node_to_cpumask_map[node];
+ return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
+ node_to_cpumask_map[node];
}
#endif
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
@ 2010-01-01 21:55 ` Rafael J. Wysocki
2010-01-02 6:15 ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
2 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2010-01-01 21:55 UTC (permalink / raw)
To: davidjon; +Cc: Jesse Barnes, Rusty Russell, andreas.herrmann3, linux-kernel
On Friday 01 January 2010, David John wrote:
> Hi All,
Hi,
> Commit e0cd516 causes an null pointer dereference when reading from the
> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
> which then gets passed to cpumask_of_node.
>
> The patch following this mail fixes the problem for x86. Perhaps a more
> thorough solution would be to fix the PCI layer to set the node
> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
> we have node 0)? I don't know whether it is safe / correct to do this.
First, it would be nicer if you said what the commit subject was in addition
to providing its hash.
Second, why don't you put the information above into the changelog of your
patch?
Rafael
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
@ 2010-01-01 22:22 ` Yinghai Lu
2010-01-02 6:18 ` David John
2 siblings, 1 reply; 11+ messages in thread
From: Yinghai Lu @ 2010-01-01 22:22 UTC (permalink / raw)
To: davidjon; +Cc: Jesse Barnes, Rusty Russell, andreas.herrmann3, linux-kernel
On Fri, Jan 1, 2010 at 11:50 AM, David John <davidjon@xenontk.org> wrote:
> Hi All,
>
> Commit e0cd516 causes an null pointer dereference when reading from the
> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
> which then gets passed to cpumask_of_node.
>
> The patch following this mail fixes the problem for x86. Perhaps a more
> thorough solution would be to fix the PCI layer to set the node
> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
> we have node 0)? I don't know whether it is safe / correct to do this.
no.
1. -1, mean calling code will use node that code is running on.
2. the system that have two or more nodes, and more peer root buses.
if the first node doesn't have RAM installed, no node0 then.
YH
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] Check the node argument passed to cpumask_of_node
2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
@ 2010-01-02 6:15 ` David John
2010-01-04 4:28 ` Rusty Russell
2010-01-04 5:15 ` [PATCH v2] " Jin Dongming
0 siblings, 2 replies; 11+ messages in thread
From: David John @ 2010-01-02 6:15 UTC (permalink / raw)
To: jbarnes, rusty; +Cc: andreas.herrmann3, rjw, linux-kernel
Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
causes an null pointer dereference when reading from the sysfs attributes local_cpu*
on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
for all PCI devices, which then gets passed to cpumask_of_node.
Ensure that the node value is valid.
Signed-off-by: David John <davidjon@xenontk.org>
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c5087d7..1141a6e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
static inline const struct cpumask *cpumask_of_node(int node)
{
- return node_to_cpumask_map[node];
+ return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
+ node_to_cpumask_map[node];
}
#endif
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS
2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
@ 2010-01-02 6:18 ` David John
0 siblings, 0 replies; 11+ messages in thread
From: David John @ 2010-01-02 6:18 UTC (permalink / raw)
To: Yinghai Lu; +Cc: Jesse Barnes, Rusty Russell, andreas.herrmann3, linux-kernel
On 01/02/2010 03:52 AM, Yinghai Lu wrote:
> On Fri, Jan 1, 2010 at 11:50 AM, David John <davidjon@xenontk.org> wrote:
>> Hi All,
>>
>> Commit e0cd516 causes an null pointer dereference when reading from the
>> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
>> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
>> which then gets passed to cpumask_of_node.
>>
>> The patch following this mail fixes the problem for x86. Perhaps a more
>> thorough solution would be to fix the PCI layer to set the node
>> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
>> we have node 0)? I don't know whether it is safe / correct to do this.
>
> no.
>
> 1. -1, mean calling code will use node that code is running on.
> 2. the system that have two or more nodes, and more peer root buses.
> if the first node doesn't have RAM installed, no node0 then.
>
> YH
>
Oh I see. Thanks.
Regards,
David.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Check the node argument passed to cpumask_of_node
2010-01-02 6:15 ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
@ 2010-01-04 4:28 ` Rusty Russell
2010-01-04 11:42 ` David John
2010-01-04 5:15 ` [PATCH v2] " Jin Dongming
1 sibling, 1 reply; 11+ messages in thread
From: Rusty Russell @ 2010-01-04 4:28 UTC (permalink / raw)
To: David John; +Cc: jbarnes, andreas.herrmann3, rjw, linux-kernel
On Sat, 2 Jan 2010 04:45:47 pm David John wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
> for all PCI devices, which then gets passed to cpumask_of_node.
>
> Ensure that the node value is valid.
This only works for x86, and only for !CONFIG_DEBUG_PER_CPU_MAPS.
I suggest fixing the callers introduced in e0cd516 for the moment,
then if you feel enthused, change the semantics of cpumask_of_node and
remove the checks from the various callers.
(And please only check for -1: it has a special meaning, unlike other
invalid numbers which indicate a bug).
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Check the node argument passed to cpumask_of_node
2010-01-02 6:15 ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
2010-01-04 4:28 ` Rusty Russell
@ 2010-01-04 5:15 ` Jin Dongming
1 sibling, 0 replies; 11+ messages in thread
From: Jin Dongming @ 2010-01-04 5:15 UTC (permalink / raw)
To: David John; +Cc: jbarnes, rusty, andreas.herrmann3, rjw, linux-kernel
Hi, David
This problem also happened when the CONFIG_DEBUG_PER_CPU_MAPS was used,
so how about modifying the code for it working well?
Best Regards,
Jin Dongming
David John wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
> for all PCI devices, which then gets passed to cpumask_of_node.
>
> Ensure that the node value is valid.
>
> Signed-off-by: David John <davidjon@xenontk.org>
>
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index c5087d7..1141a6e 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
> /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> static inline const struct cpumask *cpumask_of_node(int node)
> {
> - return node_to_cpumask_map[node];
> + return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
> + node_to_cpumask_map[node];
> }
> #endif
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] Check the node argument passed to cpumask_of_node
2010-01-04 4:28 ` Rusty Russell
@ 2010-01-04 11:42 ` David John
2010-01-04 14:58 ` [PATCH v3] " David John
0 siblings, 1 reply; 11+ messages in thread
From: David John @ 2010-01-04 11:42 UTC (permalink / raw)
To: Rusty Russell; +Cc: jbarnes, andreas.herrmann3, rjw, linux-kernel
On 01/04/2010 09:58 AM, Rusty Russell wrote:
> On Sat, 2 Jan 2010 04:45:47 pm David John wrote:
>> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
>> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
>> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
>> for all PCI devices, which then gets passed to cpumask_of_node.
>>
>> Ensure that the node value is valid.
>
> This only works for x86, and only for !CONFIG_DEBUG_PER_CPU_MAPS.
>
> I suggest fixing the callers introduced in e0cd516 for the moment,
> then if you feel enthused, change the semantics of cpumask_of_node and
> remove the checks from the various callers.
Okay, I'll do as you suggested.
Regards,
David.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3] Check the node argument passed to cpumask_of_node
2010-01-04 11:42 ` David John
@ 2010-01-04 14:58 ` David John
2010-01-04 17:18 ` Jesse Barnes
0 siblings, 1 reply; 11+ messages in thread
From: David John @ 2010-01-04 14:58 UTC (permalink / raw)
To: jbarnes, rusty; +Cc: andreas.herrmann3, rjw, jin.dongming, linux-kernel
Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
causes an null pointer dereference when reading from the sysfs attributes local_cpu*
on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
for all PCI devices, which then gets passed to cpumask_of_node.
Add a check to prevent this.
Signed-off-by: David John <davidjon@xenontk.org>
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c5df94e..807224e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -75,7 +75,8 @@ static ssize_t local_cpus_show(struct device *dev,
int len;
#ifdef CONFIG_NUMA
- mask = cpumask_of_node(dev_to_node(dev));
+ mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
+ cpumask_of_node(dev_to_node(dev));
#else
mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
#endif
@@ -93,7 +94,8 @@ static ssize_t local_cpulist_show(struct device *dev,
int len;
#ifdef CONFIG_NUMA
- mask = cpumask_of_node(dev_to_node(dev));
+ mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
+ cpumask_of_node(dev_to_node(dev));
#else
mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
#endif
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3] Check the node argument passed to cpumask_of_node
2010-01-04 14:58 ` [PATCH v3] " David John
@ 2010-01-04 17:18 ` Jesse Barnes
0 siblings, 0 replies; 11+ messages in thread
From: Jesse Barnes @ 2010-01-04 17:18 UTC (permalink / raw)
To: David John; +Cc: rusty, andreas.herrmann3, rjw, jin.dongming, linux-kernel
On Mon, 4 Jan 2010 20:28:57 +0530
David John <davidjon@xenontk.org> wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus'
> NUMA information" causes an null pointer dereference when reading
> from the sysfs attributes local_cpu* on Intel machines with no ACPI
> NUMA proximity info, since dev->numa_node gets set to -1 for all PCI
> devices, which then gets passed to cpumask_of_node.
>
> Add a check to prevent this.
>
> Signed-off-by: David John <davidjon@xenontk.org>
>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index c5df94e..807224e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -75,7 +75,8 @@ static ssize_t local_cpus_show(struct device *dev,
> int len;
>
> #ifdef CONFIG_NUMA
> - mask = cpumask_of_node(dev_to_node(dev));
> + mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
> +
> cpumask_of_node(dev_to_node(dev)); #else
> mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
> #endif
> @@ -93,7 +94,8 @@ static ssize_t local_cpulist_show(struct device
> *dev, int len;
>
> #ifdef CONFIG_NUMA
> - mask = cpumask_of_node(dev_to_node(dev));
> + mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
> +
> cpumask_of_node(dev_to_node(dev)); #else
> mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
> #endif
>
Applied to my for-linus branch, thanks David.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-01-04 17:18 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-01 19:50 [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS David John
2010-01-01 19:55 ` [PATCH] Check the node argument passed to cpumask_of_node David John
2010-01-01 21:55 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Rafael J. Wysocki
2010-01-02 6:15 ` [PATCH v2] Check the node argument passed to cpumask_of_node David John
2010-01-04 4:28 ` Rusty Russell
2010-01-04 11:42 ` David John
2010-01-04 14:58 ` [PATCH v3] " David John
2010-01-04 17:18 ` Jesse Barnes
2010-01-04 5:15 ` [PATCH v2] " Jin Dongming
2010-01-01 22:22 ` [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS Yinghai Lu
2010-01-02 6:18 ` David John
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).