linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI, x86: fix default vga ref_count
@ 2012-09-15  0:48 Yinghai Lu
  2012-09-15  0:48 ` [PATCH] PCI: Use correct type when freeing bus resource list Yinghai Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Yinghai Lu @ 2012-09-15  0:48 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, Yinghai Lu, x86

when __ARCH_HAS_VGA_DEFAULT_DEVICE is not defined, aka EFIFB is not used,

for static path, vga_default setting is through vga_arbiter_add_pci_device.
and for x86 pci_fixup_video, will skip that.
because subsys_initcall(vga_arb_device_init) come first to call vga_arbiter_add_pci_device.

for hotplug path, even vga_arbiter_add_pci_device is called via notifier, but it
will check VGA_RSRC_LEGACY_MASK that is not set for hotplug path.
So x86 pci_fixup_video will take over to call vga_set_default_device().

We need to hold one dev reference there.

otherwise vga_arbiter_del_pci_device that does not check VGA_RSRC_LEGACY_MASK
will call put_device and it will cause ref_count to decrease extra.
that will have that device get deleted early wrongly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org

---
 arch/x86/pci/fixup.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/pci/fixup.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/fixup.c
+++ linux-2.6/arch/x86/pci/fixup.c
@@ -349,8 +349,12 @@ static void __devinit pci_fixup_video(st
 	if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
 		pdev->resource[PCI_ROM_RESOURCE].flags |= IORESOURCE_ROM_SHADOW;
 		dev_printk(KERN_DEBUG, &pdev->dev, "Boot video device\n");
-		if (!vga_default_device())
+		if (!vga_default_device()) {
+#ifndef __ARCH_HAS_VGA_DEFAULT_DEVICE
+			pdev = pci_dev_get(pdev);
+#endif
 			vga_set_default_device(pdev);
+		}
 	}
 }
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] PCI: Use correct type when freeing bus resource list
  2012-09-15  0:48 [PATCH] PCI, x86: fix default vga ref_count Yinghai Lu
@ 2012-09-15  0:48 ` Yinghai Lu
  2012-09-18 22:53   ` Bjorn Helgaas
  2012-09-15  0:48 ` [PATCH] PCI, x86: clear initial value for root info resources Yinghai Lu
  2012-09-18 22:15 ` [PATCH] PCI, x86: fix default vga ref_count Bjorn Helgaas
  2 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2012-09-15  0:48 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, Yinghai Lu

Should use struct pci_bus_resource instead of struct pci_host_bridge_window

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -87,11 +87,15 @@ EXPORT_SYMBOL_GPL(pci_bus_resource_n);
 void pci_bus_remove_resources(struct pci_bus *bus)
 {
 	int i;
+	struct pci_bus_resource *bus_res, *tmp;
 
 	for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++)
 		bus->resource[i] = NULL;
 
-	pci_free_resource_list(&bus->resources);
+	list_for_each_entry_safe(bus_res, tmp, &bus->resources, list) {
+		list_del(&bus_res->list);
+		kfree(bus_res);
+	}
 }
 
 /**

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-15  0:48 [PATCH] PCI, x86: fix default vga ref_count Yinghai Lu
  2012-09-15  0:48 ` [PATCH] PCI: Use correct type when freeing bus resource list Yinghai Lu
@ 2012-09-15  0:48 ` Yinghai Lu
  2012-09-18 22:46   ` Bjorn Helgaas
  2012-09-18 22:15 ` [PATCH] PCI, x86: fix default vga ref_count Bjorn Helgaas
  2 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2012-09-15  0:48 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, Yinghai Lu, x86

Found one system one root bus hot remove get panic.
Panic happens when try to release hostbridge resource.

It turns out that resource get reject during put into resource tree
because of conflicts.
Also that resource parent pointer have random value.

That invalid value cause it pass through check __release_pci_root_info
and panic in release_resource.

Try to use kzalloc instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org

---
 arch/x86/pci/acpi.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -305,7 +305,6 @@ setup_resource(struct acpi_resource *acp
 	res->flags = flags;
 	res->start = start;
 	res->end = end;
-	res->child = NULL;
 
 	if (!pci_use_crs) {
 		dev_printk(KERN_DEBUG, &info->bridge->dev,
@@ -434,7 +433,7 @@ probe_pci_root_info(struct pci_root_info
 
 	size = sizeof(*info->res) * info->res_num;
 	info->res_num = 0;
-	info->res = kmalloc(size, GFP_KERNEL);
+	info->res = kzalloc(size, GFP_KERNEL);
 	if (!info->res)
 		return;
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: fix default vga ref_count
  2012-09-15  0:48 [PATCH] PCI, x86: fix default vga ref_count Yinghai Lu
  2012-09-15  0:48 ` [PATCH] PCI: Use correct type when freeing bus resource list Yinghai Lu
  2012-09-15  0:48 ` [PATCH] PCI, x86: clear initial value for root info resources Yinghai Lu
@ 2012-09-18 22:15 ` Bjorn Helgaas
  2012-09-18 22:39   ` Yinghai Lu
  2012-09-18 23:44   ` [PATCH] PCI: " Yinghai Lu
  2 siblings, 2 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2012-09-18 22:15 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, linux-kernel, x86, Dave Airlie

On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> when __ARCH_HAS_VGA_DEFAULT_DEVICE is not defined, aka EFIFB is not used,
>
> for static path, vga_default setting is through vga_arbiter_add_pci_device.
> and for x86 pci_fixup_video, will skip that.
> because subsys_initcall(vga_arb_device_init) come first to call vga_arbiter_add_pci_device.
>
> for hotplug path, even vga_arbiter_add_pci_device is called via notifier, but it
> will check VGA_RSRC_LEGACY_MASK that is not set for hotplug path.
> So x86 pci_fixup_video will take over to call vga_set_default_device().
>
> We need to hold one dev reference there.
>
> otherwise vga_arbiter_del_pci_device that does not check VGA_RSRC_LEGACY_MASK
> will call put_device and it will cause ref_count to decrease extra.
> that will have that device get deleted early wrongly.

I'm sure you're fixing a real bug, but this patch is completely unintelligible.

I tried to decipher the changelog and I failed.  I tried to figure out
the PCI reference counting from the vgaarb code, and I failed there
too.

Apparently the reference is connected with the vga_default device,
since that's what vga_arbiter_del_pci_device() checks, but
vga_set_default_device() is blissfully ignorant of references (and it
isn't used consistently anyway).

If you can do some rework to make this all make more sense, that would be great.

While you're at it, look at both the x86 and ia64 versions of
pci_fixup_video().  They are 90% identical, and it's not clear why
they should be different.  In fact, it's not clear why there's a fixup
for x86 and ia64 but not for any other architecture with PCI.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
>
> ---
>  arch/x86/pci/fixup.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/arch/x86/pci/fixup.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/fixup.c
> +++ linux-2.6/arch/x86/pci/fixup.c
> @@ -349,8 +349,12 @@ static void __devinit pci_fixup_video(st
>         if (config & (PCI_COMMAND_IO | PCI_COMMAND_MEMORY)) {
>                 pdev->resource[PCI_ROM_RESOURCE].flags |= IORESOURCE_ROM_SHADOW;
>                 dev_printk(KERN_DEBUG, &pdev->dev, "Boot video device\n");
> -               if (!vga_default_device())
> +               if (!vga_default_device()) {
> +#ifndef __ARCH_HAS_VGA_DEFAULT_DEVICE
> +                       pdev = pci_dev_get(pdev);
> +#endif
>                         vga_set_default_device(pdev);
> +               }
>         }
>  }
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: fix default vga ref_count
  2012-09-18 22:15 ` [PATCH] PCI, x86: fix default vga ref_count Bjorn Helgaas
@ 2012-09-18 22:39   ` Yinghai Lu
  2012-09-18 23:44   ` [PATCH] PCI: " Yinghai Lu
  1 sibling, 0 replies; 17+ messages in thread
From: Yinghai Lu @ 2012-09-18 22:39 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, x86, Dave Airlie

On Tue, Sep 18, 2012 at 3:15 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> when __ARCH_HAS_VGA_DEFAULT_DEVICE is not defined, aka EFIFB is not used,
>>
>> for static path, vga_default setting is through vga_arbiter_add_pci_device.
>> and for x86 pci_fixup_video, will skip that.
>> because subsys_initcall(vga_arb_device_init) come first to call vga_arbiter_add_pci_device.
>>
>> for hotplug path, even vga_arbiter_add_pci_device is called via notifier, but it
>> will check VGA_RSRC_LEGACY_MASK that is not set for hotplug path.
>> So x86 pci_fixup_video will take over to call vga_set_default_device().
>>
>> We need to hold one dev reference there.
>>
>> otherwise vga_arbiter_del_pci_device that does not check VGA_RSRC_LEGACY_MASK
>> will call put_device and it will cause ref_count to decrease extra.
>> that will have that device get deleted early wrongly.
>
> I'm sure you're fixing a real bug, but this patch is completely unintelligible.

yes, only can hit this bug while remove root bus with vga adapater.

or remove bridge with child pci device that is vga adapter.

for root bus removal, the less one ref_count will call that pci device
get destroyed too early.

>
> I tried to decipher the changelog and I failed.  I tried to figure out
> the PCI reference counting from the vgaarb code, and I failed there
> too.
>
> Apparently the reference is connected with the vga_default device,
> since that's what vga_arbiter_del_pci_device() checks, but
> vga_set_default_device() is blissfully ignorant of references (and it
> isn't used consistently anyway).

yes, could fix vga_set_default_device instead, but it has two versions
- no __ARCH_HAS_VGA_DEFAULT_DEVICE
- EFI version...

also there is one other user for vga switching...

>
> If you can do some rework to make this all make more sense, that would be great.
>
> While you're at it, look at both the x86 and ia64 versions of
> pci_fixup_video().  They are 90% identical, and it's not clear why
> they should be different.  In fact, it's not clear why there's a fixup
> for x86 and ia64 but not for any other architecture with PCI.

ok, will give it a try.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-15  0:48 ` [PATCH] PCI, x86: clear initial value for root info resources Yinghai Lu
@ 2012-09-18 22:46   ` Bjorn Helgaas
  2012-09-18 23:49     ` Yinghai Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2012-09-18 22:46 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, linux-kernel, x86, Tony Luck

On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Found one system one root bus hot remove get panic.
> Panic happens when try to release hostbridge resource.
>
> It turns out that resource get reject during put into resource tree
> because of conflicts.
> Also that resource parent pointer have random value.
>
> That invalid value cause it pass through check __release_pci_root_info
> and panic in release_resource.
>
> Try to use kzalloc instead.

Don't we need the same fix for ia64 in pci_acpi_scan_root()?  Here's
what it does:

        if (windows) {
                controller->window =
                        kmalloc_node(sizeof(*controller->window) * windows,
                                     GFP_KERNEL, controller->node);


> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
>
> ---
>  arch/x86/pci/acpi.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> Index: linux-2.6/arch/x86/pci/acpi.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/acpi.c
> +++ linux-2.6/arch/x86/pci/acpi.c
> @@ -305,7 +305,6 @@ setup_resource(struct acpi_resource *acp
>         res->flags = flags;
>         res->start = start;
>         res->end = end;
> -       res->child = NULL;
>
>         if (!pci_use_crs) {
>                 dev_printk(KERN_DEBUG, &info->bridge->dev,
> @@ -434,7 +433,7 @@ probe_pci_root_info(struct pci_root_info
>
>         size = sizeof(*info->res) * info->res_num;
>         info->res_num = 0;
> -       info->res = kmalloc(size, GFP_KERNEL);
> +       info->res = kzalloc(size, GFP_KERNEL);
>         if (!info->res)
>                 return;
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI: Use correct type when freeing bus resource list
  2012-09-15  0:48 ` [PATCH] PCI: Use correct type when freeing bus resource list Yinghai Lu
@ 2012-09-18 22:53   ` Bjorn Helgaas
  0 siblings, 0 replies; 17+ messages in thread
From: Bjorn Helgaas @ 2012-09-18 22:53 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, linux-kernel

On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Should use struct pci_bus_resource instead of struct pci_host_bridge_window

Looks good, and this is all my fault, sorry.  Applied to a
pci/yinghai-misc branch.

> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
>  drivers/pci/bus.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/pci/bus.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/bus.c
> +++ linux-2.6/drivers/pci/bus.c
> @@ -87,11 +87,15 @@ EXPORT_SYMBOL_GPL(pci_bus_resource_n);
>  void pci_bus_remove_resources(struct pci_bus *bus)
>  {
>         int i;
> +       struct pci_bus_resource *bus_res, *tmp;
>
>         for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++)
>                 bus->resource[i] = NULL;
>
> -       pci_free_resource_list(&bus->resources);
> +       list_for_each_entry_safe(bus_res, tmp, &bus->resources, list) {
> +               list_del(&bus_res->list);
> +               kfree(bus_res);
> +       }
>  }
>
>  /**

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] PCI: fix default vga ref_count
  2012-09-18 22:15 ` [PATCH] PCI, x86: fix default vga ref_count Bjorn Helgaas
  2012-09-18 22:39   ` Yinghai Lu
@ 2012-09-18 23:44   ` Yinghai Lu
  2012-09-18 23:54     ` Matthew Garrett
  1 sibling, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2012-09-18 23:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Yinghai Lu, x86, Dave Airlie,
	Andrew Morton, Julia Lawall, Matthew Garrett

when __ARCH_HAS_VGA_DEFAULT_DEVICE is not defined, aka EFIFB is not used,
for static path, vga_default setting is through vga_arbiter_add_pci_device.
and later x86 pci_fixup_video, will skip setting again.
- subsys_initcall(vga_arb_device_init) come first to call vga_arbiter_add_pci_device. It will call pci_get_dev to hold one reference.

for hotplug add path, even vga_arbiter_add_pci_device is called via notifier,
but it will check VGA_RSRC_LEGACY_MASK that is not set for hotplug path.
So x86 pci_fixup_video will take over to call vga_set_default_device().
It will not hold one refrence.

Later for hotplug remove path, vga_arbiter_del_pci_device that does not check
VGA_RSRC_LEGACY_MASK will call put_device and it will cause ref_count to
decrease extra. that will have that pci device get deleted early wrongly.

Need to make get/put balance for both cases.

-v2: According to Bjorn, update vga_set_default_device instead...

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org
Cc: Dave Airlie <airlied@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julia Lawall <julia@diku.dk>
Cc: Matthew Garrett <mjg@redhat.com>

---
 drivers/gpu/vga/vgaarb.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/gpu/vga/vgaarb.c
===================================================================
--- linux-2.6.orig/drivers/gpu/vga/vgaarb.c
+++ linux-2.6/drivers/gpu/vga/vgaarb.c
@@ -141,6 +141,12 @@ EXPORT_SYMBOL_GPL(vga_default_device);
 
 void vga_set_default_device(struct pci_dev *pdev)
 {
+	if (vga_default)
+		pci_dev_put(vga_default);
+
+	if (pdev)
+		pdev = pci_dev_get(pdev);
+
 	vga_default = pdev;
 }
 #endif
@@ -577,7 +583,7 @@ static bool vga_arbiter_add_pci_device(s
 #ifndef __ARCH_HAS_VGA_DEFAULT_DEVICE
 	if (vga_default == NULL &&
 	    ((vgadev->owns & VGA_RSRC_LEGACY_MASK) == VGA_RSRC_LEGACY_MASK))
-		vga_default = pci_dev_get(pdev);
+		vga_set_default_device(pdev);
 #endif
 
 	vga_arbiter_check_bridge_sharing(vgadev);
@@ -613,10 +619,8 @@ static bool vga_arbiter_del_pci_device(s
 	}
 
 #ifndef __ARCH_HAS_VGA_DEFAULT_DEVICE
-	if (vga_default == pdev) {
-		pci_dev_put(vga_default);
-		vga_default = NULL;
-	}
+	if (vga_default == pdev)
+		vga_set_default_device(NULL);
 #endif
 
 	if (vgadev->decodes & (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM))

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-18 22:46   ` Bjorn Helgaas
@ 2012-09-18 23:49     ` Yinghai Lu
  2012-09-19 13:12       ` Bjorn Helgaas
  2012-09-19 15:34       ` Jiang Liu
  0 siblings, 2 replies; 17+ messages in thread
From: Yinghai Lu @ 2012-09-18 23:49 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, x86, Tony Luck

On Tue, Sep 18, 2012 at 3:46 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>> Found one system one root bus hot remove get panic.
>> Panic happens when try to release hostbridge resource.
>>
>> It turns out that resource get reject during put into resource tree
>> because of conflicts.
>> Also that resource parent pointer have random value.
>>
>> That invalid value cause it pass through check __release_pci_root_info
>> and panic in release_resource.
>>
>> Try to use kzalloc instead.
>
> Don't we need the same fix for ia64 in pci_acpi_scan_root()?  Here's
> what it does:
>
>         if (windows) {
>                 controller->window =
>                         kmalloc_node(sizeof(*controller->window) * windows,
>                                      GFP_KERNEL, controller->node);
>


yes, but they don't support pci_set_host_bridge_release yet. so they
should not meet this problem yet.

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI: fix default vga ref_count
  2012-09-18 23:44   ` [PATCH] PCI: " Yinghai Lu
@ 2012-09-18 23:54     ` Matthew Garrett
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Garrett @ 2012-09-18 23:54 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, x86, Dave Airlie,
	Andrew Morton, Julia Lawall

On Tue, Sep 18, 2012 at 04:44:42PM -0700, Yinghai Lu wrote:

>  void vga_set_default_device(struct pci_dev *pdev)
>  {
> +	if (vga_default)
> +		pci_dev_put(vga_default);
> +
> +	if (pdev)
> +		pdev = pci_dev_get(pdev);
> +
>  	vga_default = pdev;

It shouldn't happen, but do we want to check for the vga_default == pdev 
case?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-18 23:49     ` Yinghai Lu
@ 2012-09-19 13:12       ` Bjorn Helgaas
  2012-09-19 17:17         ` Yinghai Lu
  2012-09-19 15:34       ` Jiang Liu
  1 sibling, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2012-09-19 13:12 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: linux-pci, linux-kernel, x86, Tony Luck

On Tue, Sep 18, 2012 at 5:49 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Tue, Sep 18, 2012 at 3:46 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> Found one system one root bus hot remove get panic.
>>> Panic happens when try to release hostbridge resource.
>>>
>>> It turns out that resource get reject during put into resource tree
>>> because of conflicts.
>>> Also that resource parent pointer have random value.
>>>
>>> That invalid value cause it pass through check __release_pci_root_info
>>> and panic in release_resource.
>>>
>>> Try to use kzalloc instead.
>>
>> Don't we need the same fix for ia64 in pci_acpi_scan_root()?  Here's
>> what it does:
>>
>>         if (windows) {
>>                 controller->window =
>>                         kmalloc_node(sizeof(*controller->window) * windows,
>>                                      GFP_KERNEL, controller->node);
>>
>
>
> yes, but they don't support pci_set_host_bridge_release yet. so they
> should not meet this problem yet.

Why should we wait to fix that bug until later?  I'm not interested in
debugging this in the future.  If you fix a bug, you should always try
to fix other occurrences of the same bug at the same time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-18 23:49     ` Yinghai Lu
  2012-09-19 13:12       ` Bjorn Helgaas
@ 2012-09-19 15:34       ` Jiang Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Jiang Liu @ 2012-09-19 15:34 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, x86, Tony Luck

On 09/19/2012 07:49 AM, Yinghai Lu wrote:
> On Tue, Sep 18, 2012 at 3:46 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Fri, Sep 14, 2012 at 6:48 PM, Yinghai Lu <yinghai@kernel.org> wrote:
>>> Found one system one root bus hot remove get panic.
>>> Panic happens when try to release hostbridge resource.
>>>
>>> It turns out that resource get reject during put into resource tree
>>> because of conflicts.
>>> Also that resource parent pointer have random value.
>>>
>>> That invalid value cause it pass through check __release_pci_root_info
>>> and panic in release_resource.
>>>
>>> Try to use kzalloc instead.
>>
>> Don't we need the same fix for ia64 in pci_acpi_scan_root()?  Here's
>> what it does:
>>
>>         if (windows) {
>>                 controller->window =
>>                         kmalloc_node(sizeof(*controller->window) * windows,
>>                                      GFP_KERNEL, controller->node);
>>
> 
> 
> yes, but they don't support pci_set_host_bridge_release yet. so they
> should not meet this problem yet.
Hi Yinghai,
	We are trying to add pci_set_host_bridge_release() for IA64, so would
appreciate it if you could help to fix IA64 too.
	--Gerry


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-19 13:12       ` Bjorn Helgaas
@ 2012-09-19 17:17         ` Yinghai Lu
  2012-09-19 17:49           ` Yinghai Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Yinghai Lu @ 2012-09-19 17:17 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, x86, Tony Luck

On Wed, Sep 19, 2012 at 6:12 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> yes, but they don't support pci_set_host_bridge_release yet. so they
>> should not meet this problem yet.
>
> Why should we wait to fix that bug until later?  I'm not interested in
> debugging this in the future.  If you fix a bug, you should always try
> to fix other occurrences of the same bug at the same time.

ok, will update the patch.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-19 17:17         ` Yinghai Lu
@ 2012-09-19 17:49           ` Yinghai Lu
  2012-09-19 17:49             ` [PATCH] PCI, ia64: " Yinghai Lu
  2012-09-21 16:50             ` [PATCH] PCI, x86: " Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 17+ messages in thread
From: Yinghai Lu @ 2012-09-19 17:49 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, linux-kernel, Yinghai Lu, x86

Found one system one root bus hot remove get panic.
Panic happens when try to release hostbridge resource.

It turns out that resource get reject during put into resource tree
because of conflicts.
Also that resource parent pointer have random value.

That invalid value cause it pass through check __release_pci_root_info
and panic in release_resource.

Try to use kzalloc instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: x86@kernel.org

---
 arch/x86/pci/acpi.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/arch/x86/pci/acpi.c
===================================================================
--- linux-2.6.orig/arch/x86/pci/acpi.c
+++ linux-2.6/arch/x86/pci/acpi.c
@@ -305,7 +305,6 @@ setup_resource(struct acpi_resource *acp
 	res->flags = flags;
 	res->start = start;
 	res->end = end;
-	res->child = NULL;
 
 	if (!pci_use_crs) {
 		dev_printk(KERN_DEBUG, &info->bridge->dev,
@@ -434,7 +433,7 @@ probe_pci_root_info(struct pci_root_info
 
 	size = sizeof(*info->res) * info->res_num;
 	info->res_num = 0;
-	info->res = kmalloc(size, GFP_KERNEL);
+	info->res = kzalloc(size, GFP_KERNEL);
 	if (!info->res)
 		return;
 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] PCI, ia64: clear initial value for root info resources
  2012-09-19 17:49           ` Yinghai Lu
@ 2012-09-19 17:49             ` Yinghai Lu
  2012-09-21 16:50             ` [PATCH] PCI, x86: " Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 17+ messages in thread
From: Yinghai Lu @ 2012-09-19 17:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, Yinghai Lu, Tony Luck, Fenghua Yu, linux-ia64

after
	PCI, ia64: clear initial value for root info resources

change ia64 code accordingly, because later ia64 will support root bus removal too.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org

---
 arch/ia64/pci/pci.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/arch/ia64/pci/pci.c
===================================================================
--- linux-2.6.orig/arch/ia64/pci/pci.c
+++ linux-2.6/arch/ia64/pci/pci.c
@@ -295,7 +295,6 @@ static __devinit acpi_status add_window(
 	window->resource.flags = flags;
 	window->resource.start = addr.minimum + offset;
 	window->resource.end = window->resource.start + addr.address_length - 1;
-	window->resource.child = NULL;
 	window->offset = offset;
 
 	if (insert_resource(root, &window->resource)) {
@@ -357,7 +356,7 @@ pci_acpi_scan_root(struct acpi_pci_root
 			&windows);
 	if (windows) {
 		controller->window =
-			kmalloc_node(sizeof(*controller->window) * windows,
+			kzalloc_node(sizeof(*controller->window) * windows,
 				     GFP_KERNEL, controller->node);
 		if (!controller->window)
 			goto out2;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
  2012-09-19 17:49           ` Yinghai Lu
  2012-09-19 17:49             ` [PATCH] PCI, ia64: " Yinghai Lu
@ 2012-09-21 16:50             ` Konrad Rzeszutek Wilk
       [not found]               ` <CAE9FiQV9WK4NG5+aGwVrGO3ueFH3TmmmG5zea+JjwgtQyngNRg@mail.gmail.com>
  1 sibling, 1 reply; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-21 16:50 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, x86

On Wed, Sep 19, 2012 at 10:49:00AM -0700, Yinghai Lu wrote:
> Found one system one root bus hot remove get panic.

Boy, this is mangled.

Can you include the name of the system? Perhaps you can
rephrase this to say:

"On XYZ removing the root bus (through ACPI unplug) causes
a panic."

> Panic happens when try to release hostbridge resource.

Can you include the stack trace?
> 
> It turns out that resource get reject during put into resource tree
> because of conflicts.

Come again? Are you saying:

"The reason for this is that the resources was never initialized
properly because of .. (what type of conflict?)"?

> Also that resource parent pointer have random value.
> 
> That invalid value cause it pass through check __release_pci_root_info
> and panic in release_resource.
> 
> Try to use kzalloc instead.

It is not just try, it _is_ using that now.

> 
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: x86@kernel.org
> 
> ---
>  arch/x86/pci/acpi.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> Index: linux-2.6/arch/x86/pci/acpi.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/pci/acpi.c
> +++ linux-2.6/arch/x86/pci/acpi.c
> @@ -305,7 +305,6 @@ setup_resource(struct acpi_resource *acp
>  	res->flags = flags;
>  	res->start = start;
>  	res->end = end;
> -	res->child = NULL;
>  
>  	if (!pci_use_crs) {
>  		dev_printk(KERN_DEBUG, &info->bridge->dev,
> @@ -434,7 +433,7 @@ probe_pci_root_info(struct pci_root_info
>  
>  	size = sizeof(*info->res) * info->res_num;
>  	info->res_num = 0;
> -	info->res = kmalloc(size, GFP_KERNEL);
> +	info->res = kzalloc(size, GFP_KERNEL);
>  	if (!info->res)
>  		return;
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] PCI, x86: clear initial value for root info resources
       [not found]               ` <CAE9FiQV9WK4NG5+aGwVrGO3ueFH3TmmmG5zea+JjwgtQyngNRg@mail.gmail.com>
@ 2012-09-23 20:33                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 17+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-23 20:33 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, linux-kernel, x86

On Fri, Sep 21, 2012 at 11:27:00AM -0700, Yinghai Lu wrote:
> On Fri, Sep 21, 2012 at 9:50 AM, Konrad Rzeszutek Wilk
> <konrad@kernel.org> wrote:
> > On Wed, Sep 19, 2012 at 10:49:00AM -0700, Yinghai Lu wrote:
> >
> > Can you include the stack trace?
> 
> [  414.997281] pci_bus 0000:00: freeing pci_bus info
> [  415.017221]  pci0000:00: freeing pci_host_bridge info
> [  415.017468] general protection fault: 0000 [#1] SMP
> [  415.036758] Modules linked in:
> [  415.036959] CPU 0
> [  415.037051] Pid: 4, comm: kworker/0:0 Not tainted
> 3.6.0-rc6-yh-03463-gbcab6f1-dirty #303 Oracle Corporation  Sun Fire
> X4800 M2 /
> [  415.056989] RIP: 0010:[<ffffffff81075218>]  [<ffffffff81075218>]
> __release_resource+0x11/0x3e
> [  415.076854] RSP: 0018:ffff883f4cea39f0  EFLAGS: 00010206
> [  415.077112] RAX: 0d00000000000030 RBX: ffff897ea6cd4070 RCX: 8c6318c6318c6320
> [  415.097000] RDX: ffff883f9d60e190 RSI: ffffffff8265f228 RDI: ffff897ea6cd4070
> [  415.116627] RBP: ffff883f4cea39f0 R08: 0000000000000005 R09: ffffffff810755de
> [  415.116953] R10: ffffffff81e2e7df R11: 0000000000000000 R12: 0000000000000002
> [  415.136790] R13: ffff897ea6d18600 R14: ffffffff826f18c0 R15: ffff883f9d60d800
> [  415.156541] FS:  0000000000000000(0000) GS:ffff883f9d600000(0000)
> knlGS:0000000000000000
> [  415.156953] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  415.176622] CR2: 0000000000619718 CR3: 000000000263f000 CR4: 00000000000007f0
> [  415.196401] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  415.196761] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [  415.216702] Process kworker/0:0 (pid: 4, threadinfo
> ffff883f4cea2000, task ffff883f4cea8000)
> [  415.236370] Stack:
> [  415.236475]  ffff883f4cea3a20 ffffffff810755e6 ffff883f4cea3a60
> ffff883f00000000
> [  415.236866]  ffffffff824884ad ffff88fea6ce2440 ffff883f4cea3a40
> ffffffff81c2cfc5
> [  415.256682]  ffff88bf4a1ec800 ffff88bf4a1ec800 ffff883f4cea3a50
> ffffffff81c2d023
> [  415.276452] Call Trace:
> [  415.276611]  [<ffffffff810755e6>] release_resource+0x25/0x3e
> [  415.296206]  [<ffffffff81c2cfc5>] __release_pci_root_info+0x35/0x7e
> [  415.296556]  [<ffffffff81c2d023>] release_pci_root_info+0x15/0x1a
> [  415.316249]  [<ffffffff813f568a>] pci_release_bus_bridge_dev+0x37/0x4f
> [  415.316581]  [<ffffffff815c04d8>] device_release+0x93/0xcb
> [  415.336365]  [<ffffffff813d5888>] kobject_release+0x51/0x67
> [  415.336666]  [<ffffffff813d579a>] kobject_put+0x4a/0x4e
> [  415.361466]  [<ffffffff815c026a>] put_device+0x17/0x19
> [  415.361734]  [<ffffffff813f75b2>] pci_remove_root_bus+0x5a/0x64
> [  415.376188]  [<ffffffff81437dd8>] acpi_pci_root_remove+0xeb/0x114
> [  415.376509]  [<ffffffff81434c13>] acpi_device_remove+0x86/0xa5
> [  415.396230]  [<ffffffff815c33b6>] __device_release_driver+0x86/0xdc
> [  415.396551]  [<ffffffff815c3431>] device_release_driver+0x25/0x32
> [  415.416198]  [<ffffffff8143658b>] acpi_bus_trim+0x86/0x110
> [  415.416473]  [<ffffffff81439ae1>] handle_root_bridge_removal+0x4f/0x118
> [  415.436227]  [<ffffffff81439c8b>] _handle_hotplug_event_root+0xe1/0x121
> [  415.455957]  [<ffffffff81085515>] process_one_work+0x28a/0x453
> [  415.456228]  [<ffffffff81085480>] ? process_one_work+0x1f5/0x453
> [  415.475980]  [<ffffffff81439baa>] ? handle_root_bridge_removal+0x118/0x118
> [  415.476311]  [<ffffffff81085bd1>] worker_thread+0x12a/0x201
> [  415.495995]  [<ffffffff81085aa7>] ? manage_workers+0xed/0xed
> [  415.496301]  [<ffffffff8108b93a>] kthread+0xe7/0xef
> [  415.515957]  [<ffffffff81e2ef03>] ? _raw_spin_unlock_irq+0x2e/0x33
> [  415.516257]  [<ffffffff81e372e4>] kernel_thread_helper+0x4/0x10
> [  415.535926]  [<ffffffff81e2f259>] ? retint_restore_args+0xe/0xe
> [  415.536242]  [<ffffffff8108b853>] ? __init_kthread_worker+0x5a/0x5a
> [  415.555959]  [<ffffffff81e372e0>] ? gs_change+0xb/0xb
> [  415.556209] Code: 48 89 31 48 89 46 20 31 c0 eb 0d 48 39 7a 08 48
> 8d 4a 28 72 da 48 89 d0 5d c3 66 66 66 66 90 48 8b 47 20 55 48 89 e5
> 48 83 c0 30 <48> 8b 10 48 85 d2 74 1e 48 39 fa 75 13 48 8b 57 28 48 89
> 10 48
> [  415.596509] RIP  [<ffffffff81075218>] __release_resource+0x11/0x3e
> [  415.615741]  RSP <ffff883f4cea39f0>
> [  415.616020] ---[ end trace 8bcbc1bea9c3e0bc ]---
> 
> whole log is attached.

I meant it (and a filtered with just the valid lines in it) in terms of the patch commit description.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-09-23 20:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-15  0:48 [PATCH] PCI, x86: fix default vga ref_count Yinghai Lu
2012-09-15  0:48 ` [PATCH] PCI: Use correct type when freeing bus resource list Yinghai Lu
2012-09-18 22:53   ` Bjorn Helgaas
2012-09-15  0:48 ` [PATCH] PCI, x86: clear initial value for root info resources Yinghai Lu
2012-09-18 22:46   ` Bjorn Helgaas
2012-09-18 23:49     ` Yinghai Lu
2012-09-19 13:12       ` Bjorn Helgaas
2012-09-19 17:17         ` Yinghai Lu
2012-09-19 17:49           ` Yinghai Lu
2012-09-19 17:49             ` [PATCH] PCI, ia64: " Yinghai Lu
2012-09-21 16:50             ` [PATCH] PCI, x86: " Konrad Rzeszutek Wilk
     [not found]               ` <CAE9FiQV9WK4NG5+aGwVrGO3ueFH3TmmmG5zea+JjwgtQyngNRg@mail.gmail.com>
2012-09-23 20:33                 ` Konrad Rzeszutek Wilk
2012-09-19 15:34       ` Jiang Liu
2012-09-18 22:15 ` [PATCH] PCI, x86: fix default vga ref_count Bjorn Helgaas
2012-09-18 22:39   ` Yinghai Lu
2012-09-18 23:44   ` [PATCH] PCI: " Yinghai Lu
2012-09-18 23:54     ` Matthew Garrett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).