All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
@ 2016-09-15  2:10 Long Li
  2016-09-15  2:10 ` [PATCH 2/2 v2] pci-hyperv: lock pci bus on device eject Long Li
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Long Li @ 2016-09-15  2:10 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Bjorn Helgaas
  Cc: devel, linux-pci, linux-kernel, Long Li

From: Long Li <longli@microsoft.com>

hv_pci_devices_present is called in hv_pci_remove when we remove a PCI device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove, the bus is already removed before the call, so we don't need to rescan the bus in the workqueue scheduled from hv_pci_devices_present. By introducing status hv_pcibus_removed, we can avoid this situation.

The patch fixes the following kernel panic.

[  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
[  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
ffff88007f600000
[  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
pci_is_pcie+0x6/0x20
[  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206
[  383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
ffff88007f603fd8
[  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
642f3d4854415056
[  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
ffffffffa045eb9e
[  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
ffff880003ee1c00
[  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
0000000000000246
[  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
knlGS:0000000000000000
[  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
00000000000406f0
[  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  383.853124] Stack:
[  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
ffff880003ee1c00
[  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
ffffffff8134ee2d
[  383.853124]  ffff880003d8ed00 ffff88007f603dd8 ffff880075fec320
ffff880003d8edb8
[  383.853124] Call Trace:
[  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140
[  383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150
[  383.853124]  [<ffffffffa045ef5a>]
pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
[  383.853124]  [<ffffffff810a682b>] process_one_work+0x17b/0x470
[  383.853124]  [<ffffffff810a7666>] worker_thread+0x126/0x410
[  383.853124]  [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460
[  383.853124]  [<ffffffff810aee1f>] kthread+0xcf/0xe0
[  383.853124]  [<ffffffff810aed50>] ?
kthread_create_on_node+0x140/0x140
[  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90
[  383.853124]  [<ffffffff810aed50>] ?
kthread_create_on_node+0x140/0x140
[  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
[  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20
[  383.853124]  RSP <ffff88007f603d38>

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index a8deeca..4a37598 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -348,6 +348,7 @@ enum hv_pcibus_state {
 	hv_pcibus_init = 0,
 	hv_pcibus_probed,
 	hv_pcibus_installed,
+	hv_pcibus_removed,
 	hv_pcibus_maximum
 };
 
@@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct work_struct *work)
 		put_pcichild(hpdev, hv_pcidev_ref_initial);
 	}
 
-	/* Tell the core to rescan bus because there may have been changes. */
-	if (hbus->state == hv_pcibus_installed) {
+	switch (hbus->state) {
+	case hv_pcibus_installed:
+		/*
+		 * Tell the core to rescan bus
+		 * because there may have been changes.
+		 */
 		pci_lock_rescan_remove();
 		pci_scan_child_bus(hbus->pci_bus);
 		pci_unlock_rescan_remove();
-	} else {
+		break;
+
+	case hv_pcibus_init:
+	case hv_pcibus_probed:
 		survey_child_resources(hbus);
+		break;
+
+	default:
+		break;
 	}
 
 	up(&hbus->enum_sem);
@@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
 	if (!hbus)
 		return -ENOMEM;
+	hbus->state = hv_pcibus_init;
 
 	/*
 	 * The PCI bus "domain" is what is called "segment" in ACPI and
@@ -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
 		pci_stop_root_bus(hbus->pci_bus);
 		pci_remove_root_bus(hbus->pci_bus);
 		pci_unlock_rescan_remove();
+		hbus->state = hv_pcibus_removed;
 	}
 
 	ret = hv_send_resources_released(hdev);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2 v2] pci-hyperv: lock pci bus on device eject
  2016-09-15  2:10 [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Long Li
@ 2016-09-15  2:10 ` Long Li
  2016-09-23 11:58 ` [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Cathy Avery
  2016-09-27 19:29 ` Bjorn Helgaas
  2 siblings, 0 replies; 9+ messages in thread
From: Long Li @ 2016-09-15  2:10 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Bjorn Helgaas
  Cc: devel, linux-pci, linux-kernel, Long Li

From: Long Li <longli@microsoft.com>

A PCI_EJECT message can arrive at the same time we are calling pci_scan_child_bus in the workqueue for the previous PCI_BUS_RELATIONS message or in create_root_hv_pci_bus(), in this case we could potentailly modify the bus from multiple places. Properly lock the bus access.

Thanks Dexuan Cui <decui@microsoft.com> for pointing out the race condition in create_root_hv_pci_bus().

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/host/pci-hyperv.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index 4a37598..33c75c9 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -1198,9 +1198,11 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus)
 	hbus->pci_bus->msi = &hbus->msi_chip;
 	hbus->pci_bus->msi->dev = &hbus->hdev->device;
 
+	pci_lock_rescan_remove();
 	pci_scan_child_bus(hbus->pci_bus);
 	pci_bus_assign_resources(hbus->pci_bus);
 	pci_bus_add_devices(hbus->pci_bus);
+	pci_unlock_rescan_remove();
 	hbus->state = hv_pcibus_installed;
 	return 0;
 }
@@ -1590,8 +1592,10 @@ static void hv_eject_device_work(struct work_struct *work)
 	pdev = pci_get_domain_bus_and_slot(hpdev->hbus->sysdata.domain, 0,
 					   wslot);
 	if (pdev) {
+		pci_lock_rescan_remove();
 		pci_stop_and_remove_bus_device(pdev);
 		pci_dev_put(pdev);
+		pci_unlock_rescan_remove();
 	}
 
 	memset(&ctxt, 0, sizeof(ctxt));
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
  2016-09-15  2:10 [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Long Li
  2016-09-15  2:10 ` [PATCH 2/2 v2] pci-hyperv: lock pci bus on device eject Long Li
@ 2016-09-23 11:58 ` Cathy Avery
  2016-09-27 23:54   ` Long Li
  2016-09-27 19:29 ` Bjorn Helgaas
  2 siblings, 1 reply; 9+ messages in thread
From: Cathy Avery @ 2016-09-23 11:58 UTC (permalink / raw)
  To: driverdev-devel

Hi,

You seem to be missing the Reported-by tag.

That's xiaofwan@redhat.com.

Cathy

On 09/14/2016 10:10 PM, Long Li wrote:
> From: Long Li <longli@microsoft.com>
>
> hv_pci_devices_present is called in hv_pci_remove when we remove a PCI device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove, the bus is already removed before the call, so we don't need to rescan the bus in the workqueue scheduled from hv_pci_devices_present. By introducing status hv_pcibus_removed, we can avoid this situation.
>
> The patch fixes the following kernel panic.
>
> [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
> ffff88007f600000
> [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
> pci_is_pcie+0x6/0x20
> [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206
> [  383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
> ffff88007f603fd8
> [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 642f3d4854415056
> [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
> ffffffffa045eb9e
> [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
> ffff880003ee1c00
> [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
> 0000000000000246
> [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
> knlGS:0000000000000000
> [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
> 00000000000406f0
> [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  383.853124] Stack:
> [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
> ffff880003ee1c00
> [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
> ffffffff8134ee2d
> [  383.853124]  ffff880003d8ed00 ffff88007f603dd8 ffff880075fec320
> ffff880003d8edb8
> [  383.853124] Call Trace:
> [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140
> [  383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150
> [  383.853124]  [<ffffffffa045ef5a>]
> pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
> [  383.853124]  [<ffffffff810a682b>] process_one_work+0x17b/0x470
> [  383.853124]  [<ffffffff810a7666>] worker_thread+0x126/0x410
> [  383.853124]  [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460
> [  383.853124]  [<ffffffff810aee1f>] kthread+0xcf/0xe0
> [  383.853124]  [<ffffffff810aed50>] ?
> kthread_create_on_node+0x140/0x140
> [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90
> [  383.853124]  [<ffffffff810aed50>] ?
> kthread_create_on_node+0x140/0x140
> [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
> 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
> 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20
> [  383.853124]  RSP <ffff88007f603d38>
>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>   drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
>   1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index a8deeca..4a37598 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -348,6 +348,7 @@ enum hv_pcibus_state {
>   	hv_pcibus_init = 0,
>   	hv_pcibus_probed,
>   	hv_pcibus_installed,
> +	hv_pcibus_removed,
>   	hv_pcibus_maximum
>   };
>   
> @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct work_struct *work)
>   		put_pcichild(hpdev, hv_pcidev_ref_initial);
>   	}
>   
> -	/* Tell the core to rescan bus because there may have been changes. */
> -	if (hbus->state == hv_pcibus_installed) {
> +	switch (hbus->state) {
> +	case hv_pcibus_installed:
> +		/*
> +		 * Tell the core to rescan bus
> +		 * because there may have been changes.
> +		 */
>   		pci_lock_rescan_remove();
>   		pci_scan_child_bus(hbus->pci_bus);
>   		pci_unlock_rescan_remove();
> -	} else {
> +		break;
> +
> +	case hv_pcibus_init:
> +	case hv_pcibus_probed:
>   		survey_child_resources(hbus);
> +		break;
> +
> +	default:
> +		break;
>   	}
>   
>   	up(&hbus->enum_sem);
> @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>   	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
>   	if (!hbus)
>   		return -ENOMEM;
> +	hbus->state = hv_pcibus_init;
>   
>   	/*
>   	 * The PCI bus "domain" is what is called "segment" in ACPI and
> @@ -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
>   		pci_stop_root_bus(hbus->pci_bus);
>   		pci_remove_root_bus(hbus->pci_bus);
>   		pci_unlock_rescan_remove();
> +		hbus->state = hv_pcibus_removed;
>   	}
>   
>   	ret = hv_send_resources_released(hdev);

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
  2016-09-15  2:10 [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Long Li
  2016-09-15  2:10 ` [PATCH 2/2 v2] pci-hyperv: lock pci bus on device eject Long Li
  2016-09-23 11:58 ` [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Cathy Avery
@ 2016-09-27 19:29 ` Bjorn Helgaas
  2016-09-27 23:57   ` Long Li
  2016-10-04 14:32     ` KY Srinivasan
  2 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2016-09-27 19:29 UTC (permalink / raw)
  To: Long Li
  Cc: K. Y. Srinivasan, Haiyang Zhang, Bjorn Helgaas, devel, linux-pci,
	linux-kernel, Long Li

On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> From: Long Li <longli@microsoft.com>
> 
> hv_pci_devices_present is called in hv_pci_remove when we remove a PCI device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove, the bus is already removed before the call, so we don't need to rescan the bus in the workqueue scheduled from hv_pci_devices_present. By introducing status hv_pcibus_removed, we can avoid this situation.
> 
> The patch fixes the following kernel panic.
> 
> [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
> ffff88007f600000
> [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
> pci_is_pcie+0x6/0x20
> [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206
> [  383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
> ffff88007f603fd8
> [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 642f3d4854415056
> [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
> ffffffffa045eb9e
> [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
> ffff880003ee1c00
> [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
> 0000000000000246
> [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
> knlGS:0000000000000000
> [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
> 00000000000406f0
> [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  383.853124] Stack:
> [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
> ffff880003ee1c00
> [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
> ffffffff8134ee2d
> [  383.853124]  ffff880003d8ed00 ffff88007f603dd8 ffff880075fec320
> ffff880003d8edb8
> [  383.853124] Call Trace:
> [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140
> [  383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150
> [  383.853124]  [<ffffffffa045ef5a>]
> pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
> [  383.853124]  [<ffffffff810a682b>] process_one_work+0x17b/0x470
> [  383.853124]  [<ffffffff810a7666>] worker_thread+0x126/0x410
> [  383.853124]  [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460
> [  383.853124]  [<ffffffff810aee1f>] kthread+0xcf/0xe0
> [  383.853124]  [<ffffffff810aed50>] ?
> kthread_create_on_node+0x140/0x140
> [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90
> [  383.853124]  [<ffffffff810aed50>] ?
> kthread_create_on_node+0x140/0x140
> [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
> 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
> 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20
> [  383.853124]  RSP <ffff88007f603d38>

Personally, I would remove the timestamps and addresses from this trace
because I don't think they contribute to diagnosing the problem.

> Signed-off-by: Long Li <longli@microsoft.com>

I'm ready to apply these but am waiting for an ack from the maintainers
listed in MAINTAINERS (feel free to update that if it's out of date).

> ---
>  drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index a8deeca..4a37598 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -348,6 +348,7 @@ enum hv_pcibus_state {
>  	hv_pcibus_init = 0,
>  	hv_pcibus_probed,
>  	hv_pcibus_installed,
> +	hv_pcibus_removed,
>  	hv_pcibus_maximum
>  };
>  
> @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct work_struct *work)
>  		put_pcichild(hpdev, hv_pcidev_ref_initial);
>  	}
>  
> -	/* Tell the core to rescan bus because there may have been changes. */
> -	if (hbus->state == hv_pcibus_installed) {
> +	switch (hbus->state) {
> +	case hv_pcibus_installed:
> +		/*
> +		 * Tell the core to rescan bus
> +		 * because there may have been changes.
> +		 */
>  		pci_lock_rescan_remove();
>  		pci_scan_child_bus(hbus->pci_bus);
>  		pci_unlock_rescan_remove();
> -	} else {
> +		break;
> +
> +	case hv_pcibus_init:
> +	case hv_pcibus_probed:
>  		survey_child_resources(hbus);
> +		break;
> +
> +	default:
> +		break;
>  	}
>  
>  	up(&hbus->enum_sem);
> @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
>  	if (!hbus)
>  		return -ENOMEM;
> +	hbus->state = hv_pcibus_init;
>  
>  	/*
>  	 * The PCI bus "domain" is what is called "segment" in ACPI and
> @@ -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
>  		pci_stop_root_bus(hbus->pci_bus);
>  		pci_remove_root_bus(hbus->pci_bus);
>  		pci_unlock_rescan_remove();
> +		hbus->state = hv_pcibus_removed;
>  	}
>  
>  	ret = hv_send_resources_released(hdev);
> -- 
> 1.8.5.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
  2016-09-23 11:58 ` [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Cathy Avery
@ 2016-09-27 23:54   ` Long Li
  2016-09-30 12:43     ` Cathy Avery
  0 siblings, 1 reply; 9+ messages in thread
From: Long Li @ 2016-09-27 23:54 UTC (permalink / raw)
  To: Cathy Avery, driverdev-devel

Thanks for pointing that out.

If you don't mind, I will also add "Tested-by: Cathy Avery <cavery@redhat.com>".

> -----Original Message-----
> From: devel [mailto:driverdev-devel-bounces@linuxdriverproject.org] On
> Behalf Of Cathy Avery
> Sent: Friday, September 23, 2016 4:59 AM
> To: driverdev-devel@linuxdriverproject.org
> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
> 
> Hi,
> 
> You seem to be missing the Reported-by tag.
> 
> That's xiaofwan@redhat.com.
> 
> Cathy
> 
> On 09/14/2016 10:10 PM, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
> the bus is already removed before the call, so we don't need to rescan the
> bus in the workqueue scheduled from hv_pci_devices_present. By
> introducing status hv_pcibus_removed, we can avoid this situation.
> >
> > The patch fixes the following kernel panic.
> >
> > [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> > [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
> > ffff88007f600000
> > [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
> > pci_is_pcie+0x6/0x20
> > [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206 [
> > 383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
> > ffff88007f603fd8
> > [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 642f3d4854415056
> > [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
> > ffffffffa045eb9e
> > [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
> > ffff880003ee1c00
> > [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
> > 0000000000000246
> > [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
> > knlGS:0000000000000000
> > [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
> > 00000000000406f0
> > [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [  383.853124] Stack:
> > [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
> > ffff880003ee1c00
> > [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
> > ffffffff8134ee2d [  383.853124]  ffff880003d8ed00 ffff88007f603dd8
> > ffff880075fec320
> > ffff880003d8edb8
> > [  383.853124] Call Trace:
> > [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140 [
> > 383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150 [
> > 383.853124]  [<ffffffffa045ef5a>]
> > pci_devices_present_work+0x3ea/0x400 [pci_hyperv] [  383.853124]
> > [<ffffffff810a682b>] process_one_work+0x17b/0x470 [  383.853124]
> > [<ffffffff810a7666>] worker_thread+0x126/0x410 [  383.853124]
> > [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460 [  383.853124]
> > [<ffffffff810aee1f>] kthread+0xcf/0xe0 [  383.853124]
> > [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90 [
> > 383.853124]  [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84
> > 00
> > 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66
> > 66
> > 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> > [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20 [
> > 383.853124]  RSP <ffff88007f603d38>
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >   drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
> >   1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/host/pci-hyperv.c
> > b/drivers/pci/host/pci-hyperv.c index a8deeca..4a37598 100644
> > --- a/drivers/pci/host/pci-hyperv.c
> > +++ b/drivers/pci/host/pci-hyperv.c
> > @@ -348,6 +348,7 @@ enum hv_pcibus_state {
> >   	hv_pcibus_init = 0,
> >   	hv_pcibus_probed,
> >   	hv_pcibus_installed,
> > +	hv_pcibus_removed,
> >   	hv_pcibus_maximum
> >   };
> >
> > @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
> work_struct *work)
> >   		put_pcichild(hpdev, hv_pcidev_ref_initial);
> >   	}
> >
> > -	/* Tell the core to rescan bus because there may have been changes.
> */
> > -	if (hbus->state == hv_pcibus_installed) {
> > +	switch (hbus->state) {
> > +	case hv_pcibus_installed:
> > +		/*
> > +		 * Tell the core to rescan bus
> > +		 * because there may have been changes.
> > +		 */
> >   		pci_lock_rescan_remove();
> >   		pci_scan_child_bus(hbus->pci_bus);
> >   		pci_unlock_rescan_remove();
> > -	} else {
> > +		break;
> > +
> > +	case hv_pcibus_init:
> > +	case hv_pcibus_probed:
> >   		survey_child_resources(hbus);
> > +		break;
> > +
> > +	default:
> > +		break;
> >   	}
> >
> >   	up(&hbus->enum_sem);
> > @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >   	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
> >   	if (!hbus)
> >   		return -ENOMEM;
> > +	hbus->state = hv_pcibus_init;
> >
> >   	/*
> >   	 * The PCI bus "domain" is what is called "segment" in ACPI and @@
> > -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
> >   		pci_stop_root_bus(hbus->pci_bus);
> >   		pci_remove_root_bus(hbus->pci_bus);
> >   		pci_unlock_rescan_remove();
> > +		hbus->state = hv_pcibus_removed;
> >   	}
> >
> >   	ret = hv_send_resources_released(hdev);
> 
> _______________________________________________
> devel mailing list
> devel@linuxdriverproject.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdriverd
> ev.linuxdriverproject.org%2Fmailman%2Flistinfo%2Fdriverdev-
> devel&data=01%7C01%7Clongli%40microsoft.com%7Ce61e6f946aeb437969f9
> 08d3e3a9047e%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=sAZoL6h
> %2F%2BZ7xHETp1KHOXPUAQSIJN6hUUqUUP4v67rk%3D&reserved=0
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
  2016-09-27 19:29 ` Bjorn Helgaas
@ 2016-09-27 23:57   ` Long Li
  2016-10-04 14:32     ` KY Srinivasan
  1 sibling, 0 replies; 9+ messages in thread
From: Long Li @ 2016-09-27 23:57 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: KY Srinivasan, Haiyang Zhang, Bjorn Helgaas, devel, linux-pci,
	linux-kernel



> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Tuesday, September 27, 2016 12:30 PM
> To: Long Li <longli@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Bjorn Helgaas <bhelgaas@google.com>;
> devel@linuxdriverproject.org; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Long Li <longli@microsoft.com>
> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
> 
> On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
> the bus is already removed before the call, so we don't need to rescan the
> bus in the workqueue scheduled from hv_pci_devices_present. By
> introducing status hv_pcibus_removed, we can avoid this situation.
> >
> > The patch fixes the following kernel panic.
> >
> > [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> > [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
> > ffff88007f600000
> > [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
> > pci_is_pcie+0x6/0x20
> > [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206 [
> > 383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
> > ffff88007f603fd8
> > [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 642f3d4854415056
> > [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
> > ffffffffa045eb9e
> > [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
> > ffff880003ee1c00
> > [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
> > 0000000000000246
> > [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
> > knlGS:0000000000000000
> > [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> > 383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
> > 00000000000406f0
> > [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [  383.853124] Stack:
> > [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
> > ffff880003ee1c00
> > [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
> > ffffffff8134ee2d [  383.853124]  ffff880003d8ed00 ffff88007f603dd8
> > ffff880075fec320
> > ffff880003d8edb8
> > [  383.853124] Call Trace:
> > [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140 [
> > 383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150 [
> > 383.853124]  [<ffffffffa045ef5a>]
> > pci_devices_present_work+0x3ea/0x400 [pci_hyperv] [  383.853124]
> > [<ffffffff810a682b>] process_one_work+0x17b/0x470 [  383.853124]
> > [<ffffffff810a7666>] worker_thread+0x126/0x410 [  383.853124]
> > [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460 [  383.853124]
> > [<ffffffff810aee1f>] kthread+0xcf/0xe0 [  383.853124]
> > [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90 [
> > 383.853124]  [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84
> > 00
> > 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66
> > 66
> > 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> > [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20 [
> > 383.853124]  RSP <ffff88007f603d38>
> 
> Personally, I would remove the timestamps and addresses from this trace
> because I don't think they contribute to diagnosing the problem.

Thanks Bjorn. I will remove those kernel traces and send a v3 patch.

> 
> > Signed-off-by: Long Li <longli@microsoft.com>
> 
> I'm ready to apply these but am waiting for an ack from the maintainers listed
> in MAINTAINERS (feel free to update that if it's out of date).
> 
> > ---
> >  drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/host/pci-hyperv.c
> > b/drivers/pci/host/pci-hyperv.c index a8deeca..4a37598 100644
> > --- a/drivers/pci/host/pci-hyperv.c
> > +++ b/drivers/pci/host/pci-hyperv.c
> > @@ -348,6 +348,7 @@ enum hv_pcibus_state {
> >  	hv_pcibus_init = 0,
> >  	hv_pcibus_probed,
> >  	hv_pcibus_installed,
> > +	hv_pcibus_removed,
> >  	hv_pcibus_maximum
> >  };
> >
> > @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
> work_struct *work)
> >  		put_pcichild(hpdev, hv_pcidev_ref_initial);
> >  	}
> >
> > -	/* Tell the core to rescan bus because there may have been changes.
> */
> > -	if (hbus->state == hv_pcibus_installed) {
> > +	switch (hbus->state) {
> > +	case hv_pcibus_installed:
> > +		/*
> > +		 * Tell the core to rescan bus
> > +		 * because there may have been changes.
> > +		 */
> >  		pci_lock_rescan_remove();
> >  		pci_scan_child_bus(hbus->pci_bus);
> >  		pci_unlock_rescan_remove();
> > -	} else {
> > +		break;
> > +
> > +	case hv_pcibus_init:
> > +	case hv_pcibus_probed:
> >  		survey_child_resources(hbus);
> > +		break;
> > +
> > +	default:
> > +		break;
> >  	}
> >
> >  	up(&hbus->enum_sem);
> > @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
> >  	if (!hbus)
> >  		return -ENOMEM;
> > +	hbus->state = hv_pcibus_init;
> >
> >  	/*
> >  	 * The PCI bus "domain" is what is called "segment" in ACPI and @@
> > -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  		pci_stop_root_bus(hbus->pci_bus);
> >  		pci_remove_root_bus(hbus->pci_bus);
> >  		pci_unlock_rescan_remove();
> > +		hbus->state = hv_pcibus_removed;
> >  	}
> >
> >  	ret = hv_send_resources_released(hdev);
> > --
> > 1.8.5.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at
> >
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.k
> > ernel.org%2Fmajordomo-
> info.html&data=01%7C01%7Clongli%40microsoft.com%
> >
> 7Cfa9e8dff88df4dfb474808d3e70ca246%7C72f988bf86f141af91ab2d7cd011db
> 47%
> >
> 7C1&sdata=pi7cmLozpu8EqSjH6C0ihAMWtg4FX5Usy9e0PJqpH5Q%3D&reser
> ved=0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
  2016-09-27 23:54   ` Long Li
@ 2016-09-30 12:43     ` Cathy Avery
  0 siblings, 0 replies; 9+ messages in thread
From: Cathy Avery @ 2016-09-30 12:43 UTC (permalink / raw)
  To: Long Li, driverdev-devel

I have not been able to try out your second patch.

So once that happens you can add my name.

Cathy



On 09/27/2016 07:54 PM, Long Li wrote:
> Thanks for pointing that out.
>
> If you don't mind, I will also add "Tested-by: Cathy Avery <cavery@redhat.com>".
>
>> -----Original Message-----
>> From: devel [mailto:driverdev-devel-bounces@linuxdriverproject.org] On
>> Behalf Of Cathy Avery
>> Sent: Friday, September 23, 2016 4:59 AM
>> To: driverdev-devel@linuxdriverproject.org
>> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
>>
>> Hi,
>>
>> You seem to be missing the Reported-by tag.
>>
>> That's xiaofwan@redhat.com.
>>
>> Cathy
>>
>> On 09/14/2016 10:10 PM, Long Li wrote:
>>> From: Long Li <longli@microsoft.com>
>>>
>>> hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
>> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
>> the bus is already removed before the call, so we don't need to rescan the
>> bus in the workqueue scheduled from hv_pci_devices_present. By
>> introducing status hv_pcibus_removed, we can avoid this situation.
>>> The patch fixes the following kernel panic.
>>>
>>> [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
>>> [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
>>> ffff88007f600000
>>> [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
>>> pci_is_pcie+0x6/0x20
>>> [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206 [
>>> 383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
>>> ffff88007f603fd8
>>> [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> 642f3d4854415056
>>> [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
>>> ffffffffa045eb9e
>>> [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
>>> ffff880003ee1c00
>>> [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
>>> 0000000000000246
>>> [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
>>> knlGS:0000000000000000
>>> [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
>>> 383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
>>> 00000000000406f0
>>> [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>> 0000000000000400
>>> [  383.853124] Stack:
>>> [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
>>> ffff880003ee1c00
>>> [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
>>> ffffffff8134ee2d [  383.853124]  ffff880003d8ed00 ffff88007f603dd8
>>> ffff880075fec320
>>> ffff880003d8edb8
>>> [  383.853124] Call Trace:
>>> [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140 [
>>> 383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150 [
>>> 383.853124]  [<ffffffffa045ef5a>]
>>> pci_devices_present_work+0x3ea/0x400 [pci_hyperv] [  383.853124]
>>> [<ffffffff810a682b>] process_one_work+0x17b/0x470 [  383.853124]
>>> [<ffffffff810a7666>] worker_thread+0x126/0x410 [  383.853124]
>>> [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460 [  383.853124]
>>> [<ffffffff810aee1f>] kthread+0xcf/0xe0 [  383.853124]
>>> [<ffffffff810aed50>] ?
>>> kthread_create_on_node+0x140/0x140
>>> [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90 [
>>> 383.853124]  [<ffffffff810aed50>] ?
>>> kthread_create_on_node+0x140/0x140
>>> [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84
>>> 00
>>> 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66
>>> 66
>>> 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
>>> [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20 [
>>> 383.853124]  RSP <ffff88007f603d38>
>>>
>>> Signed-off-by: Long Li <longli@microsoft.com>
>>> ---
>>>    drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
>>>    1 file changed, 17 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/pci/host/pci-hyperv.c
>>> b/drivers/pci/host/pci-hyperv.c index a8deeca..4a37598 100644
>>> --- a/drivers/pci/host/pci-hyperv.c
>>> +++ b/drivers/pci/host/pci-hyperv.c
>>> @@ -348,6 +348,7 @@ enum hv_pcibus_state {
>>>    	hv_pcibus_init = 0,
>>>    	hv_pcibus_probed,
>>>    	hv_pcibus_installed,
>>> +	hv_pcibus_removed,
>>>    	hv_pcibus_maximum
>>>    };
>>>
>>> @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
>> work_struct *work)
>>>    		put_pcichild(hpdev, hv_pcidev_ref_initial);
>>>    	}
>>>
>>> -	/* Tell the core to rescan bus because there may have been changes.
>> */
>>> -	if (hbus->state == hv_pcibus_installed) {
>>> +	switch (hbus->state) {
>>> +	case hv_pcibus_installed:
>>> +		/*
>>> +		 * Tell the core to rescan bus
>>> +		 * because there may have been changes.
>>> +		 */
>>>    		pci_lock_rescan_remove();
>>>    		pci_scan_child_bus(hbus->pci_bus);
>>>    		pci_unlock_rescan_remove();
>>> -	} else {
>>> +		break;
>>> +
>>> +	case hv_pcibus_init:
>>> +	case hv_pcibus_probed:
>>>    		survey_child_resources(hbus);
>>> +		break;
>>> +
>>> +	default:
>>> +		break;
>>>    	}
>>>
>>>    	up(&hbus->enum_sem);
>>> @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>>>    	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
>>>    	if (!hbus)
>>>    		return -ENOMEM;
>>> +	hbus->state = hv_pcibus_init;
>>>
>>>    	/*
>>>    	 * The PCI bus "domain" is what is called "segment" in ACPI and @@
>>> -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
>>>    		pci_stop_root_bus(hbus->pci_bus);
>>>    		pci_remove_root_bus(hbus->pci_bus);
>>>    		pci_unlock_rescan_remove();
>>> +		hbus->state = hv_pcibus_removed;
>>>    	}
>>>
>>>    	ret = hv_send_resources_released(hdev);
>> _______________________________________________
>> devel mailing list
>> devel@linuxdriverproject.org
>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdriverd
>> ev.linuxdriverproject.org%2Fmailman%2Flistinfo%2Fdriverdev-
>> devel&data=01%7C01%7Clongli%40microsoft.com%7Ce61e6f946aeb437969f9
>> 08d3e3a9047e%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=sAZoL6h
>> %2F%2BZ7xHETp1KHOXPUAQSIJN6hUUqUUP4v67rk%3D&reserved=0

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
  2016-09-27 19:29 ` Bjorn Helgaas
@ 2016-10-04 14:32     ` KY Srinivasan
  2016-10-04 14:32     ` KY Srinivasan
  1 sibling, 0 replies; 9+ messages in thread
From: KY Srinivasan @ 2016-10-04 14:32 UTC (permalink / raw)
  To: Bjorn Helgaas, Long Li
  Cc: Haiyang Zhang, Bjorn Helgaas, devel, linux-pci, linux-kernel, Long Li



> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Tuesday, September 27, 2016 12:30 PM
> To: Long Li <longli@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Bjorn Helgaas <bhelgaas@google.com>;
> devel@linuxdriverproject.org; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Long Li <longli@microsoft.com>
> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
> 
> On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
> the bus is already removed before the call, so we don't need to rescan the
> bus in the workqueue scheduled from hv_pci_devices_present. By
> introducing status hv_pcibus_removed, we can avoid this situation.
> >
> > The patch fixes the following kernel panic.
> >
> > [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> > [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
> > ffff88007f600000
> > [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
> > pci_is_pcie+0x6/0x20
> > [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206
> > [  383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
> > ffff88007f603fd8
> > [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 642f3d4854415056
> > [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
> > ffffffffa045eb9e
> > [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
> > ffff880003ee1c00
> > [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
> > 0000000000000246
> > [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
> > knlGS:0000000000000000
> > [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
> > 00000000000406f0
> > [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [  383.853124] Stack:
> > [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
> > ffff880003ee1c00
> > [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
> > ffffffff8134ee2d
> > [  383.853124]  ffff880003d8ed00 ffff88007f603dd8 ffff880075fec320
> > ffff880003d8edb8
> > [  383.853124] Call Trace:
> > [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140
> > [  383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150
> > [  383.853124]  [<ffffffffa045ef5a>]
> > pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
> > [  383.853124]  [<ffffffff810a682b>] process_one_work+0x17b/0x470
> > [  383.853124]  [<ffffffff810a7666>] worker_thread+0x126/0x410
> > [  383.853124]  [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460
> > [  383.853124]  [<ffffffff810aee1f>] kthread+0xcf/0xe0
> > [  383.853124]  [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90
> > [  383.853124]  [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
> > 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
> > 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> > [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20
> > [  383.853124]  RSP <ffff88007f603d38>
> 
> Personally, I would remove the timestamps and addresses from this trace
> because I don't think they contribute to diagnosing the problem.
> 
> > Signed-off-by: Long Li <longli@microsoft.com>

Acked-by: KY Srinivasan <kys@microsoft.com>

Thanks,

K. Y
> 
> I'm ready to apply these but am waiting for an ack from the maintainers
> listed in MAINTAINERS (feel free to update that if it's out of date).
> 
> > ---
> >  drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> > index a8deeca..4a37598 100644
> > --- a/drivers/pci/host/pci-hyperv.c
> > +++ b/drivers/pci/host/pci-hyperv.c
> > @@ -348,6 +348,7 @@ enum hv_pcibus_state {
> >  	hv_pcibus_init = 0,
> >  	hv_pcibus_probed,
> >  	hv_pcibus_installed,
> > +	hv_pcibus_removed,
> >  	hv_pcibus_maximum
> >  };
> >
> > @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
> work_struct *work)
> >  		put_pcichild(hpdev, hv_pcidev_ref_initial);
> >  	}
> >
> > -	/* Tell the core to rescan bus because there may have been changes.
> */
> > -	if (hbus->state == hv_pcibus_installed) {
> > +	switch (hbus->state) {
> > +	case hv_pcibus_installed:
> > +		/*
> > +		 * Tell the core to rescan bus
> > +		 * because there may have been changes.
> > +		 */
> >  		pci_lock_rescan_remove();
> >  		pci_scan_child_bus(hbus->pci_bus);
> >  		pci_unlock_rescan_remove();
> > -	} else {
> > +		break;
> > +
> > +	case hv_pcibus_init:
> > +	case hv_pcibus_probed:
> >  		survey_child_resources(hbus);
> > +		break;
> > +
> > +	default:
> > +		break;
> >  	}
> >
> >  	up(&hbus->enum_sem);
> > @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  	hbus = kzalloc(sizeof(*hbus), GFP_KERNEL);
> >  	if (!hbus)
> >  		return -ENOMEM;
> > +	hbus->state = hv_pcibus_init;
> >
> >  	/*
> >  	 * The PCI bus "domain" is what is called "segment" in ACPI and
> > @@ -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  		pci_stop_root_bus(hbus->pci_bus);
> >  		pci_remove_root_bus(hbus->pci_bus);
> >  		pci_unlock_rescan_remove();
> > +		hbus->state = hv_pcibus_removed;
> >  	}
> >
> >  	ret = hv_send_resources_released(hdev);
> > --
> > 1.8.5.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html&data=01%7C01%7Ckys%40microsoft.com%7C16b06c0f0b744cc493c
> c08d3e70ca23f%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=mdY6Lg
> az4D1S360p7orMYAHz6MfpVEp%2F412G2aBUHPw%3D&reserved=0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
@ 2016-10-04 14:32     ` KY Srinivasan
  0 siblings, 0 replies; 9+ messages in thread
From: KY Srinivasan @ 2016-10-04 14:32 UTC (permalink / raw)
  To: Bjorn Helgaas, Long Li
  Cc: Haiyang Zhang, Bjorn Helgaas, devel, linux-pci, linux-kernel, Long Li



> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Tuesday, September 27, 2016 12:30 PM
> To: Long Li <longli@microsoft.com>
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Bjorn Helgaas <bhelgaas@google.com>;
> devel@linuxdriverproject.org; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Long Li <longli@microsoft.com>
> Subject: Re: [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove
>=20
> On Wed, Sep 14, 2016 at 07:10:01PM -0700, Long Li wrote:
> > From: Long Li <longli@microsoft.com>
> >
> > hv_pci_devices_present is called in hv_pci_remove when we remove a PCI
> device from host (e.g. by disabling SRIOV on a device). In hv_pci_remove,
> the bus is already removed before the call, so we don't need to rescan th=
e
> bus in the workqueue scheduled from hv_pci_devices_present. By
> introducing status hv_pcibus_removed, we can avoid this situation.
> >
> > The patch fixes the following kernel panic.
> >
> > [  383.853124] Workqueue: events pci_devices_present_work [pci_hyperv]
> > [  383.853124] task: ffff88007f5f8000 ti: ffff88007f600000 task.ti:
> > ffff88007f600000
> > [  383.853124] RIP: 0010:[<ffffffff81349806>]  [<ffffffff81349806>]
> > pci_is_pcie+0x6/0x20
> > [  383.853124] RSP: 0018:ffff88007f603d38  EFLAGS: 00010206
> > [  383.853124] RAX: ffff88007f5f8000 RBX: 642f3d4854415056 RCX:
> > ffff88007f603fd8
> > [  383.853124] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> > 642f3d4854415056
> > [  383.853124] RBP: ffff88007f603d68 R08: 0000000000000246 R09:
> > ffffffffa045eb9e
> > [  383.853124] R10: ffff88007b419a80 R11: ffffea0001c0ef40 R12:
> > ffff880003ee1c00
> > [  383.853124] R13: 63702f30303a3137 R14: 0000000000000000 R15:
> > 0000000000000246
> > [  383.853124] FS:  0000000000000000(0000) GS:ffff88007b400000(0000)
> > knlGS:0000000000000000
> > [  383.853124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  383.853124] CR2: 00007f68b3f52350 CR3: 0000000003546000 CR4:
> > 00000000000406f0
> > [  383.853124] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  383.853124] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> > 0000000000000400
> > [  383.853124] Stack:
> > [  383.853124]  ffff88007f603d68 ffffffff8134db17 0000000000000008
> > ffff880003ee1c00
> > [  383.853124]  63702f30303a3137 ffff880003d8edb8 ffff88007f603da0
> > ffffffff8134ee2d
> > [  383.853124]  ffff880003d8ed00 ffff88007f603dd8 ffff880075fec320
> > ffff880003d8edb8
> > [  383.853124] Call Trace:
> > [  383.853124]  [<ffffffff8134db17>] ? pci_scan_slot+0x27/0x140
> > [  383.853124]  [<ffffffff8134ee2d>] pci_scan_child_bus+0x3d/0x150
> > [  383.853124]  [<ffffffffa045ef5a>]
> > pci_devices_present_work+0x3ea/0x400 [pci_hyperv]
> > [  383.853124]  [<ffffffff810a682b>] process_one_work+0x17b/0x470
> > [  383.853124]  [<ffffffff810a7666>] worker_thread+0x126/0x410
> > [  383.853124]  [<ffffffff810a7540>] ? rescuer_thread+0x460/0x460
> > [  383.853124]  [<ffffffff810aee1f>] kthread+0xcf/0xe0
> > [  383.853124]  [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124]  [<ffffffff81699958>] ret_from_fork+0x58/0x90
> > [  383.853124]  [<ffffffff810aed50>] ?
> > kthread_create_on_node+0x140/0x140
> > [  383.853124] Code: 89 e5 5d 25 f0 00 00 00 c1 f8 04 c3 66 0f 1f 84 00
> > 00 00 00 00 66 66 66 66 90 55 0f b6 47 4a 48 89 e5 5d c3 90 66 66 66 66
> > 90 55 <80> 7f 4a 00 48 89 e5 5d 0f 95 c0 c3 0f 1f 40 00 66 2e 0f 1f 84
> > [  383.853124] RIP  [<ffffffff81349806>] pci_is_pcie+0x6/0x20
> > [  383.853124]  RSP <ffff88007f603d38>
>=20
> Personally, I would remove the timestamps and addresses from this trace
> because I don't think they contribute to diagnosing the problem.
>=20
> > Signed-off-by: Long Li <longli@microsoft.com>

Acked-by: KY Srinivasan <kys@microsoft.com>

Thanks,

K. Y
>=20
> I'm ready to apply these but am waiting for an ack from the maintainers
> listed in MAINTAINERS (feel free to update that if it's out of date).
>=20
> > ---
> >  drivers/pci/host/pci-hyperv.c | 20 +++++++++++++++++---
> >  1 file changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyper=
v.c
> > index a8deeca..4a37598 100644
> > --- a/drivers/pci/host/pci-hyperv.c
> > +++ b/drivers/pci/host/pci-hyperv.c
> > @@ -348,6 +348,7 @@ enum hv_pcibus_state {
> >  	hv_pcibus_init =3D 0,
> >  	hv_pcibus_probed,
> >  	hv_pcibus_installed,
> > +	hv_pcibus_removed,
> >  	hv_pcibus_maximum
> >  };
> >
> > @@ -1481,13 +1482,24 @@ static void pci_devices_present_work(struct
> work_struct *work)
> >  		put_pcichild(hpdev, hv_pcidev_ref_initial);
> >  	}
> >
> > -	/* Tell the core to rescan bus because there may have been changes.
> */
> > -	if (hbus->state =3D=3D hv_pcibus_installed) {
> > +	switch (hbus->state) {
> > +	case hv_pcibus_installed:
> > +		/*
> > +		 * Tell the core to rescan bus
> > +		 * because there may have been changes.
> > +		 */
> >  		pci_lock_rescan_remove();
> >  		pci_scan_child_bus(hbus->pci_bus);
> >  		pci_unlock_rescan_remove();
> > -	} else {
> > +		break;
> > +
> > +	case hv_pcibus_init:
> > +	case hv_pcibus_probed:
> >  		survey_child_resources(hbus);
> > +		break;
> > +
> > +	default:
> > +		break;
> >  	}
> >
> >  	up(&hbus->enum_sem);
> > @@ -2163,6 +2175,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  	hbus =3D kzalloc(sizeof(*hbus), GFP_KERNEL);
> >  	if (!hbus)
> >  		return -ENOMEM;
> > +	hbus->state =3D hv_pcibus_init;
> >
> >  	/*
> >  	 * The PCI bus "domain" is what is called "segment" in ACPI and
> > @@ -2305,6 +2318,7 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  		pci_stop_root_bus(hbus->pci_bus);
> >  		pci_remove_root_bus(hbus->pci_bus);
> >  		pci_unlock_rescan_remove();
> > +		hbus->state =3D hv_pcibus_removed;
> >  	}
> >
> >  	ret =3D hv_send_resources_released(hdev);
> > --
> > 1.8.5.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at
> https://na01.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fvger.ke
> rnel.org%2Fmajordomo-
> info.html&data=3D01%7C01%7Ckys%40microsoft.com%7C16b06c0f0b744cc493c
> c08d3e70ca23f%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=3DmdY6Lg
> az4D1S360p7orMYAHz6MfpVEp%2F412G2aBUHPw%3D&reserved=3D0

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-04 14:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-15  2:10 [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Long Li
2016-09-15  2:10 ` [PATCH 2/2 v2] pci-hyperv: lock pci bus on device eject Long Li
2016-09-23 11:58 ` [PATCH 1/2 v2] pci-hyperv: properly handle pci bus remove Cathy Avery
2016-09-27 23:54   ` Long Li
2016-09-30 12:43     ` Cathy Avery
2016-09-27 19:29 ` Bjorn Helgaas
2016-09-27 23:57   ` Long Li
2016-10-04 14:32   ` KY Srinivasan
2016-10-04 14:32     ` KY Srinivasan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.