linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
@ 2021-04-22  5:45 longli
  2021-04-22  5:45 ` [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal longli
  2021-04-23  7:09 ` [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device Dexuan Cui
  0 siblings, 2 replies; 11+ messages in thread
From: longli @ 2021-04-22  5:45 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas, linux-hyperv,
	linux-pci, linux-kernel
  Cc: Long Li

From: Long Li <longli@microsoft.com>

On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().

This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.

Fix this by flushing/stopping the workqueue of hbus before doing hbus remove.

Signed-off-by: Long Li <longli@microsoft.com>
---

Change in v2: Remove unused bus state hv_pcibus_removed

 drivers/pci/controller/pci-hyperv.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 27a17a1e4a7c..fc948a2ed703 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -444,7 +444,6 @@ enum hv_pcibus_state {
 	hv_pcibus_probed,
 	hv_pcibus_installed,
 	hv_pcibus_removing,
-	hv_pcibus_removed,
 	hv_pcibus_maximum
 };
 
@@ -3305,13 +3304,22 @@ static int hv_pci_remove(struct hv_device *hdev)
 
 	hbus = hv_get_drvdata(hdev);
 	if (hbus->state == hv_pcibus_installed) {
+		tasklet_disable(&hdev->channel->callback_event);
+		hbus->state = hv_pcibus_removing;
+		tasklet_enable(&hdev->channel->callback_event);
+		destroy_workqueue(hbus->wq);
+		/*
+		 * At this point, no work is running or can be scheduled
+		 * on hbus-wq. We can't race with hv_pci_devices_present()
+		 * or hv_pci_eject_device(), it's safe to proceed.
+		 */
+
 		/* Remove the bus from PCI's point of view. */
 		pci_lock_rescan_remove();
 		pci_stop_root_bus(hbus->pci_bus);
 		hv_pci_remove_slots(hbus);
 		pci_remove_root_bus(hbus->pci_bus);
 		pci_unlock_rescan_remove();
-		hbus->state = hv_pcibus_removed;
 	}
 
 	ret = hv_pci_bus_exit(hdev, false);
@@ -3326,7 +3334,6 @@ static int hv_pci_remove(struct hv_device *hdev)
 	irq_domain_free_fwnode(hbus->sysdata.fwnode);
 	put_hvpcibus(hbus);
 	wait_for_completion(&hbus->remove_event);
-	destroy_workqueue(hbus->wq);
 
 	hv_put_dom_num(hbus->sysdata.domain);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal
  2021-04-22  5:45 [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device longli
@ 2021-04-22  5:45 ` longli
  2021-04-23  7:18   ` Dexuan Cui
  2021-04-23  7:09 ` [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device Dexuan Cui
  1 sibling, 1 reply; 11+ messages in thread
From: longli @ 2021-04-22  5:45 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas, linux-hyperv,
	linux-pci, linux-kernel
  Cc: Long Li

From: Long Li <longli@microsoft.com>

With the new method of flushing/stopping the workqueue before doing bus
removal, the old mechanisum of using refcount and wait for completion
is no longer needed. Remove those dead code.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 34 +++--------------------------
 1 file changed, 3 insertions(+), 31 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index fc948a2ed703..e6b4ee323068 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -452,7 +452,6 @@ struct hv_pcibus_device {
 	/* Protocol version negotiated with the host */
 	enum pci_protocol_version_t protocol_version;
 	enum hv_pcibus_state state;
-	refcount_t remove_lock;
 	struct hv_device *hdev;
 	resource_size_t low_mmio_space;
 	resource_size_t high_mmio_space;
@@ -460,7 +459,6 @@ struct hv_pcibus_device {
 	struct resource *low_mmio_res;
 	struct resource *high_mmio_res;
 	struct completion *survey_event;
-	struct completion remove_event;
 	struct pci_bus *pci_bus;
 	spinlock_t config_lock;	/* Avoid two threads writing index page */
 	spinlock_t device_list_lock;	/* Protect lists below */
@@ -593,9 +591,6 @@ static void put_pcichild(struct hv_pci_dev *hpdev)
 		kfree(hpdev);
 }
 
-static void get_hvpcibus(struct hv_pcibus_device *hv_pcibus);
-static void put_hvpcibus(struct hv_pcibus_device *hv_pcibus);
-
 /*
  * There is no good way to get notified from vmbus_onoffer_rescind(),
  * so let's use polling here, since this is not a hot path.
@@ -2067,10 +2062,8 @@ static void pci_devices_present_work(struct work_struct *work)
 	}
 	spin_unlock_irqrestore(&hbus->device_list_lock, flags);
 
-	if (!dr) {
-		put_hvpcibus(hbus);
+	if (!dr)
 		return;
-	}
 
 	/* First, mark all existing children as reported missing. */
 	spin_lock_irqsave(&hbus->device_list_lock, flags);
@@ -2153,7 +2146,6 @@ static void pci_devices_present_work(struct work_struct *work)
 		break;
 	}
 
-	put_hvpcibus(hbus);
 	kfree(dr);
 }
 
@@ -2194,12 +2186,10 @@ static int hv_pci_start_relations_work(struct hv_pcibus_device *hbus,
 	list_add_tail(&dr->list_entry, &hbus->dr_list);
 	spin_unlock_irqrestore(&hbus->device_list_lock, flags);
 
-	if (pending_dr) {
+	if (pending_dr)
 		kfree(dr_wrk);
-	} else {
-		get_hvpcibus(hbus);
+	else
 		queue_work(hbus->wq, &dr_wrk->wrk);
-	}
 
 	return 0;
 }
@@ -2342,8 +2332,6 @@ static void hv_eject_device_work(struct work_struct *work)
 	put_pcichild(hpdev);
 	put_pcichild(hpdev);
 	/* hpdev has been freed. Do not use it any more. */
-
-	put_hvpcibus(hbus);
 }
 
 /**
@@ -2367,7 +2355,6 @@ static void hv_pci_eject_device(struct hv_pci_dev *hpdev)
 	hpdev->state = hv_pcichild_ejecting;
 	get_pcichild(hpdev);
 	INIT_WORK(&hpdev->wrk, hv_eject_device_work);
-	get_hvpcibus(hbus);
 	queue_work(hbus->wq, &hpdev->wrk);
 }
 
@@ -2967,17 +2954,6 @@ static int hv_send_resources_released(struct hv_device *hdev)
 	return 0;
 }
 
-static void get_hvpcibus(struct hv_pcibus_device *hbus)
-{
-	refcount_inc(&hbus->remove_lock);
-}
-
-static void put_hvpcibus(struct hv_pcibus_device *hbus)
-{
-	if (refcount_dec_and_test(&hbus->remove_lock))
-		complete(&hbus->remove_event);
-}
-
 #define HVPCI_DOM_MAP_SIZE (64 * 1024)
 static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
 
@@ -3097,14 +3073,12 @@ static int hv_pci_probe(struct hv_device *hdev,
 	hbus->sysdata.domain = dom;
 
 	hbus->hdev = hdev;
-	refcount_set(&hbus->remove_lock, 1);
 	INIT_LIST_HEAD(&hbus->children);
 	INIT_LIST_HEAD(&hbus->dr_list);
 	INIT_LIST_HEAD(&hbus->resources_for_children);
 	spin_lock_init(&hbus->config_lock);
 	spin_lock_init(&hbus->device_list_lock);
 	spin_lock_init(&hbus->retarget_msi_interrupt_lock);
-	init_completion(&hbus->remove_event);
 	hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0,
 					   hbus->sysdata.domain);
 	if (!hbus->wq) {
@@ -3332,8 +3306,6 @@ static int hv_pci_remove(struct hv_device *hdev)
 	hv_pci_free_bridge_windows(hbus);
 	irq_domain_remove(hbus->irq_domain);
 	irq_domain_free_fwnode(hbus->sysdata.fwnode);
-	put_hvpcibus(hbus);
-	wait_for_completion(&hbus->remove_event);
 
 	hv_put_dom_num(hbus->sysdata.domain);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
  2021-04-22  5:45 [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device longli
  2021-04-22  5:45 ` [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal longli
@ 2021-04-23  7:09 ` Dexuan Cui
  2021-04-23 18:32   ` Long Li
  1 sibling, 1 reply; 11+ messages in thread
From: Dexuan Cui @ 2021-04-23  7:09 UTC (permalink / raw)
  To: longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas, linux-hyperv,
	linux-pci, linux-kernel
  Cc: Long Li

> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> Sent: Wednesday, April 21, 2021 10:46 PM
> ...
> diff --git a/drivers/pci/controller/pci-hyperv.c
> b/drivers/pci/controller/pci-hyperv.c
> index 27a17a1e4a7c..fc948a2ed703 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -444,7 +444,6 @@ enum hv_pcibus_state {
>  	hv_pcibus_probed,
>  	hv_pcibus_installed,
>  	hv_pcibus_removing,
> -	hv_pcibus_removed,
>  	hv_pcibus_maximum
>  };
> 
> @@ -3305,13 +3304,22 @@ static int hv_pci_remove(struct hv_device *hdev)
> 
>  	hbus = hv_get_drvdata(hdev);
>  	if (hbus->state == hv_pcibus_installed) {
> +		tasklet_disable(&hdev->channel->callback_event);
> +		hbus->state = hv_pcibus_removing;
> +		tasklet_enable(&hdev->channel->callback_event);
> +		destroy_workqueue(hbus->wq);

If we test "rmmod pci-hyperv", I suspect the warning will be printed:
hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_start_relations_work():

        if (hbus->state == hv_pcibus_removing) {
                dev_info(&hbus->hdev->device,
                         "PCI VMBus BUS_RELATIONS: ignored\n");
                return -ENOENT;
        }

Ideally we'd like to avoid the warning in the driver unloading case.

BTW, can you please add "hbus->wq = NULL;" after the line
"destroy_workqueue(hbus->wq);"? In case some other function could
still try to use hbus->wq by accident in the future, the error would be
easier to be understood.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal
  2021-04-22  5:45 ` [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal longli
@ 2021-04-23  7:18   ` Dexuan Cui
  2021-04-23 18:40     ` Long Li
  0 siblings, 1 reply; 11+ messages in thread
From: Dexuan Cui @ 2021-04-23  7:18 UTC (permalink / raw)
  To: longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas, linux-hyperv,
	linux-pci, linux-kernel
  Cc: Long Li

> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> Sent: Wednesday, April 21, 2021 10:46 PM
> 
> With the new method of flushing/stopping the workqueue before doing bus
> removal, the old mechanisum of using refcount and wait for completion

mechanisum -> mechanism

> is no longer needed. Remove those dead code.
> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---

The patch looks good to me. BTW, can we also remove get_pcichild() and
put_pcichild() in an extra patch? I suspect we don't really need those either.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
  2021-04-23  7:09 ` [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device Dexuan Cui
@ 2021-04-23 18:32   ` Long Li
  2021-04-23 18:41     ` Dexuan Cui
  0 siblings, 1 reply; 11+ messages in thread
From: Long Li @ 2021-04-23 18:32 UTC (permalink / raw)
  To: Dexuan Cui, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel

> Subject: RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the
> device
> 
> > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> > Sent: Wednesday, April 21, 2021 10:46 PM ...
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 27a17a1e4a7c..fc948a2ed703 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -444,7 +444,6 @@ enum hv_pcibus_state {
> >  	hv_pcibus_probed,
> >  	hv_pcibus_installed,
> >  	hv_pcibus_removing,
> > -	hv_pcibus_removed,
> >  	hv_pcibus_maximum
> >  };
> >
> > @@ -3305,13 +3304,22 @@ static int hv_pci_remove(struct hv_device
> > *hdev)
> >
> >  	hbus = hv_get_drvdata(hdev);
> >  	if (hbus->state == hv_pcibus_installed) {
> > +		tasklet_disable(&hdev->channel->callback_event);
> > +		hbus->state = hv_pcibus_removing;
> > +		tasklet_enable(&hdev->channel->callback_event);
> > +		destroy_workqueue(hbus->wq);
> 
> If we test "rmmod pci-hyperv", I suspect the warning will be printed:
> hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_start_relations_work():

In most case, it will not print anything. 

It will print something if there is a PCI_BUS_RELATION work pending at the time of remove. The same goes to PCI_EJECT. In those cases, the message is valuable to troubleshooting.

> 
>         if (hbus->state == hv_pcibus_removing) {
>                 dev_info(&hbus->hdev->device,
>                          "PCI VMBus BUS_RELATIONS: ignored\n");
>                 return -ENOENT;
>         }
> 
> Ideally we'd like to avoid the warning in the driver unloading case.
> 
> BTW, can you please add "hbus->wq = NULL;" after the line
> "destroy_workqueue(hbus->wq);"? In case some other function could still
> try to use hbus->wq by accident in the future, the error would be easier to
> be understood.

I will send v3 to add =NULL.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal
  2021-04-23  7:18   ` Dexuan Cui
@ 2021-04-23 18:40     ` Long Li
  2021-04-25  2:57       ` Dexuan Cui
  0 siblings, 1 reply; 11+ messages in thread
From: Long Li @ 2021-04-23 18:40 UTC (permalink / raw)
  To: Dexuan Cui, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel

> Subject: RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting
> functions for handling bus device removal
> 
> > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> > Sent: Wednesday, April 21, 2021 10:46 PM
> >
> > With the new method of flushing/stopping the workqueue before doing
> > bus removal, the old mechanisum of using refcount and wait for
> > completion
> 
> mechanisum -> mechanism
> 
> > is no longer needed. Remove those dead code.
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> 
> The patch looks good to me. BTW, can we also remove get_pcichild() and
> put_pcichild() in an extra patch? I suspect we don't really need those either.

Those two functions are for protecting accessing to the devices on the hbus. There are interactions from PCI layer that need guarantee from hbus that the device is present at the time of access.

Why do you think we don't' need those?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
  2021-04-23 18:32   ` Long Li
@ 2021-04-23 18:41     ` Dexuan Cui
  2021-04-23 18:49       ` Long Li
  0 siblings, 1 reply; 11+ messages in thread
From: Dexuan Cui @ 2021-04-23 18:41 UTC (permalink / raw)
  To: Long Li, longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas,
	linux-hyperv, linux-pci, linux-kernel

> From: Long Li <longli@microsoft.com>
> Sent: Friday, April 23, 2021 11:32 AM
> > ...
> > If we test "rmmod pci-hyperv", I suspect the warning will be printed:
> > hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_start_relations_work():
> 
> In most case, it will not print anything.

If I read the code correctly, I think the warning is printed _every time_ we
unload pci-hyperv.

> It will print something if there is a PCI_BUS_RELATION work pending at the time
> of remove. The same goes to PCI_EJECT. In those cases, the message is valuable
> to troubleshooting.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
  2021-04-23 18:41     ` Dexuan Cui
@ 2021-04-23 18:49       ` Long Li
  2021-04-25  2:24         ` Dexuan Cui
  0 siblings, 1 reply; 11+ messages in thread
From: Long Li @ 2021-04-23 18:49 UTC (permalink / raw)
  To: Dexuan Cui, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel

> Subject: RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the
> device
> 
> > From: Long Li <longli@microsoft.com>
> > Sent: Friday, April 23, 2021 11:32 AM
> > > ...
> > > If we test "rmmod pci-hyperv", I suspect the warning will be printed:
> > > hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_start_relations_work():
> >
> > In most case, it will not print anything.
> 
> If I read the code correctly, I think the warning is printed _every time_ we
> unload pci-hyperv.

Okay I see what you mean. I'll remove this message.

> 
> > It will print something if there is a PCI_BUS_RELATION work pending at
> > the time of remove. The same goes to PCI_EJECT. In those cases, the
> > message is valuable to troubleshooting.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
  2021-04-23 18:49       ` Long Li
@ 2021-04-25  2:24         ` Dexuan Cui
  2021-04-25  4:53           ` Long Li
  0 siblings, 1 reply; 11+ messages in thread
From: Dexuan Cui @ 2021-04-25  2:24 UTC (permalink / raw)
  To: Long Li, longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas,
	linux-hyperv, linux-pci, linux-kernel

> From: Long Li <longli@microsoft.com>
> Sent: Friday, April 23, 2021 11:49 AM
> To: Dexuan Cui <decui@microsoft.com>; longli@linuxonhyperv.com; KY
> 
> > Subject: RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the
> > device
> >
> > > From: Long Li <longli@microsoft.com>
> > > Sent: Friday, April 23, 2021 11:32 AM
> > > > ...
> > > > If we test "rmmod pci-hyperv", I suspect the warning will be printed:
> > > > hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_start_relations_work():
> > >
> > > In most case, it will not print anything.
> >
> > If I read the code correctly, I think the warning is printed _every time_ we
> > unload pci-hyperv.
> 
> Okay I see what you mean. I'll remove this message.

Here we just want to avoid the message every time the pci-hyperv driver is
unloaded. We might want to see the possible message when the PCI device
is removed, but it's ok to me if the message is unconditionally removed.

The real issus with the patch is that the 'hpdev' struct is never freed when
the driver is unloaded: if we print out the value of the ref counter in
put_pcichild(), we would notice that the ref counter is still two when the
driver is unloaded, i.e. memory leak occurs.

Before the patch, hv_pci_remove() calls hv_pci_bus_exit() ->
hv_pci_start_relations_work(), and the ref counter drops to zero in
pci_devices_present_work() due to the two calls of put_pcichild().

With the patch, when the driver is unloaded, pci_devices_present_work() 
is not scheduled, hence the ref counter doesn't drop to zero.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal
  2021-04-23 18:40     ` Long Li
@ 2021-04-25  2:57       ` Dexuan Cui
  0 siblings, 0 replies; 11+ messages in thread
From: Dexuan Cui @ 2021-04-25  2:57 UTC (permalink / raw)
  To: Long Li, longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas,
	linux-hyperv, linux-pci, linux-kernel

> From: Long Li <longli@microsoft.com>
> Sent: Friday, April 23, 2021 11:40 AM
> To: Dexuan Cui <decui@microsoft.com>; longli@linuxonhyperv.com; KY
> 
> > Subject: RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting
> > functions for handling bus device removal
> >
> > > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>
> > > Sent: Wednesday, April 21, 2021 10:46 PM
> > >
> > > With the new method of flushing/stopping the workqueue before doing
> > > bus removal, the old mechanisum of using refcount and wait for
> > > completion
> >
> > mechanisum -> mechanism
> >
> > > is no longer needed. Remove those dead code.
> > >
> > > Signed-off-by: Long Li <longli@microsoft.com>
> >
> > The patch looks good to me. BTW, can we also remove get_pcichild() and
> > put_pcichild() in an extra patch? I suspect we don't really need those either.
> 
> Those two functions are for protecting accessing to the devices on the hbus.
> There are interactions from PCI layer that need guarantee from hbus that the
> device is present at the time of access.
> 
> Why do you think we don't' need those?

IMO there is proper locking and synchronization logic in the PCI layer, so
I don't think the 'hpdev' struct can vanish when it's being accessed from the
PCI layer.

I think the 'hpdev' struct can only be freed in two scenarios:
1) the PCI device is removed: the VM receives the PCI_EJECT message,
hv_eject_device_work() calls pci_stop_and_remove_bus_device() to deregister
the pci_dev from the PCI layer, and then does other cleanup, and finally
call kfree(hpdev) in the third put_pcichild() in hv_eject_device_work().

2) the pci-hyperv driver is unloaded: in this case, hv_pci_remove() calls
pci_remove_root_bus() to deregister the pci_dev, and then calls
hv_pci_bus_exit() -> hv_pci_start_relations_work(), and eventually
pci_devices_present_work() decreases the ref counter to zero and free
the 'hpdev'.

In both the case, when the hpdev or the pdev is still being used by the
PCI layer, I think the pci_stop_and_remove_bus_device() and
pci_remove_root_bus() should be blocked, i.e. the hpdev can't be freed
even if we don't have the ref counter mechanism.

For example. we know the 'lspci' program can read the PCI device's config
space directly via the sysfile /sys/bus/pci/devices/XXXX/config; when
'lspci' is reading the config space, the code path can be:

do_syscall_64
  ksys_pread64
    vfs_read
      new_sync_read
        kernfs_fop_read_iter
          kernfs_file_read_iter
            kernfs_get_active -> atomic_inc_unless_negative(&kn->active).
            pci_read_config
              pci_user_read_config_dword
                hv_pcifront_read_config
                  _hv_pcifront_read_config

At this moment, if the host tries to remove the PCI device, the PCI_EJECT
code path will be blocked due to the kn->active:

hv_eject_device_work()
  pci_stop_and_remove_bus_device()
    pci_stop_bus_device
      pci_remove_sysfs_dev_files
        kernfs_remove_by_name_ns
          __kernfs_remove
            kernfs_drain
               wait_event(root->deactivate_waitq,
                        atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);

I don't check all the scenarios and code paths, but generally speaking
I suppose the PCI subsystem should already have the proper locking and
synchronization logic we need here.

PS, I'm not sure if the host can remove the device by only sending
a PCI_BUS_RELATIONS message with bus_rel->device_count == 0 (i.e. no
PCI_EJECT message): in this case, if we don't have the ref counter for
hpdev, pci_devices_present_work() frees the hpdev before calling
pci_scan_child_bus() and this can be a potential race condition. But IMO
this can be easily fixed by moving "free the hpdev" to a later place, afer
pci_scan_child_bus() is called.

Thanks,
-- Dexuan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device
  2021-04-25  2:24         ` Dexuan Cui
@ 2021-04-25  4:53           ` Long Li
  0 siblings, 0 replies; 11+ messages in thread
From: Long Li @ 2021-04-25  4:53 UTC (permalink / raw)
  To: Dexuan Cui, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel

> Subject: RE: [Patch v2 1/2] PCI: hv: Fix a race condition when removing the
> device
> 
> > From: Long Li <longli@microsoft.com>
> > Sent: Friday, April 23, 2021 11:49 AM
> > To: Dexuan Cui <decui@microsoft.com>; longli@linuxonhyperv.com; KY
> >
> > > Subject: RE: [Patch v2 1/2] PCI: hv: Fix a race condition when
> > > removing the device
> > >
> > > > From: Long Li <longli@microsoft.com>
> > > > Sent: Friday, April 23, 2021 11:32 AM
> > > > > ...
> > > > > If we test "rmmod pci-hyperv", I suspect the warning will be printed:
> > > > > hv_pci_remove() -> hv_pci_bus_exit() ->
> hv_pci_start_relations_work():
> > > >
> > > > In most case, it will not print anything.
> > >
> > > If I read the code correctly, I think the warning is printed _every
> > > time_ we unload pci-hyperv.
> >
> > Okay I see what you mean. I'll remove this message.
> 
> Here we just want to avoid the message every time the pci-hyperv driver is
> unloaded. We might want to see the possible message when the PCI device
> is removed, but it's ok to me if the message is unconditionally removed.
> 
> The real issus with the patch is that the 'hpdev' struct is never freed when
> the driver is unloaded: if we print out the value of the ref counter in
> put_pcichild(), we would notice that the ref counter is still two when the
> driver is unloaded, i.e. memory leak occurs.
> 
> Before the patch, hv_pci_remove() calls hv_pci_bus_exit() ->
> hv_pci_start_relations_work(), and the ref counter drops to zero in
> pci_devices_present_work() due to the two calls of put_pcichild().
> 
> With the patch, when the driver is unloaded, pci_devices_present_work() is
> not scheduled, hence the ref counter doesn't drop to zero.

Yes, I also see the leak, thanks to this warning message. Those will get fixed in v3.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-04-25  4:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22  5:45 [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device longli
2021-04-22  5:45 ` [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting functions for handling bus device removal longli
2021-04-23  7:18   ` Dexuan Cui
2021-04-23 18:40     ` Long Li
2021-04-25  2:57       ` Dexuan Cui
2021-04-23  7:09 ` [Patch v2 1/2] PCI: hv: Fix a race condition when removing the device Dexuan Cui
2021-04-23 18:32   ` Long Li
2021-04-23 18:41     ` Dexuan Cui
2021-04-23 18:49       ` Long Li
2021-04-25  2:24         ` Dexuan Cui
2021-04-25  4:53           ` Long Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).