Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] PCI: hv: Fix a race condition when removing the device
@ 2021-04-19 19:20 longli
  2021-04-21 17:33 ` Michael Kelley
  0 siblings, 1 reply; 7+ messages in thread
From: longli @ 2021-04-19 19:20 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas, linux-hyperv,
	linux-pci, linux-kernel
  Cc: Long Li

From: Long Li <longli@microsoft.com>

On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().

This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.

Fix this by flushing/stopping the workqueue of hbus before doing hbus remove.

Signed-off-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 27a17a1e4a7c..116815404313 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device *hdev)
 
 	hbus = hv_get_drvdata(hdev);
 	if (hbus->state == hv_pcibus_installed) {
+		tasklet_disable(&hdev->channel->callback_event);
+		hbus->state = hv_pcibus_removing;
+		tasklet_enable(&hdev->channel->callback_event);
+
+		flush_workqueue(hbus->wq);
+		/*
+		 * At this point, no work is running or can be scheduled
+		 * on hbus-wq. We can't race with hv_pci_devices_present()
+		 * or hv_pci_eject_device(), it's safe to proceed.
+		 */
+
 		/* Remove the bus from PCI's point of view. */
 		pci_lock_rescan_remove();
 		pci_stop_root_bus(hbus->pci_bus);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: hv: Fix a race condition when removing the device
  2021-04-19 19:20 [PATCH] PCI: hv: Fix a race condition when removing the device longli
@ 2021-04-21 17:33 ` Michael Kelley
  2021-04-21 19:56   ` Long Li
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Kelley @ 2021-04-21 17:33 UTC (permalink / raw)
  To: longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas, linux-hyperv,
	linux-pci, linux-kernel, Dexuan Cui
  Cc: Long Li

From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>  Sent: Monday, April 19, 2021 12:21 PM
> 
> On removing the device, any work item (hv_pci_devices_present() or
> hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
> and race with hv_pci_remove().
> 
> This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
> and decide to rescind the channel immediately after that.
> 
> Fix this by flushing/stopping the workqueue of hbus before doing hbus remove.

I can see that this change follows the same pattern as in hv_pci_suspend().   The
comments there give a full explanation of the issue and the solution.  But
interestingly, the current code also has a reference count mechanism on
the hbus.  And code near the end of hv_pci_remove() decrements the reference
count and then waits for all users to finish before destroying the workqueue.
With this change, is this reference counting mechanism still needed?   If the
workqueue has already been emptied, it seems like the wait_for_completion()
near the end of hv_pci_remove() would never be waiting for anything.  It makes
me wonder if moving the reference count checking code from near the end of
hv_pci_remove() up to near the beginning would solve the problem as well
(and maybe in hv_pci_suspend also?).

Michael 

> 
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 27a17a1e4a7c..116815404313 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device *hdev)
> 
>  	hbus = hv_get_drvdata(hdev);
>  	if (hbus->state == hv_pcibus_installed) {
> +		tasklet_disable(&hdev->channel->callback_event);
> +		hbus->state = hv_pcibus_removing;
> +		tasklet_enable(&hdev->channel->callback_event);
> +
> +		flush_workqueue(hbus->wq);
> +		/*
> +		 * At this point, no work is running or can be scheduled
> +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> +		 * or hv_pci_eject_device(), it's safe to proceed.
> +		 */
> +
>  		/* Remove the bus from PCI's point of view. */
>  		pci_lock_rescan_remove();
>  		pci_stop_root_bus(hbus->pci_bus);
> --
> 2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: hv: Fix a race condition when removing the device
  2021-04-21 17:33 ` Michael Kelley
@ 2021-04-21 19:56   ` Long Li
  2021-04-21 21:05     ` Michael Kelley
  0 siblings, 1 reply; 7+ messages in thread
From: Long Li @ 2021-04-21 19:56 UTC (permalink / raw)
  To: Michael Kelley, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel, Dexuan Cui

> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>  Sent:
> Monday, April 19, 2021 12:21 PM
> >
> > On removing the device, any work item (hv_pci_devices_present() or
> > hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be
> > running and race with hv_pci_remove().
> >
> > This can happen because the host may send PCI_EJECT or
> > PCI_BUS_RELATIONS(2) and decide to rescind the channel immediately
> after that.
> >
> > Fix this by flushing/stopping the workqueue of hbus before doing hbus
> remove.
> 
> I can see that this change follows the same pattern as in hv_pci_suspend().
> The
> comments there give a full explanation of the issue and the solution.  But
> interestingly, the current code also has a reference count mechanism on the
> hbus.  And code near the end of hv_pci_remove() decrements the reference
> count and then waits for all users to finish before destroying the workqueue.
> With this change, is this reference counting mechanism still needed?   If the
> workqueue has already been emptied, it seems like the
> wait_for_completion() near the end of hv_pci_remove() would never be
> waiting for anything.  It makes me wonder if moving the reference count
> checking code from near the end of
> hv_pci_remove() up to near the beginning would solve the problem as well
> (and maybe in hv_pci_suspend also?).

Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have changed to use a dedicated workqueue for hbus since they were introduced.

But we still need to call tasklet_disable/enable() the same way hv_pci_suspend() does, the reason is that we need to protect hbus->state. This value needs to be consistent for the driver. For example, a CPU may decide to schedule a work on a work queue that we just flushed or destroyed, by reading the wrong hbus->state.

> 
> Michael
> 
> >
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 27a17a1e4a7c..116815404313 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device
> > *hdev)
> >
> >  	hbus = hv_get_drvdata(hdev);
> >  	if (hbus->state == hv_pcibus_installed) {
> > +		tasklet_disable(&hdev->channel->callback_event);
> > +		hbus->state = hv_pcibus_removing;
> > +		tasklet_enable(&hdev->channel->callback_event);
> > +
> > +		flush_workqueue(hbus->wq);
> > +		/*
> > +		 * At this point, no work is running or can be scheduled
> > +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> > +		 * or hv_pci_eject_device(), it's safe to proceed.
> > +		 */
> > +
> >  		/* Remove the bus from PCI's point of view. */
> >  		pci_lock_rescan_remove();
> >  		pci_stop_root_bus(hbus->pci_bus);
> > --
> > 2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: hv: Fix a race condition when removing the device
  2021-04-21 19:56   ` Long Li
@ 2021-04-21 21:05     ` Michael Kelley
  2021-04-22  2:31       ` Dexuan Cui
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Kelley @ 2021-04-21 21:05 UTC (permalink / raw)
  To: Long Li, longli, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Lorenzo Pieralisi, Rob Herring, Bjorn Helgaas,
	linux-hyperv, linux-pci, linux-kernel, Dexuan Cui

From: Long Li <longli@microsoft.com> Sent: Wednesday, April 21, 2021 12:57 PM
> > From: longli@linuxonhyperv.com <longli@linuxonhyperv.com>  Sent:
> > Monday, April 19, 2021 12:21 PM
> > >
> > > On removing the device, any work item (hv_pci_devices_present() or
> > > hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be
> > > running and race with hv_pci_remove().
> > >
> > > This can happen because the host may send PCI_EJECT or
> > > PCI_BUS_RELATIONS(2) and decide to rescind the channel immediately
> > after that.
> > >
> > > Fix this by flushing/stopping the workqueue of hbus before doing hbus
> > remove.
> >
> > I can see that this change follows the same pattern as in hv_pci_suspend().
> > The comments there give a full explanation of the issue and the solution.  But
> > interestingly, the current code also has a reference count mechanism on the
> > hbus.  And code near the end of hv_pci_remove() decrements the reference
> > count and then waits for all users to finish before destroying the workqueue.
> > With this change, is this reference counting mechanism still needed?   If the
> > workqueue has already been emptied, it seems like the
> > wait_for_completion() near the end of hv_pci_remove() would never be
> > waiting for anything.  It makes me wonder if moving the reference count
> > checking code from near the end of hv_pci_remove() up to near the beginning
> > would solve the problem as well (and maybe in hv_pci_suspend also?).
> 
> Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have changed to use
> a dedicated workqueue for hbus since they were introduced.
> 
> But we still need to call tasklet_disable/enable() the same way hv_pci_suspend() does, the
> reason is that we need to protect hbus->state. This value needs to be consistent for the
> driver. For example, a CPU may decide to schedule a work on a work queue that we just
> flushed or destroyed, by reading the wrong hbus->state.
> 

Yes, I would agree the tasklet disable/enable are needed, especially since tasklet_disable()
is what ensures that the tasklet is not currently running.

If the hbus ref counting isn't needed any longer, I would strongly recommend adding
a patch to the series that removes it.  This synchronization stuff is hard enough to
understand and reason about; having a leftover mechanism that doesn't really do
anything useful makes it nearly impossible. :-)

Dexuan -- I'm hoping you can take a look as well and see if you agree.

Michael

> >
> > Michael
> >
> > >
> > > Signed-off-by: Long Li <longli@microsoft.com>
> > > ---
> > >  drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
> > >  1 file changed, 11 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c
> > > b/drivers/pci/controller/pci-hyperv.c
> > > index 27a17a1e4a7c..116815404313 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device
> > > *hdev)
> > >
> > >  	hbus = hv_get_drvdata(hdev);
> > >  	if (hbus->state == hv_pcibus_installed) {
> > > +		tasklet_disable(&hdev->channel->callback_event);
> > > +		hbus->state = hv_pcibus_removing;
> > > +		tasklet_enable(&hdev->channel->callback_event);
> > > +
> > > +		flush_workqueue(hbus->wq);
> > > +		/*
> > > +		 * At this point, no work is running or can be scheduled
> > > +		 * on hbus-wq. We can't race with hv_pci_devices_present()
> > > +		 * or hv_pci_eject_device(), it's safe to proceed.
> > > +		 */
> > > +
> > >  		/* Remove the bus from PCI's point of view. */
> > >  		pci_lock_rescan_remove();
> > >  		pci_stop_root_bus(hbus->pci_bus);
> > > --
> > > 2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: hv: Fix a race condition when removing the device
  2021-04-21 21:05     ` Michael Kelley
@ 2021-04-22  2:31       ` Dexuan Cui
  2021-04-22  3:57         ` Long Li
  0 siblings, 1 reply; 7+ messages in thread
From: Dexuan Cui @ 2021-04-22  2:31 UTC (permalink / raw)
  To: Michael Kelley, Long Li, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Wednesday, April 21, 2021 2:06 PM
>  ...
> > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have
> > changed to use
> > a dedicated workqueue for hbus since they were introduced.
> >
> > But we still need to call tasklet_disable/enable() the same way
> > hv_pci_suspend() does, the
> > reason is that we need to protect hbus->state. This value needs to be
> consistent for the
> > driver. For example, a CPU may decide to schedule a work on a work queue
> that we just
> > flushed or destroyed, by reading the wrong hbus->state.
> >
> 
> Yes, I would agree the tasklet disable/enable are needed, especially since
> tasklet_disable()
> is what ensures that the tasklet is not currently running.
> 
> If the hbus ref counting isn't needed any longer, I would strongly recommend
> adding
> a patch to the series that removes it.  This synchronization stuff is hard
> enough to
> understand and reason about; having a leftover mechanism that doesn't really
> do
> anything useful makes it nearly impossible. :-)
> 
> Dexuan -- I'm hoping you can take a look as well and see if you agree.
> 
> Michael

I also think we can remove the reference counting.

But it looks like there is still race in hv_pci_remove() even with Long's
patch: in hv_pci_remove(), we disable the tasklet, change hbus->state to
hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and set
hbus->state to hv_pcibus_removed -- what if the channel callback runs
again? -- now hbus->state is no longer hv_pcibus_removing, so
hv_pci_devices_present() -> hv_pci_start_relations_work() and
hv_pci_eject_device() can still add new work items to hbus->wq, and the new
work items may race with the vmbus_close().

It looks like we should remove the state hv_pcibus_removed?

Thanks,
-- Dexuan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: hv: Fix a race condition when removing the device
  2021-04-22  2:31       ` Dexuan Cui
@ 2021-04-22  3:57         ` Long Li
  2021-04-22  4:19           ` Long Li
  0 siblings, 1 reply; 7+ messages in thread
From: Long Li @ 2021-04-22  3:57 UTC (permalink / raw)
  To: Dexuan Cui, Michael Kelley, longli, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Lorenzo Pieralisi, Rob Herring,
	Bjorn Helgaas, linux-hyperv, linux-pci, linux-kernel

> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> > From: Michael Kelley <mikelley@microsoft.com>
> > Sent: Wednesday, April 21, 2021 2:06 PM  ...
> > > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we
> > > have changed to use a dedicated workqueue for hbus since they were
> > > introduced.
> > >
> > > But we still need to call tasklet_disable/enable() the same way
> > > hv_pci_suspend() does, the
> > > reason is that we need to protect hbus->state. This value needs to
> > > be
> > consistent for the
> > > driver. For example, a CPU may decide to schedule a work on a work
> > > queue
> > that we just
> > > flushed or destroyed, by reading the wrong hbus->state.
> > >
> >
> > Yes, I would agree the tasklet disable/enable are needed, especially
> > since
> > tasklet_disable()
> > is what ensures that the tasklet is not currently running.
> >
> > If the hbus ref counting isn't needed any longer, I would strongly
> > recommend adding a patch to the series that removes it.  This
> > synchronization stuff is hard enough to understand and reason about;
> > having a leftover mechanism that doesn't really do anything useful
> > makes it nearly impossible. :-)
> >
> > Dexuan -- I'm hoping you can take a look as well and see if you agree.
> >
> > Michael
> 
> I also think we can remove the reference counting.
> 
> But it looks like there is still race in hv_pci_remove() even with Long's
> patch: in hv_pci_remove(), we disable the tasklet, change hbus->state to
> hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and set
> hbus->state to hv_pcibus_removed -- what if the channel callback runs
> again? -- now hbus->state is no longer hv_pcibus_removing, so
> hv_pci_devices_present() -> hv_pci_start_relations_work() and
> hv_pci_eject_device() can still add new work items to hbus->wq, and the
> new work items may race with the vmbus_close().

Good catch, adding a check for hv_pcibus_removed should fix it. I'm sending a v2.

> 
> It looks like we should remove the state hv_pcibus_removed?
> 
> Thanks,
> -- Dexuan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] PCI: hv: Fix a race condition when removing the device
  2021-04-22  3:57         ` Long Li
@ 2021-04-22  4:19           ` Long Li
  0 siblings, 0 replies; 7+ messages in thread
From: Long Li @ 2021-04-22  4:19 UTC (permalink / raw)
  To: Long Li, Dexuan Cui, Michael Kelley, longli, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Wei Liu, Lorenzo Pieralisi,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci,
	linux-kernel

> Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the device
> 
> > Subject: RE: [PATCH] PCI: hv: Fix a race condition when removing the
> > device
> >
> > > From: Michael Kelley <mikelley@microsoft.com>
> > > Sent: Wednesday, April 21, 2021 2:06 PM  ...
> > > > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as
> > > > we have changed to use a dedicated workqueue for hbus since they
> > > > were introduced.
> > > >
> > > > But we still need to call tasklet_disable/enable() the same way
> > > > hv_pci_suspend() does, the
> > > > reason is that we need to protect hbus->state. This value needs to
> > > > be
> > > consistent for the
> > > > driver. For example, a CPU may decide to schedule a work on a work
> > > > queue
> > > that we just
> > > > flushed or destroyed, by reading the wrong hbus->state.
> > > >
> > >
> > > Yes, I would agree the tasklet disable/enable are needed, especially
> > > since
> > > tasklet_disable()
> > > is what ensures that the tasklet is not currently running.
> > >
> > > If the hbus ref counting isn't needed any longer, I would strongly
> > > recommend adding a patch to the series that removes it.  This
> > > synchronization stuff is hard enough to understand and reason about;
> > > having a leftover mechanism that doesn't really do anything useful
> > > makes it nearly impossible. :-)
> > >
> > > Dexuan -- I'm hoping you can take a look as well and see if you agree.
> > >
> > > Michael
> >
> > I also think we can remove the reference counting.
> >
> > But it looks like there is still race in hv_pci_remove() even with
> > Long's
> > patch: in hv_pci_remove(), we disable the tasklet, change hbus->state
> > to hv_pcibus_removing, re-enable the tasklet and flush hbus->wq, and
> > set
> > hbus->state to hv_pcibus_removed -- what if the channel callback runs
> > again? -- now hbus->state is no longer hv_pcibus_removing, so
> > hv_pci_devices_present() -> hv_pci_start_relations_work() and
> > hv_pci_eject_device() can still add new work items to hbus->wq, and
> > the new work items may race with the vmbus_close().
> 
> Good catch, adding a check for hv_pcibus_removed should fix it. I'm sending
> a v2.

I don't see the reason why we want to assign hbus->state to hv_pcibus_removed at all, so just removing it will take care of the race.

> 
> >
> > It looks like we should remove the state hv_pcibus_removed?
> >
> > Thanks,
> > -- Dexuan


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 19:20 [PATCH] PCI: hv: Fix a race condition when removing the device longli
2021-04-21 17:33 ` Michael Kelley
2021-04-21 19:56   ` Long Li
2021-04-21 21:05     ` Michael Kelley
2021-04-22  2:31       ` Dexuan Cui
2021-04-22  3:57         ` Long Li
2021-04-22  4:19           ` Long Li

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git