linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] virtio: Remove virtio device during shutdown
@ 2015-03-11  8:09 Fam Zheng
  2015-03-11  9:06 ` Michael S. Tsirkin
  0 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2015-03-11  8:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rusty Russell, Michael S. Tsirkin, virtualization, Paolo Bonzini,
	Jason Wang

Currently shutdown is nop for virtio devices, but the core code could
remove things behind us such as MSI-X handler etc. For example in the
case of virtio-scsi-pci, the device may still try to send interupts,
which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
be unhandled, and may cause flood.

Remove the device in "shutdown" callback to allow device drivers clean
up things.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 drivers/virtio/virtio.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 5ce2aa4..12f1f1e 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -269,6 +269,19 @@ static int virtio_dev_remove(struct device *_d)
 	return 0;
 }
 
+static void virtio_dev_shutdown(struct device *_d)
+{
+	struct virtio_device *dev = dev_to_virtio(_d);
+	struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+
+	virtio_config_disable(dev);
+
+	drv->remove(dev);
+
+	/* Driver should have reset device. */
+	WARN_ON_ONCE(dev->config->get_status(dev));
+}
+
 static struct bus_type virtio_bus = {
 	.name  = "virtio",
 	.match = virtio_dev_match,
@@ -276,6 +289,7 @@ static struct bus_type virtio_bus = {
 	.uevent = virtio_uevent,
 	.probe = virtio_dev_probe,
 	.remove = virtio_dev_remove,
+	.shutdown = virtio_dev_shutdown,
 };
 
 bool virtio_device_is_legacy_only(struct virtio_device_id id)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio: Remove virtio device during shutdown
  2015-03-11  8:09 [PATCH] virtio: Remove virtio device during shutdown Fam Zheng
@ 2015-03-11  9:06 ` Michael S. Tsirkin
  2015-03-11 10:11   ` Fam Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2015-03-11  9:06 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Rusty Russell, virtualization, Paolo Bonzini, Jason Wang

On Wed, Mar 11, 2015 at 04:09:17PM +0800, Fam Zheng wrote:
> Currently shutdown is nop for virtio devices, but the core code could
> remove things behind us such as MSI-X handler etc. For example in the
> case of virtio-scsi-pci, the device may still try to send interupts,
> which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
> be unhandled, and may cause flood.

This sounds very tentative. Do you, in fact, observe some problems
with virtio scsi? How to reproduce them? this needs to go
into the commit messages.

> Remove the device in "shutdown" callback to allow device drivers clean
> up things.
> 
> Signed-off-by: Fam Zheng <famz@redhat.com>

I'm concerned this will cause more hangs on shutdown: one
of the reasons for reboot is device mal-functioning.
How about we just reset devices instead? Something like
the below (untested).

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 5ce2aa4..0769941 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -269,6 +269,17 @@ static int virtio_dev_remove(struct device *_d)
 	return 0;
 }
 
+static void virtio_dev_shutdown(struct device *_d)
+{
+	struct virtio_device *dev = dev_to_virtio(_d);
+	/*
+	 * Reset the device to make it stop sending interrupts, DMA, etc.
+	 * We are shutting down, no need for full cleanup.
+	 */
+	dev->config->reset(dev);
+
+}
+
 static struct bus_type virtio_bus = {
 	.name  = "virtio",
 	.match = virtio_dev_match,
@@ -276,6 +288,7 @@ static struct bus_type virtio_bus = {
 	.uevent = virtio_uevent,
 	.probe = virtio_dev_probe,
 	.remove = virtio_dev_remove,
+	.shutdown = virtio_dev_shutdown,
 };
 
 bool virtio_device_is_legacy_only(struct virtio_device_id id)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio: Remove virtio device during shutdown
  2015-03-11  9:06 ` Michael S. Tsirkin
@ 2015-03-11 10:11   ` Fam Zheng
  2015-03-12 16:22     ` Michael S. Tsirkin
  0 siblings, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2015-03-11 10:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Rusty Russell, virtualization, Paolo Bonzini, Jason Wang

On Wed, 03/11 10:06, Michael S. Tsirkin wrote:
> On Wed, Mar 11, 2015 at 04:09:17PM +0800, Fam Zheng wrote:
> > Currently shutdown is nop for virtio devices, but the core code could
> > remove things behind us such as MSI-X handler etc. For example in the
> > case of virtio-scsi-pci, the device may still try to send interupts,
> > which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
> > be unhandled, and may cause flood.

Here is the problem I want to solve - file system driver hang:

If a fs code happen to hit __wait_on_buffer right after pci pci_device_shutdown
disabled msix, it will never make progress because the requests it waits for
will never be completed. So the system hangs.

In other words we will want to reset virtio device before pci_device_shutdown
AND wake up all waiters.

Unfortunately, neither your patch nor mine does that, because virtio bus can be
shutdown after pci bus (thanks to Jason for pointing out this). In that case,
any completion after disabling msix is lost.

Maybe we need both the pci shutdown handler to reset the device and the virtio
shutdown handler to remove the device?

Fam

> 
> This sounds very tentative. Do you, in fact, observe some problems
> with virtio scsi? How to reproduce them? this needs to go
> into the commit messages.

OK, my bad.

> 
> > Remove the device in "shutdown" callback to allow device drivers clean
> > up things.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> 
> I'm concerned this will cause more hangs on shutdown: one
> of the reasons for reboot is device mal-functioning.
> How about we just reset devices instead? Something like
> the below (untested).
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 5ce2aa4..0769941 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -269,6 +269,17 @@ static int virtio_dev_remove(struct device *_d)
>  	return 0;
>  }
>  
> +static void virtio_dev_shutdown(struct device *_d)
> +{
> +	struct virtio_device *dev = dev_to_virtio(_d);
> +	/*
> +	 * Reset the device to make it stop sending interrupts, DMA, etc.
> +	 * We are shutting down, no need for full cleanup.
> +	 */
> +	dev->config->reset(dev);
> +
> +}
> +
>  static struct bus_type virtio_bus = {
>  	.name  = "virtio",
>  	.match = virtio_dev_match,
> @@ -276,6 +288,7 @@ static struct bus_type virtio_bus = {
>  	.uevent = virtio_uevent,
>  	.probe = virtio_dev_probe,
>  	.remove = virtio_dev_remove,
> +	.shutdown = virtio_dev_shutdown,
>  };
>  
>  bool virtio_device_is_legacy_only(struct virtio_device_id id)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio: Remove virtio device during shutdown
  2015-03-11 10:11   ` Fam Zheng
@ 2015-03-12 16:22     ` Michael S. Tsirkin
  2015-03-12 16:39       ` Paolo Bonzini
  2015-03-12 23:35       ` Fam Zheng
  0 siblings, 2 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2015-03-12 16:22 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Rusty Russell, virtualization, Paolo Bonzini, Jason Wang

On Wed, Mar 11, 2015 at 06:11:35PM +0800, Fam Zheng wrote:
> On Wed, 03/11 10:06, Michael S. Tsirkin wrote:
> > On Wed, Mar 11, 2015 at 04:09:17PM +0800, Fam Zheng wrote:
> > > Currently shutdown is nop for virtio devices, but the core code could
> > > remove things behind us such as MSI-X handler etc. For example in the
> > > case of virtio-scsi-pci, the device may still try to send interupts,
> > > which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
> > > be unhandled, and may cause flood.
> 
> Here is the problem I want to solve - file system driver hang:
> 
> If a fs code happen to hit __wait_on_buffer right after pci pci_device_shutdown
> disabled msix, it will never make progress because the requests it waits for
> will never be completed. So the system hangs.

Paolo says that pci reset of virtio scsi device guarantees
that all outstanding requests complete.

If true and implemented correctly, I don't see what else
needs to be done.

You will need to debug this some more.


> In other words we will want to reset virtio device before pci_device_shutdown
> AND wake up all waiters.
> 
> Unfortunately, neither your patch nor mine does that, because virtio bus can be
> shutdown after pci bus (thanks to Jason for pointing out this). In that case,
> any completion after disabling msix is lost.
> 
> Maybe we need both the pci shutdown handler to reset the device and the virtio
> shutdown handler to remove the device?
> 
> Fam
> 
> > 
> > This sounds very tentative. Do you, in fact, observe some problems
> > with virtio scsi? How to reproduce them? this needs to go
> > into the commit messages.
> 
> OK, my bad.
> 
> > 
> > > Remove the device in "shutdown" callback to allow device drivers clean
> > > up things.
> > > 
> > > Signed-off-by: Fam Zheng <famz@redhat.com>
> > 
> > I'm concerned this will cause more hangs on shutdown: one
> > of the reasons for reboot is device mal-functioning.
> > How about we just reset devices instead? Something like
> > the below (untested).
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 5ce2aa4..0769941 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -269,6 +269,17 @@ static int virtio_dev_remove(struct device *_d)
> >  	return 0;
> >  }
> >  
> > +static void virtio_dev_shutdown(struct device *_d)
> > +{
> > +	struct virtio_device *dev = dev_to_virtio(_d);
> > +	/*
> > +	 * Reset the device to make it stop sending interrupts, DMA, etc.
> > +	 * We are shutting down, no need for full cleanup.
> > +	 */
> > +	dev->config->reset(dev);
> > +
> > +}
> > +
> >  static struct bus_type virtio_bus = {
> >  	.name  = "virtio",
> >  	.match = virtio_dev_match,
> > @@ -276,6 +288,7 @@ static struct bus_type virtio_bus = {
> >  	.uevent = virtio_uevent,
> >  	.probe = virtio_dev_probe,
> >  	.remove = virtio_dev_remove,
> > +	.shutdown = virtio_dev_shutdown,
> >  };
> >  
> >  bool virtio_device_is_legacy_only(struct virtio_device_id id)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio: Remove virtio device during shutdown
  2015-03-12 16:22     ` Michael S. Tsirkin
@ 2015-03-12 16:39       ` Paolo Bonzini
  2015-03-12 23:35       ` Fam Zheng
  1 sibling, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-03-12 16:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Fam Zheng
  Cc: linux-kernel, Rusty Russell, virtualization, Jason Wang



On 12/03/2015 17:22, Michael S. Tsirkin wrote:
> On Wed, Mar 11, 2015 at 06:11:35PM +0800, Fam Zheng wrote:
>> On Wed, 03/11 10:06, Michael S. Tsirkin wrote:
>>> On Wed, Mar 11, 2015 at 04:09:17PM +0800, Fam Zheng wrote:
>>>> Currently shutdown is nop for virtio devices, but the core code could
>>>> remove things behind us such as MSI-X handler etc. For example in the
>>>> case of virtio-scsi-pci, the device may still try to send interupts,
>>>> which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
>>>> be unhandled, and may cause flood.
>>
>> Here is the problem I want to solve - file system driver hang:
>>
>> If a fs code happen to hit __wait_on_buffer right after pci pci_device_shutdown
>> disabled msix, it will never make progress because the requests it waits for
>> will never be completed. So the system hangs.
> 
> Paolo says that pci reset of virtio scsi device guarantees
> that all outstanding requests complete.

For what it's worth, see here:

static void virtio_scsi_reset(VirtIODevice *vdev)
{
    ...
    s->resetting++;
    qbus_reset_all(&s->bus.qbus);
    s->resetting--;
    ...
}

static void scsi_disk_reset(DeviceState *dev)
{
    SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
    uint64_t nb_sectors;

    scsi_device_purge_requests(&s->qdev, SENSE_CODE(RESET));
    ...
}

Paolo

> If true and implemented correctly, I don't see what else
> needs to be done.
> 
> You will need to debug this some more.
> 
> 
>> In other words we will want to reset virtio device before pci_device_shutdown
>> AND wake up all waiters.
>>
>> Unfortunately, neither your patch nor mine does that, because virtio bus can be
>> shutdown after pci bus (thanks to Jason for pointing out this). In that case,
>> any completion after disabling msix is lost.
>>
>> Maybe we need both the pci shutdown handler to reset the device and the virtio
>> shutdown handler to remove the device?
>>
>> Fam
>>
>>>
>>> This sounds very tentative. Do you, in fact, observe some problems
>>> with virtio scsi? How to reproduce them? this needs to go
>>> into the commit messages.
>>
>> OK, my bad.
>>
>>>
>>>> Remove the device in "shutdown" callback to allow device drivers clean
>>>> up things.
>>>>
>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>
>>> I'm concerned this will cause more hangs on shutdown: one
>>> of the reasons for reboot is device mal-functioning.
>>> How about we just reset devices instead? Something like
>>> the below (untested).
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>>> index 5ce2aa4..0769941 100644
>>> --- a/drivers/virtio/virtio.c
>>> +++ b/drivers/virtio/virtio.c
>>> @@ -269,6 +269,17 @@ static int virtio_dev_remove(struct device *_d)
>>>  	return 0;
>>>  }
>>>  
>>> +static void virtio_dev_shutdown(struct device *_d)
>>> +{
>>> +	struct virtio_device *dev = dev_to_virtio(_d);
>>> +	/*
>>> +	 * Reset the device to make it stop sending interrupts, DMA, etc.
>>> +	 * We are shutting down, no need for full cleanup.
>>> +	 */
>>> +	dev->config->reset(dev);
>>> +
>>> +}
>>> +
>>>  static struct bus_type virtio_bus = {
>>>  	.name  = "virtio",
>>>  	.match = virtio_dev_match,
>>> @@ -276,6 +288,7 @@ static struct bus_type virtio_bus = {
>>>  	.uevent = virtio_uevent,
>>>  	.probe = virtio_dev_probe,
>>>  	.remove = virtio_dev_remove,
>>> +	.shutdown = virtio_dev_shutdown,
>>>  };
>>>  
>>>  bool virtio_device_is_legacy_only(struct virtio_device_id id)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio: Remove virtio device during shutdown
  2015-03-12 16:22     ` Michael S. Tsirkin
  2015-03-12 16:39       ` Paolo Bonzini
@ 2015-03-12 23:35       ` Fam Zheng
  2015-03-13 14:17         ` Michael S. Tsirkin
  1 sibling, 1 reply; 7+ messages in thread
From: Fam Zheng @ 2015-03-12 23:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Rusty Russell, virtualization, Paolo Bonzini, Jason Wang

On Thu, 03/12 17:22, Michael S. Tsirkin wrote:
> On Wed, Mar 11, 2015 at 06:11:35PM +0800, Fam Zheng wrote:
> > On Wed, 03/11 10:06, Michael S. Tsirkin wrote:
> > > On Wed, Mar 11, 2015 at 04:09:17PM +0800, Fam Zheng wrote:
> > > > Currently shutdown is nop for virtio devices, but the core code could
> > > > remove things behind us such as MSI-X handler etc. For example in the
> > > > case of virtio-scsi-pci, the device may still try to send interupts,
> > > > which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
> > > > be unhandled, and may cause flood.
> > 
> > Here is the problem I want to solve - file system driver hang:
> > 
> > If a fs code happen to hit __wait_on_buffer right after pci pci_device_shutdown
> > disabled msix, it will never make progress because the requests it waits for
> > will never be completed. So the system hangs.
> 
> Paolo says that pci reset of virtio scsi device guarantees
> that all outstanding requests complete.
> 
> If true and implemented correctly, I don't see what else
> needs to be done.
> 
> You will need to debug this some more.

First of all I was wrong about the fs driver above, scratch that, I'm sorry for
the misleading.

Regarding the hang in shutdown, Ulrich Obergfell has already pointed out that
the vcpu is "busy/stuck in interrupt processing":

https://bugzilla.redhat.com/attachment.cgi?id=998391 (RHBZ 1199155)

Summary: The reason it is stuck is that an IRQ from virtio-scsi-pci is not
handled. Why is there that IRQ? Because pci core code disabled msix. Why is it
not handled?  Because it's done behind virtio-scsi, who still is waiting for
msix.

"Hence, the interrupt will not be acknowledged and the guest becomes flooded
with IRQ 11 interrupt."

Fortunately it's not a livelock for upstream, because of:

    commit 184564efae4d775225c8fe3b762a56956fb1f827
    Author: Zhang Haoyu <zhanghy@sangfor.com>
    Date:   Thu Sep 11 16:47:04 2014 +0800

    kvm: ioapic: conditionally delay irq delivery duringeoi broadcast

But we still should do the shutdown right.

I also propose to not shutdown msix from pci core shutdown if the device
doesn't have shutdown function:

http://www.spinics.net/lists/kernel/msg1944041.html

With that patch is applied, the "nop" .shutdown in virtio-pci shouldn't hurt
much.

Regarding handing the requests, now I don't know if we really care about them
at shutdown. As you said, waiting for requests may cause more hang.

Ideas?

Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio: Remove virtio device during shutdown
  2015-03-12 23:35       ` Fam Zheng
@ 2015-03-13 14:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2015-03-13 14:17 UTC (permalink / raw)
  To: Fam Zheng
  Cc: linux-kernel, Rusty Russell, virtualization, Paolo Bonzini, Jason Wang

On Fri, Mar 13, 2015 at 07:35:52AM +0800, Fam Zheng wrote:
> On Thu, 03/12 17:22, Michael S. Tsirkin wrote:
> > On Wed, Mar 11, 2015 at 06:11:35PM +0800, Fam Zheng wrote:
> > > On Wed, 03/11 10:06, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 11, 2015 at 04:09:17PM +0800, Fam Zheng wrote:
> > > > > Currently shutdown is nop for virtio devices, but the core code could
> > > > > remove things behind us such as MSI-X handler etc. For example in the
> > > > > case of virtio-scsi-pci, the device may still try to send interupts,
> > > > > which will be on IRQ lines seeing MSI-X disabled. Those interrupts will
> > > > > be unhandled, and may cause flood.
> > > 
> > > Here is the problem I want to solve - file system driver hang:
> > > 
> > > If a fs code happen to hit __wait_on_buffer right after pci pci_device_shutdown
> > > disabled msix, it will never make progress because the requests it waits for
> > > will never be completed. So the system hangs.
> > 
> > Paolo says that pci reset of virtio scsi device guarantees
> > that all outstanding requests complete.
> > 
> > If true and implemented correctly, I don't see what else
> > needs to be done.
> > 
> > You will need to debug this some more.
> 
> First of all I was wrong about the fs driver above, scratch that, I'm sorry for
> the misleading.
> 
> Regarding the hang in shutdown, Ulrich Obergfell has already pointed out that
> the vcpu is "busy/stuck in interrupt processing":
> 
> https://bugzilla.redhat.com/attachment.cgi?id=998391 (RHBZ 1199155)
> 
> Summary: The reason it is stuck is that an IRQ from virtio-scsi-pci is not
> handled. Why is there that IRQ? Because pci core code disabled msix. Why is it
> not handled?  Because it's done behind virtio-scsi, who still is waiting for
> msix.
> 
> "Hence, the interrupt will not be acknowledged and the guest becomes flooded
> with IRQ 11 interrupt."
> 
> Fortunately it's not a livelock for upstream, because of:
> 
>     commit 184564efae4d775225c8fe3b762a56956fb1f827
>     Author: Zhang Haoyu <zhanghy@sangfor.com>
>     Date:   Thu Sep 11 16:47:04 2014 +0800
> 
>     kvm: ioapic: conditionally delay irq delivery duringeoi broadcast
> 
> But we still should do the shutdown right.
> 
> I also propose to not shutdown msix from pci core shutdown if the device
> doesn't have shutdown function:
> 
> http://www.spinics.net/lists/kernel/msg1944041.html

Makes sense.
Can you bounce this one to me please?
I'll ack.

> With that patch is applied, the "nop" .shutdown in virtio-pci shouldn't hurt
> much.
> 
> Regarding handing the requests, now I don't know if we really care about them
> at shutdown. As you said, waiting for requests may cause more hang.
> 
> Ideas?
> 
> Fam

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-13 14:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11  8:09 [PATCH] virtio: Remove virtio device during shutdown Fam Zheng
2015-03-11  9:06 ` Michael S. Tsirkin
2015-03-11 10:11   ` Fam Zheng
2015-03-12 16:22     ` Michael S. Tsirkin
2015-03-12 16:39       ` Paolo Bonzini
2015-03-12 23:35       ` Fam Zheng
2015-03-13 14:17         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).